context.fillText("25°C", 100, 100);
Unfortunately, in some cases (details below) the Degree Symbol is shown on a canvas as a black diamond with a question mark ! :(
After significant research, I think I understand (most of) what is happening. Basically, the Degree Symbol is encoded differently, and interpreted differently, depending on whether a text file is stored as (encoded as) ANSI or UTF-8. For most browsers (that I've tested)
These are the testcases - UTF8 ANSI ANSI with Chrome Failure ANSI with meta tag - each one is configured to test a specific combination of parameters.
While developing these pages, I stumbled on to (and documented) several browser design problems.
Controlling the encoding
|Unicode||examdiff thinks this is binary|
|UTF-8||examdiff can compare these|
|wordpad||Text Document||ANSI encoding, also changes unix line endings to DOS version|
|Text Document - MS-DOS Format||ANSI encoding|
|Unicode Text Document||examdiff claims it is binary|
To quote from w3schools
The character encoding for the early web was ASCII. Later, from HTML 2.0 to HTML 4.01, ISO-8859-1 was considered the standard. (ANSI is identical to ISO-8859-1 except that ANSI has 32 extra characters.) With XML and HTML5, UTF-8 finally arrived, and solved a lot of character encoding problems.
For example, I took one of my test files, copied it, and saved the copy using each of the encoding types - the sizes in the table below are in actual bytes.
Code File Size on Size Disk ANSI 5,054 8,192 UTF-8 5,060 8,192 Unicode 10,110 12,288
Combine that with the warning above, and the fact that this page is necessary, and you get an idea of why I don't like unicode. There is more, but that is not the purpose of this page.
Other file issues
examdiff won't compare them because it claims that they are binary!!!
This was pretty frustrating. The problem was that the UTF-8 text file starts with 3 "binary" (not visible in notepad) characters that define the type of encoding and only had a few additional (normal) characters in it. When I added more characters - the "problem" went away. Specifically.
While the fix was extremely simple, I lost a lot of time trying to figure out why notepad was creating binary files! when the real problem (I think) was that examdiff was attempting to verify that the file was not binary - and that test was failing.
With wordpad, examdiff reported that any file saved as Unicode Text Document is also binary. Just a guess - it is saved as UTF-16 - but I have no way to verify that. At any rate, and because of this, I only use notepad to intentionally produce unicode files.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
At any rate, when I made a local copy of an application which displayed Degree Symbols, Chrome not only saved the html text as UTF-8, but it also automatically added that tag.
Of course, when that tag is present and the html file is actually encoded as ANSI, there are problems!!!
At some point (several years ago) I removed this tag from the file while troubleshooting basically the same Degree Symbol problem. Since Chrome obviously added it, and since I had no idea why, I simply deleted it (actually, commented it out). Since the page still displayed the same as before, I figured that it had no purpose and left it that way. During research for this page, I re-enabled it and discovered this issue.
Notice that this failure, and the Chrome 63 only issue (testcase) show black diamonds in exactly the same locations - except that this error occurs in all (tested) browsers .. not just Chrome 63.
The ANSI js canvas error always occurs when the test page is loaded from the local file server .. but sometimes works (shows the Degree Symbol) when loaded from a remote web server. I inspected the HTTP headers, but can not determine why this happens.
There are a number of additional testcases (scenarios) that could be (and, perhaps, should be) tested, but these are enough (for now) to suggest a fix for any additional failures that might be encountered.
Those four testcases were run on 5 browsers.
|63||10||Fails if 3 character unicode header anywhere on page||Chrome 63|
|IE||11||10||Strange - OK when launched via Edge, may Fail with refresh||MS IE11|
These are screen captures of the various testcases - when testing from a file server, only 3 are necessary since 2 of them are identical. However, when testing via the web, a few of the results are different (and not consistent!).
|UTF-8 encoded file|
Degree Symbol be saved as UTF-8
|ANSI encoded file|
||When the main HTML file is ANSI, it doesn't matter how the
|These 2 testcases (files) produce the same results
The ANSI encoded file with a meta-tag override (error) demonstrates that issue
ANSI encoded file plus a few extra characters demonstrates the Chrome 63 design problem
||Notice that there are 3 black diamonds - Local HTML, ANSI js,
and Sample code
|Weird results when read via the web - Chrome 63 only
The ANSI encoded file with a meta-tag override (error) may display differently
||Notice that there are only 2 black diamonds - Local HTML
and Sample code
ANSI js always fails when read locally, but not always when read via the web. I have seen it both ways!
Ctrl-F5 (reload scripts) causes this to look like the other browsers with 3 black diamonds
Based on those results
Chrome 63 - Local vs Web
I was totally surprised to see different results after uploading these pages to a server - errors and results that I spent several days trying to characterize were no longer the same. New browser differences appeared. Many of these are intermittent - they fail the first time, and then magically go away.
|Chrome 63 test summary|
|ANSI encoded file||All symbols correct||ANSI js may be black diamond - ctrl-F5 to fix|
|ANSI with hex codes||All symbols correct||ANSI hex_js may be black diamond - ctrl-F5 to fix|
|ANSI Chrome Fails||Always fails||Always works - no issue|
UTF8 with hex codes also has intermittent results - the second Degree Symbol on the ANSI canvas fails in all browsers, but with Chrome 63, ctrl-F5 fixes it.
(Intermittent software - give me a break!)
How I found this problem
ï »¿/* At beginning of the file - the space between the first 2 characters was not there (Â°C) This produced (°C) on the canvas
|Degree Symbol on a Canvas|
|Correct - Before "fix"||After "fix" (Saving with Wordpad)|
||This is how the Degree Symbol should look
||Black diamond where the Degree Symbol should be
Ways to represent the Degree Symbol
Standard representations of the Degree Symbol
|w3schools refs ansi ISO-8859 UTF-8|
Just to be different, in wordpad the Degree Symbol can be entered via the number pad using either numeric code - 248 or 176!! Using notepad, 176 produces a black bar and a unicode warning when the file is saved. (More details below)
"(\xB0C)" context.fillText("25\xB0C", 100, 100);
For completeness, HTML5 supports special codes for °F and °C, but whether or not they are supported is browser dependent (ie, don't use these).
|Char||Dec||Hex||Name||Code used in html|
|℃||8451||2103||Degree Celsius||℃ or ℃|
|℉||8457||2109||Degree Fahrenheit||℉ or ℇ|
As mentioned above, the extended ASCII codes can be entered via the number pad. Since one of them (the Degree Symbol) is ok in an ANSI encoded file, I assumed (incorrectly) that they all are.
When I entered the Degree Symbol into notepad via the number pad alt + 248 and attempted to save the file (using alt fs) - no problem.
However, I originally tried alt + 176 (the decimal ANSI value). In this case, alt fs produced the following message (same in XP and 10 - with the long path and filename removed).
This file contains characters in Unicode format which will be lost if you save this file as an ANSI encoded text file. To keep the Unicode information, click Cancel below and then select one of the Unicode options from the Encoding drop down list. Continue?
To be clear, I discovered the ANSI and UTF-8 options only because I mistakenly tried the wrong keyboard value and saw that message. Previously, I had no clue. These can be accessed any time via the File / SaveAs... dialog under Encoding and I used them repeatedly while trying to understand this problem.
As mentioned above, wordpad has a similar option for ANSI (Text Document), but no option for UTF-8. Just to be different, the Degree Symbol can be entered using either numeric code - 248 or 176!! When entering other, random, codes, it does not produce a warning message.
When visiting the original application page and inspecting the Response Headers via the Chrome debugger, these are displayed.
Content-Type:text/html; charset=UTF-8 Last-Modified:Fri, 20 Mar 2015 05:42:55 GMT
However, this means that the Degree Symbol problem was "fixed" (hacked) before I removed the tag in the next version.
Typically, I use wordpad to remove the unix line endings - in this test of the html file, it did and it saved the file keeping the UTF-8 encoding. This is interesting since the "problem" this page is about was caused by wordpad saving a file with UTF-8 encoding as an ANSI encoded file. I don't remember (since it was several days ago), but I probably selected either Text Document or Text Document - MS-DOS Format on the SaveAs.. dialog - which I have verified both convert the file encoding from UTF-8 to ANSI which is what I was originally (and cluelessly) trying to do.