In all cases, text is placed on a canvas using javascript code similar to the following.
context.fillText("25°C", 100, 100); |
Unfortunately, in some cases (details below) the Degree Symbol is shown on a canvas as a black diamond with a question mark ! :(
After significant research, I think I understand (most of) what is happening. Basically, the Degree Symbol is encoded differently, and interpreted differently, depending on whether a text file is stored as (encoded as) ANSI or UTF-8. For most browsers (that I've tested)
These are the testcases - UTF8 ANSI ANSI with Chrome Failure ANSI with meta tag - each one is configured to test a specific combination of parameters.
While developing these pages, I stumbled on to (and documented) several browser design problems.
Controlling the encoding
Application | Encoding Option | |
---|---|---|
notepad | ANSI | |
Unicode | examdiff thinks this is binary | |
UTF-8 | examdiff can compare these | |
wordpad | Text Document | ANSI encoding, also changes unix line endings to DOS version |
Text Document - MS-DOS Format | ANSI encoding | |
Unicode Text Document | examdiff claims it is binary |
I was able to verify that the Chrome 49 SaveAs... produces a UTF-8 file. However, I could only get that option via Inspect / Paused / Sources (where only javascript files can be edited, it was not available via Inspect / Elements where the main html file can be edited).
Warning: | If the main html file is encoded as Unicode, then javascript files encoded as ANSI (8 bits per character) may be automatically assumed to be Unicode (UTF-16 - 16 bits per character) and, as a result, will not be processed. Using the Chrome 49 debug window, the file was read, but the characters were nonsense. On Windows 10, Chrome and Firefox had similar issues, however, MS Edge and IE correctly interpreted the javascript file. |
The character encoding for the early web was ASCII. Later, from HTML 2.0 to HTML 4.01, ISO-8859-1 was considered the standard. (ANSI is identical to ISO-8859-1 except that ANSI has 32 extra characters.) With XML and HTML5, UTF-8 finally arrived, and solved a lot of character encoding problems. |
For example, I took one of my test files, copied it, and saved the copy using each of the encoding types - the sizes in the table below are in actual bytes.
Code File Size on Size Disk ANSI 5,054 8,192 UTF-8 5,060 8,192 Unicode 10,110 12,288 |
Combine that with the warning above, and the fact that this page is necessary, and you get an idea of why I don't like unicode. There is more, but that is not the purpose of this page.
Other file issues
examdiff won't compare them because it claims that they are binary!!! |
This was pretty frustrating. The problem was that the UTF-8 text file starts with 3 "binary" (not visible in notepad) characters that define the type of encoding and only had a few additional (normal) characters in it. When I added more characters - the "problem" went away. Specifically.
While the fix was extremely simple, I lost a lot of time trying to figure out why notepad was creating binary files! when the real problem (I think) was that examdiff was attempting to verify that the file was not binary - and that test was failing.
With wordpad, examdiff reported that any file saved as Unicode Text Document is also binary. Just a guess - it is saved as UTF-16 - but I have no way to verify that. At any rate, and because of this, I only use notepad to intentionally produce unicode files.
Meta tag
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> |
At any rate, when I made a local copy of an application which displayed Degree Symbols, Chrome not only saved the html text as UTF-8, but it also automatically added that tag.
Of course, when that tag is present and the html file is actually encoded as ANSI, there are problems!!!
As shown on the separate test page, the Local HTML and ANSI js canvases fail with the black diamond .. as do all the Degree Symbols present in the html text. Only the symbols entered as ° and those placed on a canvas via a javascript file saved with UTF-8 encoding (UTF-8 js) are rendered as expected
At some point (several years ago) I removed this tag from the file while troubleshooting basically the same Degree Symbol problem. Since Chrome obviously added it, and since I had no idea why, I simply deleted it (actually, commented it out). Since the page still displayed the same as before, I figured that it had no purpose and left it that way. During research for this page, I re-enabled it and discovered this issue.
Notice that this failure, and the Chrome 63 only issue (testcase) show black diamonds in exactly the same locations - except that this error occurs in all (tested) browsers .. not just Chrome 63.
The ANSI js canvas error always occurs when the test page is loaded from the local file server .. but sometimes works (shows the Degree Symbol) when loaded from a remote web server. I inspected the HTTP headers, but can not determine why this happens.
Testcase Description
(For some applications, either (or both) the html and javascript may be generated by a program on the server. The problems described on this page still apply, but I have made no effort to test those scenarios.)
In order to demonstrate this issue, I have provided 4 additional html pages and 2 javascript files.
There are a number of additional testcases (scenarios) that could be (and, perhaps, should be) tested, but these are enough (for now) to suggest a fix for any additional failures that might be encountered.
Those four testcases were run on 5 browsers.
Browser | Version | OS | Issues | Issue Details |
---|---|---|---|---|
Chrome | 49 | XP | ||
63 | 10 | Fails if 3 character unicode header anywhere on page | Chrome 63 | |
Firefox | 47 | 10 | ||
MS Edge | 25 | 10 | Refused to read the javascript files with comment in the header | MS Edge |
IE | 11 | 10 | Strange - OK when launched via Edge, may Fail with refresh | MS IE11 |
These are screen captures of the various testcases - when testing from a file server, only 3 are necessary since 2 of them are identical. However, when testing via the web, a few of the results are different (and not consistent!).
UTF-8 encoded file | |
---|---|
This is why I suggest that all javascript files which contain the
Degree Symbol be saved as UTF-8
| |
ANSI encoded file | |
---|---|
When the main HTML file is ANSI, it doesn't matter how the
javascript files are encrypted
| |
These 2 testcases (files) produce the same results
The ANSI encoded file with a meta-tag override (error) demonstrates that issue ANSI encoded file plus a few extra characters demonstrates the Chrome 63 design problem | |
---|---|
Notice that there are 3 black diamonds - Local HTML, ANSI js,
and Sample code
| |
Weird results when read via the web - Chrome 63 only
The ANSI encoded file with a meta-tag override (error) may display differently | |
---|---|
Notice that there are only 2 black diamonds - Local HTML
and Sample code
| ANSI js always fails when read locally, but not always when read via the web. I have seen it both ways! Ctrl-F5 (reload scripts) causes this to look like the other browsers with 3 black diamonds |
Based on those results
This probably explains why my working application (back in 2015) suddenly stopped working and I could not figure out what happened - since the main html file was always saved as UTF-8, I probably refactored some javascript code from the main html file into a separate file saved with ANSI encoding.
Chrome 63 - Local vs Web
I was totally surprised to see different results after uploading these pages to a server - errors and results that I spent several days trying to characterize were no longer the same. New browser differences appeared. Many of these are intermittent - they fail the first time, and then magically go away.
It gets worse - When switching between testcases on the server, sometimes there will be an error. Refresh (F5) has no effect, but ctrl-F5 reloads the same javascript files (no change in files) and the display will change. It appears that the server is screwing with the file formatting! However, Chrome 63 is the only browser (of the 5 I tested) that has this problem.
In general, I can force some issues to go away by pressing ctrl-F5 to reload the javascript files. However, requiring users to perform this "extra" task is never acceptable.
Chrome 63 test summary | ||
---|---|---|
Testcase | Local | Web |
ANSI encoded file | All symbols correct | ANSI js may be black diamond - ctrl-F5 to fix |
ANSI with hex codes | All symbols correct | ANSI hex_js may be black diamond - ctrl-F5 to fix |
ANSI Chrome Fails | Always fails | Always works - no issue |
UTF8 with hex codes also has intermittent results - the second Degree Symbol on the ANSI canvas fails in all browsers, but with Chrome 63, ctrl-F5 fixes it.
Regardless of the results, because I need all my apps to run both locally and via the web, I must follow the most restrictive rules. Since the UTF-8 encoded javascript files appear to always work as expected, that remains my suggested way to avoid these problems.
(Intermittent software - give me a break!)
How I found this problem
ï »¿/* At beginning of the file - the space between the first 2 characters was not there (°C) This produced (°C) on the canvas |
Degree Symbol on a Canvas | |||
---|---|---|---|
Correct - Before "fix" | After "fix" (Saving with Wordpad) | ||
| This is how the Degree Symbol should look
| Black diamond where the Degree Symbol should be
| |
Ways to represent the Degree Symbol
Standard representations of the Degree Symbol
Char | Dec | ANSI Hex | Unicode Hex | HTML Entity | Name |
---|---|---|---|---|---|
° | 176 | B0 | 00B0 | ° | Degree Sign |
w3schools refs ansi ISO-8859 UTF-8 |
Just to be different, in wordpad the Degree Symbol can be entered via the number pad using either numeric code - 248 or 176!! Using notepad, 176 produces a black bar and a unicode warning when the file is saved. (More details below)
In a javascript file, you can
However, you can not use the html Entity - ° (°) - It works in html text, but not when writing on a canvas using javascript.
If you prefer to use a hex string in a javascript string, replace the "°" with "\xB0"
"(\xB0C)" context.fillText("25\xB0C", 100, 100); |
ANSI with hex codes and UTF8 with hex codes use modified versions of the javascript to demonstrate the hex codes on all 3 canvases. They are not included with the other testcases because this is not my suggested solution. YMMV
When run locally with Chrome 63, ANSI with hex codes shows Degree Symbols in all test locations. However, when run via the web, the Degree Symbol typed directly into the ANSI javascript file shows on the canvas and in the debug inspector as a black diamond!!!
For completeness, HTML5 supports special codes for °F and °C, but whether or not they are supported is browser dependent (ie, don't use these).
Char | Dec | Hex | Name | Code used in html |
---|---|---|---|---|
℃ | 8451 | 2103 | Degree Celsius | ℃ or ℃ |
℉ | 8457 | 2109 | Degree Fahrenheit | ℉ or ℇ |
ref |
As mentioned above, the extended ASCII codes can be entered via the number pad. Since one of them (the Degree Symbol) is ok in an ANSI encoded file, I assumed (incorrectly) that they all are.
When I entered the Degree Symbol into notepad via the number pad alt + 248 and attempted to save the file (using alt fs) - no problem.
However, I originally tried alt + 176 (the decimal ANSI value). In this case, alt fs produced the following message (same in XP and 10 - with the long path and filename removed).
\\[long path]\[filename].txt
This file contains characters in Unicode format which will be lost if you save this file as an ANSI encoded text file. To keep the Unicode information, click Cancel below and then select one of the Unicode options from the Encoding drop down list. Continue? --------------------------- OK Cancel --------------------------- |
To be clear, I discovered the ANSI and UTF-8 options only because I mistakenly tried the wrong keyboard value and saw that message. Previously, I had no clue. These can be accessed any time via the File / SaveAs... dialog under Encoding and I used them repeatedly while trying to understand this problem.
As mentioned above, wordpad has a similar option for ANSI (Text Document), but no option for UTF-8. Just to be different, the Degree Symbol can be entered using either numeric code - 248 or 176!! When entering other, random, codes, it does not produce a warning message.
Old Notes
05-28-15 v0.2 Fixed the degree-C symbols in the hlv graphs The degree characters used in html and javascript are different, copy in notepad and wordpad does not work, copy in Chrome does. |
At any rate, for that application, the current version (and all the archive versions) of the main html file are saved as UTF-8 and that is where many of the Degree Symbols are. Apparently (based on file dates), I refactored code from an ANSI javascript file to the UTF-8 html file and that fixed the problem - but without me understanding why.
At some point, based that note and the current file types, I presume that I used Chrome to pasted the Degree Symbol into an existing ANSI javascript file and, when I saved it, Chrome converted it to UTF-8. Later code files worked because they where made by simply copying existing (working) files.
When visiting the original application page and inspecting the Response Headers via the Chrome debugger, these are displayed.
Content-Type:text/html; charset=UTF-8 Last-Modified:Fri, 20 Mar 2015 05:42:55 GMT |
When this original program is saved via Chrome (I did it again today), there are 2 files - one html and one javascript. Both files are saved with unix line endings, the html file was saved with UTF-8 encoding, the javascript file with ANSI encoding. Chrome also replaced the original html header
However, this means that the Degree Symbol problem was "fixed" (hacked) before I removed the tag in the next version.
Typically, I use wordpad to remove the unix line endings - in this test of the html file, it did and it saved the file keeping the UTF-8 encoding. This is interesting since the "problem" this page is about was caused by wordpad saving a file with UTF-8 encoding as an ANSI encoded file. I don't remember (since it was several days ago), but I probably selected either Text Document or Text Document - MS-DOS Format on the SaveAs.. dialog - which I have verified both convert the file encoding from UTF-8 to ANSI which is what I was originally (and cluelessly) trying to do.
Author: Robert Clemenzi