Definitely a great addition to my Localization and Globalization info.
OT: Found my first commercial use for ExtendedHtmlUtility.HtmlEncode() today: a client's website is hosted on an ISP's Apache Server - configured to ALWAYS set the HTTP Header Content-Type: Shift_JIS. This was making it impossible to serve Korean and Chinese pages from this server, since W3C says
To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):
- An HTTP "charset" parameter in a "Content-Type" field.
- A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
- The charset attribute set on an element that designates an external resource.
Which means the browse (IE, Firefox, Netscape, etc.) will ALWAYS think the page is Shift_JIS (Japanese) and not display Korean or Chinese text correctly!
By converting ALL the non-ASCII (well, all non-Shift-JIS actually) characters into Html Entities (eg. Ӓ) the page will be successfully displayed in Korean or Chinese with the encoding set to Shift_JIS (because [ & # 1-9 ; ] are all valid Shift_JIS characters, and once they're resolved into their Unicode characters, the browser is happy to display them using whatever font-settings (or mappings) it knows about, regardless of the actual page encoding!.
It's not ideal, but at least it works - even in Netscape 4.7 (as long as you have specified the correct fonts, because we all know how dumb NS4 is at font substitution). I suspect if the pages had any 'text' within Javascript strings/variables/etc that would have caused a problem... Luckily not (this time).
No comments:
Post a Comment
Note: only a member of this blog may post a comment.