Doctypes, Compatibility Modes, Charsets and Fonts

LMAX Exchange

This information is all covered in much more detail elsewhere on the web but for my own future reference, here’s a primer on doctypes, compatibility modes, charsets and fonts which was required to explain why certain Chinese characters weren’t showing up in IE8. Of course the best answer is that you need to have the East Asian Font pack installed and then it just works (usually) but this tends to be useful background and saves “server side” folks from a number of gotchas.

Doctypes and Compatibility Modes

  • IE 7 and above has an insane array of compatibility modes which are out to get you. The most common gotcha is that it will use compatibility mode (emulating IE7) if the website is in the “intranet zone”. There’s an option to disable this somewhere in the preferences dialogs.  You wind up in the intranet zone if you’re accessing a site via any domain name that doesn’t look like a real one (e.g. http://dog/ is in the intranet zone).
  • If you can avoid falling into compatibility mode, any pages including the DOCTYPE as <!DOCTYPE html> or <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> will render in standards compliant mode (where the world is as sane as it gets in web development). Go with the shorter version unless you have a reason not to.
  • http://hsivonen.iki.fi/doctype/ is the gold standard for information about browser modes.

Character Sets and Fonts

  • If you’re tracking down problems with foreign languages there are two major categories of problems – encoding corruption (where characters come out garbled or as ?) and missing glyphs in fonts (where characters come out as little boxes).
  • Corruption is fixed by specifying the same character encoding everywhere. It is a security issue if any webpage is missing a meta tag defining the character set (smallest variant is <meta charset="UTF-8">). It must be the first tag in the <head> of the document.
  • Little square boxes mean that either the font currently in use doesn’t have a glyph for that particular character and the font fallback routine was unable to find any font on the system which supports that character. 
  • Browsers have a default stylesheet which is automatically applied to every page which commonly sets a specific font-family and font-size for text input elements, so adding the style body { font-family: 'Arial Unicode MS' } may get some Asian characters working in the main content but not in text boxes unless you also add input { font-family: inherit; }.

The security issue mentioned above is that any page which doesn’t define a character set but includes any form of user supplied content is vulnerable to a cross site scripting injection attack – even if the user supplied content is escaped properly, because the content may include a character that causes the browser to incorrectly switch to UCS-7 or other weird character sets and drastically change the meaning of the content on the page (hence the user content is no longer correctly escaped). There have been steps taken by modern browsers to remove this risk (including removing support for UCS-7 I believe) but its good practice to specify your charset explicitly anyway.

Any opinions, news, research, analyses, prices or other information ("information") contained on this Blog, constitutes marketing communication and it has not been prepared in accordance with legal requirements designed to promote the independence of investment research. Further, the information contained within this Blog does not contain (and should not be construed as containing) investment advice or an investment recommendation, or an offer of, or solicitation for, a transaction in any financial instrument. LMAX Group has not verified the accuracy or basis-in-fact of any claim or statement made by any third parties as comments for every Blog entry.

LMAX Group will not accept liability for any loss or damage, including without limitation to, any loss of profit, which may arise directly or indirectly from use of or reliance on such information. No representation or warranty is given as to the accuracy or completeness of the above information. While the produced information was obtained from sources deemed to be reliable, LMAX Group does not provide any guarantees about the reliability of such sources. Consequently any person acting on it does so entirely at his or her own risk. It is not a place to slander, use unacceptable language or to promote LMAX Group or any other FX and CFD provider and any such postings, excessive or unjust comments and attacks will not be allowed and will be removed from the site immediately.