Note: This page covers XML files other than XHTML. Examples include RSS, XML files for Flash and other XML standards.
Although the XML standard supports Unicode, it does not support numeric entity codes like ρ for Greek ρ or HTML entity codes like é for accented é
Therefore you have to use keyboard utilities to enter in the Unicode characters raw. Here's how a line of RSS XML code might look for a Spanish name. The next section will discuss how to create a Unicode XML file.
Correct: <title>José Pérez wins grand prize</title>
Incorrect: <title>José Pérez wins grand prize</title>
Note: Some applications may translate HTML entity codes in XML files to the codes to the appropriate character, but you cannot rely on this.
The first line should contain an "utf-8" declaration (in case an application chooses Latin-1 as a default). Here's the UTF-8 declaration for an RSS file
<?xml version="1.0" encoding="UTF-8"?>
<rss>
...
</rss>
Many older text editors support only ASCII or Latin-1 and cannot display Unicode characters. The result is that Unicode characters will "break-up" into sequences of ASCII characters. See list of recommended editors below.
These text editors allow you to easily type encoded text then export them as properly encoded HTML or text files.
NOTE: Because of different technical issues, export from Microsoft Word is not recommended.
NOTE: Because of Microsoft HTML formatting issues, export from Microsoft Word is not recommended.
In all the above applications, you can use either keyboard or character insertion utilities to enter the data.
If you plan to insert your XML data into an XHTML document (or transform it via XSLT), then make sure your XHTML file includes the UTF-8 declaration
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...
</head>
This will help ensure your XML Unicode characters are displayed properly even without using numeric entity codes.
Wednesday, 01-Aug-2007 14:28:51 EDT