Skip Header
Teaching and Learning with Technology
Computing With Accents and Foreign Scripts
TLT Home : TLT Suggestions Skip Menu

Encoding on the Internet

6: East Asian Languages and 16-Bit Encoding

Previous Page | Next Page

Large Encodings for Non-Alphabets

The scripts discussed on the last page such as Greek, Hebrew, Arabic and Cyrillic, are all alphabetic or about the same size as the Roman alphabet. But for syllabary scripts or ideographic scripts, the repertoire of characters is can be larger than 256 characters. These scripts require an encoding scheme which can accommodate more characters.

As a result, 16-bit encodings of tens of thousands characters were developed for these scripts. This is also called "double byte" (2 x 8-bit) encoding. In practice, characters are organized in blocks of 192 characters.

Chinese Japanese and Korean (CJK)

Because many East Asian scripts incorporate Chinese characters, they are collectively known as "CJK" scripts, short for "Chinese-Japanese-Korean". The scripts are not identical, but all of them are the same order of magnitude in size.

Top of Page

Encoding Template

To accommodate both English and the other scripts, many 16-bit encodings are structured as follows:

Interestingly, many East Asian encodings also incorporate other scripts such as the Cyrillic and Greek alphabet. Some browsers, especially on the Mac platform, use a Japanese font as the default for Cyrillic or Greek pages.

Top of Page | Encoding Tutorial Index

Previous Page  Next Page

©Penn State University, 2000-2007.
This Web page maintained by Teaching and Learning with Technology, a unit of Information Technology Services. For questions or comments on this Web page, please contact Elizabeth J. Pyatt (ejp10@psu.edu).
Unicode character names and hexadecimal entity codes are taken from the public Unicode Character Charts.

Last Modified: Tuesday, 06-Mar-2007 12:17:26 EST