Skip Header
Teaching and Learning with Technology
Computing With Accents and Foreign Scripts
TLT Home : TLT Suggestions Skip Menu

South Asian Scripts

This Page

  1. Languages Covered
  2. About South Asian Scripts
  3. Windows vs. Macintosh
  4. Web Development Options
  5. General South Asian Links
  6. Pan South Asian Fonts

Languages/Scripts Covered

 

South Asian Scripts

With the exception of the Thaana script of the Maldives, all scripts of India are derived from one Brahmi script. These scripts are syllabic alphabets in that they consists of consonant symbols with vowel signs. Encoding these languages has been a challenge for several reasons:

  1. Different vowel signs are placed in different locations. For instance, many scripts place the /ā/ sign after the consonant, the /e/ sign before the consonant, while the /o/ sign consists of the /e/ symbol before the consonant followed by the /ā/ after the consonant. See information on the Tamil signs, Sinhala signs and others for examples.
  2. Some consonant combinations are marked with special conjunct consonant symbols.
  3. Unlike Korean encoding, consonants and signs are considered separate entities (that is, there are few "precompiled" characters encoded). It is up to the fonts to display the sounds correctly based on context>
    Note: This was done to make overall encoding schemes smaller and to facilitate transliteration into different scripts.
  4. South Asian markets have been considered small until recently.

These facts combine to make font development technically difficult, and many companies have not been motivated to work on them until recent years.

Fortunately, there has been progress made in recent years, especially for the most common scripts of Devanagari, Gujarati and Gurmuhki. Many freeware fonts and utilities are also available. See the list of scripts below to find more details.

Platform Support

Because of the complex placement of vowel signs for these languages, Unicode fonts are not interchangeable between platforms. OTF fonts work in Windows, but not perfectly in OS X. Apple fonts use ATSUI technology instead. It is better to use South Asian fonts from Microsoft and Apple whenever possible.

Windows Support

Windows XP Supports

  • Devanagari
  • Gujarati
  • Gurmukhi (Punjabi)
  • Kannada
  • Tamil
  • Telugu
  • Thaana
  • Urdu/Sindhi

Windows XP Service Pack Two Adds

  • Bengali
  • Malayalam

Windows Vista Adds

Macintosh Support

Macintosh Supports

  • Devanagari
  • Gujarati
  • Gurmukhi (Punjabi)

System 10.4 (Tiger) Adds

  • Tamil

Freeware Utilites are Available for

X11 Unix Environment

  • Additional Language tools may be available for the Unix X11 environment which comes with Apple.

Linux/Unix

Top of Page

 

Web Development

South Asian Encoding and Language Tags

Encoding: utf-8 (Unicode) , ISCII (older), ITRANS (older)
Use Unicode to develop new pages.

Inputting and Editing Text

One option is to use FrontPage, Netscape/Mozilla Composer or Dreamweaver and change the keyboard to the correct script.  Make sure you specify the encoding in the Web page header.

Another option is to compose the basic text in an international or foreign languags text editor or word processor and export the content as an HTML or text file with the appropriate encoding. This file could be opened in another HTML editor such as FrontPage or Dreamweaver an edited for formatting.

Unicode Chart with HTML Entity Codes

For short texts, such as the yoga om sign ( = ॐ), it may be desirable to use Unicode Entity codes and enter HTML entity codes.

Available Unicode Charts

ISCII vs. Unicode

Before the development of Unicode encoding, the government of India had developed a standard called ISCII (Indian Script Code for Information Interchange). In this standard similar characters in multiple scripts would be assigned the same character number. For instance Devanagari (ka) and Gujarati (ka) would be assigned the same code point. However, most modern development is in Unicode.

Using Encoding and Language Codes

Computers process text by assuming a certain encoding or a system of matching electronic data with visual text characters. Whenever you develop a Web site you need to make sure the proper encoding is specified in the header tags; otherwise the browser may default to U.S. settings and not display the text properly.

To declare an encoding, insert or inspect the following meta-tag at the top of your HTML file, then replace "???" with one of the encoding codes listed above. If you are not sure, use utf-8 as the encoding.

Generic Encoding Template

<head>
<meta http-equiv="Content-Type" content="text/html; charset=??? ">
...
<head>

Declare Unicode

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8 ">
...
<head>

XHTML

The final close slash must be included after the final quote mark in the encoding header tag if you are using XHTML

Declare Unicode in XHTML

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...
<head>

No Encoding Declared

If no encoding is declared, then the browser uses the default setting, which in the U.S. is typically Latin-1. In that case many Unicode characters could be displayed incorrectly. Also, older browsers such as Netscape 4.7 may not be able to process the entity codes correctly without the "utf-8" declaration.

Language Tags

Language tags are also suggested so that search engines and screen readers parse the language of a page. These are meta data tags which indicate the page of a language, not devices to trigger translation. Visit the Language Tag page to view information on where to insert it.

PDF and Image Files

In some cases, your best options may be to use PDF files or image files. See the Web Development Tips section for more details.

Top of Page

Links

These pages cover internationalization of South Asian scripts in general.

Computing

Fonts

See also

©Penn State University, 2000-2009.
This Web page maintained by Teaching and Learning with Technology, a unit of Information Technology Services. For questions or comments on this Web page, please contact Elizabeth J. Pyatt (ejp10@psu.edu).
Unicode character names and hexadecimal entity codes are taken from the public Unicode Character Charts.
Last Modified: Friday, 13-Feb-2009 10:24:36 EST