Home | Technical Responses to 
Multilingual Issues
(Summary)
Full Text: Page 1 | Page 2 | Page 3 | Page 4

Fonts and keyboards and non-European scripts
ISO 8859 or Unicode (ISO 10646) is a unique binary code number assigned to every character in every language, no matter what the platform, program or language. Unicode consortium is a non-profit making organisation founded to develop, extend and promote the use of the standard. It can cope with bidirectionality, needed for Arabic and Hebrew and is continually being expanded, nowadays even to include such things as archaic alphabets like ogham (see www.unicode.org
)

This system is favoured by the IT industry, as it the adoption of one method has obvious advantages, but it is by no means the only standard in the field.

Note also commercial products such as those listed at www.fingertipsoft.com based on Unicode. Small caps can be bought which cover the keys of a normal keyboard to aid in the typing of languages using extended versions of the Roman script e.g. ð å þ ñ ç æ ć ł etc. This simple method can even enable the Kanji script of Japanese to be word-processed.

Soft keyboards, or keyboards displayed on a touch screen, may be a flexible way of dealing with some of the problems of non-Roman or exotic scripts.

Languages with thousands of characters, like Chinese, require special software before they can benefit from electronic word-processing. For Chinese, a normal keyboard is used to enter a phonetic spelling of a Chinese word e.g. feng shui according to the Pinyin system of transliteration and the software displays those characters which are pronounced in that way – there may be as many as ten or so. The correct character or characters are chosen and entered in the document. The wrong choice would be the Chinese equivalent of a spelling mistake. This system is very adaptable, enabling the traditional and the simplified Chinese characters to be word-processed. The use of the Pinyin system does however mean that the operator needs to know the Mandarin or Pekingese pronunciation of Chinese, which it is not necessary to know in order to write Chinese by hand. It is however possible to buy software based on the Cantonese pronunciation. (See www.asiasoft.com)

The software takes up more space on the PC’s memory than the word-processing of a language written in the Roman alphabet but in cities where there are considerable numbers of Chinese people it could be justifiable to buy this software and make it available on a dedicated machine in a public library.

Arabic scripts present less of a problem and specially adapted keyboards are available.

Transliteration, transcription and authority files.
In many cases e.g. for the production of catalogues, indexes, toponymic lists and other works of a bibliographic nature which are meant to be used by people who can only be expected to be familiar with the Latin alphabet, or for typographical reasons, it will not be possible or practical to use the characters of a non-Latin script. In that case, transcription or transliteration will be necessary.

Transliteration is the process by which the letters of an alphabetic writing system are converted into the symbols of another alphabetic system e.g. Cyrillic or Greek into the Latin alphabet. There are problems caused by alternative systems of transliteration e.g. Чехов can be transliterated Tchehov or Chekhov.

Transcription may in principle be used for the sounds of any language but it is the only system which can be used for non-alphabetic scripts e.g. the sounds Chinese can be transcribed into the symbols of the Latin or some other alphabetic system.

Clearly there are problems of standardisation as a result of transliteration and transcription. Different systems or variation in practice would cause difficulties in searching databases. At the moment there is no standardised name record format relevant to the needs of European cultural institutions but one is to be developed by the LEAF project (Linking and Exploring Authority Files) funded by the EC from March 2001.

International standards are being developed for this purpose for a variety of languages see ISO TC46/SC2 and Diffuse. There is a standard for the transliteration of Indic scripts ISO 15919:2001, Transliteration of Devanagari and related Indic scripts into Latin characters. There may be software which transliterates Greek and Cyrillic into Latin. See also music.

Machine Translation (MT)
At one time great hopes were entertained of MT but in view of the effort expended on it since the 1950’s, the results may be seen as disappointing. The kind of problems which are encountered and which have proved impossible to solve are, for example:

  • ambiguities in the meanings of words;

  • differences in word order;

  • computers cannot be given any knowledge of the real world or context or readership.

The investment in MT research has recently been scaled down. The effectiveness of MT systems is dependent on a number of factors e.g. documents must be free of any typographic or grammatical errors, words not in the dictionary of the system, or complex sentence structures. It can really only be used to give the gist or general meaning of the document or for screening large numbers of documents to identify those warranting human translation.

Internet search engines offer machine translations of some websites: see www.google.com and babelfish, offering Japanese, Korean and Chinese. There are a number of other sites on the Internet offering MT e.g. Alittera. A gateway to a number of web-based translation services is Babblefish www.babblefish.com.

There are a number of websites offering both free and charged translation services on the World Wide Web. If a URL is entered MT software can translate a webpage and documents can be translated automatically – these sites also often offer translation by human beings, for example World Lingo; AlphaWorks; FreeTranslation and Systran.

(See also personalisation)

Home | Technical Responses to 
Multilingual Issues
(Summary)
Full Text: Page 1 | Page 2 | Page 3 | Page 4


Select a country to view information on public libraries


Digital Guidelines Manuals
Click here to view


The PULMAN
Online Database of Education Resources


Private Section for PULMAN partners only.
Click here to Enter

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Last updated 11/05/2004
Site best viewed with IE 4.0 or above