|
Home
| Technical Responses to
Multilingual Issues (Summary)
Full Text: Page 1 | Page 2 | Page 3 | Page 4
Fonts and
keyboards and non-European scripts
ISO 8859 or Unicode (ISO 10646) is a unique binary code number
assigned to every character in every language, no matter what
the platform, program or language. Unicode consortium is a
non-profit making organisation founded to develop, extend and
promote the use of the standard. It can cope with
bidirectionality, needed for Arabic and Hebrew and is
continually being expanded, nowadays even to include such things
as archaic alphabets like ogham (see www.unicode.org)
This system is favoured by the IT industry, as it the adoption
of one method has obvious advantages, but it is by no means the
only standard in the field.
Note also commercial products such as those listed at
www.fingertipsoft.com based on Unicode. Small caps can be bought
which cover the keys of a normal keyboard to aid in the typing
of languages using extended versions of the Roman script e.g. ð
å þ ñ ç æ ć ł etc. This simple method can even enable the Kanji
script of Japanese to be word-processed.
Soft keyboards, or keyboards displayed on a touch screen, may be
a flexible way of dealing with some of the problems of non-Roman
or exotic scripts.
Languages with thousands of characters, like Chinese, require
special software before they can benefit from electronic
word-processing. For Chinese, a normal keyboard is used to enter
a phonetic spelling of a Chinese word e.g. feng shui according
to the Pinyin system of transliteration and the software
displays those characters which are pronounced in that way –
there may be as many as ten or so. The correct character or
characters are chosen and entered in the document. The wrong
choice would be the Chinese equivalent of a spelling mistake.
This system is very adaptable, enabling the traditional and the
simplified Chinese characters to be word-processed. The use of
the Pinyin system does however mean that the operator needs to
know the Mandarin or Pekingese pronunciation of Chinese, which
it is not necessary to know in order to write Chinese by hand.
It is however possible to buy software based on the Cantonese
pronunciation. (See
www.asiasoft.com)
The software takes
up more space on the PC’s memory than the word-processing of a
language written in the Roman alphabet but in cities where there
are considerable numbers of Chinese people it could be
justifiable to buy this software and make it available on a
dedicated machine in a public library.
Arabic scripts present less of a problem and specially adapted
keyboards are available.
Transliteration,
transcription and authority files.
In many cases e.g. for the production of catalogues, indexes,
toponymic lists and other works of a bibliographic nature which
are meant to be used by people who can only be expected to be
familiar with the Latin alphabet, or for typographical reasons,
it will not be possible or practical to use the characters of a
non-Latin script. In that case, transcription or transliteration
will be necessary.
Transliteration is the process by which the letters of an
alphabetic writing system are converted into the symbols of
another alphabetic system e.g. Cyrillic or Greek into the Latin
alphabet. There are problems caused by alternative systems of
transliteration e.g. Чехов can be transliterated Tchehov or
Chekhov.
Transcription may in principle be used for the sounds of any
language but it is the only system which can be used for
non-alphabetic scripts e.g. the sounds Chinese can be
transcribed into the symbols of the Latin or some other
alphabetic system.
Clearly there are problems of standardisation as a result of
transliteration and transcription. Different systems or
variation in practice would cause difficulties in searching
databases. At the moment there is no standardised name record
format relevant to the needs of European cultural institutions
but one is to be developed by the LEAF project (Linking and
Exploring Authority Files) funded by the EC from March 2001.
International standards are being developed for this purpose for
a variety of languages see ISO TC46/SC2 and
Diffuse. There is a
standard for the transliteration of Indic scripts ISO
15919:2001, Transliteration of Devanagari and related Indic
scripts into Latin characters. There may be software which
transliterates Greek and Cyrillic into Latin. See also music.
Machine
Translation (MT)
At one time great hopes were entertained of MT but in view
of the effort expended on it since the 1950’s, the results may
be seen as disappointing. The kind of problems which are
encountered and which have proved impossible to solve are, for
example:
-
ambiguities in the
meanings of words;
-
differences in word
order;
-
computers cannot be
given any knowledge of the real world or context or readership.
The investment in
MT research has recently been scaled down. The effectiveness of
MT systems is dependent on a number of factors e.g. documents
must be free of any typographic or grammatical errors, words not
in the dictionary of the system, or complex sentence structures.
It can really only be used to give the gist or general meaning
of the document or for screening large numbers of documents to
identify those warranting human translation.
Internet search engines offer machine translations of some
websites: see www.google.com
and babelfish, offering Japanese, Korean and Chinese. There are
a number of other sites on the Internet offering MT e.g.
Alittera. A gateway to a number of web-based translation
services is Babblefish
www.babblefish.com.
There are a number of websites offering both free and charged
translation services on the World Wide Web. If a URL is entered
MT software can translate a webpage and documents can be
translated automatically – these sites also often offer
translation by human beings, for example World Lingo; AlphaWorks;
FreeTranslation and Systran.
(See also personalisation)
Home
| Technical Responses to
Multilingual Issues (Summary)
Full Text: Page 1 | Page 2 |
Page 3 | Page 4
|