2011年6月24日 星期五

Mandarin Chinese Spoken Languages, Transcription Systems and Character Sets

Many dialects are spoken in China. Mandarin is a category of related Chinese dialects spoken in most of the northern, central, and western parts of China. However, Mandarin, as it is known to the world, refers to standard Mandarin (or modern standard Chinese) which is based on the Mandarin dialect spoken in Beijing. Standard Mandarin is the official spoken language known as Putonghua in China. Standard Mandarin is also one of the five official languages of the United Nations, and is used in many international organizations. Phonologic descriptions show that the structural pattern of a Mandarin syllable is an optional initial consonant followed by the vowel, and then optionally followed by a velar or alveolar nasal ending. Another component of the Mandarin syllable is the tone which mainly specifies the syllable’s pitch pattern. Technically, a syllable is presented in terms of its initial, final, and tone. Mandarin is a tonal language because the tones, just like consonants and vowels, are used to distinguish words from each other.

Chinese linguists have proposed various transcription systems for Mandarin. But the most popular ones are Hanyu Pinyin. Hanyu Pinyin was accepted as the official transcription system for the Chinese language in 1958 by the government of China. The transcription system is used in the input of Chinese characters in computer systems.

Today, there are two Chinese character sets used by Chinese-language users, i.e., the traditional Chinese characters and the simplified Chinese characters. The traditional Chinese characters have been used since the 5th century. This character set is still being used in some overseas Chinese communities today. The simplified Chinese characters originate from the official character simplification during 1950s and 1960s. Now, this set of simplified Chinese characters is the official writing system in China, and is accepted by the United Nations. In computer systems, different codes are used for these two character sets. The Guobiao code (GB) is a national standard character encoding in China. It refers to the GB 2312-80 set issued in 1981, or the GB 18030-2000 set issued in 2000. There are 6,763 Chinese characters in the GB 3212-80 code set.

Mandarin Chinese is referred to as monosyllabic because the majority of words are one syllable in length. This is true for classical Chinese, but no longer true for modern Chinese. A large number of polysyllabic words are used today in daily spoken Chinese. One syllable when uttered with different tones corresponds to different characters. A word in polysyllabic form is written with two or more characters. Since Chinese texts have no spacing between words, extra effort is required to segment a sentence into word-parts. Because of these particular characteristics, the design of Chinese language corpora needs extra considerations. Most of the Chinese spoken language processing systems developed recently deal with standard Mandarin. Few of them cater for other dialects, such as Cantonese, Min-nan, Hakka, Wu, etc.

Subscribe to the post comments feeds or Leave a trackback


View the original article here

沒有留言:

張貼留言