Marjorie Chan's Home Page


Using Wenlin for Chinese and Pinyin with Tone Diacritics

Web Pages with Unicode (UTF-8) for Character Set




This is a test to see how well the Wenlin 2.x software works as an editor to generate Chinese text and Pinyin with tone diacritics for automatic viewing on the Web.


The following is a short sentence created in Chinese in Wenlin, together with entering of Pinyin with tone diacritics. The sentence is saved as a Unicode-encoded file (as opposed to Big5- or GB-encoded files) in UTF-8 format (i.e., 8-bit UCS Transformation Format) in Wenlin that is then inserted into an HTML file:

我們打算明天去圖書館.
wǒmen dǎsuan míngtiān qù túshūguǎn.

The Pinyin with tone diacritics can be in a different font, such as Times New Roman or Arial (or some other proportional or fixed-width font style). [If your web browser cannot display the Pinyin-with-tone diacritics below, you may need to download and install the latest versions of these fonts for your Mac or PC from Microsoft's webpage, TrueType Core Fonts for the Web.]

wǒmen dǎsuan míngtiān qù túshūguǎn.     (Arial)
wǒmen dǎsuan míngtiān qù túshūguǎn.     (Times New Roman)
wǒmen dǎsuan míngtiān qù túshūguǎn.   (Courier New)

For the full set of Pinyin-with-tone diacritics in Courier New (fixed-width) font:

āáǎàēéěèīíǐìōóǒòūúǔùǖǘǚǜü

The above Chinese sentence is displayed below in traditional and simplified Chinese characters on the same page:

我們打算明天去圖書館.

我们打算明天去图书馆.


All these displays are done using Unicode-encoded files saved in UTF-8 format in Wenlin without requiring some separate Pinyin-with-tone font in serif and/or sans-serif style. However, it does require that your fonts -- be they Times New Roman, Arial, or some other font -- be Unicode fonts, and not pre-Unicode ones. For other tests, see MC's ChinaLinks2: Fonts. At this point in time, files created in other Chinese systems or dedicated Chinese word processors that I am familiar with *lose* their Pinyin-with-tones when such files are converted to webpages. Only Wenlin has "passed" this test in being able to not only keep these tone diacritics intact, but can be displayed and printed using fonts that are serif or sans-serif, as well proportional or fixed-width.


* Recommendation on setting of font preference for Netscape 4.x to view this webpage and other Chinese webpages encoded in UTF-8:

     EDIT > PREFERENCES > FONTS > ENCODING. For the Encoding: Unicode.
     Choose for Variable Width Font: SimSun.
     Choose for Fixed Width Font: SimSun.

  If needed, also select the following from the menubar:

     VIEW > CHARACTER SET > UNICODE (UTF-8)


(Since I prefer Netscape to MS Internet Explorer, I'll let you adjust the settings in MS I.E. on your own.)

Note: What is stated above concerning web pages also holds for reading files in word-processors such as MS Word 2000, with tone diacritics in Wenlin preserved when the Unicode font for Chinese, SimSun is selected as the font to read the text. One would need to highlight all the copied-and-pasted text and select the SimSun font. I don't know what is the font built into Wenlin's software. The SimSun font, however, seems to be able to handle the characters and display of Pinyin with tone diacritics quite well. The font is *huge*, though, a whopping 10.5 MB! It comes bundled with MS Office 2000 and MS FrontPage 2000, and is on some (or all?) of the computers in 148 Cunz Hall and other computer labs on the OSU campus. If one selects MS Song in lieu of SimSun, in setting the font preference in the web browser for Unicode, that also works, but the display of the Pinyin with tone diacritics doesn't look as nice as using the SimSun font. NJStar Communicator or some other Chinese viewer with UTF8-decoding would also work. However, NJStar Communicator 2.2 did miss some characters/symbols that are displayed in the following webpages, in addition to being able to display them only in traditional characters or only in simplified characters and not intermixing of them. (Some Chinese characters are also missing if one chooses Bitstream's Cyberbit font, for example.) If you have installed the multilingual, Arial Unicode MS font (23.6 MB), you can test out these webpages using it also. Simply replace "SimSun" with "Arial Unicode MS" in the above instructions. (The page looks better using the (proportional-spaced) Arial Unicode MS font than it does using the (fixed-width) SimSun font; that is, roman letters are in proportional space while CJK characters are of a fixed width.) (Also try testing it out on Gyula Zsigri's CJK Fonts webpage, which simultaneously displays Chinese (simplified and traditional), Japanese, and Korean.) For more on that freely-downloadable font, see MC's ChinaLinks3: Unicode.


My interest in looking into this is in part to gain a better understanding of what the possibilities are with new fonts and a wider use of Unicode that extends the size of character sets beyond those of GB-encoding (with some 7,000 characters) and Big5-encoding (with about 13,000 characters). Chinese Unicode fonts that include the ability to type in Tone 3 in Pinyin romanization, for example, should make use of that range of Unicode that covers the "Combining Diacritical Marks." If my understanding is correct, that would then enable developers of software that provides Chinese input to make use of that range to place the caron (hachek) over the vowels for Tone 3 without requiring separate, software-specific fonts for typing Pinyin-with-tones.


Top

The original webpage was created on 27 October 2000 by Marjorie Chan. This is an excerpt of that webpage for the Chinese mailing list, created on 29 March 2001.

URL:   http://people.cohums.ohio-state.edu/chan9/computing/Wenlin_UTF8_PYTones.htm