[ MC Home ]

Instructions for Concordancing Chinese E-Texts
using MonoConc Pro 2.2





The instructions on this webpage are for using Michael Barlow's MonoConc Pro 2.2 program to make concordances using e-texts that are encoded in double-byte, East Asian encoding systems, with examples given from Big5- and GB-encoded Chinese e-texts. The instructions on this webpage, however, are also applicable for e-texts encoded for Japanese and Korean. MonoConc Pro 2.2, pre-release version, Build 231 (22 September 2002), is used for preparing this set of instructions. English Windows 2000 is used for preparing the screenshots on this webpage.

  1. To start the program, double-click on the file, MonPro.exe. Once the program is open, a simple screen appears, as shown in the screenshot below. This initial screen looks rather bare, as it contains only a blank window, with no files yet loaded, and only two items in the menubar, File and Info.
    MonPro20 - 1

  2. From there, first go to the File menu, and select Language.
    MonPro20 - 2


  3. Language selection is determined by the encoding in the e-text(s) that will be concordanced. For example, a GB-encoded e-text requires the selection of "Chinese (PRC)" or "Chinese (Singapore)," while a Big5-encoded e-texts requires the selection of "Chinese (Hong Kong)" or "Chinese (Taiwan)." Below, the left screenshot shows the Select Language window with "Chinese (PRC)" as the language selected. After selecting the language, click on Font in that window to select your (Unicode-based) font. Given the language selection, the font must be for GB-encoding. (For "Chinese (Hong Kong)" and "Chinese (Taiwan)," for example, a Big5 font is needed.) On the right screenshot below, the GB-encoded font selected here is "SimSun." (At this time, MonoConc Pro 2.2 can only handle GB(K) and Big5-encoded files, and cannot yet handle UTF8-encoded files nor word-processed files (e.g., MS Word .doc/.rtf files).) Other selections include font color, font size, font style, etc. Be sure to select the correct Script. Here, since the language selected is "Chinese (PRC)," the script (i.e., encoding) chosen is "CHINESE_GB2312", as displayed below.
    MonPro20 - 3


  4. You are now ready to load your corpus from your computer. Select File in the menubar, and then select Load Corpus Files. The corpus may consist of one or more files. (Note: The program can also load a webpage (HTML file) directly the Web, by selecting "Load Corpus URL" instead.)
    MonPro20 - 4


  5. Below, a GB-encoded, spaced e-text was loaded as the corpus.

    MonPro20 - 5


  6. To conduct a search of the corpus (i.e., to make a concordance), select Concordance from the menubar, and then select Advanced search, which allows for searching of regular expressions, including a string of text that is part of a word (e.g., a regular expression search for 'or' would yield such search results as 'core,' 'for,' 'fore,' 'ford,''score,' 'spore,' 'word,' 'words,' and so forth), a search method that is especially useful for searches in non-spaced CJK e-texts. (Note: Build 231 through 235 of MonoConc Pro 2.2 cannot correctly search non-spaced e-texts, but Build 236 (24 October 2002) onwards can correctly search non-spaced Chinese e-texts. -- Info added on 10.27.02 based on field-testing of MonoConc Pro 2.2, Build 236. (There are still some minor display problems, so there will be further field-testing before MonoConc Pro 2.2 will be commercially available this fall.))
    MonPro20 - 6


  7. In the Advanced Search window shown below, the input for the search is the monomorphemic, monosyllabic word, zhǐ 指 'finger, to point.' To input the search word or string, such input software was Global IME or an external encoder/decoder (e.g. NJStar Communicator) may be used. Observe, also, that in the Advanced Search window, other settings, such as "Regular Expression," have also been selected.
    MonPro20 - 7


  8. The result is a total of 145 hits, with Keyword-in-Context (KWIC) display shown in the large, lower window in the screenshot below. In that window, the fifth row is highlighted. In that particular occurrence, zhǐ 指 combines with dǎo 导 to form the polymorphemic word, zhǐdǎo 指导 'to guide, to direct.' The small, upper window displays the context in which that particular token of the searched string, zhǐ 指 'finger, to point,' appears in the e-text. Notice, also, that some characters adjacent to the searched character are highlighted in blue, including dǎo 导 . Items highlighted in blue are frequent collocates of zhǐ 指 in the corpus. Highlighting of frequent collocates is the default display setting in the program. To disable this display, go to Display in the menubar, and then de-select (un-check/un-tick) the option, Highlight Collocates. After Highlight Collocates is de-selected, only the searched word is highlighted in the KWIC display window.
    MonPro20 - 8


  9. The searched results in the KWIC display window can, in turn, be sorted. In the screenshot below, under Sort in the menubar, the sorting selected is 1st Right and then 2nd Right.
    MonPro20 - 9
  10. The result of the sorting is given in the screenshot below. Centrally displayed are all fourteen (14) occurrences in the corpus of zhǐ 指 'finger, to point' combined with huī 挥 'to wave, to brandish' to form the polymorphemic word, zhǐhuī 指挥 'command, direct.'
    MonPro20 - 10


  11. Using the same corpus, a search is conducted for the string, 指 挥. (Note that since there is a space between 指 and 挥 in the e-text, a space is also needed in the search string.) The concordanced result is the same fourteen (14) occurrences of the word, zhǐhuī 指挥 'command, direct,' as shown in the screenshot below, where the results have been sorted by "1st Right", and then by "2nd Right."
    MonPro20 - 11


  12. Each of the concordanced results can be saved as a file for analyzing later. As shown in the screenshot below, first select Concordance from the menubar, and then select Save as File. You can then choose to save the file as a TEXT (.txt) file, or as an HTML (.htm) file.
    MonPro20 - 12


  13. Note that concordancing programs, including MonoConc Pro, can also be used to obtain frequency lists. Be sure that the e-text contains spacing between characters (or polysyllabic words cí 词). After loading your corpus, select Frequency, and then Corpus Frequency Data, and Frequency Order, for the list to be sorted by frequency, as shown below.
    MonPro20 - 13


  14. The results are shown in the left screenshot below, with three columns of information: the left-most column shows the "count" (number of tokens/occurrences in the corpus), the second column the "pct" (percentage of occurrences in the corpus), and the third column the "word" (here, a character or punctuation mark; i.e., a string separated by a space). As shown in the third column, ignoring puctuation marks, the most frequently-occurring characters in the corpus, in order from most frequent, are: 的, 我, 是, 不, and 了. The frequency list can be saved by going to Frequency, and then Save as File, to save the list as a TEXT file. The right screenshot below displays the saved frequency list opened in a text editor.
    MonPro20 - 14


The above provides a simple set of step-by-step instructions to help you get started.1

----------
Notes:
1 For alternative methods to make concordances using Chinese e-texts, see, for example, my instructions in the following two webpages:

Instructions for Concordancing Chinese E-Texts using Wenlin

Instructions for Concordancing East Asian E-Texts using Concordance


Top
[ MC Home ]

Created by Marjorie K.M. Chan on 11 October 2002. Last update: 25 March 2005.
Copyright © 200x Marjorie K.M. Chan. All rights reserved.

URL:     http://people.cohums.ohio-state.edu/chan9/conc/monpro.htm