SKIP
Suppose you want to create a document (Web page, e-mail, whatever) that
contains some Chinese characters, not pictures of them but real live
UniCode characters from the CJK (Han unification) planes? How do you
find the appropriate characters to enter into your document? It depends
on what you know about each character:
- If you already have the desired character in some other document,
such as some of your earlier writing, or an online Chinese language
"newspaper", and you can find the particular character when you need it, then
you can simply copy and paste.
- If you know how to pronounce it in Mandarin or Cantonese, and you
know how to code that in pinyin, you can use the pinyin to directly key
it on the keyboard of your computer, if it's so equipped, else you can
use the pinyin to find the character in an online Chinese-language dictionary,
then copy+paste from there.
- If you know what the character means in English, you can look it up
in an online English-to-Chinese dictionary, then copy+paste from there.
- But suppose you have none of that information. You've seen the character
in visual-only form, such as in a Chinese-language newspaper (hardcopy)
you've found lying around, or within open-captions on a Chinese-language
program on television, and there's no Chinese-English bilingual person
handy to give you the slightest clue what it means, and you don't know
enough Mandarin or Catonese to be able to follow the rest of the characters
in the audio dialog to be able to hear (by process of elimination)
which sound corresponds to that one character that arouses your curiosity?
You need some way that you can somehow describe, to the computer, how the
charater looks visually, and have the computer determine which character
you are describing. That's the topic of the rest below.
SKIP
Methods to look up a character by its shape:
- You might be able to use a scanner to produce an image of the text,
then use Chinese-language OCR (Optical Character Recognition) to identify
each character and produce the sequence of UniCodes for them. I'm not aware
of any such software available, and this seems like an awful lot of
hassle, especially if you have to travel to a public computer lab to get
access to the scanner each time you want a few characters scanned.
- If you have easy access to the scanner, but don't have any OCR
software, you could scan the text and post to the net, asking if anyone
recogizes these characters and could please key them in for you (or
use the other methods listed below) and return the result to you online.
You could even pay people online to provide this service for you, if you
have money, or after NewEco
implements third-party
contract work. Or you could use the scanned text within a ReCaptcha
system.
- Sometimes you can identify the primary "radical" within the character,
and count the number of additional strokes in the character. If so, you
visually look up the radical in a list of standard radicals, then key in
the number of additional strokes, then look down the list of results hoping
to see the complete character. This can fail because there are two or more
similar-looking radicals and you picked the wrong one, or because you
miscounted the number of additional strokes, or because you simply didn't
recognize the character even though you were looking right at it as you
scanned a list of several separate pages of a list of over a hundred
characters that share the same radical and same number of additional strokes.
My personal experience is that the process seldom worked for me, and even
when it did, it was an awful lot of work just to find one character.
- There's a system called "four corners" for identifying Chinese
characters, where you type four digits somehow identifying what shape is
located near each corner of the character, and the lookup engine shows
you all known characters that share all four corners. But I've never seen
an explanation what numbers to use for what shapes, so I've never been
able to assess how well this system actually works.
- For common characters, you can eyeball-scan an online newspaper,
looking for it to appear "at random". For the very most common characters, you
are likely to see each several times on each page of text, allowing
you to quickly eliminate your workload to just the few not-so-common
characters you are seeking. I found this method much easier to use
than the radical+strokes method, getting more than half my desired
commonly-seen simple-looking characters completed within a couple
hours of searching.
- But what if none of those existing methods are suitable for the
remaining characters which aren't common enough to find by accident
while scanning online newspapers? Below I propose some new methods
for directly describing the various strokes of a character into a HTML FORM,
whereupon the WebServer application could look for your description within
a database that I have set up. (But this won't help me when I'm first
setting up the database!! I'll probably use the services of a Chinese-English
bilingual person, if I can find any. Such services would not be suitable
for "every time I see a character I'm curious about", except to pass time
at a senior lunch if I happen to be sitting near a Chinese person, but
would be suitable when doing a one-time inclusion of that character into
a database.)
SKIP
New ideas for describing a Chinese character:
- Use a JavaScript or Java-applet client-side application to track
the mouse to directly draw strokes and record the key information about
what was drawn. Of course this wouldn't work at all on cell-phones, nor within
a text-only browser such as lynx, so
I haven't considered implementing it myself.
Update 2013.Jul.17: @rudharcom
found
a
Website
which allows me (in Google
Chrome for example) to use mouse to draw the character (using JavaScript
tracking presumably) then submit it for recognition. For simple charcters
it finds the desired character more than half the time, but for very complex
characters it almost never works.
- Start with blank image, and overlay several possible most-prominent
strokes within characters,
and have user select from them somehow. Then show the selected
most-prominent-stroke overlayed with several possible next-most-prominent
strokes,
and have user select from them somehow. Keep going in this way adding one
new stroke with each user interaction. At any point when the strokes
so-far are in fact a completed character, show info about that character
before showing the overlay+options to skip that character to build towards
a more complicated character. For a partial prototype
of this method, see
here
- Instead of having a limited number of pre-defined strokes to identify
visually for selection, have a grid of checkboxes, and check the locations of
the start and end and each bend (if any) in a single stroke. Since strokes
always start at upper left corner and proceed to lower right, and never
complete a closed shape within a single stroke, in most/all cases the computer
can unambiguously decide which marked location is the start and the sequence
of marked locations from there to the end. Of course this method would
work efficiently only on devices that have a mouse. It would be a royal pain
on a cell-phone or text-only Web browser to need to step through all the
unused checkboxes to check just the few that are being used.
- Use a grid, but manually key in the row-column indexes instead of
checking boxes. This would be more awkward than checkboxes if a mouse
is available, but would work better on cell-phone or text-browser or any other
device not having a mouse.
To protect against row/col coordiates getting mixed up, an "algebraic"
notation could be used, where letters denote columns and numbers denote
rows.
Chinese references/tools
.
.
.
.
.
.
.
.
.