A few questions about Pleco functions and dictionaries

timseb

Member
Hi!

I have been using Pleco for a while and I also have the professional bundle. I have a few things I have not quiet got a grip on, and will number my questions:

1. Is it possible to import a character or word and get *all* available definitions in the dictionary, and not just one of them? For example, if I import 划, so far I have only found the options to either pick an entry manually or make Pleco pick the first one. The first option is workable if you only import a few characters and know what you're after, but if you import a list of a few thousand characters, it's just not an option. I would really like to pick, for example ABC Dictionary, and get both a card for huá and one for huà, in this case. Is this possible, or can it be made possible? I guess 尽 would be another well known example, but there are of course a lot of them.

2. When exporting cards, is it possible to get all traditional variants of a character provided by dictionary instead of just a single one?

3. How many entries are there in the PLC dictionary?

4. Are the KEY E-C chinese definitions exportable? CC, PLC and ABC for example, while OCC is not. It would be great to know before I buy it.

Thanks!
 

mikelove

皇帝
Staff member
1) Not at the moment; best bet would be to create a separate record in your import file with each pronunciation. (it'll match against the pronunciation field if it's there)

2) No, to be honest there are a lot of really weird archaic variants in there - we're mostly optimizing around search so we'd rather include a rare variant than not - and we don't have great data on which ones are in common use (character frequency itself is a poor proxy since in a lot of cases a character might be common in other senses but rare as a traditional version of this particular one) so we don't think it would be very useful data to export.

If you want an exhaustive mapping of characters to traditional variants the Unihan database does a pretty good job with that.

3) About 120,000.

4) Yes, they are.
 

timseb

Member
Thank you. Excellent answers.

Do you know if it's possible to make a list out of the Unihan database or know of a list of that kind? A list to import to Pleco. A list that would turn 差 into a like the one below for example, and perhaps with the eventual traditional variation for matching. Is it even possible to just export all dictionary entries in some weird way?

差 chà
差 chāi
差 chài
差 cī
差 chài

I tried doing this with Chinese Text Analyzer, which worked OK but with problems. It gives both jìn and jǐn for 尽, but only xuè for 血. For 差 I only get cha1, cha4 and chai1. I'm guessing there are multiple reasons for this, among them that it's not really based on a character dictionary.

That last answer was very welcome indeed, I'm buying that dictionary right away!
 

mikelove

皇帝
Staff member
It does have all of that information, yes:


lists those pronunciations and even a few more. You'd basically want to a) download the Unihan database, b) extract whichever reading band you wanted from the Readings file (kHanyuPinyin or kXHC1983 or whatever), and c) run it through a script to convert the U+ into a character and put each reading and that character on a separate line.
 

timseb

Member
It does have all of that information, yes:


lists those pronunciations and even a few more. You'd basically want to a) download the Unihan database, b) extract whichever reading band you wanted from the Readings file (kHanyuPinyin or kXHC1983 or whatever), and c) run it through a script to convert the U+ into a character and put each reading and that character on a separate line.
Thank you.

I have been using the Unihan website for a few months and really like it, but did not know it could be turned into lists manually. I am not very technical, but also not a complete beginner, so I should be able to make this work! Will be back in a few hours if I don't. I have no idea about scripts though. :rolleyes:
 

timseb

Member
After looking at my downloaded Unihan data for a while and still not understanding a single thing, I gave up. I did however use a generator to get all readings for the characters in question. The list is here. The crux is now how to get each reading on a new row, combined with the character. I'm guessing for people who do coding, the answer is glaringly obvious, but I can't seem to figure it out.

A lot of these readings are either super obscure or plain wrong sometimes(?), but I'm thinking that's not a problem since the dictionary will filter them out anyway.
 

mikelove

皇帝
Staff member
You'd basically want a regular expression search, something like:

^([^ \n]*) ([^ \n]*) ([^ \n]*)
to
$1\t$2\n$1\t$3

repeating until it has nothing more to replace. (in some text editors you might have to replace the $'s with \'s) Make sure that the readings and characters are separated with tabs and not spaces in the final import file.
 

timseb

Member
Thank you. I think I'm moving in the right direction, but misunderstood something along the way. The script (in Notepad++) gave me tab separations between all readings, and duplicate lines, but still there are more readings than one on each row. Sometimes only one row but several readings. The list now looks like this:

一 yi1 yi2
一 yi1 yi4
人 ren2 ren
下 xia4 xia
上 shang4 shang3
上 shang4 shang

EDIT: I might have solved it. I turned all tabs into spaces, and they used your script once again. Now each reading has its own row!
 
Last edited:
Top