Importing a Custom Dictionary in 2.0.1

kenianbei

秀才
I'm trying to import a creative commons licensed Buddhist Dictionary as a custom dictionary. I've got dictionary saved as a tab delimited text file, with three columns, Chinese, Pinyin, English. No header. When I try to import it shows up fine in the preview, no character encoding problems. However, each time I try to import, it stops at 2177 entries. The file should contain about 16828 entries. I checked the entries around 2177, and there doesn't seem to be any characters that would mess it up... Any thoughts? Any (control) characters I could search for that would mess up the import? Or is this a limitation on the part of Pleco? Should I try splitting the file into 10 parts? Any help will be greatly appreciated, thanks!

EDIT: I've figured it out. There were two entries that must have surpassed the length allowed by the database.

P.S. If anyone is interested in having the dictionary, I think it would be legal to post, as it's copied from this website: http://www.acmuller.net/soothill/soothill-hodous.html, which is CC licensed. The author states the dictionary is public domain, so it should be fine to post.
 

mikelove

皇帝
Staff member
Glad to hear that explained it. But this still seems like a bug - could you possibly e-mail me your import file? (along with which entries were causing the problem) There shouldn't be any upper limit on entry sizes, really.

And I'm interested to hear that Soothill is CC-licensed now - don't think it was before. Perhaps we can get an official Pleco conversion of it out at some point.
 
After the dictionary has been saved, can it be passed along as a dictionary file? Are there ways to name it?

Also, I'm not clear on Chinese copyright law (or honestly, from living here, whether one actually exists) but if you own one or even several electronic editions of a dictionary, so long as you aren't using it on two devices at once, you should have the right to view it on any device. I'm not clear, I plan to inquire into the legality of this with a law professor.

Rob
 

mikelove

皇帝
Staff member
Yes on both counts - you can rename it / change its icon in "Manage Dicts," and the database file for it should be in My Documents unless you've moved it.
 
I've had some irregular behavior when I tried to import a personal dictionary (one that I made myself based on "Remembering Simplified Hanzi" by Heisig-Richardson)
I used the format <headword><tab><pronunciation left blank><tab><definition>. Everything seemed to work fine, except that the pinyin didn't import for ANY of the dictionary entries. Fine. Maybe "Fill-in-the-blank fields" is only a feature of the Flash card import.

So then I wanted make flashcards from the Heisig method that were divided into categories by lesson. I used the "Merge Cats" setting for duplicate entries. I used the following format:
//Lesson 1
<headword>
<headword>
<headword>
etc...
//Lesson 2
<headeword>
etc.

Then for my "Heisig" flashcard profile, I force the dictionary to be Remembering Simplified Hanzi, the one I had just created. Some of the cards seem to work great (many of them even now have pinyin in the pronunciation field, which is a bonus...and its nice to use pop-up definitions to flip the the other dictionary definitions for that word), but there are many cards with the pinyin missing, and I just realized its because these cards were not paired properly by their headword.

e.g. If I'm in the Manage Flashcard screen, and I search for the headword 旧. TWO flashcards appear with the exact same headword. One has the "Heisig Lesson 1" category, the other has all the other categories I already had for that headword. They apparently did not "merge cats" properly. Many cards, however, did merge properly. i.e. the "Heisig Lesson 1" category was applied to the cards that already existed.

Is it possible to "merge Cats" after the flash cards have been created? I there even a way to find these "duplicate cards" without just searching for their headword one at a time?

As I was searching for an answer in the online manual, I noticed what might be a typo. http://www.pleco.com/manual/flash.html#import It explains each of the Duplicate Entry import options, but then it says
We recommend "Skip, keep cats" unless you have a particular reason to prefer another option - it generally provides the best results.
There is no "Skip, keep cats" option, but I'm pretty sure it is just a typo for "merge cats", but a confusing one for users that never did beta testing.
 

mikelove

皇帝
Staff member
Thanks for the note on the manual typo - I've fixed it in the online version now.

As far as this import problem, fill-in-the-blank-Pinyin indeed only works in flashcard imports and only if you're linking them to dictionary entries - there are too many words our dictionaries don't cover (even the ABC) for us to easily fill in Pinyin for other imports.

With the cards not coming up as duplicates, my guess would be that that's because the cards aren't actually identical - with 旧, for example, the card that's linked to a regular Pleco dictionary entry would probably also include the traditional character 舊, whereas the entry in your custom Heisig dictionary might not. The missing Pinyin might also be a factor.

There's really no good solution to this problem - if we flagged cards as duplicates as long as just one version of the headword matched, people would constantly be getting frustrated in cases like 干/幹/乾 (which are given separate entries in many dictionaries and are in fact often completely different characters that were merged when they rolled out simplified). All we can really do is improve the system for prompting the user when the software encounters a duplicate card - that could perhaps be extended to cover cases where one version of the headword matches, along with showing you a list of potential duplicates in the main dictionary interface (and not just in the import / remap screens like it does now). But absent a user prompt I think the current default behavior is really the best way to handle duplicates.

You can search for duplicates in Manage Cards (choose an Advanced search and it'll be one of the fields you can search for), though if the software didn't think the cards were duplicates the first time it probably won't flag them as that now either.
 

Dmitry

Member
Dear Mike,

How can I remove a custom dictionary?

I have added a dictionary and then realised that is it not a freeware, but something like a home-made adaptation of a part of a protected software. Also, after that installation Pleco neets quite some time to start after I press Pleco icon on the screen, apart from that everything runs well.

Tried to do something about it, but cannot find any "remove" commands. Should I reinstall Pleco?

Thanks!
 

mikelove

皇帝
Staff member
Exit Pleco completely using the Quit command in the Dict menu, then open up File Explorer (Start / Programs) and navigate to your My Documents folder and you should see a file in there with the same name as that custom dictionary - delete that file to get rid of the dictionary.
 

Dmitry

Member
Thank you, Mike! It worked right away, I tried to do it earlier, but I did not quit Pleco, so it did not work. Thanks again!
 

kenianbei

秀才
Hi Mike,just wanted to check in about the importing long entries bug... was that ever figured out? I'm importing a revised version of a chinese-chinese dictionary (also public domain) called 丁福保佛學大辭典, and there is an entry that is 5,293 characters that halts the import. Seems like iPhone also has this limit? It's not an important def, but I'd hate to chop it up before hearing from you whether this can be fixed, particularly if you are still interested in offering it as a free dictionary. Thanks!
 

mikelove

皇帝
Staff member
Haven't figured it out yet, but we'll take another crack at it now that the iPhone flashcard / user dictionary version is out. Certainly might be interesting to offer it as a free dictionary, though it's gotten less important with the new user dictionary format (which performance-wise should be pretty comparable to the built-in one on an iPhone at least - just lacks full-text search at the moment).
 
Top