MakePlecoDict

David

举人
Here's the message I get when I try to run it:
can't find dll library MSL_All-Dll80_x86.dll (default path is .... paths follow)

(in Windows 2000 Professional, traditional Chinese)
 

mikelove

皇帝
Staff member
Yikes, forgot to take that out... I've posted an updated version now (though it's listed with the same version number), so download it again and you should be OK.
 
Order of entries

Mike,

thank you for providing MakePlecoDict! I have a few questions:
  • Is the order of the entries in the file important, that is, do they need to be sorted in advance? Or does MakePlecoDict sort them automatically?
  • How do I get the superscript numbers? Are they ordinary UNICODE characters?
  • How do I get the circled numbers? Again, UNICODE?
 

mikelove

皇帝
Staff member
Re: Order of entries

haraldalbrecht said:
Is the order of the entries in the file important, that is, do they need to be sorted in advance? Or does MakePlecoDict sort them automatically?

Yes, the order is important; in English-to-Chinese they have to be alphabetically sorted (but it'll warn you if you've done it incorrectly), and in Chinese-to-English they have to be sorted in whatever way you eventually want them to appear when the software is in "Dict" mode.

haraldalbrecht said:
How do I get the superscript numbers? Are they ordinary UNICODE characters?
How do I get the circled numbers? Again, UNICODE?

Yes on both counts, you can use the standard Unicode values for those.
 

David

举人
For an entry like this 阿尔法[-爾-], if you don't enter the traditional character in brackets, does that mean you can't search for the entry using only traditional characters?

In the readme, it says that you can omit pinyin, the character, or the definition. I was expecting pinyin, but not the character. That's good flexibility. Thanks for this tool!
 

mikelove

皇帝
Staff member
If you enter the entry in traditional characters only it'll work just as well; the brackets are only required if you want to specify both the simplified and the traditional characters. PlecoDict's search engine doesn't care whether the dictionary is in traditional or simplified mode, it'll look for characters in either version of a headword in both modes. I think we left in support for the / character too, so if you want to specify multiple characters for the same position in a word you can do that using slashes (as in the ABC dictionary).

And the Pinyin-only option *should* work, but I don't think we actually tested it, so if you're planning to create a large Pinyin-only file it might be a good idea to do a few test entries first.
 
Is there a limit on the length of each string? (i.e. number of Chinese characters allowed, length of pinyin etc). I get the message oversized char block in line 17, oversized alt char block in line 247 etc.
 

mikelove

皇帝
Staff member
There's a limit of I think 14 on characters that actually get indexed, but the entries should still show up regardless of length.
 
I removed all entries longer than 14 characters and successfully created my dictionary, Chinese to English. But I am still having problems with the English to Chinese. I am getting an error saying something about it not being sorted correctly (though I sorted before saving as encoded txt.) Any reason why I get this message? Are there any forbidden characters? Is there also a limit on length in English? Surely not 14 though.
 

mikelove

皇帝
Staff member
No, this is because the entries aren't correctly sorted (and we haven't yet added auto-sorting capability to MakePlecoDict) - it should tell you exactly which entries are causing the problem, so hopefully this will make it easy to figure out what isn't working. It ignores punctuation, capitalization, and spaces, so you should organize entries according to that.

If you fix the sort order and you're still getting error messages, it's probably a bug - what little time we spent testing this early release of MakePlecoDict was mostly dedicated to its Chinese-to-English half.
 
I have the data in a Word file, in a table. I sort the English column alphabetically. Then I change the table to text and save as unicode text file. This then becomes the entry file for MakePlecoDict.

Am I doing somehtng wrong? You say it should tell me where the error is but it flashes by too fast and then the DOS window closes. Are there codes I can add to make it work more slowly or output the errors to a log file?
 

mikelove

皇帝
Staff member
Try this: go to the Start menu and select Run, type "cmd" and press OK; this will bring up the Windows command prompt. Navigate to the directory where you've stored MakePlecoDict (using cd to change directories and dir to list their contents), and when you get to the right directory, type "MakePlecoDict.exe". Running it this way will keep the window from closing once it finishes.
 

David

举人
I was able to sucessfully create one dictionary but I'm having problems with the next one, so I've got some more questions.
Maximum length of Chinese headword is apparently 14. What's the maximum length of the other fields?
Are there any characters, Chinese or otherwise, that aren't allowed for each of the fields? That's all I can think of for now. Thanks.
 

mikelove

皇帝
Staff member
Well, the Pinyin field won't be indexed past its 14th syllable, but as with the character field, all of the text should show up even if it isn't all indexed. The total entry has to be less than 8192 characters long; we could expand this to be as large as 32,000 characters without having to make any changes to the file format, but we haven't done so yet. (the longest entry in any of our dictionaries so far is "go" in Oxford E-C, which clocks in at 6500 characters)

There are a few private-use Unicode characters which if inserted into a dictionary entry could cause problems, but aside from those I don't think there are any characters that would stop the software from encoding a dictionary.
 
I realise that the problem is that it is ignoring punctuation and spaces. That means an entry like acquisition device should come after acquisition, child, but Word does not sort them that way. I have hundereds of entries. Can this be changed or can you think of some way I can automate the process of moving these things around?
 

mikelove

皇帝
Staff member
Well you could find another program to sort the list correctly; we have an in-house tool for this (unfortunately not one that we can release just yet). But there's no way to change this in PlecoDict, unfortunately.
 
It would be nice if PlecoDict or the MakePlecoDict programs coudl do the sorting...

In the meantime I have been trying to work this out myself. I tried to automate some of it, and then began to work through changing things by hand and testing every now and then. On the last test, I got this:

heap verify failed!!! Possible program bug!!! Out-of-order English, line 409 etc

And it told me the errors were in the following lines which I copy here as I cannot see any problem:

book 书
book cover 封面
book cover 书皮
book cover design(er) 封面设计
book format; book size 开本
book, guide to the use of a 凡例
bookish language 书面语
book jacket 护封
book knowledge 书本知识
book list 书目
Book Number, International Standard 国际标准书号
book or edition, hand-copied 抄本
book, reference 参考书
book review 书评
books and newspapers 书报
books and periodicals 书刊
books and writings 书籍
books, series of 丛书

Aren't these sorted right?
 

mikelove

皇帝
Staff member
They look correct, yes. Have you tried installing the database to your Palm? The out-of-order English errors usually aren't fatal, and the software should still work. The heap verify thing is a relic of an older version and doesn't necessarily mean that the database is unusable.
 

mikelove

皇帝
Staff member
It might not search correctly; the search algorithm expects the database to be sorted. (there's no separate index like there is with Chinese-to-English databases)
 
Top