User dictionaries creation: import file format etc

Cameroon

探花
Hi!
As all of Pleco users might already know,
a file for import into a user dictionary shall be txt format (tab separated, also called tsv), utf-8 with signature, have at least 1 hanzi column, and 1 translation column.
Optional are: additional hanzi (Traditional or Variant), in the same column as Simplified (thus, they are put in the first column) added immediately afterwards (no spaces) in square brackets, pinyin transcription (numbered or with tone marks, variants can be also mentioned after commas) - in the second column (when available).
5th and further columns are not taken into account by Pleco and therefore can contain virtually anything.
(Experienced users kindly correct me if I put something wrong).

Question:
if a hanzi has more than 1 Traditional/Variant form, how should it be formatted in the second column? Comma separated (like pinyin variants) or else how?
 
Last edited:

mikelove

皇帝
Staff member
There's not currently any way to support variants in a user dictionary; your best bet would be to create two separate entries, one for each form. (which should end up combined just like they do in our dictionaries, assuming the simplified characters match)

Also characters in general go in the first column, just in brackets; it would be:

SC[TC]<tab>pinyin<tab>definition

and you need the two tabs if you skip Pinyin but provide a definition.
 

Shun

状元
Hi Cameroon,

simultaneous to Mike's answer, I think I've already imported a list of Hanzi with carriage returns, without any tabs, with the option "Fill in missing fields" enabled. If you only provide the Hanzi and a translation, then you should add two tabs between them, because the middlie column would be for the pinyin.

In the left column, you have the Simplified Hanzi field first, then no tab, but square brackets and the Traditional Hanzi in between. So the whole line, if it's complete, looks as follows:

Simplified Hanzi[Traditional Hanzi]<<tab>>pinyin with tone numbers<<tab>>definition

Note that there is no space/whitespace between the Simplified Hanzi and the "[" left square bracket. The translation field currently uses the special Unicode character "EAB1" for line breaks. (See the thread Newlines.)

Cheers,

Shun
 
Last edited:

Cameroon

探花
Thanks Mike, Shun, for the clarifications given, I have also amended my first post to avoid potentially misleading hints.
 

Cameroon

探花
Also characters in general go in the first column, just in brackets; it would be:

SC[TC]<tab>pinyin<tab>definition
Will this also work for En-Ch dicts?
If the header looks like:
air distingue[air distingué]
or
air distingué[air distingue]
will Pleco search for both variants, and once found, display the second (bracketed) variant next to the main one, like it does with Chinese entries?

What's the format for import lists for En-Ch user dictionaries?
English1[English2]<tab>Chinese
Maybe some additional columns, like transcription, are also possible?
 
Top