Custom dictionary: some definitions listed separately in search result list

etm001

状元
Hi,

I have the Taiwan MoE custom dictionary installed, so perhaps this is an issue specific to it. I've noticed that some MoE entries are listed separately in search results, and not integrated into the result list entry that contains the definitions from the other installed dictionaries. For example the MoE entry for 工作 is included in the integrated result, but 號碼 is not. I'm just curious to know what causes this.
 

mikelove

皇帝
Staff member
In MoE this probably means that its simplified mappings don't sync up with other dictionaries - does 號碼 use a simplified conversion other than 号码?
 
I get integrated results for both 號碼 & 号码 with MoE.
(apparently I'm running MoEDict-04a-Simp01.pqb)

-I often see miss-matched/un-integrated results due to different pinyin (spaces/apostrophes/etc)
 

mikelove

皇帝
Staff member
Could you give some examples? We normally strip out everything but the letters + tones when comparing Pinyin between entries, so that shouldn't happen.
 
Here's a bit of a weird one:
(I was trying to think of something I'd come across before)

If I search "鸡犬不ning"

I only get one result and it's from MoE:
PY jī quǎn bù níng
(lots of spaces...)

So normally I would just select the characters press the magnifying glass/search button resulting in pleco giving me a fully intergrated result, which in this case is 11 dictionary entires as opposed to the single MoE entry.
 

mikelove

皇帝
Staff member
This actually relates to the design of our built-in databases - basically, with our indexing system it would take a lot of extra storage space to support mixing of characters and pinyin that far along in a word, so we don't, but the user dictionary index is totally different and there's no extra cost to doing it in that. We should probably limit the user dictionary similarly for the sake of consistency.
 

alex_hk90

状元
image.jpg
Second one has a space in the middle...
From KEY if it makes any difference + the KEY entry is merged in with the first, spaceless, entry aswell...
 

mikelove

皇帝
Staff member
Those are actually separated because the fanti in that second entry is different (different 为) - it's not a meaningful difference, the 为s are interchangeable and in this case the meanings are the same, but they're not tagged as such in KEY. We use a couple of tricks to try to determine whether fanti differences are meaningful (in some words they are - 冷面, 后座, etc) and in this case it seems like thanks to those separated KEY entries the algorithm is deciding they are. Anyway we'll note this as something to fix in KEY.
 
Top