Wrong traditional characters

mikelove

皇帝
Staff member
Yiliya said:
One more: 托/託

ABC only has 託 in 拜託, and even then only as a "variant spelling". Plenty of words should have 託, e.g. 寄託, 假託, etc.

This one I think we would consider a variant too, since the simplified version of it would be 讬 rather than 托. However, if these words are commonly written with 託 then that ought to be listed as a variant in those entries (so it would be 寄托/讬[--/託]).
 

Yiliya

榜眼
I don't think 讬 is ever used, anywhere. Seems to be something invented by the Unicode guys for the sake of "completeness", it's not in any dictionary. It's not even in the GBK encoding.

What we have here is just another case of merging two characters (托 and 託) into one. Just like 游/遊 -> 游. So, 拜托 should be 拜托[-託], 寄托 = 寄托[-託], etc. There's only one way to write these words in the PRC/ROC.
 

mikelove

皇帝
Staff member
Yiliya said:
I don't think 讬 is ever used, anywhere. Seems to be something invented by the Unicode guys for the sake of "completeness", it's not in any dictionary. It's not even in the GBK encoding.

It actually appears in 汉语大词典 (which predates Unicode) as the simplified version of 託, but only as that, and it's listed elsewhere as 托. But I'm not sure whether 托 was part of the official PRC simplification standard or whether 托 just arose naturally and people use it even though 讬 was the official one.

Yiliya said:
What we have here is just another case of merging two characters (托 and 託) into one. Just like 游/遊 -> 游. So, 拜托 should be 拜托[-託], 寄托 = 寄托[-託], etc. There's only one way to write these words in the PRC/ROC.

But is there ever a case where 託 would map into 讬? Otherwise it's actually much simpler than 游/遊. Though I think we'd want to keep 讬 as a variant regardless since the character exists and it's clearly meant to be a simplified version of 託 (even if a rarely-used one).
 

Yiliya

榜眼
It actually appears in 汉语大词典
Does it? My electronic version of 漢語大詞典 doesn't have it. Or did you mean 漢語大典 (which I don't have and can't check)? Nevertheless, I'd like to see a picture.

I looked up 托 in my paper copies of 现代汉语词典 (2005) and 现代汉语规范词典 (2010) and they both only have 托 and 託 (託 is, of course, listed for only some senses of 托.)

Sure, you could include 讬 in the words with 託, "for the sake of completeness". But it really is something made up by computer geeks and isn't actually used in real life. Could be misleading.

My point is basically, if someone encoded "讠主", it still won't magically become the simplified variant of 註 (which is 注). There are standards to be followed.
 

mikelove

皇帝
Staff member
Yiliya said:
Does it? My electronic version of 漢語大詞典 doesn't have it. Or did you mean 漢語大字典 (which I don't have and can't check)? Nevertheless, I'd like to see a picture.

My electronic version does have it; not sure whether you're using the official CD version or some other one.

There is a provision in our new database format to have variants that are searchable but aren't displayed by default, but we'll need to do some more research before we can establish if this is uncommon enough to warrant that status.
 

Attachments

  • tuoscreenshot.jpg
    tuoscreenshot.jpg
    274.2 KB · Views: 1,389

pat

Member
Yiliya said:
Here's what mistakes in ABC I've found so far:

The following traditional variants are never used at all in ABC:
(what's used/what should be used when appropriate)
游/遊
周/週
回/迴
欲/慾
向/嚮
團/糰
(these two should be used at all times)
净/淨
羡/羨

Some more corrections:
惡/噁 (噁心 should at least be given as a variant of 惡心, this is the modern TC standard, the IMEs nowadays use 噁心, I suggest you include the word as 噁/惡心)
熏/薰 (利欲熏心 should be 利慾薰心, however ABC doesn't give any traditional variant at all, this is curious because it also gives 薰心 as a separate entry but ONLY in traditional)
注/註 (注音 is erroneously 註音 for some inexplicable reason)

There's bound to be more. I'm sorry, I'm not really using Pleco that much.

Absolutely right. I bet these are the simplified: 游, 周, 回, 欲, 向, 團, 净, 羡, 惡, 熏, 注.
Some more: 面包/麵包, 前后/前後.

注 is in 注音, and 註 as in 註冊 (the simplied is 注冊).

淨, use water to clean something, so the water part.
same as 沖, use water to flush, so the water part.
also 洗, use water to wash, so the water part.
減 is also in the water part (no idea why it is related to water), two dots is the simplified.

BUT, 凉 must be in two dots (ice part). three dots is wrong. As in 冰凉. Any one saw 冰 written in water part? There was a short story: 誤把馮京作馬凉. Explain: In the ancient time, a man called 馮京 went to take the Imperial Exam (科舉). He got the champion and went to the king's palace for the inauguration. As the king would like to call 馮京, but he called 馬凉 instead. Of course, no body answered the king. The king called again and again, and some time passed, the king looked at the name list again and apologized. He said that the name was 馮京, not 馬凉. The king said he mistakenly moved the two dots from 馮 to 京 and called 馬凉.

The font manufacturers never have the correct 凉 created. Too bad!!!
 

rizen suha

状元
was reading this old but very interesting post. just felt the urge to express my deep satisfaction with how well pleco supports traditional characters. it is a most important instrument with which we foreign students of chinese can do our small bit to keep traditional alive. thanks
 

Abun

榜眼
It seems like there is a conversion mistake (or at the very least serious confusion) in the 古汉语大词典 entry for 壞[坏]. It lists the pronunciations pēi, péi and pī. These pronunciations – together with the corresponding definitions (二), (三) and (四) – only apply where the traditional character is also 坏, but not where the traditional is 壞. I’m not sure how feasible it is to separate these definitions and pronunciations out while also staying faithful to the source dictionary, but at the very least the corresponding example sentences should have 坏 rather than 壞 when viewing them in traditional.
 

Attachments

  • Screenshot_2018-08-07-09-50-07.png
    Screenshot_2018-08-07-09-50-07.png
    231 KB · Views: 305

Shun

状元
Hi Abun,

the 古汉语大词典 appears to be the only dictionary that merges different homographs into the same dictionary entry. I think the rule in the 古汉语大词典 is as follows:

- For each meaning in the definition, if no Hanzi is given in brackets, the first Hanzi written at the top applies.
- If a different Hanzi is given for a meaning, then it applies to that meaning.

hdc.jpg

On the correct choice of homograph in the example sentences, I think that's quite a thorny problem. The best case would of course be if the 古汉语大词典 were supplied in both a Simplified and a Traditional version. But that's probably not the case—I assume it's all in Simplified Chinese—, such that Pleco needs to convert it to Traditional character by character. It just gets "坏" as input and looks for the most likely corresponding Traditional character, without looking at the semantic level. Perhaps, if the conversion algorithm is very smart, it could look at the context and collocations to find the most likely Traditional equivalent, which would certainly reduce the number of conversion errors significantly. Perhaps such an algorithm already exists and could be built into Pleco. :)
 
  • Like
Reactions: Wan

Abun

榜眼
Hi Shun,

yes, the other dictionaries don’t show the 坏[环] lemmata. I also agree that separating it into two entries might not be feasible (especially seeing as entries are merged in other similar cases such as 麵 as well). However:

But that's probably not the case—I assume it's all in Simplified Chinese—, such that Pleco needs to convert it to Traditional character by character. It just gets "坏" as input and looks for the most likely corresponding Traditional character, without looking at the semantic level.

Does it though? When I look at the GHDCD entry for 面 in traditional, I do get 面 and 麵 in their respective lemmata, which makes me think that the possibility to tweak the behaviour during T-S conversion is there (maybe using HTML attributes?), even though it probably means adding meta information by hand.
 

Shun

状元
Does it though? When I look at the GHDCD entry for 面 in traditional, I do get 面 and 麵 in their respective lemmata, which makes me think that the possibility to tweak the behaviour during T-S conversion is there (maybe using HTML attributes?), even though it probably means adding meta information by hand.

That's true. I found out by googling that there most likely also is a Taiwanese, Traditional character version of the 古漢語大詞典. If Pleco uses that as its original text, then of course it could tell the three lemmata apart more easily. Conversion from Traditional to Simplified is, naturally, much more reliable than Simplified to Traditional. That way, no manual tagging would be required, either.

It could also just be somewhat inconsistent data in the GHYDCD regarding the 坏/壞 issue, where it's all written as 壞 in Traditional even when it strictly should be 坏. But here it could be too many things, so I'm a little puzzled.
 

mikelove

皇帝
Staff member
Our system does support mapping the same simplified character to multiple traditional ones, it's just very time-consuming to do that for every dictionary by hand, so in a lot of cases we rely on automatic conversion, which doesn't always do the job perfectly. (anyway I'll add this to our bug file)
 

Abun

榜眼
Our system does support mapping the same simplified character to multiple traditional ones, it's just very time-consuming to do that for every dictionary by hand, so in a lot of cases we rely on automatic conversion, which doesn't always do the job perfectly. (anyway I'll add this to our bug file)
No problem. I figured that’s probably the case, so you’d probably have to rely on reports to catch mistakes :)

Anyway, thanks for putting it on the list. It’s really awesome how well things work here (both in the app itself and in support). Big kudos to you!
 
Top