Cantonese?

mikelove

皇帝
Staff member
j40 said:
http://bit.ly/f4201d
This Cantonese-Mandarin dictionary is good too and it is from the same authors of that Commercial Press book.
The publisher isn't Beijing Commercial Press.
Actually the CP book is from CP Hong Kong, so that might be license-able after all. Is this other title better / larger or were you just offering it as an alternative if we can't get the CP one? (BTW, for any other dictionary-watchers, there's a few sample pages of the CP one at http://www.cp1897.com.hk/bookinfo/pdf/pv/9789620702891pv.pdf - opinions welcome)

j40 said:
The drawback of dictionaries that are written by these authors is that neither Yale or Jyutping is used. (the one you couldn't buy and this one)
Some one even took time to create a new version of this dictionary with Jyutping!
Very easy to deal with in our case - we'll do the same thing we do with Mandarin and encode all pronunciation in a neutral format which we can search / render using whichever pronunciation system people want. Jyutping / Yale / Sidney Lau / Meyer-Wempe / Guangdong... maybe we don't need all of them but we could quite easily support whichever ones are popular / in demand. We've been more reluctant to support lots of romanizations with Mandarin because Pinyin is pretty much the accepted international standard (even in Taiwan they seem to be slowly learning to live with it), but for Cantonese there doesn't seem to be nearly as much consensus.

j40 said:
Hi, I put the fixed original tab file in my blog.
Looks good, thanks!
 
mikelove said:
Very easy to deal with in our case - we'll do the same thing we do with Mandarin and encode all pronunciation in a neutral format which we can search / render using whichever pronunciation system people want. Jyutping / Yale / Sidney Lau / Meyer-Wempe / Guangdong... maybe we don't need all of them but we could quite easily support whichever ones are popular / in demand. We've been more reluctant to support lots of romanizations with Mandarin because Pinyin is pretty much the accepted international standard (even in Taiwan they seem to be slowly learning to live with it), but for Cantonese there doesn't seem to be nearly as much consensus.
Slightly off-topic: Whenever you'll roll out this functionality in Pleco, I'd be really glad if it supported Wade-Giles as well. For all of us relying on pre-pinyin scholarly works in our studies, the ability to quickly look up all those weird names would be nice.

Even more off-topic: Also, at said time it should also be very easy to add optionable, downloadable Hangul-support as well, right? This would make Pleco more important to my life than food.

Just pointing this out so you'll keep it in the back of your heads while working on more important improvements. Carry on everybody!
 

mikelove

皇帝
Staff member
samitakamaki said:
Slightly off-topic: Whenever you'll roll out this functionality in Pleco, I'd be really glad if it supported Wade-Giles as well. For all of us relying on pre-pinyin scholarly works in our studies, the ability to quickly look up all those weird names would be nice.

Even more off-topic: Also, at said time it should also be very easy to add optionable, downloadable Hangul-support as well, right? This would make Pleco more important to my life than food.

Just pointing this out so you'll keep it in the back of your heads while working on more important improvements. Carry on everybody!
Good point, it probably would be worth including W-G for that reason. The main problem is switching between that and Pinyin - we need to find a way to stick an extra button in the UI to switch between them, which in some ways is more challenging than the W-G support itself :)

As far as Hangul support, we already have that via the Character Info screen - are you not able to access them now?
 

mikelove

皇帝
Staff member
j40 said:
I put my Stardict file in the apps that support Stardict Dictionary. My Stardict is Cantonese to English and these apps can't search in reverse direction... so it's kind of sucks
Yeah, sorry that's taking so long - we've been making just enough progress on licensing a dictionary to not give up on doing so altogether, hence the continued delay in getting Cantonese support code added.
 

mnanon

秀才
Any luck with Cantonese?

Any plans in creating integrations with Chinese/Cantonese books? Instead of writing answers in a textbook, you would write the answers on an App on your phone, and it would correct your writing.
 

mikelove

皇帝
Staff member
mnanon said:
Any luck with Cantonese?
More on that soon, hopefully - haven't come up with a comprehensive solution but there are a couple of pieces we're moving into place which between them will hopefully give us something pretty good. Recent @plecosoft Twitter post:

Anybody know any native Cantonese speakers with good English who'd be interested in some fairly boring contract work? Message us if so.
mnanon said:
Any plans in creating integrations with Chinese/Cantonese books? Instead of writing answers in a textbook, you would write the answers on an App on your phone, and it would correct your writing.
That sort of textbook integration isn't on our radar right now - too much to do in the dictionary space - but could certainly happen in the future as things evolve.
 

mnanon

秀才
Thank you for the info.

Regarding the work, I know a Cantonese speaker (in HK) who is a Chinese-English translator. What sort of work are you looking at? She might be interested.
 

alex_hk90

状元
Just wanted to ask if a Cantonese dictionary is likely to be released in the near future? Otherwise I might look into converting one of the ones mentioned here to Pleco user dictionary format, to use for the time being until there's an officially licensed one. Thanks. :)
 

mikelove

皇帝
Staff member
alex_hk90 said:
Just wanted to ask if a Cantonese dictionary is likely to be released in the near future? Otherwise I might look into converting one of the ones mentioned here to Pleco user dictionary format, to use for the time being until there's an officially licensed one. Thanks. :)
Depends on how you define "near future," but we're definitely making progress, yes.
 

alex_hk90

状元
mikelove said:
Depends on how you define "near future," but we're definitely making progress, yes.
Thanks for the fast reply, that's good news and I would certainly be interesting in buying that add-on and Cantonese audio files as well. :) By "near future", I was thinking weeks rather than months. Anyway, I've converted a freely available (apparently, I'm not sure about copyright) Cantonese dictionary to Pleco format and once I've cleaned it up I could post the result on here for others to use until a proper (official) dictionary is available.
 

mikelove

皇帝
Staff member
alex_hk90 said:
Thanks for the fast reply, that's good news and I would certainly be interesting in buying that add-on and Cantonese audio files as well. By "near future", I was thinking weeks rather than months. Anyway, I've converted a freely available (apparently, I'm not sure about copyright) Cantonese dictionary to Pleco format and once I've cleaned it up I could post the result on here for others to use until a proper (official) dictionary is available.
The problem with that is that you won't be able to search it in Cantonese until we add official support for Cantonese to Pleco, which we haven't done yet - right now it should be searchable by characters but only by them, unfortunately.
 

alex_hk90

状元
mikelove said:
The problem with that is that you won't be able to search it in Cantonese until we add official support for Cantonese to Pleco, which we haven't done yet - right now it should be searchable by characters but only by them, unfortunately.
Yes, that is an issue. And also that it seems the user dictionary system does not work too well with many (100,000+) entries.

With an official Cantonese dictionary and the current system, if the Jyutping (Cantonese romanisation) is in the definition then it should be able to search for that (via full-text searching, in the same way as searching for English in a Chinese dictionary), which wouldn't be a bad compromise in the short term. Regarding the Cantonese romanisation, a decision would have to be made between Jyutping and Yale, but I guess much of this will depend on what the licensed Cantonese dictionary uses (it's not difficult to convert between them but it's not quite a one-to-one mapping).
 

mikelove

皇帝
Staff member
alex_hk90 said:
Yes, that is an issue. And also that it seems the user dictionary system does not work too well with many (100,000+) entries.
True, we didn't performance-optimize it for massive dictionaries since that's not really its intended use. Works a bit better if you "Lock" the dictionary, though.

alex_hk90 said:
With an official Cantonese dictionary and the current system, if the Jyutping (Cantonese romanisation) is in the definition then it should be able to search for that (via full-text searching, in the same way as searching for English in a Chinese dictionary), which wouldn't be a bad compromise in the short term.
Except that we don't currently support full-text searches in user dictionaries.

alex_hk90 said:
Regarding the Cantonese romanisation, a decision would have to be made between Jyutping and Yale, but I guess much of this will depend on what the licensed Cantonese dictionary uses (it's not difficult to convert between them but it's not quite a one-to-one mapping).
I believe the only real issue is eo/oe versus eu, correct? The current plan is to use an internal coding system that's (probably) Jyutping-based but allow Cantonese to be searched/rendered in other systems too - there are too many people who only know one system or another so we really can't limit ourselves to just one.
 

alex_hk90

状元
mikelove said:
True, we didn't performance-optimize it for massive dictionaries since that's not really its intended use. Works a bit better if you "Lock" the dictionary, though.
Thanks. I'll try that once I finish the import (ended up splitting it into 10,000 entry chunks as it was taking ages). More likely I'll have to try to find a Cantonese-only dictionary for now (i.e. with only the compounds words that won't appear in the Mandarin dictionaries, and the individual characters so that I can look up the pronunciations).

mikelove said:
Except that we don't currently support full-text searches in user dictionaries.
I meant if you released an official Cantonese dictionary add-on, but didn't make any other additions for Cantonese, then we could at least search by the romanisation in the definition for the time being. :)

mikelove said:
I believe the only real issue is eo/oe versus eu, correct? The current plan is to use an internal coding system that's (probably) Jyutping-based but allow Cantonese to be searched/rendered in other systems too - there are too many people who only know one system or another so we really can't limit ourselves to just one.
I'm not sure, I only just looked into Cantonese romanisations a week or two ago. From this page (http://cburgmer.nfshost.com/content/cantonese-yale-syllable-table) it sounds like it might be more than that:
The following Jyutping syllables are missing due to the lack of proper sources for a mapping between the two romanisations: lem, deu, gep, kep, loei, loet, pet, om (all in Jyutping). The table beneath thus is missing the Jyutping final set -oei, -oet, -om, -em, -ep, -et and -eu.
I think there might be something to do with the tones that doesn't fully map either, but again I don't really know. :oops:
 

mikelove

皇帝
Staff member
alex_hk90 said:
I meant if you released an official Cantonese dictionary add-on, but didn't make any other additions for Cantonese, then we could at least search by the romanisation in the definition for the time being.
True, there could be some lag between our adding Cantonese support to our own dictionaries and to user ones.

alex_hk90 said:
I'm not sure, I only just looked into Cantonese romanisations a week or two ago. From this page (http://cburgmer.nfshost.com/content/can ... able-table) it sounds like it might be more than that:
The following Jyutping syllables are missing due to the lack of proper sources for a mapping between the two romanisations: lem, deu, gep, kep, loei, loet, pet, om (all in Jyutping). The table beneath thus is missing the Jyutping final set -oei, -oet, -om, -em, -ep, -et and -eu.

I think there might be something to do with the tones that doesn't fully map either, but again I don't really know.
Our primary concern is making sure that our internal system is as specific as possible - i.e., that it's on the "many" side of any many-to-one mappings. It seems like we can mostly achieve that with Jyutping, but in the few cases where it can't we'll add additional codes to deal with that.

The main tone issue I believe has to do with high-flat and high-falling being merged; Yale splits them while Jyutping doesn't. So that needs to be dealt with, probably by some extra coding, though in actual modern Cantonese I don't believe anybody pays much attention to this. The other problem of "entering tones" (7/8/9 as alternatives to 1/3/6 for syllables ending with certain consonants) I believe we can automatically detect based on the syllable.

eu / em / ep are (according to Wikipedia at least) colloquial sounds that Yale doesn't represent, so we'd probably either skip those or parenthesize the Jyutping versions when rendering in Yale. That and eu/oe cover everything he lists except "pet," I believe, which isn't listed on the official LSHK page for Jyutping at all, so I don't know what's going on with that syllable.
 

dustpuppy

榜眼
Mike, do you have any updates on the progress of a cantonese dictionary ? It's a bit disheartening that I can't use the best language tool ever (pleco) to learn cantonese. At this point all i'm looking for is a one-way dictionary (cantonese -> english) which would let me read a cantonese piece of text.
 

mikelove

皇帝
Staff member
dustpuppy said:
Mike, do you have any updates on the progress of a cantonese dictionary ? It's a bit disheartening that I can't use the best language tool ever (pleco) to learn cantonese. At this point all i'm looking for is a one-way dictionary (cantonese -> english) which would let me read a cantonese piece of text.
As I've said in a few places now, we're working on the Cantonese problem from three different angles - all three are now signed / committed in some way, but we're still waiting on data files for two of them. So the wait for those data files (hopefully not too long) is what will determine the speed with which we can support Cantonese.
 

dustpuppy

榜眼
That's excellent news, thank you. I don't know what the market looks like for Cantonese learners, but I know that there are NO good iphone cantonese dictionaries in the appstore. So pleco has an automatic market share when these dictionaries come out.

There are many foreigners in hong kong who have no intention whatsoever of learning Cantonese, they might be interested in a pleco-lite app that bundles OCR + cantonese dictionary + reader.
 
Top