CC-CEDICT / HanDeDict updated

mikelove

皇帝
Staff member
After way too long a wait - sorry about that, FWIW we're making good progress on automating our dictionary-making process for these two and are hoping we'll soon be able to release updates on a more regular schedule (first ?day of every month, say).

We're waiting a few days to put them in the in-app catalog (since a few times in the past we've found that some database formatting change we didn't know about screwed things up), but direct download links (for the new iPad/OS4 direct download system discussed here) are at:

CC-CEDICT
HanDeDict

Note that if you're downloading these in Safari, you'll need to turn off "Open "safe" files after downloading" in Preferences / General to keep Safari from opening the files when it's not supposed to - you want to copy the actual .zip archive to your iPhone.
 

johnh113

榜眼
mikelove said:
After way too long a wait - sorry about that, FWIW we're making good progress on automating our dictionary-making process for these two and are hoping we'll soon be able to release updates on a more regular schedule (first ?day of every month, say).

We're waiting a few days to put them in the in-app catalog (since a few times in the past we've found that some database formatting change we didn't know about screwed things up), but direct download links (for the new iPad/OS4 direct download system discussed here) are at:

CC-CEDICT
HanDeDict

Note that if you're downloading these in Safari, you'll need to turn off "Open "safe" files after downloading" in Preferences / General to keep Safari from opening the files when it's not supposed to - you want to copy the actual .zip archive to your iPhone.

Dear Mike,

How do I install this? First I downloaded to my desktop and unzipped and uploaded to my iPhone using File Manager, and double-clicked on it and nothing happened. So after searching the manual and the reference manual and finding nothing, I decided to look at the above link to OS4 direct download. So again downloaded to my desktop and this time I didn't unzip and then used File Manager to bring it to Pleco and double clicked and it installed. Well, so it said. But when I use Manage Dicts to look at CC-CEDICT it says that it is still v100627 which I presume is the 2010 June 27 version.

What should I have done to correctly install this?

John
 

mikelove

皇帝
Staff member
poncin said:
Would you consider also including the french-version of the xxDICT serie "CFDICT".
It's still a work in progress but seems already quite useful.

http://www.chine-informations.com/chinois/open/CFDICT/

Didn't know it had progressed so far so quickly - how did they get from ~3000? to 120,000 entries so fast? Did they run CC-CEDICT through Google Translate or was there some other method involved? I ask because while I'm sure the editors are on the up-and-up, sometimes large contributors turn out to be contributing stolen content (HanDeDict had some problems with this a year or two ago IIRC) - wouldn't want Pleco to be complicit in that.

johnh113 said:
How do I install this? First I downloaded to my desktop and unzipped and uploaded to my iPhone using File Manager, and double-clicked on it and nothing happened. So after searching the manual and the reference manual and finding nothing, I decided to look at the above link to OS4 direct download. So again downloaded to my desktop and this time I didn't unzip and then used File Manager to bring it to Pleco and double clicked and it installed. Well, so it said. But when I use Manage Dicts to look at CC-CEDICT it says that it is still v100627 which I presume is the 2010 June 27 version.

Database updates are a little buggy in OS4 right now (fixed by 2.2 but that's not out yet); basically you have to exit Pleco, then manually kill it in the background (double-press your iPhone's home button, tap-hold on the Pleco icon and tap on the (X) to kill it) and reopen in order to make the new version active.
 

poncin

Member
mikelove said:
poncin said:
Would you consider also including the french-version of the xxDICT serie "CFDICT".
It's still a work in progress but seems already quite useful.

http://www.chine-informations.com/chinois/open/CFDICT/

Didn't know it had progressed so far so quickly - how did they get from ~3000? to 120,000 entries so fast? Did they run CC-CEDICT through Google Translate or was there some other method involved? I ask because while I'm sure the editors are on the up-and-up, sometimes large contributors turn out to be contributing stolen content (HanDeDict had some problems with this a year or two ago IIRC) - wouldn't want Pleco to be complicit in that.

It is a new development.
The initial CFDICT attempt, 4 years ago, did not go very far. (http://www.chinaboard.de/cfdict.php )
This is supported by a web site / online community (Chine Informations) that provides the dictionary online and slowly get it translated, starting from CEDICT. This time, the process seems much more mature, with an online review system, starting with the english transation and collecting corrections. You have to be authenticated on the web site to submit translations.
When you look at the content, part of it really looks like requests submitted online for translation by community members (e.g. large sentences)
The total amount of entries advertised does not reflect the current validated/corrected content ...

From their board (in french) at http://www.chine-informations.com/forum ... tion=fiche

forum of chine-informations.com said:
- Il faut savoir que les autres CFDICT sont non légaux car "empruntés" à des dictionnaires electroniques.

- Après il ya le site chinaboard.de qui avait relancé le programme, mais il n'on à ce jours même pas 700 traductions...

- Pour cette nouvelle version de CFDICT, je me suis basé sur le dictionnaire CEDICT qui a près de 100 000 entrées. Elles ont été traduites au fur et à mesure depuis des années sur Chine Informations soit manuellement, soit par des dictionnaires chinois - français qui on accepté de fournir du contenu. A ce jour il y a 30 000 traductions en français. Je ne propose pas les 30 000 à télécharger (sauf sur demande pour des projets précis), car il y a des entrées à revérifier (même si à priori elles sont correctes); ce sont ces 16 000 entrées venant de différents dictionnaires chinois en ligne. La plupart on été vérifié et traduit (le plus souvent de l'anglais, sinon du chinois dans le doute) par moi-même grâce à différents outils de traduction en ligne ou non.

So, basically,
- they seem well aware of copyright issues, and choose a process avoiding that trap.
- the author is the main contributor.
- this is still work in progress.
 

mikelove

皇帝
Staff member
poncin said:
When you look at the content, part of it really looks like requests submitted online for translation by community members (e.g. large sentences)
The total amount of entries advertised does not reflect the current validated/corrected content ...

Understood... thanks for translating, nobody here speaks French :)

Does he mention any way to filter out the validated entries from the not-yet-validated ones? Since it's just an add-on download I suppose the risk isn't too great if there does turn out to be a copyright violation (we could remove the add-on from our server but not have to do anything drastic like pull our app from sale and re-submit with the problem content removed), but we'd ideally like have a way to give people only the high-quality ones. The .txt data file seems to lump everything together.
 

poncin

Member
mikelove said:
poncin said:
When you look at the content, part of it really looks like requests submitted online for translation by community members (e.g. large sentences)
The total amount of entries advertised does not reflect the current validated/corrected content ...

Understood... thanks for translating, nobody here speaks French :)

Does he mention any way to filter out the validated entries from the not-yet-validated ones? Since it's just an add-on download I suppose the risk isn't too great if there does turn out to be a copyright violation (we could remove the add-on from our server but not have to do anything drastic like pull our app from sale and re-submit with the problem content removed), but we'd ideally like have a way to give people only the high-quality ones. The .txt data file seems to lump everything together.

Here the feedback after posting on the cfdict forum
* you should interact with David Houstin (firstname.lastname@gmail.com), he is the one supporting this project. English should not be an issue.
* material available for download has already been subject to a first check (the cfdict.u8 file only contains entries that have already been subject to a first verification). However this is only a first pass, ... with continuous quality improvement to be considered. There is no "compact/quality-checked" version.
* additional content (non-verified, etc) is accessible through the online interface, and progressively incorporated in the file.

The web site provides the following licencing information:
http://www.chine-informations.com/chinois/open/CFDICT/ said:
"LICENCE & SOURCES
Contrat Creative Commons
Cette création est mise à disposition sous un contrat Creative Commons Paternité - Partage des Conditions Initiales à l'Identique. Cela implique qu'il est autorisé d'utiliser ces données à des fins commerciales ou non sous la réserve de mentionner l'endroit où vous avez obtenu les données (cette page ou ce site) et que, dans le cas où vous amélioreriez ou ajouteriez des données au fichier, vous n'avez le droit de le redistribuer que sous la même licence que celle-ci sauf autorisation (ceci dans le but d'en faire profiter tout le monde).
Les traductions proposées sur ce site proviennent de Chine Informations, de son équipe et de ses membres, mais également de l'auteur de Huaying. Certaines traductions proviennent quant à elles du site Wikipédia dont les données sont disponibles sous licence Paternité - Partage des Conditions Initiales à l'Identique."

So, to keep the transation straightforward:
- this is the typical creatives commons BY-SA licence.
- their web page should be mentioned as the source of the data
- translations come from 1) Chine Informations team and members, 2) the author of Huaying, 3) some entries come from Wikipedia data under the Attribution - Share alike conditions.

For Huaying, see the Iphone app ...

I uploaded the whole file as Pleco user dictionary ... took me 20 hours on the ipad to process it and resulted in a 70MB file ;-)
After playing with the dictionary for 2 days, I would qualify the content as
- still lacking the polish of more mature dictionaries,
- but useful for french speaking users, especially when the english translation falls outside of my current language skills ...

Now if you could get a licence agreement for the Ricci ... ;-)
 

mikelove

皇帝
Staff member
poncin said:
So, to keep the transation straightforward:
- this is the typical creatives commons BY-SA licence.
- their web page should be mentioned as the source of the data
- translations come from 1) Chine Informations team and members, 2) the author of Huaying, 3) some entries come from Wikipedia data under the Attribution - Share alike conditions.

For Huaying, see the Iphone app ...

I uploaded the whole file as Pleco user dictionary ... took me 20 hours on the ipad to process it and resulted in a 70MB file
After playing with the dictionary for 2 days, I would qualify the content as
- still lacking the polish of more mature dictionaries,
- but useful for french speaking users, especially when the english translation falls outside of my current language skills ...

Understood. We'll see about getting it added during our next round of free dictionary update releases - thanks!
 

metica

Member
mikelove said:
Understood. We'll see about getting it added during our next round of free dictionary update releases - thanks!

Hi Mike,

First, congratulations for pleco, it's a great product, i bought an ipod touch specially in order to use it.

Good news regarding the cfdict ! Do you have a rough idea when our next round of free dictionary update will be release ?

Regarding the release of the version 2.2 (and/or 2.3), I can't find news . Is it already out there ?, if not when (App store still offer version 2.1.2) ? I'd like to get rid of the annoying message "Turn off airplane mode or use wi-fi to access data" as i can't use wifi.
 

mikelove

皇帝
Staff member
metica said:
Good news regarding the cfdict ! Do you have a rough idea when our next round of free dictionary update will be release ?

Not too clear - depends a lot on how busy things get once OCR is out (with bug fixes and other improvements), every new dictionary takes at least a little programming work to adapt its data file format to our converter.

metica said:
Regarding the release of the version 2.2 (and/or 2.3), I can't find news . Is it already out there ?, if not when (App store still offer version 2.1.2) ? I'd like to get rid of the annoying message "Turn off airplane mode or use wi-fi to access data" as i can't use wifi.

Not out yet, we're still waiting for Apple to approve it - hopefully any day now. And yes, it should finally kill that annoying message.
 
Top