教育部國語辭典 available formats

Fernando

榜眼
I know that the dictionary of Taiwan's Ministry of Education (MoE) is available for free on pleco and in due time it will be available on the Mac, but does anyone know if it also exists in the .dictionary format so that it can be used in the native dictionary app, or in any other (open source) format, for that matter?

The database for the moedict is available on GitHub, but divided in three .xls files, and I don't have the technical skills to convert that.
 

Shun

状元
Hello Fernando,

I also know EUdic and GoldenDict, the latter being open source, perhaps it's a bit easier to convert the MoE dictionary from Excel to one of those formats? EUdic is available on the Mac App Store, but the paid version is getting more expensive ($20 now). They also seem to be in a bit of a Wild West situation with regard to data quality and copyrights.

Cheers, Shun
 

Fernando

榜眼
Hi Shun,

I never used EUdic, but I have GoldenDict on my Windows/Linux desktop machine and it is a pretty nifty app. The problem is not the dictionary app itself though, but the dictionary format. Apple's native app has its own format, .dictionary, and GoldenDict runs a number of open source formats, like .ifo, .index, and Babylon's dictionary format. But the MoE dictionary is not available in any of those formats.

There is a Python utility called Pyglossary (also on Github) that can be used to convert between a number of formats, but it also can't handle .xls.
 

Shun

状元
Yeah exactly, but what if you convert Excel to CSV? I would guess you could more easily convert that to whatever dictionary format you like.
 

Fernando

榜眼
Yes, that's what I thought about doing. But I think I would have to merge the three .xls files before converting. That might be a little tedious and time-consuming, as every time I try to copy large amounts of data my excel crashes.

Also I'm not sure about the end result. MoEdict uses number codes for any character outside big-5. I'm not sure whether those are unicode.
 

alex_hk90

状元
That encoding sounds tricky, perhaps @mikelove or @alex_hk90 know more about it. @alex_hk90 once converted the MoEDict to Pleco format, as well. Have a look at these posts:


To get a single CSV file, I'd suggest you Save As each Excel file to CSV and then merge the CSV files using "cat" or just a text editor. :)

Yes - saving each sheet to CSV and then working with those files could be the way to go.

@Fernando I'm not sure which source files you are using - when we did the unofficial MoEDict Pleco conversions we used source files in JSON, that were converted from the original HTML:

But if you have them in Excel then that's usually already quite usable.

On the number codes for outside Big-5, hard to say without seeing the source files but depending on the output format you are aiming for it's possible you may need to do something to handle/convert these, yes.
 

Fernando

榜眼
On the number codes for outside Big-5, hard to say without seeing the source files but depending on the output format you are aiming for it's possible you may need to do something to handle/convert these, yes.

Hi Alex,

I took the data from this depository:

Indeed there is a json file there as well, but I have no idea how to go about turning that into a dictionary of any of the formats I mentioned above.
As to the number encoding, they are inline references to "unicodes PNGs", and that has been amended, apparently both in the web version of Moedict and the app they created:

So I suppose the json now only includes unicode?
 

Shun

状元
Hi Fernando, hi alex_hk90,

I'm not sufficiently experienced with Python's Unicode handling, but I converted the json file to a Python dictionary using the json package in the following way:

----

import json

with open('dict-revised.json', 'r', encoding='utf-8-sig') as f:
____json_string = f.read()

moe_dict_in = json.loads(json_string)

----

Then one could iterate through the dictionary moe_dict_in and put the results in a new dictionary moe_dict_out. I noticed the json file uses a Unicode hexadecimal code in the 'title' field if a single Chinese character is in the field, and a regular Unicode string if the 'title' field is longer than one character.

Cheers, Shun
 
Last edited:
I know that the dictionary of Taiwan's Ministry of Education (MoE) is available for free on pleco and in due time it will be available on the Mac, but does anyone know if it also exists in the .dictionary format so that it can be used in the native dictionary app, or in any other (open source) format, for that matter?

The database for the moedict is available on GitHub, but divided in three .xls files, and I don't have the technical skills to convert that.
Here's a copy of 萌典 dictionary I use on my Mac: https://www.sendspace.com/file/mabdk3

I suppose you only need to move this file into your Dictionary folder and you'll be able to use it in your Dictionary app.
 

Fernando

榜眼
Here's a copy of 萌典 dictionary I use on my Mac: https://www.sendspace.com/file/mabdk3

I suppose you only need to move this file into your Dictionary folder and you'll be able to use it in your Dictionary app.
Thank you! This is what I was looking for! I wonder why it's not more easily available elsewhere given that the dictionary is free to use.

@Shun You might be into something there. In any case you're more knowledgeable than me. But I think the dictionary @goldyn chyld just shared was perhaps built this way:
I didn't come across that when I did my original search
 
Thank you! This is what I was looking for! I wonder why it's not more easily available elsewhere given that the dictionary is free to use.

@Shun You might be into something there. In any case you're more knowledgeable than me. But I think the dictionary @goldyn chyld just shared was perhaps built this way:
I didn't come across that when I did my original search
I also have the 兩岸 dict in the same format so let me know if you need it and I can upload it for you.
 
Thank you! This is what I was looking for! I wonder why it's not more easily available elsewhere given that the dictionary is free to use.

@Shun You might be into something there. In any case you're more knowledgeable than me. But I think the dictionary @goldyn chyld just shared was perhaps built this way:
I didn't come across that when I did my original search

Yeah, the version of MoE dict in the link above may be even more up-to-date than the copy I uploaded before. (Mine was from 2013 but the one in the link above is from 2016).
 

Fernando

榜眼
Yeah, the version of MoE dict in the link above may be even more up-to-date than the copy I uploaded before. (Mine was from 2013 but the one in the link above is from 2016).

I don't think it makes much of a difference, and in some cases being a little older might even be a plus, given that these days dictionaries are being targeted to accommodate ever more "orthodox" political views. I converted the MoE copy you uploaded to use it on my linux machine as well, for the long run.
 
Yeah, the version of MoE dict in the link above may be even more up-to-date than the copy I uploaded before. (Mine was from 2013 but the one in the link above is from 2016).
Hey, sorry to bother you -- is it possible I could get the .dict files for 兩岸/萌典 dictionaries both? I'm having trouble compiling on my own. Thanks a ton!
 
Hey, sorry to bother you -- is it possible I could get the .dict files for 兩岸/萌典 dictionaries both? I'm having trouble compiling on my own. Thanks a ton!
Sorry, I only saw your comment now. I've uploaded both dictionaries for you here. Let me know if you also need the latest version of CC-CEDICT. ;)
 
Top