《说文》for Pleco?

s85

秀才
Is it possible to make a version of 《说文》for Pleco as a dictionary? It has definitions for some "strange" hanzi and may be helpful for advanced learners.
BTW, it's absolutely free :)))
 

mikelove

皇帝
Staff member
Certainly doable if we can find a good electronic source text, though some of the characters in it may not be supported by our fonts / handwriting recognizer / etc.
 

s85

秀才
Well, this would be really great!
I've found several versions of 说文 online by searching 说文 下载, in PDF and in TXT formats. TXT version seems to be corrupt a little or it's just my fonts.
Anyway, this dictionary should be absolutely free to use.

加油!
I really love this software and hope you'll make it even better!
 

mikelove

皇帝
Staff member
We'll take a look. Might be worth considering for 康熙字典 too. We're also exploring the possibility of getting Chinese Wikipedia converted (without images, at least initially) - it would be a fairly massive download, but could be genuinely useful in its own right and as a fallback when there was no dictionary entry available. (and as insurance against intermittent Wikipedia blockings, though I guess that also means we'd have to host it on a separate server to protect ourselves from such a blocking)
 
Mike, You have me drooling again; especially with the mention of the possibility of converting the Chinese Wikipedia and making it available to us Pleco users. I would be happy to pay for it if I could access the Chinese Wikipedia via Pleco.

Cheers!

Darrol
 

taijidan

举人
Sounds interesting.

Can anyone give a brief intro about what 说文 and 康熙字典 are? and what benefits they have?
 

bokane

举人
IIRC, there are StarDict dictionaries (presumably convertable to Pleco format) for Shuowen out there. Maybe look into one of those? The seal script forms of the characters presumably won't work unless Pleco has some mechanism for displaying images (this would be SO AWESOME if it were an option), but even Xu Shen's guesswork explanations of characters would be nice to have -- and the Kangxi dictionary would be even more useful.
 
As a Chinese scholar with 40 years' study behind, I would suggest that the Kang Xi dictionary is really of very limited use. In particular, the Kang Xi's seal script characters are fanciful and inaccurate.

The Shuo Wen is a revolutionary work in terms of the analysis of the Chinese language in the Han period; but it must be approached with great caution. Xu Shen, its author, had very limited material available to him. Our archaeological data for the etymology of Chinese words is much better now that in Xu Shen's time.

I don't suggest any great effort be put into either. The Xinhua Dictionary of Classical Chinese would be a much better investment of effort.
 

gato

状元
Yeah, once Mike makes a dictionary converter publicly available, one of the users here can convert 说文 or Kangxi Dictionary if there's enough interest. Mike's effort can be focused more on bring Classical Chinese and chengyu dictionaries to Pleco.
 
I think that the Chinese Language History Unit at the Huadong Normal College in Shanghai may have developed fonts for bronze seal script and oracle bone characters.
 

mikelove

皇帝
Staff member
Stephen Selby - Any idea how those fonts might be licensed? We're always interested in useful free content but it's tough to justify signing agreements / paying royalties for something that arcane.

Good points about Shuowen / Kangxi - with public-domain works like those we probably are better off letting the community convert them while we concentrate on titles that need proper license agreements / copy-protection / etc.
 
Not sure about licensing the fonts. The people involved were professor Li Pu (李圃) and Zang Kehe (臧克和) at Huadong Shifan Daxue. Do you have people there who could contact them?

There would not be any way of searching those fonts as I don't believe there's an input method.

It seems that one could go overboard in creating dictionaries and font bloat for hand-held devices. Have you thought of hosting some of the more sophisticated dictionaries at your end and allowing your software to query them remotely, perhaps even sending images of variant / archaic characters to a window?
 

mikelove

皇帝
Staff member
Don't have any contacts there, no. The lack of a search method might be problematic, though I assume there's some mapping table to tell which font glyphs correspond to which characters.

As far as remotely querying dictionaries, honestly once we've gone to the trouble to license / clean up / format a new dictionary it doesn't take that much additional effort to make it downloadable; device memories are getting bigger and bigger, and cell networks at least in parts of the world (<cough>New York<cough>) are still very unreliable, so at least for the next few years the advantage still falls very clearly on the side of locally-hosted databases.
 
Certainly there's a mapping to modern characters (subject to the proviso that only about 60% of oracle bone characters and about 85% of bronze characters have been firmly mapped!). But that means you can only check backwards: i.e. you already know what the character means (so why are you looking it up?) I will try to get in contact with Zang Kehe.

Wonder if you've seen this:
http://emeld.org/workshop/2003/ruyng-demo.html

Can Pleco display any bitmapped graphics?
 

mikelove

皇帝
Staff member
Well it seems in general like the way most Pleco users would employ these would be to pull up the oracle / bronze version of a modern character, rather than trying to translate oracle / bronze texts; the latter would involve some sort of complicated new handwriting recognizer system that, absent a spectacularly-generous grant, is unlikely to be forthcoming from anyone anytime soon.

There's no support for embedding bitmaps in dictionary entries at the moment, but it wouldn't be that difficult to add - main issue would be juggling multiple screen resolutions (on Palm/WM already but also on iPhone once the inevitably-based-on-iPhone-OS-so-that-they-can-lock-it-down-and-keep-taking-their-30%-cut Apple Tablet appears), but we could just embed a few different versions of the image for different screen sizes and pick the closest match.
 
I have a sort of idea, after gathering the profile of many of your users from this forum. Rather than a dictionary, it would be an e-text containing some representative Shang jiagu and and Shang-Zhou bronze inscriptions reproduced with fonts. Your software would pop up the definitions and brief notes when you tapped on the text, like your ebook reader. That avoids the pitfall of people thinking that ancient Chinese is simply classical Chinese with ancient characters. I have the resources to do that on a gradual basis.

I expect your dictionary internal structure is kept confidential; but can you create a dictionary file out of a Microsoft Excel spreadsheet? If so, I could create a small sample to play with. 8)
 

mikelove

皇帝
Staff member
For user dictionaries at least the structure isn't particularly protected at all; you can easily view it by opening up a user dictionary database file in any SQLite browser / manager app. The problem is that we don't currently have any support for embedding custom images into dictionary entries, or for installing / selecting custom fonts, so until we add those things there wouldn't be any way to pull this off.
 
Noted. Let me play with a sample. The font set would be rather small (assuming we aren't talking about Shuowen texts ). I'll work out a sample that works on a windows desktop machine and supply the font (which is under GPL open source license for our present study purposes.)

S
 

feng

榜眼
EDIT

There is a cleaned up edition of the Shuowen where a scholar has gone through and updated mistaken etymologies:
http://www.amazon.cn/%E8%AF%B4%E6%96%87 ... 567&sr=8-1
It contains the complete original, so it's hard to see using another edition, except that the original by itself, if already typed in, would be cheaper.
He also has a volume out covering just the 540 radicals, and a 14 volume set of documents, and something else I think. These have been out for a few years now.

If folks really wanted Kangxi, 漢語大詞典出版社 came out with a modern printed version a few years ago (retyped, not a photo reprint of an old edition), which I suppose you could license. How faithful the 21st century printed forms are to the original printed forms, I can't say since I haven't sat down to compare them.

That last point may be the source of a new thread . . .
 
Top