Stuff to use with the Document Reader - Four Great Novels

mikelove

皇帝
Staff member
On most operating systems (like Windows Mobile) reading from internal memory and an expansion card do indeed use the same interface, but on Palm they unfortunately don't; Palm files in internal memory are stored in a weird, directory-less, record-based filesystem that wouldn't really work correctly with a memory card, so they use a completely different programming interface for memory card files. This is also why you can't load normal text files / MP3s / etc into a Palm's internal memory, the filesystem can only accommodate Palm record database files like PRCs and PDBs.
 

ipsi

状元
That's just a bit weird. But never mind. Should I then update the page to let PPC users know they can chuck the PalmDoc files on a memory card? Or can PPC users not read PalmDoc files at all right now?
 

thenrik

Member
Hi:

Well, admittedly I'm a Pleco Beta Voyeur, waiting for the final version before plunking down my dough. Nonetheless, I appreciate the Lu Xun stories and four great novels. I thought I'd mention an alternative Chinese doc reader for Pleco Beta holdouts like myself or while the bugs are being straightened out in Pleco 2.0.

This post in the Pleco Palm forum explains the gist:
viewtopic.php?f=2&t=1159

Anyway, I've taken the Lu Xun stories and the four classic novels and saved them in Word format, with Simsum as the default font. If anyone would like to host them, post, let me know and I'll upload them.

It will make my bus commute more interesting.

Waitin' for Pleco 2.0 Final,

Tom
 

ipsi

状元
I'm glad you appreciate them. I'll be adding more stuff there at some point (NPCR related), so keep an eye on this thread.

I'm also going to do flashcards for book four at some point, but I hope I'll be able to OCR that... I suspect not though, as it doesn't like English and Chinese together.

I'll also redo the flashcards when Mike gives out information on creating flashcards with multiple tags for 2.0, so there's only one definition, but tagged for each chapter it appears in (and also if it appears in the main text or the supplementary).

Oh, and Thenrik: If you want, I'll save them as a Word doc myself and upload them. I can also add slightly more formatting and a Table of Contents, plus Footer (i.e. page numbering) if you'd like. Alternately, you can email them to me at n z i p s i @ g m a i l . c o m (remove spaces between the letters), and I'll put them up. If you have MSWord, this probably a better idea as I use OO and that creates larger .doc files (to the point where a 50kb limit for a cover letter is too small).

I can do some other funny stuff as well. If people are interested, I may see if I can add Ruby Text to the books. But getting it to happen automatically and without error will be a challenge. EDIT: And might be pointless if they're written in Classical Chinese...
Oh for a Java version of Prolog... I can also produce individual PDFs of the chapters (sorry, but doing any other format would be a huge pain, as I can keep the PDFs all up-to-date using LaTeX without needing to change files one-by-one).
 
Personally, I don't think PDF is a problem, because one can always copy the text out of the PDF if needed. As long as it's UTF encoded, and (for me at least) in Simplified characters.

Which OCR software are you using. I have two packages. I've been using ReadIris Pro 11, which handles mixed Chinese and English. I have another package purchased here in China with my scanner, which worked very well at the store, but I confess I haven't installed it on my limited laptop.

Thomas also sent me a single DOC file so I will be putting that up on the web site for download. However, I'd also prefer individual DOC files. Perhaps you can help me script something that would let people check mark which files they want, and then I could put them into a single archive on the fly and provide for download?
 

ipsi

状元
Hmm... I'll think about this more when I get home. Should be some stuff that can be done to make it easier for people.

I'll also probably PDF it today or maybe over the weekend.

For OCR I'm using DanQing, which works fine but doesn't like most punctuation and also will only do one language at a time. If I had money I'd buy IRIS, but I don't... And you need a special package to OCR Asian languages.
 

ipsi

状元
Ok, I've converted the Four Great Novels to Word format, complete with headings and a table of contents. It's not a huge difference, but it's nice for some people. I'll upload those soon. There's also two versions - One is the normal, left-to-right way, and the other is the more classical, top-to-bottom, right-to-left reading order, with the page set as horizontal. Not sure if anyone has any use for that, but it didn't take much effort - nowhere near as much as trying to learn OpenOffice macros *cries*.

I'm playing around with some other stuff at the moment, and I'll probably upload separate chapters soon, but probably only in PDF format, unless there are people out there desperately keen on Word docs? If so, then I'll try and convert each of the Four Great Novels into separate doc flies for each chapter, and also convert the Lu Xun novels.

Anyone have any suggestions for other formats?
 
ipsi said:
EDIT: Ok, putting them to your card, however I try, doesn't seem to work. It somehow fucks up the record count, and then Pleco thinks it's an empty file. That's just weird.

EDIT: Ok, the files have moved to http://china.panlogicsoftware.com/ebooks.html :)
...

Hello ipsi, I'm a great fan of your NPCR flashcard database. I'm really looking forward to use it with the new Pleco portation for Iphone, when there's a function released that allows one to use flashcards within the app.

As I was learning the chinese language using the german Version of the NPCR books, I'd like to know if there's any german adaption of your NPCR flashcard database that could be used with Pleco. I searched around but couldn't find anything but the printable flashcards on http://www.chinaboard.de/chinesisch_deu ... vokabeln#6

If there isn't any german adaption available, I'd like to translate your database for german students if you will let me do.
(Perhaps we'll also find some guys who can adapt it for other languages)

Therefore I got 2 questions:

1) Will you (ipsi) let me translate your NPCR flashcards into german language?

2) What's the best tool to edit the flashcard files or generate new ones to make it a most flexible and categorizable database?

3) What's the best database file format for a translated version of your NPCR flashcards that could be easily imported into pleco on palm OS, windows mobile, Iphone, maybe Android, etc. ?
As mentioned by Michael Love @ viewtopic.php?f=4&t=1918&start=90 , the .txt format may be the most unversial one. Is that right?

So please would anyone give me an advice because I am not a programmer and need more information before I could start the "german NPCR flashcard pleco project" :)

thank you
 

mikelove

皇帝
Staff member
I can comment on #3 at least - for that, .txt is indeed the most universal format, though XML is also supported on Palm / WM in 2.0 and will be supported on iPhone and any other platforms we branch out to (desktops etc). But XML's a lot harder to create files in, and for basic vocabulary lists at least it doesn't really add any extra features.

For #2, we create our flashcard lists using a regular old text editor (the fast, Chinese- and large-file-friendly EmEditor for Windows - sadly there's nothing even remotely comparable to it on Mac) - you could also lay them out in a spreadsheet like Excel, but that can sometimes distort parts of the file in awkward ways, so I'd recommend keeping things in plain text as much as possible.

If you already have a copy of Pleco on another platform, or have access to a friend's copy, one thing you could try that might get you well on your way to a translated version of the NPCR flashcards is this. First, make sure you've got the latest version of the free HanDeDict Chinese-German dictionary installed in Pleco. Then, import the NPCR list into Pleco's flashcard system with HanDeDict set as the main / only dictionary to match entries to. To do this, tap on the "Dicts..." button in the Pleco flashcard "Import" dialog, select "Prefer Dicts" for "Definition Source" at the top of the screen, and remove everything but HanDeDict from the dictionary list on the left side of the screen. Also, in the main "Import" screen, set "Duplicate Entries" to "Allow." Proceed with importing the NPCR flashcards, then turn right around and export them to a text file via the "Export" screen, making sure to choose "Text File" as your file format and to check the "Card definitions" and "Free dict defns" boxes.

After doing all of that, that exported text file should be a copy of the NPCR list, but with every word in it that has a HanDeDict entry getting its definition from that instead of from the source list. You can then clean up / correct those HanDeDict-supplied definitions and put in your own translations for the ones that are still in English.
 

ipsi

状元
Yeah, no worries. All I've really done is transcribe the vocabulary from the book, so feel free to use my list as a base for generating the German version of the flashcards.

In terms of actually putting those flashcards into Pleco, Mike's suggestion isn't a bad one, but it won't necessarily work properly, as the text file format isn't flexible enough to contain all the category information that I've added to my NPCR flashcards - as far as I'm aware, it's only possible to specify a single category string (e.g. /NPCR/NPCR 1/Chapter 1/Text 1), but as I strip out and combine duplicates before entering it to Pleco, I needed the ability to have more than one category per card. So depending on the file you're making use of, chances are there aren't any duplicates, but there'll be several (at least 2+, potentially 7+) categories per card. Unfortunately, it's been quite some time since I've last done any work on this, and I actually can't remember which file I was working on... The latest file I've made available is different to the latest file on my machine, so I'm really confused as to what I've done with it all...

But yeah, if you want to translate the flashcards, go right ahead :)

If you want to get a duplicate-free XML file to import into Pleco, then you'll probably need to A) Use the custom text file format I've made up, and B) run a little program I've written on that text file. If you want to do that, I'll actually have to remind myself how it works!

- ipsi
 
Top