No user-created PlecoDocs

mikelove · Aug 25, 2008

Small design change announcement I felt I should probably give a separate topic:

We've decided after careful consideration not to release a converter utility for the "PlecoDoc" format in the document reader. Basically, the format's really not that good, doesn't support anywhere near as many features as we'd like it to (not even links), and since it uses our proprietary database format it would be impossible for us to open-source it, something we'd really like to do with all of the user data formats going forward, along with tying us to that format for longer than we'd like to be tied to it (it's almost unchanged since 1.0 and we're kind of stretching it to the limit in 2.0 as-is). We'll probably still use PlecoDoc ourselves for a few things like dictionary appendixes (which yes we really are planning to release this time around) but it's not going to be a major part of the software anymore.

Instead, we're planning to add support for documents in XHTML. This won't happen in 2.0.0 but should hopefully make it into one of the 2.0.x releases - we already have some code for parsing XHTML from our internal PlecoDoc converter (it's what we use as an input format) so it's mostly just a matter of getting that ported over to Palm/WM. It won't support all or even most of the tags, at least not initially, just a few basic formatting ones, links, and some sort of way of specifying bookmarks / chapters, but it gives us almost unlimited room to grow and offers some tantalizing possibilities as far as better integrating Pleco with web sites. It'll take a little while to scan a document for chapters/bookmarks the first time you open it, but it should be able to cache them to open documents more quickly on successive uses. (along with this we plan to add support for directly reading UTF-8 so you won't have to wait a long time for the file to be converted to UTF-16 on startup)

So that's the plan - the reader isn't going to be all we'd like it to be in 2.0 but it should improve enormously in the next few releases.

sfrrr · Aug 25, 2008

I'm looking forward to some big changes in the Pleco Reader because I am getting tired of managing six or eight different readers, each for a few piddly files. I had hoped that by now all the various companies selling WM readers would have figured out that they weren't going to conquer the reader format wars. I deally, one or two readers would be able to read everything produced for the pocket pc, but I know that's a pipe dream.`

Sandra

ipsi · Aug 25, 2008

Interesting. I'm a bit sad that we won't be able to get nicely marked up documents in 2.0, but that's not a major issue. Could you not just use h1/h2/h3/etc tags for bookmarks? Maybe have something like <h1 formating="none">...</h1> for making a bookmark that doesn't take normal h1 formatting, and instead just looks like the surrounding text? Of course, I don't know if that's valid XHTML. I'm fairly certain it's not valid HTML. Could also do something similar to specify whether a bookmark should start a new page or not, for example.

For sufficiently long documents, it might be useful to provide an option to 'wrap' the documents in a PLD file that would also include a pre-generated index, so you don't have to scan several MBs on startup. This would obviously be most useful when distributing documents yourselves that aren't likely to change.

But yes... I like the change. Now I just need to reformat all my documents.

mikelove · Aug 26, 2008

sfrrr - that's partly a DRM thing, I think; everybody wants to sell electronic books, but publishers aren't yet comfortable with releasing them in a non-copy-protected format as with MP3s. (and may never be - in the case of music DRM was thwarted by the existence of unprotected and easily-rippable CDs, whereas with books you actually have to scan / OCR the thing in)

ipsi - we actually use h1/h2/h3 in our input format for PlecoDoc now, but I'm a bit hesitant to do that with our new format since ideally we'd like this to be able to read regular XHTML files in the future - better to do something that won't interfere with / get confused by non-Pleco-specific web pages. Maybe just use HTML anchor tags, with some sort of extra specifier at the beginning establishing which of those were actually bookmarks and in what hierarchy. Good idea about a wrapper format.

ipsi · Aug 27, 2008

The need to scan hasn't stopped book pirates, but the drop in quality is vastly more noticeable than it is for MP3s.

That could work too. Though I assume it'll be done in such a way that, say, Firefox will be able to display the page (mostly) correctly?

I think the wrapper format is actually pretty much essential if you ever plan on including images (or other content) with the documents. Could also provide you a way of encrypting/obfuscating content if you're ever able to sell stuff that you're not allowed to leave unprotected.

estudiando · Sep 15, 2008

What is the current easiest way to create documents compatible with Pleco2 reader? I did a couple a few months back but now I can't find the post describing the process. I want to copy a story from the web or from chinesepod transcripts and paste it in a txt file and then convert it.

stephanhodges · Sep 15, 2008

When you do, could you either post the process, or feel free to write it up yourself on the wiki? Thanks

estudiando · Sep 15, 2008

OK, sorry for not looking at the reader before asking a question. The new reader can read .txt docs.... no need for conversion...

No user-created PlecoDocs

mikelove

皇帝

sfrrr

状元

ipsi

状元

mikelove

皇帝

ipsi

状元

estudiando

探花

stephanhodges

状元

estudiando

探花