Custom User dictionaries and flashcard handling

ldolse

状元
I've got Pleco to make custom user dictionaries using flashcard import as discussed in the features thread, and I found a SQLite manager that allows me to tweak the dictionary file afterward, so I can pretty much set up everything the way I want. The current dictionary I've got is only a few hundred entries, so additional indexing isn't neccessary.

I'm using Excel for my raw data, and since Excel doesn't support private use characters and tabs I've got tokens for those. I basically concatenate my entries together using Excel, paste it into a text editor and replace the tokens with the correct characters. At that point I can use the flashcard import function. I'm using the old format rather than the XML format right now, as I don't need to write a really crazy regex to restructure the individual lines excel outputs into xml. When I update the dictionary I'm not updating a single entry, I just rebuild the whole dictionary using the same process.

What I'm concerned about though is how to create flashcards for this dictionary, and then update the dictionary in the future and keep the flashcard mapping. I looked through the schema and I don't see a unique identifier aside from rowid. Even if there was an obvious unique ID I'm not sure how I would go about mapping my raw data to refer back to a specific entry when I rebuild the database.

Is the flashcard to dictionary entry mapping supported with user dictionaries? Can we get some detail/advice about how to design a user dict with flashcards to support updates to the dictionary over time?
 

ipsi

状元
Without actually trying this, you can update the definitions and there'll be no risk, but with the present user dictionary entries that are linked to flashcards, you can't change the headword or the pronunciation without invalidating the flashcard(s) linking to it. What I mean by that is that it will delete the previous user dictionary entry when trying to modify the headword or the pronunciation. Changing the definition is ok, but that's it. I briefly talked about this with Mike in another thread. God only knows which one...

The row ID seems the most likely candidate for whatever is linking to it. You could create a flashcard pointing to it and see what comes up in the flashcard DB if you want to know for sure :). What that means is that adding entries is going to be really tricky, as that would offset all following rows, causing your flashcards to point towards invalid entries.

Best bet there would be to not sort them, and hope that they're added in the order they appear in the file :).

Just a few (very tired and hungry!) thoughts.
 

mikelove

皇帝
Staff member
Flashcard-to-user-dictionary mapping is not supported yet, in fact that's precisely why we have that "Unlock" restriction - flashcards right now just link to rowids. Basically, since we haven't actually designed the system for syncing handheld flashcard / dictionary data with desktops / web servers yet, we aren't sure what problems we're likely to encounter in doing so, and we don't want to commit to an editable dictionary entry ID system which we then have to abandon / find-an-elaborate-way-to-accommodate when it comes time to implement that sync system. So it's unlikely there'll be any official support for flashcards to link to edited / reimported user dictionary entries until the finished (non-"preview") desktop version of Pleco is available.

The database isn't sorted by rowid, it's sorted by whatever's in the sortkey field - if you'd like to impose your own sort order just populate all of the sortkeys for your entries with numbers and Pleco will sort the dictionary in that order. (it will use its own sortkey generator whenever you update an entry, though, so you should only do this if you're creating a dictionary you're not planning to edit on your handheld)

For right now, while I can't make any promises about future compatibility even with this, the best bet for keeping flashcards linking to a user dictionary which you're updating on your desktop would probably be to do as our software does and retire a rowid whenever the headword or pronunciation changes - consider each rowid to be permanently tied to a particular headword/pinyin combo and don't assign it again to any other combo. SQLite doesn't have any problem with there being "gaps" in rowids, it stores an auto-increment value in the database file and whenever you create a new record it assigns it that id and then increments it, so a deleted rowid won't appear again unless you specifically reassign it to another record. So there wouldn't be the issue ipsi mentions with entries getting shifted down.

Of course, since there's no way to specify row IDs in an import file (or view them in an export file) you'd have to make these changes directly on the database - if you're not comfortable doing all of the editing right in the file, you could dump it back to a spreadsheet, make your changes and then bring it back into the database with a series of UPDATEs / INSERTs (or "INSERT OR REPLACE" statements). So certainly not easy, but something you could probably make largely automated with a couple of regular expressions.
 

ldolse

状元
If it's not supported yet that's fine, in the case that it was I just wanted to make sure I took advantage of anything you were doing there. My SQL knowledge has become rusty to the point of near non-existence, but I'll see if I can figure out how to use insert or replace functions, as it's easy enough to start tracking the current rowids and add a sortkey manually in excel.

I'm not planning to edit on my handheld, basically because there isn't any way to get that data back to the original source file my desktop, so that will also prevent the issue Ipsi mentions, it's not even really an option. That's the same reason I don't really want to edit the SQLite file directly on the desktop. I imagine I'll stick with the current approach until the proper desktop app is ready.

Regarding headword/pronunciation, is there some 'right' way to handle a situation where the headword doesn't have any pronunciation? I've got a number of components from Extension A that don't have any pronunciation that I can find. I first tried using 'none' in that field, but the importer tried to convert it to valid Pinyin and it became 'nong5ne'. Then I tried using an empty pronunciation field, but I didn't like that they then became the first entries in the dictionary, so I settled on 'N/A', which seems to work.

Glad to hear about the sort order, that was the other bit I was going to ask about. It seems to be defaulting to an alphabetical sort, which was definitely not what I wanted for the radicals - the sort I'm using during the import at the moment is frequency (easier for the flashcards), but I'd much rather the final sort was by Kangxi#. When/how was the sort order imposed during the import? It looks like when I opened the file in the SQLite editor I believe I saw that the rowids themselves were also alphabetically sorted.

Not sure what the new MakePleco will look like when it comes out, but if it has some minimal support to specify the sortkey and possibly the rowid in a tab delimited file that would be great.
 

mikelove

皇帝
Staff member
OK. Definitely one of the biggest arguments for adding desktop sync.

There's not really a better way to handle entries with blank pronunciation, though I suppose there's an argument for making them come up last rather than first (wouldn't be too difficult to change this in a future update, we'd just re-generate all of the sortkeys). If you put a full-width Unicode tilde character (FF5E) in that field, that should get it to sort after all of the entries with Pinyin, as would a half-width symbol character like FFED/EE, though there's not a good way to enter either of those using Pleco. (a Unicode character-input palette like Word's Insert Character screen would be an excellent addition in 2.1)
 

ldolse

状元
mikelove said:
There's not really a better way to handle entries with blank pronunciation, though I suppose there's an argument for making them come up last rather than first (wouldn't be too difficult to change this in a future update, we'd just re-generate all of the sortkeys). If you put a full-width Unicode tilde character (FF5E) in that field, that should get it to sort after all of the entries with Pinyin, as would a half-width symbol character like FFED/EE, though there's not a good way to enter either of those using Pleco. (a Unicode character-input palette like Word's Insert Character screen would be an excellent addition in 2.1)
I wouldn't worry about it much regardless. I may go with FFED, but it's not a big deal. If I can get the sortkey working it becomes moot. Unicode input on Pleco itself would be kind of cool, but there are a lot of other things I'd rather see first, and it'll come for free whenever you get the desktop version up and running. Being able to set boldface directly within Pleco would be really sweet though.


Copied from the features thread:
mikelove said:
EAB8/BB for copy-whatever's-in-this-to-the-Input-Field hyperlinks
I haven't tried this yet, but was thinking about it. I noticed in ABC that even though the hyperlink is just Pinyin that it always goes to the correct entry with that specific Pinyin. Is there anything special I need to do to make sure it the hyperlink goes to the right entry?
 

ipsi

状元
ABC has superscript numbers to make each pinyin entry unique. Not sure if superscript numbers will work with user dictionaries, but if they do then just add that to each ambiguous pinyin entry so that it become unambiguous :)

EDIT: I believe the superscript numbers fall in the standard Unicode range for such things, though I couldn't tell you what that is off the top of my head.
 

ldolse

状元
Cool, I suspected it was the superscript numbers, but I wasn't certain. I'll try that out. I'm pretty sure they're standard unicode as well.
 

mikelove

皇帝
Staff member
Yep, superscripts are standard Unicode, though they're split into two blocks (1/2/3 are in 00-FF while 0 and 4-9 are in the 2000s somewhere). They should work correctly with user dictionaries, though it's another thing I can't promise we'll support later - the current implementation just checks all of the search results for a matching superscript and jumps to it if found so it doesn't really care whether the results are coming from a fixed or a user dictionary.
 
OK, I have 2 different SQLite managers, but cannot figure out how to "open" the file from Pleco. I assumed that it would be one of the files that are hotsynced.

Can someone give me the steps, please ?
 

ldolse

状元
A little more information on what you're trying to do would be helpful. Since there isn't a 'MakePleco' application ready for 2.0 I'm using the following steps:

Back up my 'real' flashcard DB and User dictionary to a separate location (in My Documents on WinMo, don't know Palm)
Delete those files from the original location after backing them up.
Import a flashcard file with the option to create user dictionary entries.
Copy the newly created user dictionary file to your desktop PC.
Open that with the SQL manager edit.
Give the file a new name (same naming convention as the other dicts) , and you should then be able to stick it the main program directory. Pleco should automatically find it.

Once Pleco has created the basic table structure you should be able to run basic sql commands or more using the manager, depending what manager you're using. Inserting new entries should be ok as well, but I haven't tested it to be sure. I'm using Mike T's SQLite Database App on Mac OS X, seems to have all the features I need, though I haven't gotten to the point of significantly manipulating the DB until I finish some research on the raw data.
 
I'm not trying to do much of anything except first see the data. I know SQL pretty well, however, and I thought it might be interesting to put together an automated web page to create user dictionaries.

However, I am using PALM, so the user dictionary is a PDB file. For example, my user dictionary (I think) is named "PlecoCUser48C7D37D.pdb". That's why I wanted to know specific steps (names, etc).

I haven't found any file that opens with the SQL managers yet...
 

ldolse

状元
hmmm... on windows mobile the extension is .pqb, full name is PlecoCUser490E96E5.pqb, trailing alphanumeric bit is randomly generated when the file is created. On the Mac app I mentioned all I did was select 'connect to database' pointed it at the file, and I was immediately staring at my data. One would think it would be the same on Palm, but I really don't know how it works.
 
OK, your file name verified that I had the right file. All I had to do was strip off the pdb header block. (I've done a lot of programming for Palm in past years.) It only needs a binary/hex editor to do that. Of course, putting a new header back on is a little more tricky, but not too much...

Another good reason to switch to a Windows Mobil platform, I guess.
 

mikelove

皇帝
Staff member
PlecoMover can put the header on the file for you if you stick it on your SD card and then copy it to internal memory. If you want to do the conversion yourself, each record needs to be exactly 4k (except for the last one, of course) - the other record header information shouldn't make any difference, we just divide whatever location SQLite wants data from by 4096 and pull it from the record at that index.
 

ldolse

状元
Ok, New problem relating to this.

I got the dictionary created, entries are fine. I wrote up some SQL to replace the sortkey, and you're right, it will be easy to write some simple regexes to create the SQL statements for updating entries on the desktop in the future, so things look pretty good there.

My new problem came down to the last finalizations to the dictionary. In the table pleco_dict_properties there are fields for DictIconName, DictMenuName, DictShortName, and DictName. I updated these fields to say what I wanted for this dict, then I changed the filename to Radical_Dict.pqb and placed it in the Pleco Program directory alongside the other dictionaries. When I relaunched Pleco none of these values were changed in the dictionary interface except Menu name. In the drop down menu it did update the title and called it 'Radicals'. Otherwise, in 'Manage Dicts' it was still called User C-E and the icon still said USR instead of what I'd updated in the database.

Beyond that, with everything I'd changed I'd expect Pleco to create a new 'real' User C-E dictionary if I add new entries. This didn't happen though. Attempting to add a new custom entry caused Pleco to put it in the dictionary I'd just created, which I don't want.

It seems to me like Pleco still views this as the exact same file as the original User Dict, and seems to be referencing some other settings, not sure if that's in the registry. Is it referencing the FileID in that same table to identify it as the same file? Is there a safe way to change this so it views it as a separate file?

I also saw there is an EditLock field - can I use this to prevent the file from being used when someone adds a new entry? If so let me know what the value needs to be.

Lastly, if I distribute this file, how can I distribute in a way both Palm and WinMo users can use it? Seems based on StephanHodges post that the Palm user dictionary format is slightly different from the PC with a pdb extension and slightly different binary header....
 

mikelove

皇帝
Staff member
Setting EditLock to anything other than 0 should prevent the dictionary from being used when someone creates a new entry - might be a specific value needed in a later version but for now it just checks to see if it's nonzero.

The icon / name / etc are cached in Pleco's preferences file (mainly so that the dictionary sort order can be retained even if some dictionaries temporarily go missing, e.g. due to a removed SD card) - you can fix those with the Reset Prefs to Defaults command in the Misc panel of Preferences. (they'll show up correctly right away on anyone else's system after they install it) The Reset button in Manage Dicts resets the icon / abbreviation, but doesn't reset the stored name since we weren't planning on people changing those; Reset to Defaults however resets everything.

For Palm users, anyone with a Palm and a memory card reader can turn the PQB into a PDB using PlecoMover; if you post the file here I imagine someone will chime in with a PDB version, or just e-mail it to me and I'll do it.
 
Also, if you want, we can host those dictionary files on http://china.panlogicsoftware.com

I will be converting all of that to be under the wiki, which will make it easier to upload files. Right now, the wiki is at http://china.panlogicsoftware.com/pleco if I recall correctly, and you could just put it up there for now, if you like.

I'm only going to rebase the wiki, so that it is at the base address (above)... when I find time. Had a server problem yesterday resulting in about 5 hours of "wasted" / unplanned time, so it will have to wait at least a week to move the wiki...

Stephan
 

ldolse

状元
Thanks Mike, that all worked, dictionary is now uploaded. Still ran into a couple minor snags:
  • Couldn't change the icon color no matter what I did to that particular field in the DB, tried several numbers
  • There is a bug with flashcard export and that dictionary - there are 321 flashcards, but when I exported as XML it only exported 220

@Stephanhodges - I put together a page to post the files up to your site, but I wasn't able to post the files themselves, you'll need to update your site so PQB files are supported.
 
Idolse OK, have added types for PRC, PDB,and PQB. Any other types (Windows Mobile) for Pleco that I should add? Regular text files should be zipped (or other formats) before uploading, due to security concerns.

I also sent you an email at the address you registered for the wiki.
 
Top