Dictionary: Flexible pinyin search for multi-word strings

Shun

状元
Hello Mike,

as of right now, when entering multi-word pinyin in Dictionary search, there needs to be a dictionary entry which matches that pinyin string. If a user puts whitespace in between two or more pinyin words, would it not be useful if Pleco also searched for those words individually and listed the 3-5 most common word/expression matches for each? It already does this for Hanzi strings, of course, where the user input is more specific.

Combinations of Hanzi and pinyin are also currently not possible for multiple words in the 4.0 beta.

When watching Chinese TV, for example, I would be quite happy if Pleco helped me with combining longer expressions which were clearly understood phonetically.

Have you already considered such an addition to the core Dictionary functionality? Thanks!

Shun
 

mikelove

皇帝
Staff member
Honestly with pinyin I think this would be too messy - too many homonyms, particularly since any Chinese phrase of any length is going to have some single characters thrown in (and since people hate entering tones after their pinyin). There would be way too many cases where the character you were looking for did not appear among the top 5. Also, the spaces themselves would be somewhat unreliable, since you don't really know from listening to spoken Chinese where they should go.

If we wanted to support something like this I think we'd need to apply some intelligence to it, either the tiny local AI model we're probably going to have to roll out at some point or some other system to come up with a few linguistically plausible phrases that match the blob of pinyin you entered; just treating it like we do character searches would not deliver a satisfactory experience.

The good news is that smarter people than me have already developed systems like that for use in Chinese IMEs, so if you experiment with a couple of those I suspect you'll find one that does a decent job of turning the expression you heard into characters to search in Pleco.
 

Shun

状元
Thanks for the thoughtful answer. Such a feature—with or without AI—would certainly need to be used with some caution and be an off-by-default option (which understandably you'd like to have fewer of). But the way it works today, I think, breaks the flow of using a dictionary with pinyin even a little bit more, since you can enter only one pinyin at a time, and you have to keep in your short-term memory what other words followed it. At present, you're almost required to have a piece of note paper next to Pleco and then feed the words to the dictionaries one at a time afterwards. That way, it also isn't possible to see multiple definitions close together on the screen when using pinyin. That will surely improve by much with the desktop version with windowing, but for the smartphone/tablet version, and for the efficiency-minded, simultaneous display would probably outweigh the disadvantages you have mentioned. The better your Chinese 听力 gets, and the bigger your 词汇 already is, the more useful such a feature would probably be, because then you would get the word boundaries and even the tones right almost all of the time.

What may be another great option for me would be commas between pinyin words in the search bar. Then Pleco could simply spit out definitions for all of those at once, and it could allow you to switch between them using tabs or somesuch. Have you perhaps already weighed going in that kind of direction?

I will definitely have another look at the newest smart IMEs. And thanks for the like, @jurgen85! Let's share with one another when we have found a promising new IME. (preferably for iOS, but Android would work, too)

Cheers,

Shun
 

mikelove

皇帝
Staff member
I can see the basic logic of 'let me aggregate a bunch of pinyin searches in the search box', I just think it has to be a more explicit action - here are some pinyin words, look up all of them - rather than a fallback like it is with breakdown searches, because unlike with those, the odds of it actually producing a coherent, accurate string of translations is much lower. So maybe something like your comma suggestion, but probably with a more obscure character than a comma since people use those a lot in pinyin transcriptions of Chengyu.

Another issue I have with this is that you can't actually fit much pinyin in the search bar right now, and it's awkward/annoying to have a search text that's wider than the screen. An idea I've been thinking about for a while now is some sort of separate "sentence composer" search screen designed for bulk text input - input field that expands to multiple lines, easy controls for navigating around it, easy editing of word breaks / selection of which specific reading you want for a particular character, ability to save/retrieve your previous sentence breakdowns, that kind of thing; could maybe also hook this into the regular reader so you could go through a document sentence by sentence, with text immediately before/after visible but grayed out and this breakdown UI taking up most of the screen. But this would take a LOT of work to design / refine to a satisfying workflow so probably more of a 5.0 thing at this point.

Another idea would be that when you enter a string of pinyin that doesn't have a match, we would show you every result that matched the start of it - going from longest to shortest - and you could select one of those, at which point we would 'lock in' that result, replace that chunk of the pinyin in the search with those characters, and then show you a prompt for the next thing. Basically the same idea as a pinyin IME but with definitions.
 

Shun

状元
Great ideas, yes, I also thought along the lines of having a little smart editor. Before that happens, perhaps one could make the input box growable (first one line high, then dynamically growing to two or three lines) and add something like a "|" button to the keyboard which could separate multiple pinyin searches.

I also think the idea in the last paragraph would be great. One could tap on the right Hanzi from a choice of Hanzi, like you would in an IME.

Thanks a lot!

Shun
 

mikelove

皇帝
Staff member
Thinking more on the comma option, one quick version of this doable even in 4.0 - and without an off-by-default option - would be that if a search query starts with a /, we treat each subsequent / as the start of a new search and aggregate all of those. So "/nin2/叫/shen么/ㄇㄧㄥㄗ" would get you 您+叫+什么+名字. Would that make sense?
 

mikelove

皇帝
Staff member
Needs some tweaking, but:
1768509567515.png
 

mikelove

皇帝
Staff member
Also realized that I could do a version of my pinyin-to-characters idea through the context menu, so that's in too, with a new "replace ..." command if the characters for the selected word don't match the current text in the search box for that section:

1768512286992.png
1768512298561.png
 

Shun

状元
Amazing, thank you so much! That will be great!

For me, it wouldn't be a problem if Pleco displayed more than one/two matches for each pinyin entered, the main thing would be that they're all in the same list. But I'll try it out first with the next beta.

Shun
 

mikelove

皇帝
Staff member
I found it helpful in testing myself, but will wait to see other reactions - if it’s really popular we could even add buttons to the list cells to make it easier (though then we also have to add undo support for stray taps).
 
Top