Reader feature requests — fixed vocabulary area and underlining/highlighting long phrases

psucom · Nov 12, 2021

I would like to propose having a dedicated vocabulary area in the Pleco reader (particularly with the spacious iPad) to complement TTS. As TTS reads the text, the more advanced vocabulary words (HSK level filters specified by the user) could be placed in the vocabulary zone (and remain there till queued out). A complementary setting option could also permit such identified advanced vocabulary words/phrases to be specially highlighted throughout the text. The current pop-up method simply goes by too quickly and is often unnecessary for a large majority of the words. The proposed feature permits the popups for the advanced words to linger for a while till they are pushed out of the queue with new words. The feature would also be useful outside of TTS by providing a short-term history of recently looked-up words immediately available on the screen.

Another feature request is to provide an option to underline/highlight proper nouns and/or long phrases. It’s not clear if proper nouns are well-defined in Pleco, but identifying long phrases is certainly not an issue.

Shun · Nov 12, 2021

I'd love that, too, even if it's still important to study the words afterwards, so I would also suggest a "+" button next to each word to add it to flashcards. It would also be nice if Pleco showed the first 4-5 words of the definition to the right of each headword. The words could move up smoothly, and there could even be a pleasant fade-out effect for words that are about to disappear at the top.

I'm under the impression though that Pleco has so far been a very neutral, functional app. Once Mike decides to add a number of other high-level "comfort" features such as including a much-simplified, wizard-style flashcard mode for absolute beginners, your proposed reader feature may fit into the picture perfectly, perhaps even as an Add-on.

mikelove · Nov 12, 2021

psucom said:
I would like to propose having a dedicated vocabulary area in the Pleco reader (particularly with the spacious iPad) to complement TTS. As TTS reads the text, the more advanced vocabulary words (HSK level filters specified by the user) could be placed in the vocabulary zone (and remain there till queued out).

We've added a feature in 4.0 that's something like a concordance - lists all of the words in the document. This could be narrowed to the current page, I suppose, or last selected paragraph, or something along those lines - what sort of unit of text would be most useful here?

The queue that you describe is an interesting idea, but I'm kind of waiting until we have a better sense of where things are going with iPad (is everyone going to just buy a tiny M1 Mac and call it a day?) - and with tablets in general, with Google now pushing this big new Android tablet revamp for next year - before investing a whole lot more time in adding new iPad-specific features beyond the ones we've already added in 4.0 (mouseover, key control, etc).

psucom said:
A complementary setting option could also permit such identified advanced vocabulary words/phrases to be specially highlighted throughout the text.

This is also added for 4.0, you can pick a bunch of categories to be highlighted with a set color or their tag colors.

psucom said:
Another feature request is to provide an option to underline/highlight proper nouns and/or long phrases. It’s not clear if proper nouns are well-defined in Pleco, but identifying long phrases is certainly not an issue.

That one I'm less clear on - we don't currently define proper nouns, doing that requires a bunch of ML stuff on the back end and our main goal for reader text analysis in 4.0 was simply having it be reasonably accurate + very fast. So we could consider working in a slower algorithm in the future that would do that but I don't know if most people would find the speed tradeoffs worth it.

Shun · Nov 12, 2021

Hello Mike,

I'm sorry about the second-guessing. I'll try to avoid it in the future. The very best thing for me would be to make Pleco fully scriptable, allowing external programmers/hobbyists to add nice features, customizations and UI's on top of it, making Pleco a "Chinese platform". Maybe in 5-10 years, this might become a reality. Have you ever thought of making Pleco a platform?

Regards, Shun

mikelove · Nov 12, 2021

Shun said:
I'm sorry about the second-guessing. I'll try to avoid it in the future. The very best thing for me would be to make Pleco fully scriptable, allowing external programmers/hobbyists to add nice features, customizations and UI's on top of it, making Pleco a "Chinese platform". Maybe in 5-10 years, this might become a reality. Have you ever thought of making Pleco a platform?

We've added a *ton* of customization in 4.0, and can further build that in specific areas based on what people ask for, but building a platform like you describe is really, really difficult in these days of locked-down modern mobile OSes; it's not that hard for us to let someone develop their own flashcard algorithm or scoring system, but the sort of extension system that would make it feasible for somebody to build something like this reader word queue would be almost impossible.

Maybe in a few years, if Apple has to support sideloading and Google has to support easy sideloading and we can release a version of Pleco that's not subject to disingenuous fearmongering about the dangers of unapproved code, we could consider moving Pleco in a direction like that, but in the present mobile market it's not something we can do.

Shun · Nov 12, 2021

Thanks a lot for the explanation regarding locked-down, sandboxed software, which I didn't think of.

psucom · Nov 12, 2021

mikelove said:
We've added a feature in 4.0 that's something like a concordance - lists all of the words in the document. This could be narrowed to the current page, I suppose, or last selected paragraph, or something along those lines - what sort of unit of text would be most useful here?

Some of the Pleco ebooks already have a similar feature, for example in the graded reader series, in which certain advanced vocabulary words/phrases are highlighted.

Purple culture has a convenient tool that, after parsing the provided text, will extract the vocabulary works according to user-defined filters (HSK levels). This sounds similar to what you are describing with the concordance but with a filtering option. In addition to filtering based on HSK, filtering based on "not being in specified flashcard lists" or filtering based on some measure of frequency (should such data be available) would also be nice.

mikelove said:
The queue that you describe is an interesting idea, but I'm kind of waiting until we have a better sense of where things are going with iPad (is everyone going to just buy a tiny M1 Mac and call it a day?) - and with tablets in general, with Google now pushing this big new Android tablet revamp for next year - before investing a whole lot more time in adding new iPad-specific features beyond the ones we've already added in 4.0 (mouseover, key control, etc).

That one I'm less clear on - we don't currently define proper nouns, doing that requires a bunch of ML stuff on the back end and our main goal for reader text analysis in 4.0 was simply having it be reasonably accurate + very fast. So we could consider working in a slower algorithm in the future that would do that but I don't know if most people would find the speed tradeoffs worth it.

I recognize this feature of a dedicated pop-up vocabulary area is a bit of a jump from the current implementation of Pleco's reader functionality. Here's a youtube video that somewhat depicts the idea and usage of this feature:

This is the kind of use case I had in mind. For example, after pasting in a chapter of a novel, the user could play the TTS and easily glance over at the vocabulary section when unfamiliar words come up. This feature does not necessarily need a lot of space or be limited to devices with larger screens, though serious learners would likely gravitate to the larger-screen devices.

Pleco is certainly fast. There is some room to improve accuracy with ML processing, but it's understandable that there's always the potential of "opening a can of worms". Still, improved parsing of the text as an option, even if there's a small delay at the start, would be nice. In addition to the improved parsing, the ability to add spaces, as another user recently commented on, would be nice. This spacing feature, for example, is nicely implemented in "The Chairman's Bao" app.

mikelove said:
That one I'm less clear on - we don't currently define proper nouns, doing that requires a bunch of ML stuff on the back end and our main goal for reader text analysis in 4.0 was simply having it be reasonably accurate + very fast. So we could consider working in a slower algorithm in the future that would do that but I don't know if most people would find the speed tradeoffs worth it.

Underlining is also a feature that is also a jump from the current implementation of Pleco's reader functionality. Underlining can be particularly helpful when there are many proper nouns in the text, which often comes up when reading text that has been translated into Chinese. Here's a wiki link to some brief text on "Underlines in Chinese": https://en.wikipedia.org/wiki/Underscore#Underlines_in_Chinese

If such an underlining feature could be implemented, it could be extended to easily spot the more advanced vocabulary words. This vocabulary underlining feature, for example, is implemented in the "Du Chinese" application, though that application does not provide any options for customization.

mikelove · Nov 12, 2021

psucom said:
Purple culture has a convenient tool that, after parsing the provided text, will extract the vocabulary works according to user-defined filters (HSK levels). This sounds similar to what you are describing with the concordance but with a filtering option. In addition to filtering based on HSK, filtering based on "not being in specified flashcard lists" or filtering based on some measure of frequency (should such data be available) would also be nice.

This is already supported in our current implementation of this feature for 4.0; can filter the list however you like.

psucom said:
This is the kind of use case I had in mind. For example, after pasting in a chapter of a novel, the user could play the TTS and easily glance over at the vocabulary section when unfamiliar words come up. This feature does not necessarily need a lot of space or be limited to devices with larger screens, though serious learners would likely gravitate to the larger-screen devices.

This seems like it's somewhat reliant on having short, curated definitions available for those tricky words - the flow doesn't really work as well if you're doing it with an arbitrary document where you might have to sift through half a dozen different senses of a word to find the relevant one. So it's more the sort of thing I'd consider in paid graded readers, or documents authored in our new custom Markdown dialect; something where the document author is providing the definition rather than the app doing it automatically.

We have already added an option to disable reader lookups for common / known words; if you combine that with the 'lock to bottom' feature, you could effectively make the last definition stick around until another supported one came up, and if it came by too fast you could pause and quickly go back to the previous one since it would skip all of the common words in between. Would that help matters any?

psucom said:
If such an underlining feature could be implemented, it could be extended to easily spot the more advanced vocabulary words. This vocabulary underlining feature, for example, is implemented in the "Du Chinese" application, though that application does not provide any options for customization.

Again, this is something that's a lot easier when you have a pre-made text (as they do); intelligently applying proper noun underlines to an arbitrary document is a whole lot harder than manually underlining proper nouns in a document you've specially prepared.

psucom · Nov 12, 2021

mikelove said:
We have already added an option to disable reader lookups for common / known words; if you combine that with the 'lock to bottom' feature, you could effectively make the last definition stick around until another supported one came up, and if it came by too fast you could pause and quickly go back to the previous one since it would skip all of the common words in between. Would that help matters any?

This sounds like a great option! However, I've looked high and low on the iOS settings and could find where to "disable reader lookups for common/known words". Google searching didn't help either. Any help on how to find it is appreciated!

mikelove · Nov 12, 2021

Sorry, "already added" -> added for 4.0.

Reader feature requests — fixed vocabulary area and underlining/highlighting long phrases

psucom

进士

Shun

状元

mikelove

皇帝

Shun

状元

mikelove

皇帝

Shun

状元

psucom

进士

mikelove

皇帝

psucom

进士

mikelove

皇帝