Pleco Audio in Phrase / Sentence / Paragraph / Document

davidy · May 9, 2010

Can Pleco be made to optionally read out phrases at a time, or sentences, or paragraphs at a time. Or to even read through the whole clipboard or document. Could the cursor be optionally able to select single characters, single and multigrams, phrases, sentences for the reading?

This would be great for learning to read out loud. I would read a phrase, or sentence, then right-arrow to the phrase or sentence (as optionally set), listen to how it is pronounced. I could keep replaying the audio or move on to read the next phrase or sentence, right-arrow to it for a listen after my attempt and so on.

The option for Pleco to do complete text to speech for the document or clipboard would be when I just want to listen.

Thanks.

mikelove · May 10, 2010

Not really; our audio pronunciation system is relatively limited, it's great at pronouncing words (particularly words it has an exact recording for) but it doesn't have the logic needed to read out entire sentences correctly. All you'd get would be a long string of syllables and words without any sentence-length intonation / correct pauses / etc, something almost impossible to follow / understand for more than a few words.

We certainly might consider licensing / introducing the technology for full-fledged Chinese text-to-speech in the future, but to be honest, assuming we did license rather than develop it (developing a system that works well may be beyond our capabilities) it's unlikely we'd go to the expense of doing so on Windows Mobile. Another thing that has a better chance of making it to WM, though, would be some sort of module that allowed you to download native-speaker recordings for particular pieces of text and follow along - better still if we paired it with a document format that matched up timecodes in the audio with characters in the text, so that you could start playback from a particular point and could follow along with some sort of onscreen cursor.

davidy · May 10, 2010

mikelove said:
Another thing that has a better chance of making it to WM, though, would be some sort of module that allowed you to download native-speaker recordings for particular pieces of text and follow along - better still if we paired it with a document format that matched up timecodes in the audio with characters in the text, so that you could start playback from a particular point and could follow along with some sort of onscreen cursor.

That is exactly what I am after.
The size of the cursor could be optionally set to select
words ie single character / bigrams / trigrams / more
phrases ie up to commas / semi-colons / colons
sentences ie to fullstops, exclamation / question marks

I am currently pairing my Nokia as an MP3 player, with text on my WM manually. I read a phrase, then unpause the Nokia to listen to the phrase then pause, read another phrase, unpause etc. Sometimes I rewind to relisten to a phrase. I check the PinYin for more accuracy occasionally.

I find it has helped me read aloud a lot better and hopefully I will be able to speak PuTongHua better consequently.

Would something like that be hard to implement?

mikelove · May 10, 2010

The implementation wouldn't be difficult aside from word segmentation - sizing the cursor around words would be a bit challenging - but creating the data files would be; coding a document + audio file to precisely link each character to a point in the audio would be rather time-consuming, as would making the original recordings, and we'd need to build up a pretty good base of these for them to be usable. Perhaps if we made the file format nice and open and standardized and released a free desktop utility to aid in creation of these files we could get users to do a lot of the work on this - the open format would mean they could also be used in other software, so you wouldn't feel like you were just acting as unpaid Pleco employees

(wouldn't even need to be a desktop utility, I suppose, in fact it might be preferable to release it on mobiles since it's exactly the kind of work one can easily do on-the-go - listen to audio on headphones and tap on onscreen buttons to go forward / backward / mark each character start / etc)

davidy · May 11, 2010

mikelove said:
The implementation wouldn't be difficult aside from word segmentation - sizing the cursor around words would be a bit challenging - but creating the data files would be

Did you mean sizing the cursor around words would ONLY be a bit challenging - but creating the data files would be VERY challenging?

I thought selecting and cursoring to the next punctuation of user choice would be simple coding. As it is, the Pleco cursor already moves intelligently to incorporate two characters when they produce a word.

I agree that data files timed to audio files would be difficult.

Actually I am happy with text to speech if native recording is difficult to achieve. Take a look at nciku.com and the audio supplied with words, sentences etc. What they have on nciku.com for their sentences' audio sounds definitely like text to speech rather than native recording. There is a mode to create one's own conversation and hear it back - which can only be the computer being ever ready to read it back. The voice per character and words thought sound like a recorded native speaker.

If Pleco's Reader could be tweaked to allow audio on cursor movement over the words / phrase / sentences with the already available male or female voice, then that would be a great jump start. If the user wants to listen in parts, the user could either use the arrow keys or four-way pad to move through the document or clipboard to listen to predetermined amounts of text at a time. The other way would be to tap on the screen arrows or to tap on the document / clipboard text, whereupon the predetermined amounts of text would be selected and read out.

The Reader could also be set to read through the whole document / clipboard without the user having to move the cursor if listening is all that is required.

The advantage of text to speech is that one can use any text file and listen to it.

You already have all the technology onboard Pleco, just needs some extra code to achieve this audio enhancement.

mikelove · May 12, 2010

Sizing the cursor would be a bit challenging, yes - the correct length isn't always obvious, there are lots of sentences with multiple possible parsings and the very definition of a "word" in Chinese is somewhat debatable.

Chinese TTS isn't as good as native recording, but it's a lot better than what we could achieve just by stringing audio samples together - that really wouldn't be listenable for more than a few words at a time, we've tested it here. We could at least allow selection of ranges of text in the reader that are longer than a single dictionary entry (probably a good idea anyway) and perhaps play the audio for those in succession, but I'm hesitant to put any more time than that into this feature without licensing a "real" Chinese text-to-speech system.

davidy · May 12, 2010

Could you select based on punctuations?

eg 妈，您终于回来了！饿了吧，我去做饭。不用了，您看桌儿上。你已经把饭做好了！真好！

The cursor selections simulated in the different colours with punctuations delineating them. The cursor would move through those segments with user control. Alternatively, use only .!? for sentences.

Would that be easy and workable?

mikelove · May 12, 2010

Sure, I was talking about your specific suggestion that the cursor highlight the word that's currently being spoken - that seems in general like a very good idea, but it requires a fair amount of behind-the-scenes intelligence to implement. And still doesn't address the more general problem of coming up with audio that can be listened to over longer periods than just a few words.

Pleco Audio in Phrase / Sentence / Paragraph / Document

davidy

秀才

mikelove

皇帝

davidy

秀才

mikelove

皇帝

davidy

秀才

mikelove

皇帝

davidy

秀才

mikelove

皇帝