OCR!

character

状元
That's f'ing fantastic, Mike. You should throw together some promotional materials and send them to Apple -- get them interested in having Pleco be one of the apps demo'd on stage when they announce the iPod Touch with camera ("this combination is perfect for students and travellers").

Just to be annoying*, can it read text in the traditional top to bottom layout? Can I make a 1w x 3h selection box instead of 3w x 1h ?


* and because I have a stack of books in this format
 

mikelove

皇帝
Staff member
character said:
Just to be annoying*, can it read text in the traditional top to bottom layout? Can I make a 1w x 3h selection box instead of 3w x 1h ?
Yes; little tough to line this up while simultaneously hitting the "screenshot" button, but:
 

Attachments

mikelove

皇帝
Staff member
For the curious, "span lines" is for those annoying situations where a word starts at the end of one line and wraps around to the next (you point to the start of the word, tap "span lines," and point to the end of the word on the next line), "lock input" keeps the current entry in place to let you more easily change dictionaries / add to flash / etc, the top 5 icons are exit / zoom / refocus / toggle flash / history respectively, and the bottom buttons have the same functions they do elsewhere in Pleco.

There may also be an option to capture slightly larger blocks of text (a line or two), in which case you'd bring them up in the reader. Capturing full pages as a still image still doesn't work very well, unfortunately - we licensed a very good algorithm, and even with that we found that maybe 1/3 to 1/4 of the time you'd end up with total gibberish; even with English it's a dicey prospect at this point.

Suggestions on UI tweaks are welcome, BTW - we haven't actually found any examples of another program doing this on iPhone (there were a couple of English-language live OCR projects on Android, but their UIs all relied on having a shutter button and were considerably less "live" than ours in general), so we're probably going to get a few things wrong the first time out :)
 

character

状元
Beyond f'ing fantastic. :D Makes me feel like I'm living in the future and that I have some hope of getting through The Book and the Sword. :wink:

There may also be an option to capture slightly larger blocks of text (a line or two), in which case you'd bring them up in the reader. Capturing full pages as a still image still doesn't work very well, unfortunately - we licensed a very good algorithm, and even with that we found that maybe 1/3 to 1/4 of the time you'd end up with total gibberish; even with English it's a dicey prospect at this point.
Since you're able to pull longer sections into the the reader, maybe you can see if you can do a 'scanning window' where you keep track of 5-6 characters and use that to capture the text without lots of duplicates or missed characters.

I.e. if you see the following sets of characters recognized
他是一个高
是一个高大
一个高大的
个高大的男
高大的男孩

You would add 他是一个高大的男孩 to the reader. You could provide feedback to the user that he's shaking too much if you see a lot of duplicate characters or going too fast if you're not able to recognize a certain percentage of words as he scans in the text.

Suggestions on UI tweaks are welcome, BTW [...]
Supposedly green is one of the hardest colors to read -- perhaps you could make what's currently green white on an translucent background like the other UI elements, or have them vary based on the page, or be one color with a black outline so no matter what, they will be readable.
 

numble

状元
Wow. Much better than I imagined it would be. Never thought it would be instant, augmented reality style.

I'm happy I decided to make the jump from 3G and will be getting an iPhone 4 soon (hopefully).

I'd say the green looks pretty bad in the screenshot but not so bad on the video.
 

denesis

Member
Does the OCR feature work on street signs, storefronts, etc when I'm out and about? This is actually what I imagined when I saw the announcement but before I watched the demo video. OCR from a book/newspaper is of course great as well, but what about OCR from signs out in the "real world"?
 

mikelove

皇帝
Staff member
character said:
Since you're able to pull longer sections into the the reader, maybe you can see if you can do a 'scanning window' where you keep track of 5-6 characters and use that to capture the text without lots of duplicates or missed characters.

I.e. if you see the following sets of characters recognized
他是一个高
是一个高大
一个高大的
个高大的男
高大的男孩

You would add 他是一个高大的男孩 to the reader. You could provide feedback to the user that he's shaking too much if you see a lot of duplicate characters or going too fast if you're not able to recognize a certain percentage of words as he scans in the text.
Interesting idea, though we can actually get pretty good accuracy even across a whole line of text, at least on an iPhone 4 with 720p video capture; much lower framerate than just a few characters (drops from 6 fps to just 1, or even less - this is for the characters only, the video's still fast and live) but still pretty usable.

character said:
Supposedly green is one of the hardest colors to read -- perhaps you could make what's currently green white on an translucent background like the other UI elements, or have them vary based on the page, or be one color with a black outline so no matter what, they will be readable.
I'm not wild about the green either, but it does look better in real life than in the screenshots - I think the black-bordered-white-subtitle approach might make sense, though I'm worried that the characters might get too small and fuzzy on a 3GS. The live output characters in the middle box aren't essential, though, we added them as a debugging tool but liked them enough to leave them in - there'll probably be an option to turn them off in the finished version since some people might find them distracting.

numble said:
Also, post more screenshots (instead of video) so it can be shown off on other forums and the like.
Good idea, though the green really does look pretty ugly in the screenshots - perhaps we should at least temporarily change it for those.

denesis said:
Does the OCR feature work on street signs, storefronts, etc when I'm out and about? This is actually what I imagined when I saw the announcement but before I watched the demo video. OCR from a book/newspaper is of course great as well, but what about OCR from signs out in the "real world"?
It should, assuming they're in a standard font (not handwritten) - can be a little dicey if the lighting is weird or there's a complex image / bunch of lines / etc in the background, though.
 

numble

状元
I really suggest you popping over to Chinatown (or more easily, just pointing the phone at a picture of a street sign), and getting a screenshot of that. I know it'd be pretty hard to steady your hand for a screenshot, but I think that will wow people even more. Does it do white text on colored backgrounds (many Chinese signs are white on black, red, blue, or green).
 

mikelove

皇帝
Staff member
numble said:
I really suggest you popping over to Chinatown (or more easily, just pointing the phone at a picture of a street sign), and getting a screenshot of that. I know it'd be pretty hard to steady your hand for a screenshot, but I think that will wow people even more. Does it do white text on colored backgrounds (many Chinese signs are white on black, red, blue, or green).
Really hard to get that shot, yes - simultaneously pressing the launcher and power buttons while keeping the camera pointed precisely at a particular target :)

It can do inverted text, but it's not very good at detecting it automatically yet - if we can't come up with a better system for that, we'll add a manual toggle button for it and indicate it in the preview somehow (maybe by inverting the image inside of the recognizer box).
 

character

状元
numble said:
I know it'd be pretty hard to steady your hand for a screenshot [...]
http://www.iphone-tripodholder.com/ and many other solutions are available.

mikelove said:
Interesting idea, though we can actually get pretty good accuracy even across a whole line of text [...]
You might want to test and get some idea of what font size and number of characters (and in what direction) are the practical limits of the OCR. If you want, I and others could mail you scans of what we think are challenging material.

I think the black-bordered-white-subtitle approach might make sense, though I'm worried that the characters might get too small and fuzzy on a 3GS. The live output characters in the middle box aren't essential, though, we added them as a debugging tool but liked them enough to leave them in - there'll probably be an option to turn them off in the finished version since some people might find them distracting.
I think the characters add a lot. You could try a medium yellow with black border -- a number of films have subtitles like that.
 

numble

状元
Here's a feature request: A "scanning" mode.

Replace the definition window with a window displaying the current scanned words, the user hits "+" or "add/append" to append the current OCR selection to the scanned words. Maybe add quick keys for people to input punctuation marks, and some way to undo or backspace. The scanned items then are saved into a document or a pasteboard.
 

character

状元
A section made for PlecOcr:
http://www.9to5mac.com/app-store-sectio ... -marketing

'At first glance, Apple's new "Apps to Impress your Friends" section may seem like an innocent section showing you the coolest apps that will surely wow your pals. Look again and you will soon realize the section should be called "Apps to Impress your friends enough to buy an iPhone." The section includes amazing iPhone exclusives such as Siri Assistant, Ocarina, Red Laser, Bump, and Apple's own iMovie.'

'These apps leverage advanced technologies, that work extremely well on iOS.'
 

mikelove

皇帝
Staff member
character said:
You might want to test and get some idea of what font size and number of characters (and in what direction) are the practical limits of the OCR. If you want, I and others could mail you scans of what we think are challenging material.
Might be worth checking, yes, though at least initially I think we'll mainly be concentrating on the single-word mode - we've had pretty successful luck with entire book lines of 24+ characters with it in the past, which would translate to ~30x30 pixel characters, so that might be a good figure to keep in mind as a baseline. No need to send along any material when you guys will be able to test for yourselves in a couple of weeks :)

character said:
I think the characters add a lot. You could try a medium yellow with black border -- a number of films have subtitles like that.
Yellow's really hard to see on a computer screen, though, particularly since mobile displays have such crappy color gamuts - we'll play with a few things, though, maybe even make it user-configurable since we've already got a color picker interface anyway.

numble said:
Here's a feature request: A "scanning" mode.

Replace the definition window with a window displaying the current scanned words, the user hits "+" or "add/append" to append the current OCR selection to the scanned words. Maybe add quick keys for people to input punctuation marks, and some way to undo or backspace. The scanned items then are saved into a document or a pasteboard.
Definitely have something like that in mind, yes - might not make it into the first release but you should at least be able to easily create flashcards from the history screen. If we want to scan longer documents I think we'd probably just go back to a still-image capture system (already supported by our OCR engine) for those rather than rigging something together with live input - we're a little wary of that because it's very temperamental (as are all OCR systems, but with live input you don't notice since you're subconsciously doing a lot of the work for it), but when the lighting etc is correct you can get an entire page in there in just a few seconds.

character said:
'At first glance, Apple's new "Apps to Impress your Friends" section may seem like an innocent section showing you the coolest apps that will surely wow your pals. Look again and you will soon realize the section should be called "Apps to Impress your friends enough to buy an iPhone." The section includes amazing iPhone exclusives such as Siri Assistant, Ocarina, Red Laser, Bump, and Apple's own iMovie.'

'These apps leverage advanced technologies, that work extremely well on iOS.'
We'd love to get in there, but I think a Chinese dictionary app may be a little too obscure to interest their marketing department; they've shown no inclination to help us promote our app so far anyway.
 

character

状元
mikelove said:
We'd love to get in there, but I think a Chinese dictionary app may be a little too obscure to interest their marketing department; they've shown no inclination to help us promote our app so far anyway.
I think the key is using technologies they want to highlight/having features that work well in an ad. Your non-Apple HWR isn't something they want to highlight, for ex, even though it is impressive.

I would also be interested in a static or live 'scanning' mode -- copyright issues aside, it would solve the 'lack of graded reading material in digital form' problem. I'm excited about the OCR just for looking up random characters in printed material as well, but I suspect it will be tough to use on the bus.
 
Mike and the Pleco team are hands-down freaking awesome. Totally excited for OCR on iPhone/Touch and expansion to Android. I am such a fan that every so often the thought goes through my head "what can i do for Pleco?" Now I figured it out.

I write occasional articles for an online news site that is picked up on Google News where searching for "Pleco" pulls up nothing currently. After also reading the posting about cheap dictionaries and getting the word out on Pleco I've decided to write up a review of the OCR capabilities and submit it to the site. If it goes up, I'll put the link here. That'll be my gift to you guys. Cross your fingers it gets approved!

In any case, you guys rock.
 

mikelove

皇帝
Staff member
character said:
I think the key is using technologies they want to highlight/having features that work well in an ad. Your non-Apple HWR isn't something they want to highlight, for ex, even though it is impressive.
Well it is a cool thing to show off, that's true - Red Laser made a great demo after all - and they did have that Mandarin phrasebook in an ad way back when; it's impossible to get their attention by writing them directly, so Engadget or another blog would probably be a better bet.

character said:
I would also be interested in a static or live 'scanning' mode -- copyright issues aside, it would solve the 'lack of graded reading material in digital form' problem. I'm excited about the OCR just for looking up random characters in printed material as well, but I suspect it will be tough to use on the bus.
We've actually already implemented a static recognizer back when we were testing it, so I guess we could leave it in as a buried-in-Settings / barely-acknowledged-in-the-manual "experimental" feature like text editing; it really isn't accurate enough to sell, though.

ConfuciusTse said:
I write occasional articles for an online news site that is picked up on Google News where searching for "Pleco" pulls up nothing currently. After also reading the posting about cheap dictionaries and getting the word out on Pleco I've decided to write up a review of the OCR capabilities and submit it to the site. If it goes up, I'll put the link here. That'll be my gift to you guys. Cross your fingers it gets approved!
Thanks! Really appreciate that, this is definitely something we're going to be trying to get the word out on.
 

numble

状元
Might help if the demo is showing off something more people could find to be useful. They did have that Spanish translator before... "Where is the train station?" More people would probably find it useful if it was demonstrated reading a street sign or a menu... Not many people are thinking of needing help for reading Lord of the Rings in Chinese.

It may also be that Pleco currently is too complicated of an app design-wise (possibly even feature-wise) compared to what people associate with an iOS app.
 
Top