Need help with removing text line breaks

jleeyap

Member
I just moved to Pleco 2 on my Tungsten TX and am eagerly trying out the document reader (my main reason for upgrading). I cut and pasted text from a chinese website (actually www.hxwz.com) to both Notepad and also to Word 2007 using the instructions to save it as text and to use the utf-8 coding. It displays fine on the document reader but I find that the line breaks are carried over and since the original line length is longer than what can be displayed on the Palm screen, I end up with one character left over at the end of a line sitting by itself on the next line.

Furthermore, if the last word before a line break is linked to first word in the next line, pleco does not recognize the compound since I think it is respecting the line break. I don't know if I have explained myself ,for example- this is what I see on the screen: (a single character sitting alone on a line, and pleco does not pick up hui2yi4 as a compound word because of the line break)


更不用说我这样隔洋万里,相交如水的朋友。所以

忆起与刘晓波先生的点滴交往,聊以表达思念和敬意。

I don't know if this has been answered before, but can someone help me on any changes I have to do during the transfer.
 

mikelove

皇帝
Staff member
The best way to fix this would be to eliminate all of those soft line breaks in Word. There's a blank line before each new paragraph in this document, right? Assuming there is, here's what to do:

1) Do a find-and-replace to turn all ^p^p (two newlines) into @@ (two @ symbols).
2) Do a find-and-replace to turn all ^p (one newline) into nothing (blank box).
3) Do a find-and-replace to turn all @@ (two @ symbols) into ^p^p (two newlines).

That will delete those pesky line breaks while keeping your paragraphs properly separated.
 

jleeyap

Member
Thank you so much Mike, your solution solved the problem with the line breaks.

I encountered a different problem this time. Our work computer were reinstalled and still on Word 2007. I am encountering a problem cutting and pasting chinese text into word, resulting in missed characters. Not all webpages do this. For example, on text from www.xys.org (xinyisi), the text comes out all correct. But text from www.hxwz.org results in the following results:

original text:

专家认为,中国的收入差距已经超过了基尼系数规定的警戒线。基尼系数是判断收入分配均等程度的指标,其数值的表示介于0到1之间。当基尼系数为0时,表示绝对平等;基尼系数越大,不均等程度越高;当基尼系数为1时,表示绝对不平等。

Upon cutting and pasting to word 2007 (font defauts to Mincho): the words zhuan3 and then ren4wei2 appear just dots, and also guo4 of chaoguo etc also come out as dots. Strangely, when I am cutting from word to paste it to this message, the dotted words reappear. Any idea what setting changes I have to do. thanks again.
 

jleeyap

Member
Never mind, please disregard this last post. I found out that regardless whether the words show up in Word, when the text is transferred to the PDA, all the characters show up correctly. Must be just a word problem
 
Top