Importing large amount of texts from the Bible

Shun

状元
I'll see what I can do with the Chinese Text Analyzer John. mentioned. Or I could use a short Python script together with John.'s BCC frequency list to find and count all the words. I will need a few days for it, though.
 

Shun

状元
Hello, @岩恩!

I followed John.'s advice and parsed the file using the Chinese Text Analyser. It didn't launch properly under macOS Catalina, so I had to run the CTA under ubuntu, which worked very well. If you wish to study Chinese texts in general, I can recommend that you try (free full version for 14 days) and buy Chinese Text Analyser, which has a pretty good price of about 11 (eleven) USD:


I downloaded the work 新标点和合本 from:


and converted it to Unicode text using the "calibre" EPUB reader, for further processing by CTA.

I attach the 新标点和合本 word frequency list output by the CTA. It contains each word that occurs in the 新标点和合本, along with its pinyin (not always the right one, so it may be better if you let Pleco fill it in), a short English definition, and its frequency in the 新标点和合本 after the definition. Would that be what you were looking for? I also attach the simple Python script using which I extracted unique characters and frequencies from the list.

If you wish to add this list to Pleco Flashcards, you can simply add "// <<Your Pleco flashcard category name>>" as the first line using a text editor.

Hope this helps,

Shun
 

Attachments

  • Xin Biao Dian He He Ben - Hanzi Pinyin Definition Frequency.txt
    765.3 KB · Views: 1,835
  • Get_three_columns_unique.py.txt
    457 bytes · Views: 324

岩恩

秀才
@Shun
I do not know what exactly to say. The work you put it to provide this is awesome. This is exactly what I am looking for. Thank you for not only providing great help but also being a great teacher. I appreciate it all immensely. I will be looking into the converter tool more. Please know this late reply also does not denote any lack of gratitude. I am simply just getting to see this.

I appreciate it greatly and also shout outs to you and John again both for blessing Christopher, myself, and all others who may come across this looking for similar help.

Thanks much,
Yan
 

Shun

状元
@Yan

I'm happy that I could meet a need! It took me only about 20 minutes, and it was a good exercise for me. I love to teach, as well. If you should run into any subsequent questions, feel free to follow up.

You're welcome,

Shun
 

岩恩

秀才
Hi @Shun Apologies to bother you again but I’m trying to get the 新标点和合本 word list set up as flashcards on a new iPad with Pleco but I’m having trouble remembering/knowing the process. I have been able to successfully get the.txt file your provided me with and I see that it is showing up in my file manager but unable to successfully get it into my flashcards. Help again with this would be greatly appreciated. Thank you so much!
 

Shun

状元
Hi Yan,

that's all right. You should be able to open the text file in the "Import Flashcards" dialog. Then all you have to do is to tap Import. One of the posts above also has recommended import settings, if that was the post you obtained the text file from.

If that operation fails, feel free to ask again.

You're welcome,

Shun
 

岩恩

秀才
Thank you Shun for your reply again. I am not sure as to what exactly the issue is.
I have imported it but after doing so the list does not show up in the organize flashcards section. I have also gone to file manager but not luck there either. I took 2 screenshots.
Also, however with the OP Christopher G‘s request, I was able to simply have that frequency word list file, for 新译本, go right into my flashcards. Being not so technologically literate in these areas I am not sure what the difference is between the two.

Grateful for the patience, grace, and help. and
461EFA19-40B5-4285-963C-F3644AB817DC.png
9B9DE4FE-CF25-450A-ADC6-61A6FC89EE21.png
 

Shun

状元
Hello Yan,

my bad. I see there was no category line at the beginning of the file you've just imported. You should now be able to find the entire flashcard list in the Uncategorized category in Organize Flashcards, so that if you create a new category folder, then select all cards in the Uncategorized category and move them to the category folder you've created, they should be stored properly.

I believe the difference between the files is that the one I uploaded contains both the definitions and the frequencies, whereas the others come with the frequency numbers only.

Hope this helps, you're welcome,

Shun
 
Last edited:
Top