Strategic Lesson Selection for ChinesePod and others

Forum for sharing Pleco flashcard lists, as file attachments or website links. You can also use this forum to share sets of flashcard configuration settings, though general discussion of flashcards should go in the main Pleco 2.0 forum.

Strategic Lesson Selection for ChinesePod and others

Postby jiacheng » Wed Dec 23, 2009 4:57 pm

I've come up with an idea that I'm just now starting to use and am really optimistic about. It is a way to figure out which is the best ChinesePod lesson to study next. I've written some very rough programs (bash scripts) that make an informed decision. In a nutshell, it figures out which lesson's unlearned vocabulary is the most frequently used.

To do this, it requires 3 things:

1. An exhaustive list of your vocabulary, not an easy thing to get unless you use pleco, anki or some other program that can track your vocabulary.

2. Chinese character & bigram frequency data, freely available online.

3. ChinesePod vocabulary lists for each lesson that you want to choose between. (this was kind of a hassle to compile, and may not be perfect)

So the script reads in your vocabulary, the frequency data and then looks at the vocab list in each lesson. For each lesson, it figures out which words or characters that you have not yet learned, and then looks all those words/characters in the frequency table. it then adds those frequencies up and then divides the total by the number of new words. The results can then be sorted to determine the highest scored lesson, which is the lesson you should study next.

Currently, I'm using 2 separate programs to do this for single characters and bigrams. I don't have a frequency list for words of arbitrary length, but it sure would be useful if I could find one.
Attachments
chinesepod-vocab.tar.gz
Vocabulary for ChinesePod lessons
(550.06 KiB) Downloaded 121 times
lesson-select.tar.gz
Lesson selector scripts, character and bigram frequency data.
(419.31 KiB) Downloaded 114 times
jiacheng
Jinshi
Jinshi
 
Posts: 57
Joined: Sat Dec 13, 2008 4:23 am

Re: Strategic Lesson Selection for ChinesePod and others

Postby bao mingguang » Wed Jan 06, 2010 2:17 am

Jiacheng,

I applaud your efforts. It looks like you are off to a really good start with this idea.

I'd like to see more documentation, since although I am a Linux user I don't have much experience hacking shell scripts. I'd like to play around with the script but I'm not sure how to start.

I can help with the frequency list for words of arbitrary length. I've been downloading every frequency list I can find for some time now, and I merged them all together into a master list. Finally, I ran the entire list through Google and did a comparison for anomalies. You can find the list attached:

rankfile.zip
(2.1 MiB) Downloaded 153 times

The fields are as follows: Chinese word, adjusted ranking, Google frequency count (may not reflect current Google counts which are always changing), original frequency ranking.

Where Google didn't vary much from the expected I left the original count alone. In the other cases I used the Google data to moderate the rankings. See the 概要 and 笔记本电脑 entries for a couple of examples. I know there are still discrepancies in the rankings but the list should be pretty reliable as a general guide, that is, words near the top of the list are very common and words at the bottom are rarely seen.

I hope this helps. I'd like to see many more applications like yours appear.

bao mingguang
 
Posts: 7
Joined: Wed Jan 06, 2010 1:40 am

Re: Strategic Lesson Selection for ChinesePod and others

Postby jiacheng » Thu Feb 04, 2010 11:22 pm

bao:

Thanks a bunch!

I've made a slight adjustment to my script to read in this data and I'm starting to try it out. I'm attaching the altered script.
Attachments
count-best-lesson-ngrams-eff.zip
(719 Bytes) Downloaded 63 times
jiacheng
Jinshi
Jinshi
 
Posts: 57
Joined: Sat Dec 13, 2008 4:23 am

Re: Strategic Lesson Selection for ChinesePod and others

Postby bao mingguang » Mon Feb 08, 2010 7:47 pm

Jiacheng,

I'm glad to help. I see you've added a little documentation to the file. It's helpful to see what the command-line parameters are. How about the vocab file, is it just a flat file with one Chinese word per line? Do you use the exact same frequency list file I uploaded, or did you make modifications to read it with the script?

Can you give an example of how you would use the script? Eg. count-best-lesson-ngrams-eff char-freq.txt myvocab.txt lesson1.txt lesson2.txt lesson3.txt

bao mingguang
 
Posts: 7
Joined: Wed Jan 06, 2010 1:40 am

Re: Strategic Lesson Selection for ChinesePod and others

Postby jiacheng » Tue Feb 16, 2010 4:43 pm

Yes, the vocab file is just the same format you'd get if you do a text export from pleco.

Each line would basically look like this:
Code: Select all
词汇[詞彙]        ci2hui4        n. 〈lg.〉 vocabulary; lexis; lexicon

Allthough for the vocabulary, I'm pretty much ignoring everything beyond the first column.

Note that when you export from pleco, you should UNCHECK the "categories" box under the Include: section.

for the ngrams script, you can just pass it the exact uncompressed text file that you uploaded as the first parameter, just like your post:

count-best-lesson-ngrams-eff rankfile.txt myvocab.txt lesson1.txt lesson2.txt lesson3.txt

The output is kind of cryptic, but it will basically look like this:
Code: Select all
[average score]        [total score]        [lesson_filename.txt]        [newword1:frequency] ...


The output is sorted by the average score, which is basically the sum of all the frequencies divided by the total number of unlearned new words in that particular lesson. So the lessons most highly recommended by the script will be on the last lines.

Note that there are some issues in the script that result mostly from "儿" and "……" on word entries. I'll try to clean them up at some point, but it shouldn't cause a huge issue.
jiacheng
Jinshi
Jinshi
 
Posts: 57
Joined: Sat Dec 13, 2008 4:23 am

Re: Strategic Lesson Selection for ChinesePod and others

Postby bao mingguang » Sat Mar 06, 2010 1:33 am

Jiacheng,

Thanks for the clarification. I'm looking forward to putting the script to work.

So far I've had some problems running the script. I must have a different version of bash than you use. First, it told me:
line 13: declare: -A: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]

I changed it to declare -a and it got past that line ok. But it gets stuck here:
line 19: 国家: syntax error: operand expected (error token is "国家")

国家 is the first item in my vocab list. Any idea what's going on?

bao mingguang
 
Posts: 7
Joined: Wed Jan 06, 2010 1:40 am

Re: Strategic Lesson Selection for ChinesePod and others

Postby ipsi » Sun Mar 07, 2010 2:21 am

Treo Pro - WM 6.1 (Default Unlocked ROM), Flex Email, Pocket Informant, CE-Star (Standard), SPB Diary
ipsi
Zhuangyuan
Zhuangyuan
 
Posts: 682
Joined: Thu May 24, 2007 7:35 pm
Location: Wellington, New Zealand

Re: Strategic Lesson Selection for ChinesePod and others

Postby bao mingguang » Mon Mar 08, 2010 1:28 am

ipsi,

Thanks for the link. Here's my version info:
~$ bash -version
GNU bash, version 3.2.25(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2005 Free Software Foundation, Inc.

It looks like my version of bash doesn't support the -A option. Any idea of a workaround? (Yes, I know I need to upgrade my os)

bao mingguang
 
Posts: 7
Joined: Wed Jan 06, 2010 1:40 am

Re: Strategic Lesson Selection for ChinesePod and others

Postby ipsi » Mon Mar 08, 2010 1:44 am

My suggestion would be to try installed ZSH and pointing the script to that (e.g. on Ubuntu run
Code: Select all
sudo apt-get install zsh
, then putting, I assume,
Code: Select all
#!/bin/zsh
at the top of the script.), but I can't say if that would result in ZSH taking over from Bash as your default shell or not. I would try it myself, but not on my work laptop.
Treo Pro - WM 6.1 (Default Unlocked ROM), Flex Email, Pocket Informant, CE-Star (Standard), SPB Diary
ipsi
Zhuangyuan
Zhuangyuan
 
Posts: 682
Joined: Thu May 24, 2007 7:35 pm
Location: Wellington, New Zealand

Re: Strategic Lesson Selection for ChinesePod and others

Postby bao mingguang » Wed Mar 10, 2010 1:40 am

Thanks!

Progress! The script seems to be working for me now. However, I'm still not understanding the output very well. Maybe something is still going wrong.

I ran the script using the menu-stealer files since there are only 3. (Start small when testing, I say)

Here are my results:

loaded frequency tables
./count-best-lesson-ngrams-eff:shift:37: shift count must be <= $#
12580000 12580000 chinesepod_TMS0002-vocab.txt 花椒:871000
24088200 24088200 chinesepod_TMS0003-vocab.txt 下酒菜:0
34010000 34010000 chinesepod_TMS0004-vocab.txt 新疆拌面:0

The "shift count" part looks like an error.

Does the fact that there are only three words listed mean that there are only three new words for me to learn? I think there are more than that missing from my word list.

bao mingguang
 
Posts: 7
Joined: Wed Jan 06, 2010 1:40 am

Re: Strategic Lesson Selection for ChinesePod and others

Postby sfrrr » Fri Mar 12, 2010 4:21 pm

Strategic lesson choice for CPod. Sounds great.
But I can't figure out what you guys are talking about (except for an occasional the or a) and I don't have any idea how to use the script. Could you translate the script, your messages, or both into slightly less techy language? Thanks.
sfrrr
Zhuangyuan
Zhuangyuan
 
Posts: 453
Joined: Mon Apr 18, 2005 4:37 pm

Re: Strategic Lesson Selection for ChinesePod and others

Postby bao mingguang » Wed Mar 17, 2010 6:14 pm

Sfrrr,

I'm glad to see your interest in this undertaking. User Jiacheng wrote the above script and introduced it here and on Chinesepod. That page provides a better introduction to the script than this one, so I suggest reading it and the comments posted on it. This idea is really in its infancy right now, mostly because the script is only usable by people who have access to Bash v4 (if you don't know what that is, you don't have access to it).

Unfortunately I also don't have access to the script, since my version of Bash isn't new enough. I think it would be great to see this idea ported to a more widely-available platform.

What I like about this idea is that it moves a step closer to providing a custom-tailored learning experience. The ideal learning program incorporates strategic repetition of newly learned vocabulary (ex. Pimsleur, Rosetta Stone). The disadvantage of a packaged product is that it is very structured and unable to contain features that appeal to everyone. On the other hand, Chinesepod has lessons with more universal appeal. The script referenced in this thread is one way to bridge this gap. In theory, a person who creates a vocabulary list using Pleco can use this script to determine which Chinesepod lessons are most appropriate for learning new vocabulary/reinforcing old vocab.

bao mingguang
 
Posts: 7
Joined: Wed Jan 06, 2010 1:40 am

Re: Strategic Lesson Selection for ChinesePod and others

Postby sfrrr » Wed Mar 17, 2010 7:00 pm

Bao--thanks for the explanation. I'm about to look up bash on the web--just to find out a little about what it is. And I'm on my way to the CPod page. Thanks again.
sfrrr
Zhuangyuan
Zhuangyuan
 
Posts: 453
Joined: Mon Apr 18, 2005 4:37 pm

A Progressive Word List

Postby mihobu » Tue Apr 13, 2010 8:48 am

Bao Mingguang, thank you for posting your rankfile.txt. I've been looking for a way to produce a progressive word list (attached), and this helped out a lot. I limited the output to bigrams, but still don't have a good way to weed out high-frequency non-words! This list presents words by character frequency, then listing words (most common first) having only characters encountered so far.
Attachments
progressive-wordlist.txt
(607.06 KiB) Downloaded 33 times
mihobu
Xiucai
Xiucai
 
Posts: 15
Joined: Wed Mar 08, 2006 1:28 am
Location: Columbus, Ohio USA

Re: Strategic Lesson Selection for ChinesePod and others

Postby bao mingguang » Fri May 14, 2010 11:32 pm

mihobu,

You're welcome. :-) Do you mean you can't weed out non-words, or you couldn't before downloading my list? AFAIK all the words on my frequency list are actual words. I notice there are 23 words on your list that aren't on mine. I checked a few and found most of them in the dictionary. They don't seem to be very common ones though.

I like your word list. It's a great idea that could serve as the foundation for a very effective system of introducing Chinese vocabulary.

bao mingguang
 
Posts: 7
Joined: Wed Jan 06, 2010 1:40 am

Next

Return to Flashcard Exchange

Who is online

Users browsing this forum: No registered users and 1 guest