Flashcards for TOCFL (2023), CCCC, TBCL

ivan1

Member
I've parsed out vocabulary from these taiwanese tests and converted to flashcards in pleco's format. Useful e.g. for seeing term levels, intended part of speech and sometimes definitions/examples.

TOCFL vocab was updated some couple years ago and I haven't yet seen a processed version of the latest list, so here it goes. Current version (2022/2023) has 7517 terms, or 7847 with variants expanded like here. CCCC is essentially children's TOCFL variant with just three levels up to TOCFL L2 / CEFR A2, 1197 terms / 1344 with variants.

There are basic definitions and part of speech tags in CCCC and TBCL up to level 3, and TBCL additionally has many example compounds and sentences. TOCFL list only provides POS, no definitions, but has them for terms at all levels.

Note: level numbers in tests are a little different: TOCFL n ≈ TBCL n+1 ≈ CCCC n+1.

Sources:

Part of speech tags should be pretty self explanatory, except "Vs" maybe unfamiliar to some - taiwanese linguists typically categorize adjectives as a type of verb, stative or state verb (Vs) because of how they work in chinese grammar. For more details see https://tocfl.edu.tw/assets/files/vocabulary/8000_description_202204.pdf

TBCL has a nice grammar points list here. There's not much really to parse, it's already in readily usable format, so I'm just leaving the link:
 

Attachments

  • tocfl-pleco.txt
    435.3 KB · Views: 183
  • cccc-pleco.txt
    92.5 KB · Views: 115
  • tbcl-pleco.txt
    1.1 MB · Views: 172
Last edited:
Thank you for the resources.
I need some clarification, however. What is the relationship between TOCFL and TBCL.
The current TOCFL has 7517 words.
TBCL has 3100 Chinese characters and 14425 words (source: https://coct.naer.edu.tw/TBCL/).

1. Is it correct to assume that TOCFL compiled their list from TBCL?

2. By any chance, do you know the total number of Chinese characters that is part of those 7517 words? My guess it that it will be either the TBCL's 3100 or less.
 

ivan1

Member

TBCL was a 2014 effort to create a set of reference guidelines for conducting Chinese language tests. It's not a test (unlike TOCFL), just guidelines, like list of words split by levels, grammar points.

TOCFL predates it by quite a while, established in 2005 or so. They revised their wordlist heavily over the years, I suppose with feedback from TBCL, but it is not identical. In particular TOCFL wordlist is quite heavily curated, whereas TBCL especially at higher levels seems to me quite a bit rougher, derived with a lot of automation and less of reviewing/curation and they haven't revised ever since - was just a one-off effort.

>2. By any chance, do you know the total number of Chinese characters that is part of those 7517 words? My guess it that it will be either the TBCL's 3100 or less.

Yea, 2563, I have actually a list here:
 
Wow. What can I say but thank you once again!

My current game plan is to make audio sentence flashcards for each of the 7517 TOCFL words. Once that's done and thoroughly reviewed, I'll start adding the remaining TBCL words (14425 - 7517 = 6908), which is kind of crazy when you think about just how many there are. That's almost an another potential TOCFL test.

Thankfully already knowing all of the 2136 Japanese jōyō kanji should this process a bit smoother and more encouraging.

By the way, you may already know this, but I located Taiwan's own official character list and it numbers 4808 (source: https://en.wikipedia.org/wiki/Chart_of_Standard_Forms_of_Common_National_Characters). This means the average Taiwanese completing high school is [supposedly] able to understand 2245 more than the foreigner who passed the highest TOCFL level. That 4808 list still outnumbers TBCL by an additional 1708 (I'm aware that a bunch will have a lower frequency in daily life usage).

This is not a request because your time is valuable, but I plan to print out a poster of said 4808 characters. Ideally, it would be greater if I could have them listed in order of frequency. Do you know of any existing website that can rearrange them in such a fashion? Their order seems to be random at initial glance: https://language.moe.gov.tw/001/Upload/Files/site_content/download/mandr/教育部4808個常用字.pdf An alternative would be to find the character list segmented by grade (if it even exists) because Japan actually does that.
 

ivan1

Member
Oh looks like the tool only can generate word frequencies... but you could download the whole word frequencies list and massage out per-character frequencies with a bit of scripting
 
I see. I know nothing about coding, but that's something to think about when I'm closing in on that phase of studies. Thanks for the suggestion.
 

ivan1

Member
Try asking chatgpt. It can certainly whip out a simple program like that and even run it for you on a file you point it to
 
Try asking chatgpt. It can certainly whip out a simple program like that and even run it for you on a file you point it to
Oh! I used it for trying to compare the two lists but at least the free version did't spit a response due to the length. But asking it to write a program is clever, didn't think of that.
 

scarbear

Member
I've parsed out vocabulary from these taiwanese tests and converted to flashcards in pleco's format. Useful e.g. for seeing term levels, intended part of speech and sometimes definitions/examples.

TOCFL vocab was updated some couple years ago and I haven't yet seen a processed version of the latest list, so here it goes. Current version (2022/2023) has 7517 terms, or 7847 with variants expanded like here. CCCC is essentially children's TOCFL variant with just three levels up to TOCFL L2 / CEFR A2, 1197 terms / 1344 with variants.

There are basic definitions and part of speech tags in CCCC and TBCL up to level 3, and TBCL additionally has many example compounds and sentences. TOCFL list only provides POS, no definitions, but has them for terms at all levels.

Note: level numbers in tests are a little different: TOCFL n ≈ TBCL n+1 ≈ CCCC n+1.

Sources:

Part of speech tags should be pretty self explanatory, except "Vs" maybe unfamiliar to some - taiwanese linguists typically categorize adjectives as a type of verb, stative or state verb (Vs) because of how they work in chinese grammar. For more details see https://tocfl.edu.tw/assets/files/vocabulary/8000_description_202204.pdf

TBCL has a nice grammar points list here. There's not much really to parse, it's already in readily usable format, so I'm just leaving the link:
Thanks so much for these. This is a stupid question, but how do I import these as flash cards? I tried the “Share” function in iOS, as well as opening it in Reader, but both give me jibberish.
 
Top