How does Pleco rank words by frequency?

Hello everyone,

I apologize in advance if the question has already been asked, I somehow couldn't find an answer. :confused:
How does the word frequency works in Pleco Dictionary ("Words" tab)?

I understand that words from all the available dictionary are consolidated, so how it it possible to create a reliable (if not exhaustive) frequency rank for all these words?

Thanks in advance,
Thanks Mike, this is great.

I don't know to what extent you can get into the details, but could you let us know what is the corpus that is used to compute the frequency numbers? I mean, is it coming from Internet scrapping of some websites, from books, articles, movies,etc. ?


Staff member
It's an aggregate of a bunch of different corpora, actually - in general we were going for breadth rather than precision, since we're not currently utilizing this frequency data for something where every position counts, like, say, designing a study curriculum.


So the frequency table is precomputed? Any plan to do online scoring when #results is small enough (e.g. <10) based on dictionaries that each result belongs to? Maybe can use the user's dictionary order as weight parameter.

I occasionally see the ranking not ideal when looking up for 成语. For example for the query: "qireny", I get following results.
1st: qirenyidou ABC
2nd: qirenyishi MOE
3rd: qirenyoutian CY GF MOE LMA HDA GH OCCABC PLC

I expect the 3rd one to be at 1st result.