Randomly Ordered Sentences Game

Shun

状元
Dear all, dear @leguan,

I've just had a promising idea for a new way of learning sentence structure using Pleco Flashcards and the Tatoeba sentences. It reminds me of similar questions in IQ tests. The game is simple:

You are shown a sentence in your language and the corresponding Chinese sentence with its characters in random order. Your task is to figure out the correct order, that is, the sentence in the headword. (Self-graded test)

I'll program it using Python soon. Expect the flashcards soon.

Best,

Shun
 
  • Love
Reactions: JD

Shun

状元
Dear all,

thanks, JD! :) I just attach it here for you to try out, I'll add a longer description later. Here are the usage instructions:

- Import the file "order changer hsk1-6.txt" into your flashcards using the following settings:

Import_settings.png

- Study the categories using Test type Self-graded, Show: Definition. Under Display, turn off the option Filter head in defns if you have it enabled.

You have to reconstruct the correct original sentence from the scrambled sentence. You can use the English translation as a clue.

Pleco 4.0 will of course offer Fill-in-the-blanks for headwords longer than 4 characters. At the moment, for optimal results, I suggest that you write the sentence as you would put it together in Hanzi on a sheet of chequered paper. This way, you will think about it the most, and you will learn the most from it.

The benefits of this learning game, as I see them, are:
  • The user learns to form grammatically correct sentences, recognizing important constructions in doing so.
  • Increase Hanzi reading speed
  • The user learns to write characters quickly, and to recall the pinyin from the characters (or the characters from the pinyin) before the answer is shown.
  • If the user is experienced enough, they can even guess at the possible sentence meanings without looking at the English translation.
We still need to improve the HSK rating algorithm, but for now, it works acceptably well. I've limited the number of sentences per HSK level to 500, and the sentence lengths to 15 characters for now.

I attach the Python source code. It needs some additional data files to run (HSK levels and BCC frequency lists), so it's mostly for informational purposes.

Attribution: The sentences were taken from the Tatoeba project at tatoeba.org.

You’re invited to leave feedback and suggestions for improvement.

Enjoy!

Shun
 

Attachments

Last edited:

Shun

状元
Hi Weyland,

Yes, it should work for any sort of construction. You just have to recognize it and reconstruct it. Have fun!

Cheers,

Shun
 
Last edited:

Weyland

进士
As so far I know the current Pleco won't save these grammar constructions that have a ... in between. I'm going to assume that it "should" work in the same sense as they'll give you a sentence with
既 。。。且 。。。
But, only remove the 既 for you to fill in.
 

Shun

状元
The Tatoeba example sentences sometimes have ellipses in them, too—with "...", I think. The randomly ordered version of a sentence such as this:

他喜欢学汉语...

could be:

欢.他语..喜汉学

As you can see, all the characters from the original sentence are there, it's just your job to reconstruct it. Since it only works with the Self-graded test type right now, you can also just ignore the ellipses and tap on the "mark correct" button if you got the Hanzi right. Pleco accepts any text in any of its fields. I suggest you try it out, I'm already addicted! :)

Cheers, Shun
 

Weyland

进士
I suggest you try it out, I'm already addicted! :)
Maybe. I have already passed HSK6, so if I do it will probably include either the collection of 成语 we made, or the new HSK word-list or the 普通话水平测试 (PSC) word-list (were you able to get access to that yet? I not send me a pm).

Till now I have been using the app 必胜公考 to improve my Chinese.. and honestly, it's a humbling experience. It also has fill-in-the-blanks (填空) sections. But, seeing how some of the 成语 are past the 4000+ on the BCC frequency list the lexical words are sometimes not even part of the PSC list I'm still depending on Pleco's OCR function to guide me through every question.

Screenshot_20200611-225107.jpg
 

Shun

状元
Hi Weyland,

I've sent you a p.m., I'd love to have a look at your lists. Thank you! I also have the more difficult sentences from the thread 18,896 HSK sentences. I will use those to create a more challenging game, though I think the ones here are already plenty challenging, because I feel that usually sentence formation, and more generally, language production, is most learners' weak spot.

The level of the text in the screenshot you posted looks nice. It's a good level, definitely somewhat above HSK 6. You can tell from them what a mighty language Chinese is, it takes a long time to get a good feel for all its 成语.

Cheers,

Shun
 
Last edited:

leguan

探花
This is indeed a very exciting new development! Thank you for your great idea and implementation, Shun!

I'm already very much enjoying practicing with your sets and turn on the Reveal Parts Separately option in the Test Settings so I can have the option to display the correctly ordered pinyin on demand.

I'll play with a bit more and give you some more feedback later after thinking more about what might be able to be improved.

Best regards
leguan
 

Shun

状元
Love it, it's great to hear you've been bitten by this bug, too! :) You're welcome!

I can upload the full sets and other languages soon. I'll try to improve on the HSK rating by averaging the HSK and BCC scores and giving the BCC score more weight, since that one is more complete with its greater number of words.

I'm curious to hear your suggestions for improvement.

Best regards,

Shun
 

leguan

探花
H Shun,
One idea I have is that the scrambled Chinese Chinese characters could be placed in the Mandarin Pronunciation field instead of in the Definition field along with the English. That way one could choose to show the scrambled Chinese characters as a hint only after thinking about how to translate the sentence first.

Not having the Pinyin available to use as a hint would probably make the flashcards less useful for practicing recall and writing of Chinese characters though. However if the goal was more on (oral) production of sentences, then having the scrambled Chinese characters as an on-demand hint perhaps might even be preferable even if less optimal for practicing recall and writing of Chinese characters.
What do you think?
 
Last edited:

Shun

状元
Sounds very good! I'll be able to answer by this evening, European time.
 
Last edited:

leguan

探花
Great!
I have one more idea - if the goal of having the user unscramble the sentence is to have them practice putting words into a correct order then it might be preferable to scramble by words rather than by individual characters. Scrambling by words rather than characters means less detective work for the user to figure it what words are in the scrambled sentence. However, I think such detective work might not be worth the extra time spent on it. What do you think?
 

leguan

探花
Yes, that would be great!
After thinking about and practicing with your flashcards more, I now think both approaches (scrambling by words and by character) are both probably very useful for our Chinese studies, and that more actual practice with both types of cards may be necessary to get a good feel for their respective advantages and disadvantages.

Edit: This also would seem to apply to the the two different types of flashcards where the scrambled Chinese characters appear in the Definition field with the English and separately in the Pinyin field. Of course, this would mean a total of four different types of flashcards might be need to be made available to suit each user's preferences. ;) :D
 
Last edited:

mikelove

皇帝
Staff member
FWIW, 4.0 also allows much more specificity with multi-choice questions; you can use a custom field for the multi-choice answer and can also link each specific card to a set of cards to draw its incorrect choices from.
 

Shun

状元
@mikelove This sounds impressive, many thanks! Let's stress that 4.0 will truly be an all-new app (at least under the hood). I'm very much looking forward to it.

@leguan I started with your first idea concerning the fields. I attach the full sentence files (not capped at 500 sentences per HSK level anymore) in both arrangements. For the first arrangement, it is necessary to select Fill in missing fields under Import to fill in the pinyin based on the dictionaries you have installed. This may take a while. The second arrangement doesn't require this, as all three fields are already filled with something.

Reordering of whole words will be next. I'm just not feeling too well health-wise at the moment (not Corona-related), but it will definitely come.

I'll try out both systems and tell you how I fared.

Best regards,

Shun
 

Attachments

Last edited:

Shun

状元
Dear @leguan, dear @Weyland,

I can confirm that leguan's arrangement of fields:

- Headword: The correctly ordered sentence.
- Pronunciation: The sentence in randomized order
- Definition: The translation of the sentence in a Western language

works really well. I do prefer it to my original arrangement. To give a simple example of how this learning technique can be used, I attach some notes from a relatively easy learning session. (on an iPad or Surface tablet) At the beginning, you just write down all Chinese words that come to mind for translating the English sentence, perhaps already forming a complete Chinese sentence. In the second step, you reveal the pronunciation field with the sentence out of order, where you can complete your sentence guess with any words you didn't yet recall, trying to get them in the right order. At the end, you check by revealing the headword if your guess was correct. You mark any corrections with a different-colored pen something like this:

Page 1.jpgPage 2.jpg

So this also works as a studying history, allowing you to review a day later what mistakes you have made, and reinforcing your learning points.

Best,

Shun
 

Shun

状元
Hi all,

I enhanced my Python script to cleanly separate sentences into shuffled 1- to 4-character words. (see attachment) I had to use a separate array to denote which parts of a sentence were already taken and which weren't, so I could spread the unknown characters between the sentence parts it did recognize.

I increased the sentence length limit to 40 characters for Tatoeba, now that it has become easier to recognize its parts when words stay together. Here's an example of a card:

Original sentence他听到有人叫他的名字的时候,正在昏昏欲睡。
Scrambled sentence, 他 的 正在 的 有人 他 昏昏欲睡 。 名字 时候 听到 叫
English translationHe was about to fall asleep, when he heard his name called.
HSK level5


I also used the more difficult wordlist with example sentences from dict.cn and applied the same algorithm to it. I like the results a lot so far. As a learner, one can handle much longer sentences now that the words stay together. Only sometimes does the word recognition produce unexpected results, but these cases are easy to handle for humans.

I think both sentence sets work great like this. Your feedback is welcomed. Next, I'd like to improve the HSK rating algorithm.

Enjoy,

Shun
 

Attachments

Top