July 18 (Week 5) – Maury

Today, I was finally about to evaluate a portion of Kauchak’s Wikipedia corpus, as well as the English Online corpus, with the language model trained on the CIC-FCE corpus I mentioned previously. On the Wikipedia corpus, the perplexity was computed to be 27.5864. On the English Online corpus, the perplexity was 26.0762. And, as a baseline, evaluating the perplexity on the model training set (the CIC-FCE corpus) is 24.8764. This change in perplexity between the baseline and the first two corpora make sense: obviously, the model should not be as surprised to see the text that it was trained on. Furthermore, it makes sense that the Wikipedia corpus has a slightly higher perplexity than the English Online corpus. The latter corpus was written to accommodate English learners, so the syntactical constructions that is uses are less complex. This then translates into a lower perplexity than that of the Wikipedia corpus, which does not make such considerations.

This comparison in perplexity is admittedly a little confusing. The relative differences in perplexity make sense. But the absolute perplexities do not. What is a “good” perplexity score? How low should a perplexity be so that we can consider the model to be an accurate representation of text for English learners? Do only the relative differences matter? And if so, how much of a difference is considered satisfactory?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s