Today, I completed work on modifying the Bash script. It can now take in a configuration file that sets the parameters of the language models I want to create. It then creates all of the language models, computes their perplexity across a common test set, and prints out a table that compare the perplexities of each language model. Whew! Now that I have that done, I can now turn my attention to somehow combining the “general” corpus and the “by-learners” corpus (whether it be interpolation or something else) to see if I can lower the perplexity even more. For now, as an exercise, I am going to start by recreating SRILM’s linear interpolation. And for that, I have begun looking at the Python module arpa that lets me parse the language models of the corpora. Then, I can hopefully linearly interpolate the two language models. I should have an update on that tomorrow.