Today, I finally was able to get sentence parses for all of the sentences in the corpus from SRILM. And then, I converted the parses into a “sentence” by traversing the parse tree and printing out the parts-of-speech in a pre-order tree traversal. Finally, I fed in these “sentences” into SRILM and trained a language model using it. We hope that this language model captures the syntactical complexity of the text with respect to second-language English speakers. To verify that this language model indeed does capture what we want it to, we need to find several corpora, preprocess them, and compute the model’s perplexity across the corpora. My task for the next week is to find these corpora, preprocess them, and then compute the model’s perplexity across them.