July 8 (Week 3) – Maury

Today, I finally was able to get sentence parses for all of the sentences in the corpus from SRILM. And then, I converted the parses into a “sentence” by traversing the parse tree and printing out the parts-of-speech in a pre-order tree traversal. Finally, I fed in these “sentences” into SRILM and trained a language model using it. We hope that this language model captures the syntactical complexity of the text with respect to second-language English speakers. To verify that this language model indeed does capture what we want it to, we need to find several corpora, preprocess them, and compute the model’s perplexity across the corpora. My task for the next week is to find these corpora, preprocess them, and then compute the model’s perplexity across them.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s