July 1 (Week 2) – Maury

Today, I’ve decided to abandon the whole “examine the errors” in the corpus approach that I was trying earlier, and I’m going to go back to examine parse trees. I spoke to Prof. Medero today, and she suggested that I first create a parse tree for every sentence in the corpus. Then, I grab the Penn Tree Bank annotation for each word in the sentence, and create a language model based off of the frequencies for each annotation. After that, I can generate how likely a text’s syntactic structure might be to appear with respect to the text written in the corpus. And that, in turn, might help me judge how difficult a text is for second-language English speakers.

So far, I’ve extracted all of the sentences. But I am getting a problem with Stanford CoreNLP in that it keeps timing out if I give it a large number of sentences to process. That was the roadblock at the end of the day, so I decided to give up and try again on Tuesday.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s