July 11 (Week 4) – Maury

Today, I downloaded David Kauchak’s corpus of Normal and Simple English Wikipedia. The goal is to ultimately parse the Normal Wikipedia sentences using CoreNLP, write the parses to a file, and evaluate the perplexity using the language model trained on the CIC-FCE corpus that I’ve been working on all last week. I extracted the sentences and am ready to use CoreNLP, but unfortunately I am working under a Knuth disk quota constraint… I have tried running CoreNLP on my lab computer. But unfortunately, it crashed when I first tried it. So I’m resigned to wait until my Knuth disk quota has been increased to continue processing that corpora. But until then, I am doing a little bit of investigative work as to why I can’t run CoreNLP as a service.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s