July 11 (Week 4) – Maury

Today, I downloaded David Kauchak’s corpus of Normal and Simple English Wikipedia. The goal is to ultimately parse the Normal Wikipedia sentences using CoreNLP, write the parses to a file, and evaluate the perplexity using the language model trained on the CIC-FCE corpus that I’ve been working on all last week. I extracted the sentences and am ready to use CoreNLP, but unfortunately I am working under a Knuth disk quota constraint… I have tried running CoreNLP on my lab computer. But unfortunately, it crashed when I first tried it. So I’m resigned to wait until my Knuth disk quota has been increased to continue processing that corpora. But until then, I am doing a little bit of investigative work as to why I can’t run CoreNLP as a service.

