July 6 (Week 3) – Maury

Today, I was unfortunately not able to make much progress on getting results from Stanford CoreNLP. Yesterday, I tried to give it a file with all 30k sentences in the dataset that I want it to parse. I left it running overnight, and unfortunately the process never finished. I figured that the file was too big for CoreNLP to handle, so I decided instead to just give it 5k sentences this morning. I let it run for 3+ hours, and eventually CoreNLP quit because it ran out of heap space… I’m not quite sure where to turn to here, because Knuth is the only powerful machine that I really have access to right now. I’m not sure whether to switch to another piece of software, or troubleshoot why CoreNLP is giving me problems.

In other news, I got started on the other part of the pipeline, which is to install SRILM–the second part of this whole pipeline. Unfortunately, I also ran into some errors here. At first, I ran out of disk quota space on Knuth when trying to build SRILM, but that is a problem I easily overcame by just deleting some files from undergraduate. Then, later on, I got an error very similar to this one, where Knuth seems to only have 64-bit libraries installed for a 32-bit binary that I’d like to compile. This error is representative of the kinds of errors that will be hard to solve without having superuser access on Knuth. I am not sure what direction to go in here, either.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s