Today, I was unfortunately not able to make much progress on getting results from Stanford CoreNLP. Yesterday, I tried to give it a file with all 30k sentences in the dataset that I want it to parse. I left it running overnight, and unfortunately the process never finished. I figured that the file was too big for CoreNLP to handle, so I decided instead to just give it 5k sentences this morning. I let it run for 3+ hours, and eventually CoreNLP quit because it ran out of heap space… I’m not quite sure where to turn to here, because Knuth is the only powerful machine that I really have access to right now. I’m not sure whether to switch to another piece of software, or troubleshoot why CoreNLP is giving me problems.
In other news, I got started on the other part of the pipeline, which is to install SRILM–the second part of this whole pipeline. Unfortunately, I also ran into some errors here. At first, I ran out of disk quota space on Knuth when trying to build SRILM, but that is a problem I easily overcame by just deleting some files from undergraduate. Then, later on, I got an error very similar to this one, where Knuth seems to only have 64-bit libraries installed for a 32-bit binary that I’d like to compile. This error is representative of the kinds of errors that will be hard to solve without having superuser access on Knuth. I am not sure what direction to go in here, either.