Today, I spent time thinking about the results that I got yesterday. Primarily, I was thinking about the way that I constructed the text given to the language model. To get the numbers that I got yesterday, I actually did a pre-order traversal of the syntax trees I got from Stanford CoreNLP. But, in retrospect, I am not sure that allows me to extract the information that I want from the text. So, today I decided to do some research on language models built on information gathered from constituency parses (e.g., parts of speech). I haven’t found anything on that yet.
From July 20th until July 22nd, I will be away. But expect an update on July 25th for additional progress!