Week 11, November 16-20 – Maury

This week continues my work on tuning the baseline feature functions for Moses.

My first action item was finishing the creation of baseline feature function(s). In a previous meeting, Prof. Medero suggested creating two feature functions: one feature function that penalizes for the number of characters of each word, and another feature function that penalizes for the length of the current hypothesis. I have implemented these feature functions since then, in this Github commit. Depending on whether the weight I give to these feature functions, the output changes (which is good!).

Recall from last week this sample input sentence.

the first and broadest sense of art is the one that has remained closest to the older latin meaning , which roughly translates to " skill " or " craft , " and also from an indo-european root meaning " arrangement " or " to arrange " .

Ideally, I want this simplified output:

the first and broadest sense of " art " means " arrangement " or " to arrange . "

And without the feature functions (i.e., just the translations/ language model), Moses gives me:

the first and broader sense of art is the one that has remained closer to the older latin meaning , which roughly translates|UNK|UNK|UNK " to skill|UNK|UNK|UNK or " craft , and also from an indo-european root meaning " arrangement " or " to arrange|UNK|UNK|UNK " . " " to roughly|UNK|UNK|UNK

where the differences between the input are in red, missing words are italicized, and extraneous words are bolded. Besides the change to broader/ closer, not much has changed. It’s still the same length, and no words have been simplified. Introducing the feature functions with an equal weight as the translation/ language model changes the output. I instead get:

the first and broader sense of art is the one that has remained closer to the older latin meaning , which roughly translates to skill|UNK|UNK|UNK or craft ; and also from an indo-european root " or " to arrange|UNK|UNK|UNK " . meaning " arrangement or to arrange " " " " roughly|UNK|UNK|UNK latin meaning which translates|UNK|UNK|UNK indo-european

Clearly the feature functions have an effect, here. If we place significantly more weight on the feature functions (e.g,. feature functions are weighted 100 times more than other features), we can get completely nonsensical output such as this:

, , . to or an or to to the of art is the one and the has and also from root that older latin which skill|UNK|UNK|UNK craft first widest sense " " " " " " " " arrange|UNK|UNK|UNK meaning roughly|UNK|UNK|UNK meaning closest remained translates|UNK|UNK|UNK indo-european arrangement

which I did not bother differentiating from the input because they’re simply too different. Notice the preference for short words, especially in the beginning.

One could easily see that there is an element of tweaking with the weights that needs to be done here. This is especially something that could matter when I have a more meaningful feature function; maybe I can use the weights as a way to tell Moses how simple I want a translation to be? This is something that I definitely want to take on in the future, after I create more meaningful feature functions. This is a major task of mine for the rest of the semester (which ends in mid-December).

The next task that I took on this past week was expanding the Moses translations phrase table. I may have mentioned previously that I am using the XS (eXtra-Small) PPDB datasets of: Lexical Paraphrases, Lexical Identities, Phrasal Paraphrases, and Phrasal Identities. I decided now that I am at a point where I want to expand the phrase table. So, I downloaded the equivalent PPDB datasets, but of size L (Large) instead. Even though preprocessing and Moses phrase table loading took much longer than it did with the XS size, I hope that I will get enough gains in performance to justify switching datasets. In the future I expect to look at some Moses phrase table loading optimizations, to speed it up as much as I can.

Lastly I was tasked with, but did not get to, investigating creating a new language model using SRILM. For that, I would need a corpus to build the model with, and I currently do not have one yet. So I will put that on the back-burner.

Week 10, November 9-November 13 – Maury

During this past week I continued my investigation into Moses. More specifically, I analyzed my results from last week, coded up a simple feature function, and scoured the web for more information about Moses.

Early on in the week, I found out that the simplicity (i.e, small size) of our language model is starting to show in subtle ways. Prof. Julie and I were looking at the output from the system I got up and running last week, and we found a sentence from the PWKP corpus that illustrates the problems a simple language model can trigger. The input sentence was:

the first and broadest sense of art is the one that has remained closest to the older latin meaning , which roughly translates to " skill " or " craft , " and also from an indo-european root meaning " arrangement " or " to arrange " .

And the corresponding output sentence from Moses is

the first and broader sense of art is the one that has remained closer to the older latin meaning , which translates " skill or " craft , and also from an indo-european root meaning " arrangement " or " to arrange " . " " to roughly

Just for reference, this represents approximately what we are looking for:

the first and broadest sense of " art " means " arrangement " or " to arrange . "

Notice that Moses changed “broadest” to “broader.” And after a bit of clever sleuthing by Prof. Julie, it turns out that this happened because the language model had the bigram “and broader” and not “and broadest”! So it ranked the former translation higher during phrase translation, and it found its way into the output. Obviously this is not ideal. Therefore, to increase the quality of our output, creating a better language model is a priority in our list. To get the data required for that, Prof. Julie mentioned getting the ball rolling on getting a membership to the Linguistic Data Consortium. So, hopefully in a few weeks, I can implement a “smarter” language model.

After thinking about the language model, I turned to other issues. Last week, I mentioned standardizing the way that my PPDB parsing script and source translation escapes special characters, such as single-quote, double-quote, and ampersand. That has now been fixed, which results in Moses not thinking phrases such as “"” are not in the phrase table (when they are).

That fix then got me in a position to finally develop a baseline FeatureFunction class. The Moses site has a great tutorial about it on their website, which I was able to follow along without much difficulty. It took a lot less time than I thought it would! I was able to create a feature that penalizes hypotheses with a larger number of words. This changed Moses’s output from what I had to

the first and broader sense of art is the one that has remained closer to the meaning , craft , and also from an indo-european root meaning " arrangement " or " to arrange " " " " or roughly which translates to older latin " skill

Since the translation is different, we know the feature is definitely being incorporated into the translation! After this breakthrough, now I feel like the work will require even more innovation on my part. In the upcoming weeks, I have to figure out what data is available to me, and how to best use it to create an effective quantitative measure of difficulty.

Week 9, November 2-November 6 – Maury

Today has been a productive week! I have an end-to-end solution working with Moses using the PPDB database with paraphrases incorporated. This took longer than expected. But at least, I was able to feed Moses the preprocessed first 500 complex sentences in Zhu et. al’s PWKP corpus, and get “simplifications” from Moses. Check out last week’s post for an example simplification. There are several issues I need to address, though, regarding the PPDB database. Here are some of the ones I plan to fix in the future:

  • I need to standardize the way that I preprocess the phrases in the PPDB database file and the source translation I give to Moses.
  • I need to find a way to incorporate syntactical and one-to-many paraphrases. This involves the task of assigning probabilities to identity paraphrases.
  • I need to increase the number of probabilities for each phrase simplification that I give to Moses. Right now, I am only giving Moses the probability of a source (complex) phrase given a target (simple) phrase. However, the specification for the Moses phrase table has three more probabilities that could increase the accuracy of translations. So, I would really like to find a way to incorporate this.

Unfortunately, I did not have time to investigate FeatureFunctions, like I mentioned I would do last week. But since I can actually run Moses now (despite the problems listed above), this is my main priority for this upcoming week. I hope to have an update for you in this front next week.

Week 8, October 26-October 30 – Maury

This week was definitely a coding and exploratory week for me. After I finished the literature review last week, I set my sights back on Moses. I first focused on exploring the parts of Moses that I would need to change in order to incorporate a measure of text difficulty. This involved several hours of digging through their codebase and following dependencies until I got where I need to be. By then, I had enough information from scavengering to generate the below diagram. Note that, in the diagram, the ScoreComponentCollection includes a series of classes that inherit from FeatureFunction. My objective for the next two weeks is to create a class that inherits from FeatureFunction to quantify text difficulty. Then, I can add it to the ScoreComponentCollection so that Moses can consider the measure when comparing translation hypotheses.

A subset of the classes in Moses that deal with phrase-based decoding. Note that the ScoreComponentCollection includes a series of classes that inherit from FeatureFunction. My objective for the next two weeks is to create a class that inherits from FeatureFunction to quantify text difficulty. Then, I can add it to the ScoreComponentCollection so that Moses can invoke the measure when comparing translation hypotheses.
A subset of the classes in Moses that deal with phrase-based decoding.

After discovering this, I turned my attention back to generating the phrase table from the PPDB database. In my post a few weeks ago, I was using the PPDB database of size “Small” with “All” types of (lexical, one-to-many, phrasal, syntactical) paraphrases. It turns out that this database does not have identity paraphrases of any kind. This essentially means that Moses, with the phrase table it gets, will try to change every word of a sentence, even the already-simplified ones! For example, Moses might “simplify” the sentence “This is great” to “This is laudable.” We don’t want this behavior. So what I decided to do is combine the PPDB databases with identity paraphrases (which are the databases of lexical and phrasal paraphrases). This is not ideal because this will make sentence splits and deletions almost entirely unlikely. But I thought this would be a good stopgap to get a baseline working. Eventually, though, I will have to revisit this problem, because I want to include one-to-many and syntactical paraphrases as well! Perhaps I will have to generate probabilities of my own for paraphrases that do not have identity paraphrases in PPDB already.

I do not have a working Moses with the new identity paraphrases, but I will update this post with some sample translations when I do have that working. In the meantime, next week I will work on investigating more into the “FeatureFunction” class and see if I can get Moses to at least recognize a baseline text difficulty measure (such as a measure based on the length of a word).

[UPDATE 11/10/2015] I finally got Moses working with the new identity paraphrases! I gave it this sentence as input (which you have seen before here):

this month was originally named sextilis in latin , because it was the sixth month in the ancient roman calendar , which started in march about 735 bc under romulus .

And I got this output “simplification”:

this month was initially named , because it was the sixth ancient roman timetable , which began in march about 735 bc under romulus . in the months in latin sextilis|UNK|UNK|UNK

It looks more similar to the output! Yay!