Today has been a productive week! I have an end-to-end solution working with Moses using the PPDB database with paraphrases incorporated. This took longer than expected. But at least, I was able to feed Moses the preprocessed first 500 complex sentences in Zhu et. al’s PWKP corpus, and get “simplifications” from Moses. Check out last week’s post for an example simplification. There are several issues I need to address, though, regarding the PPDB database. Here are some of the ones I plan to fix in the future:
- I need to standardize the way that I preprocess the phrases in the PPDB database file and the source translation I give to Moses.
- I need to find a way to incorporate syntactical and one-to-many paraphrases. This involves the task of assigning probabilities to identity paraphrases.
- I need to increase the number of probabilities for each phrase simplification that I give to Moses. Right now, I am only giving Moses the probability of a source (complex) phrase given a target (simple) phrase. However, the specification for the Moses phrase table has three more probabilities that could increase the accuracy of translations. So, I would really like to find a way to incorporate this.
Unfortunately, I did not have time to investigate FeatureFunctions, like I mentioned I would do last week. But since I can actually run Moses now (despite the problems listed above), this is my main priority for this upcoming week. I hope to have an update for you in this front next week.