Week 7, October 19-October 23 – Maury

As I mentioned in last week’s post, this week more of a reading and writing week for me. In particular, the main task this week was to assemble a literature review on what’s been done with text simplification, and how I am improving based off of existing approaches. I mostly focused on works achieved simplification with phrase-based machine translation techniques (including Zhu et al. (2010), Coster and Kauchak (2011), Wubben et al. (2012),  and Specia (2010)). I discussed how many of these techniques relied on using Simple Wikipedia as a basis for the simplification corpus, and I pointed out how existing works (like Xu. et al (2015)) critique this reliance on Wikipedia and have a call-to-action for using other datasets for simplification. This is where my work comes in. Like I mentioned in the first post, I am going to avoid using simplification corpuses for phrase-based translation. Instead I plan to use an extensive English paraphrase database. And I will use quantitative difficulty measures to prioritize paraphrases that perform simplifications.

After finishing the review, I did a little to investigate what parts of Moses I will need to modify to add the quantitative difficulty measures. I reckon that, at a preliminary glance, the majority of my changes will happen in the translation model folder. I plan to make this idea more concrete over the next week. Once I finish with that, I plan on modifying the paraphrase database that I created in my previous post to ensure that I am getting the best possible paraphrases.

Works Cited

