As I mentioned in last week’s post, this week more of a reading and writing week for me. In particular, the main task this week was to assemble a literature review on what’s been done with text simplification, and how I am improving based off of existing approaches. I mostly focused on works achieved simplification with phrase-based machine translation techniques (including Zhu et al. (2010), Coster and Kauchak (2011), Wubben et al. (2012), and Specia (2010)). I discussed how many of these techniques relied on using Simple Wikipedia as a basis for the simplification corpus, and I pointed out how existing works (like Xu. et al (2015)) critique this reliance on Wikipedia and have a call-to-action for using other datasets for simplification. This is where my work comes in. Like I mentioned in the first post, I am going to avoid using simplification corpuses for phrase-based translation. Instead I plan to use an extensive English paraphrase database. And I will use quantitative difficulty measures to prioritize paraphrases that perform simplifications.
After finishing the review, I did a little to investigate what parts of Moses I will need to modify to add the quantitative difficulty measures. I reckon that, at a preliminary glance, the majority of my changes will happen in the translation model folder. I plan to make this idea more concrete over the next week. Once I finish with that, I plan on modifying the paraphrase database that I created in my previous post to ensure that I am getting the best possible paraphrases.
- Delphine Bernhard Zhu, Zhemin and Iryna Gurevych. 2010. A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd international conference on computational linguistics, pages 1353–1361. Association for Computational Linguistics.
- William Coster and David Kauchak. 2011. Learning to simplify sentences using wikipedia. In Proceedings of the Workshop on Monolingual Text-To-Text Generation, MTTG ’11, pages 1–9.
- Antal Van Den Bosch Wubben, Sander and Emiel Krah- mer. 2012. Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 1015–1024. Association for Computational Linguistics.
- Lucia Specia. 2010. Translating from complex to sim- plified sentences. In Computational Processing of the Portuguese Language, pages 30–39. Springer.
- Chris Callison-Burch Xu, Wei and Courtney Napoles. 2015. Problems in current text simplification research: New data can help. Transactions of the Association for Computational Linguistics, 3:283–297.