After completing the Spring 2016 semester research with Professor Medero (filled with much less blogging, compared to Fall 2015), I am back for a 10-week summer research stint. The hope is that by the end of this period, I will have some meaningful contribution to the text simplification field that I can write about in some form of an academic paper. I expect my meaningful contribution to be something along the lines of creating a system that adapts text simplification depending on the reading audience. But that may change in the coming weeks.
Today has just been filled with administrative details, as well as familiarizing myself with the evolving state of phrase-based text simplification research. I found a few articles that I thought were enlightening, so I’ll just highlight them here:
- Pavlick et. al from the University of Pennsylvania recently came out with a publication “Simple PPDB: A Paraphrase Database for Simplification.” It outlines the construction of a subset of PPDB that identifies phrase pairs where one is a simplification of the other. I found the creation of this database interesting, because it reveals another way to approach the problem of finding phrase pairs where one is a simplification of the other in PPDB. We could just train a classifier to do so.
- Xu et. al from the University of Pennsylvania came out with a publication “Optimizing Statistical Machine Translation for Text Simplification”. This article does a lot of what Professor Medero and I were looking to do. The article discards with the common approach of extracting phrase pairs from a sentence-aligned corpus of “complex” English sentences to “simple” English sentences. Instead, it uses PPDB as the phrase table, and relies on simplification-specific features during the decoding process.
- Naples et. al from the University of Pennsylvania came out with the publication “Sentential Paraphrasing as Black-Box Machine Translation.” This basically describes the release of a software package that individuals can tune and use to rewrite sentences for whatever reason. It has some prepackaged use cases (including text simplification), so I think I may poke at the software package in the near future.
- Klerke et. al came out with the article “Improving sentence compression by learning to predict gaze.” This article improves the sentence compression task (a subset of text simplification) by training a classifier that predicts how much users gaze as certain phrases in the sentence (and, by extension, how complex those phrases are). This article catapulted me into thinking about ways that I could know that text is difficult to read for someone. It was also a pretty cool, well-written article.
- Elmira Tapkanova compiled a survey of the text simplification field in her thesis “Machine Translation and Text Simplification Evaluation.”
Tomorrow, I plan to continue my background research and see if I can dig up any more relevant articles on text simplification.