Today was a really productive day for me today. I spoke to Prof. Medero today about my musing over the past week, and I have gathered some key insights. Two days ago, I was thinking about two questions. And I think I have the answer to my first question.
My first question was what reading audience I wanted to focus my simplifications on. And I have now decided that I want to focus on adapting simplifications for second-language English learners. One reason for doing this is because Prof. Medero mentioned that they were one of the (abundant) populations to most easily get data for. Also, second-language English learners, to my current knowledge, have not been specifically targeted as an audience for text simplification systems. And finally, I have familial ties to second-language English learners (my parents), and I feel deep conviction to advance the state of research for the sake of anyone in their shoes.
With that figured out, my task now becomes: how can we best simplify text for this population of people? How can we make it adaptive to how much English each person has learned? These are the questions that my research efforts will now focus on. One avenue that I see myself exploring is:
- Figuring out the patterns that are used to simplify text for this population (e.g., sentence splits, fewer parts of speech, etc.). I can do this by examining text written by this population to see what constructs are uncommon (compared to a typical English corpus) and thus difficult for this population. I can also do this by examining the findings in other papers. Petersen and Ostendorf have an excellent article about this here.
- Creating a classifier that predicts if text is complex enough to need to be simplified in some way.
- Performing the simplifications, using statistical machine translation.
I realize that’s a lot to do in 9 weeks (especially step 3). But is one path I see to getting what I want.