From last week, my attention has shifted from literature review to getting the logistics of this “research class” laid out. As such, I do not have much of an update this week. So, I am working on a proposal for my research project throughout the semester. I will have a draft of the proposal to Prof. Julie by next week, and I will receive her critique and revise as necessary. Hopefully, once I finish the proposal, I can post what I have so you all can see a formal description of what I plan to work on. After that, I can delve back into the literature review and write up something formal regarding what I have found.
In other news, I tried to get an instance of Box (which I talked about in last week’s post) working on my Amazon AWS account. But unfortunately, I am getting a weird error. Over the last three days, I have failed in creating an EC2 instance using the Box image. I contacted the creator of Box, so I hope to receive a reply and get everything working by next week. If that does not work, I will look into setting up my own instance of Moses.
This week, I got more of a sense of what the area of text simplification research looks like. I read a particularly great summary of the state of text simplification research as of 2014 by Advaith Siddharthan, from the University of Aberdeen. His paper offers a good survey of how automatic text simplification systems have evolved since the late 1990s. In particular, he went into detail on contemporary “Text simplification as monolingual machine translation” (section 3.2.2), which is exactly what I want to do; I want to treat text simplification as a monolingual (one language, English) machine translation problem. Siddharthan talked about how researchers such as David Kauchak (a professor at nearby Pomona College!), use phrase-based machine translation systems to accomplish their work–mainly Moses. I explored David Kauchak’s paper, Learning to Simplify Sentences using Wikipedia, and saw that his work mirrors a lot of what Prof. Julie and I talked about. Unfortunately, I don’t have much information on automatic text simplification with the emphasis on measuring difficulty (which is my focus), so I still have some research ahead of me. I’m going to ask Prof. Julie whether composing some sort of “literature review” would be a good idea for this project, and when that would be due.
Alongside doing some preliminary research, I’m also becoming familiar with software used for computational linguistics. In particular, I’m reading about Moses, a statistical machine translation system. Prof. Julie says that it’s the software package she had in mind to test our theory, so we’ll be using it throughout the semester. She also mentioned Box, a machine translation research platform available on AWS. Next week, I’m going to try playing around with it, and see if I can get an instance running. Fingers crossed!
I’m Maury Quijada, a Senior at Harvey Mudd College. And for this semester of Fall 2015 (September 1 to December 11), I will be working closely with Professor Julie Medero (Prof. Julie) on a problem she’s identified that I am really excited to work on: automatic text simplification that incorporates measures of text difficulty.
A bit more background on Prof. Julie’s familiarity with this topic: Her exploration on this topic began when she was writing her dissertation. The topic she honed in on was quantifying how hard it is for a particular person to read a body of text. Both then and now, Prof. Julie thinks that quantifying the complexity of a body of text has useful applications, particularly in text simplification. And she’s having students from Harvey Mudd work in this realm. With regard to the quantification of text complexity, Prof. Julie had a research group in Summer 2015 that works on a new approach involving people reading text on an iPad. Their progress was actually tracked on this blog. And with regard to the applications of these measures, this is what I will work on throughout this next semester.
Over the next few months I plan to use measures of text difficulty to improve automatic text simplification. I’m going to focus particularly on simplification using statistical machine translation, where “normal English” is the foreign language and “simplified English” is the target language. I want to see if and how a machine translator could successfully use difficulty as a measure when we need to compare possible text simplifications. And my progress will be documented through weekly updates on Tuesday!
This week does not have much going on for me. Prof. Julie and I met and discussed expectations, a bit more background about the project, and discussed some starting tasks. I’ve also been asked to read a couple of articles, and summarize my findings. Expect more of an update next week!