Most of my work today was, as expected, related to writing the paper. I did the small mechanical things first – set up the GitHub repo on my own machine so I could update it easily, anonymized the paper, adjusted a few formatting things. I also went through and commented out sections of the paper that we discussed as being probably not necessary. After this, I spent most of the day working on the results section, rewriting it to clarify our process, and putting in the table of relevant data. It’s probably about time to get to work on the discussion section more properly, as it is the only section I haven’t tackled much at all. After meeting with Prof Medero, I mostly spent my time trying to create a logo for the app/project. I fixed the poster’s formatting somewhat, and gave the titles colors to catch the eye a little more. I wanted to add colored borders to the boxes, but the text boxes themselves are too close to the text, and adding the border looks cramped. I may try adding in empty squares and see if it looks appealing. Besides that, with some kind of a logo present, I think the poster is in good shape.
Today’s work was entirely based on the paper for NAACL. I started by putting in Adam’s write-ups for the Data Analysis and Results sections, and spent some time fixing the formatting after that. Then I got to work on the Introduction and Background sections, which I have been avoiding for ages. I reduced the range of topics I wanted to cover in the Introduction, since on reflection it seems to me not everything is relevant… eye-tracking seems to be most pertinent, so I’ve started writing up that part. I ended up coming Intro and Background, but if it becomes too unwieldy I may undo that later. At the moment, I think I am reasonably close to finishing the introduction, and have mostly been working on weeding out the unnecessary things to make more room for other citations we might need. A couple of mechanical things I need to remember to do are:
- Set up a git remote to upload images that TeX file needs to compile
- Update repo with latest version of paper (also needs remote)
- Anonymize the paper
Today’s work was much the same as yesterday’s – I started the morning by fixing the issues in the poster from yesterday, on a content and formatting level. I haven’t really spent enough time trying to make it look more attractive, mostly because I find this hard to do without making it look tacky, but I will definitely get to it soon. The rest of the time I worked on the paper and also did some statistical analysis. As Adam wrote the Results section, we thought it would be good to have some indication of the statistical significance of the results we were showing, so I had to spend a couple of hours fixing my code to work with the new data. Instead of word by word comparisons, we now wanted a passage to passage comparison, so I had to account for that functionality too. Generally our results are not ideal, but we have at least 3 good numbers which we will add to the paper soon.
Adam’s Results section needs to be added to the TeX file, which is probably what I will start with on Monday. I’m going to do something about the logo on the poster, and except for that I think it has to be mostly writing the other sections of the paper. I tried to make sure I had Adam’s input before writing it all up, and I think I have a reasonable idea of what is left to write.
After having completed a draft of the poster yesterday, I spent today trying to tackle the rest of the paper. In the morning, I went through the sections I had already written and worked on refining them some more, and continued outlining the sections I hadn’t yet started working on. Fortunately, reasonably early on in the day Adam finished commenting his code, so I handed off the Data Analysis section to him. Besides that, we have the Results, Summary, and Future Plans sections left, not to mention the Introduction. I think it makes most sense for Adam and I to do those three sections together, but I think I can tackle the Introduction by myself, so I’ve started working on that. There really isn’t much else to say, since it’s just the drafting process… I think tomorrow I will start by changing the poster according to the pointers Prof Medero gave earlier today, just to introduce some variety into all my TeXing. Then I hope to finish off the Introduction tomorrow, and figure out how we want to present our data. I don’t think the paper draft will be complete tomorrow, but I think we will have enough of a foundation that I can continue working on it by myself next week.
I started the morning by trying to work on the paper a little more. Although I felt like I did a large part of the body yesterday, in retrospect it seems like there’s a lot more missing than there is done. This is probably because I didn’t outline the results/summary section very well, not having any idea (at the time) what that would look like. So I spent some time trying to form that outline, and I think that besides the detail required for Data Analysis, the rest of the sections can be properly outlined and partially written up tomorrow.
After getting tired of TeXing, though, I decided to work on the poster instead. That was a lot more fun, and I modeled the layout a little bit on my poster from last year. There really isn’t anything very exciting to say here, since it was all the same information written up for the presentation and the paper. The challenge was condensing it, making the prose clear, and laying out the poster in a semi-aesthetically appealing way. I also had to learn how to cite PhD dissertations, which I still don’t think I’ve gotten entirely right, since I couldn’t find an accession number. Besides minor fixes to the bibliography section, I think the poster is pretty much done. So I’ll get back to the paper tomorrow.
We spent maybe an hour this morning preparing for and giving our final presentation. I don’t think there was too much left to do; Adam and I had practiced the day before and we successfully improvised whatever we hadn’t thought of. I think the presentation went well, if a little long, perhaps also due to not starting at exactly half past. Besides the presentation, I spent a little time in the morning trying to decide what the poster should look like, since that is something we should have pretty much down before the end of this week. It seems to me that we would be ok covering the same content as we did in our presentation, but with a lot more fleshing out, since the presentation was much more supplemented by our oral explanation. I intend to spend some time on that tomorrow, especially if I get tired of TeXing, which I can imagine happening.
The second half of the day I spent entirely working on our final report in TeX, using the NAACL style file to format it. I began by putting in only section headers so we have a skeleton, and placed brief summaries of the sections I didn’t want to write just yet — the introduction and the background sections in particular, which maybe should not be separate sections at all. I had a pretty reasonable outline for the body of the paper, so I wrote up two sections today — the App and the Experiment, which talks about app implementation, features, the experimental logistics, and the text pairs. I’m a little reluctant to tackle Results and Data Analysis, so I was thinking of writing Future Directions first. Then perhaps Results, and if TeXing gets too much, working on the poster!
We started the day today by working on our presentation for tomorrow (after getting back from Discovery Day). I don’t think there was very much to do, since I had pretty much finished up the slides, but the kinds of data analysis being done has changed a lot since I wrote them. Adam changed that, I fixed some small things, and we did a trial run. It came to around 11 minutes, at which point Adam reminded me that we need a results section now that we actually have results! So we added in a slide with the results that we do have — successes such as the few statistically significant differences, actually being able to capture such fine data, and so on. We also added a few reasons that we think explain why our data is not as good as we hoped, which I think paves the way for future work.
After this I spent most of my time doing some statistical analysis. I was pretty confused by how many of our distributions seemed to be normal, so I went through and ran all of them (which definitely took a while), to equally surprising results. Only two of our 36 distributions are apparently not normal. I’m still confused, but I think more and more that these results are just a consequence of the small sample size. Having finished this, I did some research on implemented paired t-tests in SciPy, and found two types: the “related” and the “independent”. Running the paired produced the problem of not having equal sized arrays, which was (as Prof Medero later explained) because the related distributions are supposed to be data sets that can be paired up. The independent test, which I only ran for a couple of pairs, suggests that most of them are not different distributions in a statistically significant manner. However, as Adam changes his analysis, I can always run pairs that look visually to be more useful.
I then did some documentation things — created a GitHub repo, put some initial useful documents in there, and moved all my Google Docs stuff to Evernote. Tomorrow, after the presentation, my intention is to work on the poster and the paper, and run some statistical tests if necessary.