Day 30, July 2 – Vidushi

I think I achieved a lot today, in that all the issues I knew I had to fix (a pretty long list) did in fact get fixed. I finished rewriting the program so that it was functional. Then I added labels for points, as well as the labels that indicate what kind of effect that point had on the acceleration. Unfortunately, this showed me that something must be pretty wrong. To give just one example, point that is higher than both of its neighbors should be a “spike”, but clearly my classification system was not working:

Screen Shot 2015-07-02 at 1.48.48 PM

Clearly, here, the point at 27.6 seconds is not a dip at all. Looking at this graph also gave me an idea for what the problem might be. Since I was using a dictionary to store the classification of each word type, every word that appeared more than once was having its word type overwritten. This can be seen in the “the (dip)” entry that appears twice in the graph above. To fix this, I ditched the dictionary and changed all of my dictionary data structures to be arrays instead. That is, a dictionary of the form { key1: value1, key2: value2 } became [ (key1, value1), (key2, value2) ]. As can be imagined, changing something so fundamental in a large program meant I had to go back and change a lot of things, such as how I was accessing my data, printing it out, and of course storing it. This was admittedly really frustrating for a part of my day, but in the end less of an overhaul than I had thought. (I still need to update the comments, though, which is another big task.)

Doing all this fixed some problems, but not the one I originally wanted to fix. I still had misclassified points all over my graph, and spent a while trying to figure out what the problem could be — without knowing where to start. This graph gave me another idea:

Screen Shot 2015-07-02 at 3.24.06 PM

It looks like the same word is appearing twice, which can happen in many cases, but I know for a fact it does not happen in this text. This, compounded with the fact that the points should form a passage (since this graph is of time) but don’t, makes me think I at least know where the problem is: in the “dictionary” of words and accelerations. This is a massive function and I’m not entirely sure where to start looking, especially since the data sets are huge. Print statements are kind of unhelpful because the arrays are just so big. Therefore I first intend to write some testing functions that will check certain things for me — such as whether my texts, times and accelerations actually match up. If this keeps giving me trouble, though, I may just postpone it and work on the second study’s texts first, since I feel that is getting more and more pressing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s