The early part of today I spent trying to find something more specific about statistical testing for us — that is, not just the details of how the test works, but also how to implement it. Rather than start to compare the distributions, I thought it would be a good idea to first check our “assumption” of normality on the distributions.
This is functionality implemented into scipy, so that was my first step. I asked Adam to send me the z-scores for every word we want to test so we could start to see whether the distributions make sense. We already knew from the graphs that the data didn’t quite have sharp enough mean peaks for it to give us the results we wanted, so I wasn’t expecting too much. After spend some time trying to get through the complicated data structure, I ran the first normal test and hit the first snag — we don’t have enough data. After thresholding for a high enough variance, we have ended up with 4 people reading one of the text versions, and 6 people reading the other. This means distributions with 4 or 6 data points. Scipy literally does not run anything with less than 8 points, because a normality test at that level is really quite meaningless. To fix this, I asked Adam to send me everything – regardless of the variance threshold. This way we have either 10 or 12 data points, which is a little better.
With these data points, running the normality tests has been a little surprising. From what I read here, we need a large p-value in order to say that our data IS normal, and strangely I have gotten this for every word I’ve ran so far (the results are at the end of this document). My concern, however, is that the chi-squared values are all over the place, which makes me think that whatever the normality test might say, we don’t have enough values to make a reasonable judgment regarding the normality of these distributions. After finishing this normality testing, I will start doing paired t-tests and we will see how that goes!