Fun with statistics about book reading

So, you all know I'm busy analyzing y'all, and I thought this was interesting. 


I've read numerous times over the last few years, that men aren't reading as much as women (and by extension, that boys aren't reading as much as girls). In fact I saw someone posting a whole lot of pretty graphs from a romance reader survey just a couple of days ago. And I'm calling bullhockey on it. And a lot of other internet statistics you see parroted around saying X% of readers say Y!


tl;dr version: Surveys are tricksy things, and seeing only "averages" reported can often be quite misleading.


One of the questions I asked on my survey was "how many books do you read a year". Another was gender. Put the two together and you get the following mean values: 


Men, mean number of books read per year: 61

Women, mean number of books read per year: 127


Seems clear right? But it's totally not. Statistically, there is a better than 10% chance that is a completely random result, given this particular set of data. 


Why? Because the actual numbers are not even close to the whole story. The standard deviation for men is +-39, while the standard deviation for women is a whopping 106. I'm not giving a whole statistics class here, but generally a little over 2/3 of all answers are within one std dev of the mean. 


Which means the real numbers are:

Men: About 2/3 of all men read something between 22 and 100 books a year.

Women: About 2/3 of all women read something between 21 and 233 books a year.


Well that suddenly looks a bit more even doesn't it. Means are easily affected by one or two really extreme outliers, so all it takes is one woman to read 500 books, and no men to read more than a couple hundred, and voila. Or vice versa.


You can go further, and say pretty much all readers (I think it's 99.7%) are within two standard deviations of the mean. Which gets you:

Nearly all men read between 0 (because you can't read -17 books) and 139 books a year, and nearly all women read between 0 (again, you can't read -85 books) and 339 books a year.


Which I think you'll agree, pretty much doesn't tell us anything at all.


But wait! There's more!

Like all voluntary surveys, mine suffers from something called self-selection bias. A lot of people don't take surveys, and there's a good body of (real) research that says, men simply don't take as many surveys as women, at least not voluntarily. And that was true here too: for every one man who took the survey, five women did. Which means there were 5 times as many chances for someone to be that outlier who reads 500 books a year. It also means, because the population is self-selected, and it doesn't look much like the real population of the world, I can't draw any really solid conclusions about "everyone". I can draw only some conclusions about populations that look like mine (which might be, heavy readers, who use social book sites, and most of whom write reviews as well as reading them.) And even then, only the subset of that population who also don't mind filling in surveys.


On top of that, there's self-reporting bias too. Many people have no idea how many books they read a year, and took wild guesses. How do I know? Because they told me so, in the "leave a comment" section at the end. 


Finally, SPSS (the beastly nasty huge behemoth of a statistics processor I'm using) actually does some massive calculations, and tells me exactly how reliable this result is, based on a whole lot of... well statistics. Which is where I got the 10% figure I gave you up above, that the difference in mean could be found by random.


In any case, I could quite easily make a blog post here, and say "Data PROVES women read twice as much as men". And worded just like that, it's pretty much true. And it's still pretty much bullhockey and doesn't actually mean anything+. Truthfully I would (and will be) reporting this as "no statistically significant difference was found in the number of books read by gender."



(Actual mathematicians, and I know there are a few of you out there - if I got anything wrong in this post, feel free to yell at me and I'll correct it :)