I can see three potentially important differences between your methods and those of the original study you linked that found such strong results.
One is duration. They say their session were 2.5 hours each. Yours tended to be 5-30 minutes at a time. I can imagine a few mechanisms that might make this impact the results:
The other is CO2 levels. Here's a screenshot of a histogram of your CO2 levels:
You have a fair number of data points in the original study's low-medium range. But their median "high" range was 2,496 ppm, and you barely have any data in that range.
Finally, their task was multifaceted, whereas yours was very narrow. Here's the table of their decision-making measures and scores:
Notice that focused activity improved with elevated CO2, and information search was roughly unaltered. I don't want to make an argument about which variables your word game maps onto most neatly, but it seems possible that you might have gotten different results with a more robust measure.
Add in questions about the precision of your air monitor (which does, I hasten to add, seem to cluster by time nicely in the way you'd expect if it were at least somewhat accurate), confounding factors - maybe you chose to stop playing when you felt yourself doing poorly, for example - and the lack of blinding in your study, and I'm just not ready to trust your results over theirs.
That's not to say you should necessarily trust theirs, though! I just think that if I'd run your experiment on myself, I would probably not put very much weight in the evidence. Instead, I would focus more on integrating additional evidence from the published literature, and update whatever prior that established a little bit to account for these results.
My first thought was:
Did you keep a record of time?
(I'm not sure offhand how to adjust for say, improving ability over time, which seems like it could be a factor over the course of 800 games.)
After finishing reading it, I wondered 'how much variation was there in score over the course of the attempts'? It is a game of chance, after all.
To rescue the hypothesis that it matters, you’d either have to find that it affects other people more than it does me (why would it?)
It seems unlikely, but a skeptic could say:
*This seems somewhat sensible. Do you get cancer immediately after the first time you smoke? (Also, I haven't read that study, but if researchers went to different places, took a reading of air quality, then measured how people performed in different environments that experiment participants were in long term, then air quality might have powerful but slow, over time effects on cognition.)
**This would require you to be very good at predicting how well you will do, or 'enthusiasm playing games' to have a stronger effect on performance than air quality within the thresholds. On the up side the test for this is...a randomized trial. (Oh no, however will anyone be able to stand such a terribly unethical form of experimentation? Forced to play a game, over and over again, 800 times!)
***Is continuous statistics a thing, or is variation just small, so a regression need only applied? (I'm really curious about how long you have to be in a place with low air quality for it to take effect as well (if that is how it works), which involves measurements well in advance.)
I am very sensitive to (lack of) fresh air. It's as if my IQ drops with the levels of oxygen, for example when I am giving a lecture, I suddenly can't find the right words... then I notice the closed windows and open them... and after a while I can continue talking as usual.
Worse, at company meetings, where the windows often cannot be open (or sometimes the room has no windows at all), I lose the ability to follow the argument, and if it goes on for too long, I become sleepy. I repeatedly got into a trouble because of this. People tell me that from outside it looks as if I am high or drunk. If I say it is my reaction to lack of oxygen (perhaps it would be more precise to say too much CO2), it seems like I am making up stuff, because people agree that the air was "bad, but not that bad", and of course I have no official diagnosis for this.
At home, except for winter, my windows are always half-open. When I give a lecture somewhere, I learned to always check the windows first. At work, this was mostly out of my control (because open offices and air conditioning -- oh, how much I hate them), but work from home saved me.
explore which skills do or don’t suffer under carbon dioxide poisoning
I didn't do any experiments with this, but given that the last step is "I become sleepy", I would expect that ultimately all skills will suffer, except maybe for repetitive mechanical activities (such as coding, LOL).
[cross-posted from my blog Astral Codex Ten]
In 2012, a Berkeley team found that indoor carbon dioxide had dramatic negative effects on cognition (paper, popular article). Subjects in poorly ventilated environments did up to 50% worse on a test of reasoning and decision-making. This is potentially pretty important, because lots of office buildings (and private houses) count as poorly-ventilated environments, so a lot of decision-making might be happening while severely impaired.
Since then people have debated this on and off, with some studies confirming the effect and others failing to find it. I personally am skeptical, partly because the effect is so big I would expect someone to have noticed, but also because submarines, spaceships, etc have orders of magnitude more carbon dioxide than any civilian environment, but people still seem to do pretty hard work in them pretty effectively.
As part of my continuing effort to test this theory in my own life, I played a word game eight hundred times under varying ventilation conditions.
…okay, fine, no, I admit it, I played a word game eight hundred times because I’m addicted to it. But since I was playing the word game eight hundred times anyway, I varied the ventilation conditions to see what would happen.
The game was WordTwist, which you can find here (warning: potentially addictive). You get a 5x5 square of letters and you have to find as many words as possible (of four letters or more) within three minutes. You can move up, down, right, left, or diagonal, and get more points for harder words. A typical board looks like this:
I played this game about 5-10x/day over three months. During this time, the carbon dioxide monitor in my room recorded levels between 445 ppm (with all windows open and the fan on) and 3208 ppm (with all windows closed and several people crammed into the room for several hours). I discounted a stray reading of 285 as an outlier, since this is climatologically impossible (I’m not claiming my monitor is perfectly calibrated, just that it clearly shows higher levels when my room is less well ventilated). CO2 445 is basically the same as outdoors; 3208 is considered extremely poor air quality, likely to cause headaches, nausea, and other minor ailments. The Berkeley study looked at levels between 600 and 2500, so my range was comparable to theirs.
I correlated my adjusted score (my score as a percent of the average score for that board) for each game with the CO2 level in my room when I was playing it. R was 0.001, p = 0.97 - there was absolutely no correlation.
Why might these results not be valid? Well, CO2 level in my room wasn’t randomly determined - I just played a game when I felt like it and recorded whatever the ambient CO2 level was at the time. CO2 level was lower if I had the window open or air conditioning on, higher if I’d been in the room for a long time, and highest if I’d just woken up after being asleep in the room all night. It was also higher when other people were in my room. In theory things like this could confound the results. For example, if CO2 really did affect performance, but I performed better when I was hot, then turning the air conditioning on might improve performance (by decreasing CO2) but also hurt performance (by making it colder), and those effects could cancel out. Or if I performed worse after exercise, and I often went out of my room to exercise, then I might perform worse when I had just come back into my room (which was often when CO2 was lowest).
In practice I’m skeptical this mattered. For one thing, the studies found huge positive effects - so for me to find zero effect would require a huge negative effect of the exact right size to cancel out the huge positive one. For another thing, I checked if temperature had any effect, and it didn’t (r = -0.008, p = 0.83). For another, I ran a few controlled experiments to see if they got the same results as the naturalistic ones, and they did. For another, I did get to test an exogenous shock - about halfway through the experiment, I moved to a new house with better ventilation. The difference in average CO2 reading between the old and new houses was significant (p < 0.001), but the difference in score wasn’t (p = 0.15). Although it was in the expected direction (new house > old), I attribute this to me improving on the word game with practice, and I didn’t improve any more during the month when I switched houses than in an average month.
I consider this to be very strong evidence that at least for me, on this specific task, carbon dioxide has zero effect on cognition. To rescue the hypothesis that it matters, you’d either have to find that it affects other people more than it does me (why would it?) or that it affects other aspects of cognition more than it affects the skills associated with this particular word game. This second one is moderately plausible - I don’t think the word game tests “decision-making” per se. But it would be surprising for this not to be a general health effect, and would potentially be important in the study of intelligence and neuroscience to explore which skills do or don’t suffer under carbon dioxide poisoning.
I was excited to read the Less Wrong post Chess and cheap ways to check day to day variance in cognition by KPier, who does something similar with chess instead of a word game; they haven’t checked carbon dioxide levels yet, but I’d be excited for them to try. I’m also interested in hearing from anyone else who often repeats some objectively-scoreable cognitive task, to see how they do. A CO2 monitor costs about $100 on Amazon, but if money is the only reason you’re not going to do some really good experiment, please let me know and I’ll buy it for you.
If you’re planning on testing this, please post about it below as a form of preregistration.
EDIT: You can download the original data here, some explanations of what the columns mean here.