A very interesting problem is measuring something like general intelligence. I’m not going to delve deeply into this topic but simply want to draw attention to an idea that is often implied, though rarely expressed, in the framing of such a problem: the assumption that an "intelligence level," whatever it may be, corresponds to some inherent properties of a person and can be measured through their manifestations. Moreover, we often talk about measurements with a precision of a few percentage points, which suggests that, in theory, the measurement should be based on very stable indicators.
What’s fascinating is that this assumption receives very little scrutiny, while in cases where we talk about "mechanical" parameters of the human body (such as physical performance), we know that such parameters, aside from a person’s potential, heavily depends on numerous external factors and what that person has been doing over the past couple of weeks.
Do you know about the g-factor?
A very interesting problem is measuring something like general intelligence. I’m not going to delve deeply into this topic but simply want to draw attention to an idea that is often implied, though rarely expressed, in the framing of such a problem: the assumption that an "intelligence level," whatever it may be, corresponds to some inherent properties of a person and can be measured through their manifestations. Moreover, we often talk about measurements with a precision of a few percentage points, which suggests that, in theory, the measurement should be based on very stable indicators.
What’s fascinating is that this assumption receives very little scrutiny, while in cases where we talk about "mechanical" parameters of the human body (such as physical performance), we know that such parameters, aside from a person’s potential, heavily depends on numerous external factors and what that person has been doing over the past couple of weeks.
In this text, I will discuss my experience not with measuring my intelligence level but with measuring my intellectual performance level under various circumstances.
Why Could This Be Useful?
There could be several practical benefits from this:
Several Requirements to the Method for Measuring Intellectual Performance
After some thought, I arrived at the following requirements for a measurement method:
I would like to point out that traditional intelligence tests, such as IQ tests or psychometric assessments, clearly fail to meet at least requirements 2, 3, 5, and 6. As for requirement 4, while it’s less obvious, these tests are not designed to account for fluctuations in mental state. This ties back to the point I made at the beginning of this text: for the concept of an absolute intelligence level to make sense, we are forced to assume that this parameter represents a stable, inherent trait of a person, largely unaffected by situational factors.
Regarding requirement 1, a brief explanation is necessary since it is crucial. I am not concerned with measuring an absolute level of intelligence — essentially, a parameter that allows for the comparison of different individuals. What interests me is comparing the intellectual performance of the same person in different mental states and environments, not an absolute value but rather a deviation from their baseline performance when in a "neutral" state.
My Choice
The measurable parameter I settled on is the Elo rating from solving tactical chess puzzles.
For those readers who are not familiar with this topic, here is a short explanation of how it works on Lichess, the platform I use. The puzzles themselves are positions, mostly taken from real games. The task is to find a sequence of several (usually a small amount, 2-5) moves that lead one of the parties to a significant advantage or victory. The rating system works similarly to the Elo rating system used in chess matches. Each puzzle is assigned a rating based on its difficulty, and players solving puzzles have their own puzzle-solving rating, which reflects their skill level in this particular activity. The puzzle rating is adjusted according to the rating of the players who solve them successfully or fail. The players' rating is adjusted according to the rating of the puzzles they solve or fail. The puzzles for each player are selected according to the current rating of that player and a short history of changes in that rating.
Let's see how the properties of this measurement method relate to the requirements that were proposed in the previous section.
So, all the requirements, except for requirement 6th, are relatively well met.
Now, regarding requirement 6. This is, of course, the most questionable, given the very nature of tactical chess puzzles: if solving them didn’t help improve chess skills, and if chess skills didn’t translate into puzzle-solving ability, there wouldn’t be any point in doing them for chess players (and they think it's a useful activity). So it’s reasonable to expect that my chosen method might be flawed due to its inconsistency with this particular requirement.
However, here’s an idea. Each person, not just in chess but in any activity, has a certain maximum level of skill that is achievable with a certain type and amount of practice/training. In any activity, you can’t make significant, long-term progress without changing your approach to practice and/or your level of commitment. And that’s enough for my purpose. If you take someone who isn’t a particularly avid chess player or puzzle solver, and who doesn’t have a strong desire to improve, this method should work fine for them almost right away. Theoretically, it should also work for someone who has been passionate about chess for many years and has reached a plateau skill level relative to their comfortable investments in improving. But based on indirect and imprecise evidence (literally, from watching chess streamers), I suspect that such people have some issues with requirement 3. In any case, I’m not one of them, and most likely neither are most people reading this. For those who are, I would venture to guess that everything I’m discussing here could be applied to Go tactical puzzles without any changes.
That said, just in case, I looked into how I could improve my puzzle-solving skills and deliberately avoided those methods. For me, these would be long sessions focused on solving puzzles in narrow categories. Lichess offers this option, and once I noticed the issue, I avoided experimenting with it further.
How I Used It
I simply registered on Lichess and started solving puzzles. For about the first week, I watched my rating fluctuate and learned a little along the way. It probably worked to my advantage that I’m already fairly old and not particularly adept at learning through simple practice. After a week, my rating stabilized around the level where it remains today, three years and approximately 25,000 solved puzzles later. During this first week, I made an effort to log into the app in a state I subjectively perceived as "I’m okay". This is likely an important factor.
Once I noticed my rating had stabilized, I began experimenting, measuring my performance under significantly different conditions. Clearly, a rigorous, evidence-based study wasn’t possible here, so I decided to follow the high standards of Victorian-era amateur science: apply a change and observe its effects, without worrying about study blindness, placebo control, or even how previous experiments might affect the initial conditions for future ones — none of those luxuries were feasible.
For a time, I tried to record the results in numbers. Unfortunately, it turned out that the results weren’t reproducible with enough precision to make the numbers meaningful. However, the trends were consistent. If there were at least five copies of me, this could be addressed, but since there aren’t even two of me, I had to abandon such ambitions. That’s essentially it: if you’re interested in the effect of a particular factor on your overall intellectual performance, you can simply test it by checking, several times for reliability, how that factor affects changes in your Elo rating when solving puzzles.
What Kind of Results Can Be Obtained
Overall, this method works well for testing stimuli that are controlled and have relatively quick effects. It’s possible to determine whether such stimuli influence intellectual performance and, if so, to what extent. Sometimes, through self-observation, you can uncover additional interesting details.
Here are examples of results I’ve confidently obtained about myself:
One more observation, which I don’t know if it’s possible to use to my advantage, but it relates to maintaining the purity of the data: my formal rating noticeably increases when, while solving puzzles, I verbally (out loud, through the mouth, and not in an internal monologue) describe my thought process out loud (similar to the "rubber ducky" method, though not quite the same).
What Should Have Been Done, But I Didn't
As an Afterword, Two Final Thoughts:
Perhaps a good proof that my approach works would be to show what benefit I got from the results. But whether I’ve truly benefited from the information I’ve gathered is a tough question. Unfortunately, the real problems I face in everyday life — which, if solved successfully, lead to tangible benefits — are very diverse, demand different kinds of intellectual engagement, and are burdened by ever-changing external constraints. They don’t really lend themselves to measurable results. I believe I am able to use the information I’ve gathered, but I recognize my bias here. I’d say that this entire text serves as a stronger argument for the practical value of this method than my personal testimony does. It’s worth trying.
I have no idea. I don’t have a second person who’s done research like mine, so I can’t compare my results with anyone else’s. I suspect that some effects — like those of caffeine — are fairly universal. But others — like the effects of different types of physical activity — are likely influenced heavily by personal history.
And that's all. Thank you for your attention.