kqr

Quant, systems thinker, anarchist.

I write at https://entropicthoughts.com

My inbox is lw[at]xkqr.org

Wiki Contributions

Comments

Sorted by
Answer by kqr21

Many of the existing answers seem to confuse model and reality.

In terms of practical prediction of reality, it would be a mistake to emit a 0 or 1, always, because there's always that one-in-a-billion chance that our information is wrong – however vivid it seems at the time. Even if you have secretly looked at the hidden coin and seen clearly that it landed on heads, 99.999 % is a more accurate forecast than 100 %. It could have landed on aardvarks and masqueraded as heads, however unlikely, that is a possibility. Or you confabulated the memory of seeing the coin from a different coin you saw a week ago – also not so likely, but happens. Or you mistook tails for heads – presumably happens every now and then.

When it comes to models, though, probabilities of 0 and 1 show up all the time. Getting a 7 when tossing a d6 with the standard dice model simply does not happen, by construction. Adding two and three and getting five under regular field arithmetic happens every time. We can argue whether the language of probability is really the right tool for those types of questions, but taking a non-normative stance, it is reasonable for someone to ask those questions phrased in terms of probabilities, and then the answers would be 0 % and 100 % respectively.

These probabilities also show up in limits and arguments of general tendency. When a coin is tossed, the probability of getting only tails is 0 % as long as you keep tossing whenever you get tails. In a random walk, the probability of eventually crossing the origin is 100 %. When throwing a d6 for long enough, the mean value will end up within the range 3-4 with probability 100 %.

These latter two paragraphs describe things that apply only to our models, not to reality, but they can serve as a useful mental shortcut as long as one is careful about applying them blindly.

kqr30

This analysis suffers from a fairly clear confounder: since you are basing the data on which days you actually listened to music, there might be a common antecedent that both improves your mood and causes you to listen to music. As a silly example, maybe you love shopping for jeans, and clothing stores tend to play music, so your mood will, on average, be better on the days you hear music for this reason alone.

An intention-to-treat approach where you make the random booleans the explainatory variable would be better, as in less biased and suffer less from confounding. It would also give you less statistical power, but such is the cost of avoiding false conclusions. You may need to run the experiment for longer to counterbalance.

It appears that listening to music, in the short-term: [...] makes earworms play in my mind for slightly less of the time

Whenever I suffer from an earworm, my solution has for a long time been to just play and listen to that song once, sometimes twice. For some reason, this satisfies my brain and it drops it. Still counter-intuitive, but you might want to try it.


On a completely separate note:

Both response variables were queried by surprise, 0 to 23 times per day (median 6), constrained by convenience.

How was this accomplished, technically? I've long wanted to do similar things but never bothered to look up a good way of doing it.

Answer by kqr01

If Q, then anything follows. (By the Principle of Explosion, a false statement implies anything.) For example, Q implies that I will win $1 billion.

I'm not sure even this is the case.

Maybe there's a more sophisticsted version of this argument, but at this level, we only know the implication Q=>$1M is true, not that $1M is true. If Q is false, the implication being true says nothing about $1M.

But more generally, I agree there's no meaningful difference. I'm in the de Finetti school of probability in that I think it only and always expresses our personal lack of knowledge of facts.

kqr11

Thanks everyone. I had a great time!

kqr10

The AI forecaster is able to consistently outperform the crowd forecast on a sufficiently large number of randomly selected questions on a high-quality forecasting platform

 

Seeing how the crowd forecast routinely performs at a superhuman level itself, isn't it an unfairly high bar to clear? Not invalidating the rest of your arguments – the methodological problems you point out are really bad – but before asking the question about superhuman performance it makes a lot of sense to fully agree on what superhuman performance really is.

(I also note that a high-quality forecasting platform suffers from self-selection by unusually enthusiastic forecasters, bringing up the bar further. However, I don't believe this to be an actual problem because if someone is claiming "performance on par with humans" I would expect that to mean "enthusiastic humans".)