LESSWRONG
LW

All of SimonM's Comments + Replies

While I enjoy Derek Lowe, the extent to which his posts are inside-baseball and do not repeat themes, or only repeat many years apart, emphasize the contrast with Levine.

The original post also addresses this suggestion

Derivative AT a discontinuity

SimonM4mo40

Your definition of the Heaviside step function has H(0) = 1.
Your definition of L has L(0) = 1/2, so you're not really taking the derivative of the same function.

I don't really believe nonstandard analysis helps us differentiate the Heaviside step function. You have found a function that is quite a lot like the step function and shown that it has a derivative (maybe), but I would need to be convinced that all functions have the same derivative to be convinced that something meaningful is going on. (And since all your derivatives have different values, this seems like a not useful definition of a derivative)

1Alok Singh3mo

I adjusted H to use heaviside's 1/2 convention, good catch.

What is the alpha in one bit of evidence?

Answer by SimonMOct 23, 2024*128

The log-returns are not linear in the bits. (They aren't even constant for a given level of bits.)

For example, say the market is 1:1, and you have 1 bit of information: you think the odds are 1:2, then Kelly betting you will bet 1/3 of your bankroll and expect to make a ~20% log-return.

Say the market was 1:2, and you had 1 bit of information: you think the odds are 1:4, then Kelly betting, you will bet 1/5 of your bankroll and expect to make a ~27% log-return.

We've already determined that quite different returns can be obtained for the same amount of infor... (read more)

6gwern4mo

This is similar to the answer I got from o1-preview in ChatGPT when I originally asked with OP's post as the text, so that's pleasant to see. (I didn't post anything here because I was unsure and wasn't checking it in enough detail to repost, and so didn't believe in publishing it without being able to improve it.) I thought there might be some relationship at first with an appropriate transformation, but when I recalled how Kelly requires both edge and net worth, and the problem of frequency of payoffs, I lost my confidence that there would be any simple elegant relationship beyond a simple 'more information = more returns'. Why indeed would you expect 1 bit of information to be equally valuable for maximizing expected log growth in eg. both a 50:50 shot and a 1,000,000,000:1 shot? Or for a billionaire vs a bankrupt? (Another way to think of it: suppose you have 1 bit of information on both over the market and you earn the same amount. How many trades would it take before your more informed trade ever made a difference? In the first case, you quickly start earning a return and can compound that immediately; in the second case, you might live a hundred lives without ever once seeing a payoff.)

Rabin's Paradox

SimonM6mo40

When this paradox gets talked about, people rarely bring up the caveat that to make the math nice you're supposed to keep rejecting this first bet over a potentially broad range of wealth.

This is exactly the first thing I bring up when people talk about this.

But counter-caveat: you don't actually need a range of $1,000,000,000. Betting $1000 against $5000, or $1000 against $10,000, still sounds appealing, but the benefit of the winnings is squished against the ceiling of seven hundred and sixty nine utilons all the same. The logic doesn't require that

... (read more)

1Charlie Steiner6mo

Suppose you'll only reject the bet when your net worth is under $20,000, and you'll accept it above. Can you see why, if you have a utility function, it's still implied that up to $20,000, the positive dollars are worth less than 10/11 the negative dollars? And then once you have that, does it make sense that the marginal utility of money is going down (at least) exponentially up to $20,000?

Ransomware Payments Should Require a Sin Tax

SimonM7mo53

Thus, an attacker, knowing this, could only reasonably expect to demand half the amount to get paid.

Who bears the cost of a tax depends on the elasticities of supply and demand. In the case of a ransomware attack, I would expect the vast majority of the burden to fall on the victim.

2Brian Bien7mo

Right, the proposal offers no initial benefit to the next victims immediately after its implementation. Also, I agree that the inelasticity of the market for ransomware would lead to increased initial burden on the next victim, due to higher initial total payments (ransom + tax) prior to adaptation. Indeed, at any point there is a tax increase, the immediately-following victims would pay more, so perhaps a slow raising of the tax rate would be best. One assumption I made is that the attackers are already demanding their utility-maximizing amount. Since this ransom would decrease with time as all actors become aware of the existence of this tax, the benefit would be realized by the downstream effects of less funding of ransomware, and the would-be victims of the future are the real intended beneficiaries. (End of post updated for clarity on this)

Why have insurance markets succeeded where prediction markets have not?

Answer by SimonMJan 22, 202420

I wrote about exactly this recently- https://www.lesswrong.com/posts/zLnHk9udC28D34GBB/prediction-markets-aren-t-magic

Hiring decisions are not suitable for prediction markets

SimonM1y20

I don't give much weight to his diagnosis of problematic group decision mechanisms

I have quite a lot of time for it personally.

The world is dominated by a lot of large organizations that have a lot of dysfunction. Anybody over the age of 40 will just agree with me on this. I think it's pretty hard to find anybody who would disagree about that who's been around the world. Our world is full of big organizations that just make a lot of bad decisions because they find it hard to aggregate information from all the different people.

This is roughly Hanson'... (read more)

2Dagon1y

I don't disagree with the first part, but the "because" clause is somewhere between over-simple and just plain wrong. The dysfunction in large organizations (corporations and governments as primary examples) is analogous to dysfunction in individual humans, which is ALSO rampant, and seems to be more about misalignment of components than about single-powerful-executive information and decision-making.

Prediction Markets aren't Magic

SimonM1y10

So the first question is: "how much should we expect the sample mean to move?".

If the current state is $μ_{n}$ , and we see a sample of $x$ (where $x$ is going to be 0 or 1 based on whether or not we have heads or tails), then the expected change is:

$E ((μ_{n} - μ_{n + 1})^{2}) = \frac{1}{(n + 1)^{2}} E (μ_{n} - x) = \frac{1}{(n + 1)^{2}} (μ_{n}^{2} - 2 μ_{n} E (x) + E (x^{2}))$

$\dots = \frac{1}{(n + 1)^{2}} (μ_{n}^{2} - 2 μ_{n} μ_{n} + μ_{n}^{2} + σ_{x}^{2}) = \frac{σ_{x}^{2}}{(n + 1)^{2}} \sim O (\frac{1}{n^{2}})$

In these steps we are using the facts that ( $x$ is independent of the previous samples, and the distribution of $x$ is Bernoulli with $p = μ_{n}$ . (So $E (x) = μ_{n}$ and $V a r ($ ... (read more)

Prediction Markets aren't Magic

SimonM1y10

Whoops. Good catch. Fixing

Prediction Markets aren't Magic

SimonM1y10

x is the result of the (n+1)th draw sigma is the standard deviation after the first n draws pnl is the profit and loss the bettor can expect to earn

1Throwaway23671y

Then imo the n+1 factor inside the expected value should be deleted. (n+1)(1n∑ni=1ai−1n+1∑n+1i=1ai)=(1+1n)∑ni=1ai−∑n+1i=1ai=1n∑ni=1ai−an+1

2Alexander Gietelink Oldenziel1y

Thank you! I'm still not following the entire derivation if I'm honest. Would you be able to share a more detailed derivation ? :)

Prediction Markets aren't Magic

SimonM1y40

Prediction markets generate information. Information is valuable as a public good. Failure of public good provision is not a failure of prediction markets.

I think you've slightly missed my point. My claim is narrower than this. I'm saying that prediction markets have a concrete issue which means you should expect them to be less efficient at gathering data than alternatives. Even if information is a public good, it might not be worth as much as prediction markets would charge to find that information. Imagine if the cost of information via a prediction market was exponential in the cost of information gathering, that wouldn't mean the right answer is to subsidise prediction markets more.

Prediction Markets aren't Magic

SimonM1y10

If you have another suggestion for a title, I'd be happy to use it

Prediction Markets aren't Magic

SimonM1y21

Even if there is no acceptable way to share the data semi-anonymously outside of match group, the arguments for prediction markets still apply within match group. A well designed prediction market would still be a better way to distribute internal resources and rewards amongst competing data science teams within match group.

I used to think things like this, but now I disagree, and actually think it's fairly unlikely this is the case.

Internal prediction markets have tried (and failed) at multiple large organisations who made serious efforts to create them
As

... (read more)

Prediction Markets aren't Magic

SimonM1y30

Sure - but that answer doesn't explain their relative lack of success in other countries (eg the UK)

Additionally, where prediction markets work well (eg sports betting, political betting) there is a thriving offshore market catering to US customers.

Solving Two-Sided Adverse Selection with Prediction Market Matchmaking

SimonM1y*30

This post triggered me a bit, so I ended up writing one of my own.

I agree the entire thing is about how to subsidise the markets, but I think you're overestimating how good markets are as a mechanism for subsidising forecasting (in general). Specifically for your examples:

Direct subsidies are expensive relative to the alternatives (the point of my post)
Hedging doesn't apply in lots of markets, and in the ones where it does make sense those markets already exist. (Eg insurance)
New traders is a terrible idea as you say. It will work in some niches (eg where

SimonM1y31

I'm excited about the potential of conditional prediction markets to improve on them and solve two-sided adverse selection.

This applies to roughly the entire post, but I see an awful lot of magical thinking in this space. What is the actual mechanism by which you think prediction markets will solve these problems?

In order to get a good prediction from a market you need traders to put prices in the right places. This means you need to subsidise the markets. Whether or not a subsidised prediction market is going to be cheaper for the equivalent level of forecast than paying another 3rd party (as is currently the case in most of your examples) is very unclear to me

1Saul Munn1y

Thanks for the response! Could you point to some specific areas of magical thinking in the post? and/or in the space?[1] (I'm not claiming that there aren't any, I definitely think there are. I'm interested to know where I & the space are being overconfident/thinking magically, so that I/it can do less magical thinking.) The mechanism that Manifold Love uses. In section 2, I put it as "run a bunch of conditional prediction markets on a bunch of key benchmarks for potential pairs between two sides that are normally caught in adverse selection." I wrote this post to explain the actual mechanism by which I think (conditional) prediction markets might solve these problems, but I also want to note that I definitely do not think that (conditional) prediction markets will definitely for sure 100% totally completely solve these problems. I just think they have potential, and I'm excited to see people giving it a shot. I agree! In order to get a good prediction from a market, you (probably, see the footnote) need participation to be positive-sum.[3] I think there are a few ways to get this: * Direct subsidies * Since prediction markets create valuable price information, it might make sense to have those who benefit from the price information directly pay. I could imagine this pretty clearly, actually: Manifold Love could charge users for (e.g.) more than 5 matches, and some part of the fee that the user pays goes toward market subsidies. As you pointed out, paying a 3rd party is currently the case for most of my examples — matchmakers, headhunters, real estate agents, etc — so it seems like this sort of thing aligns with the norms & users' expectations. * Hedging * Some participants bet to hedge other off-market risks. These participants are likely to lose money on prediction markets, and know that ahead of time. That's because they're not betting their beliefs; they're paying the equivalent to an insurance premium. * For prediction markets generally, this see

AI #39: The Week of OpenAI

SimonM1y40

A thing Larry Summers once said that seems relevant, from Elizabeth Warren:

He said something very similar to Yanis Varoufakis (https://www.globaljustice.org.uk/blog/2017/06/when-yanis-met-prince-darkness-extract-adults-room/) and now I like to assume he goes around saying this to everyone

When and why should you use the Kelly criterion?

SimonM1y10

No, it's fairly straightforward to see this won't work

Let N be the random variable denoting the number of rounds. Let x = p*w+(1-p)*l where p is probability of winning and w=1-f+o*f, l=1-f the amounts we win or lose betting a fraction f of our wealth.

Then the value we care about is E[x^N], which is the moment generating function of X evaluated at log(x). Since our mgf is increasing as a function of x, we want to maximise x. ie our linear utility doesn't change

How to (hopefully ethically) make money off of AGI

SimonM1y10

Yes? 1/ it's not in their mandate 2/ they've never done it before (I guess you could argue the UK did for in 2022, but I'm not sure this is quite the same) 3/ it's not clear that this form of QE would have the effect you're expecting on long end yields

-1O O1y

They have bought longer term treasury bonds.

How to (hopefully ethically) make money off of AGI

SimonM1y3-1

I absolutely do not recommend shorting long-dated bonds. However, if I did want to do so a a retail investor, I would maintain a rolling short in CME treasury futures. Longest future is UB. You'd need to roll your short once every 3 months, and you'd also want to adjust the size each time, given that the changing CTD means that the same number of contracts doesn't necessarily mean the same amount of risk each expiry.

How to (hopefully ethically) make money off of AGI

SimonM1y41

Err... just so I'm clear lots of money being printed will devalue those long dated bonds even more, making the bond short an even better trade? (Or are you talking about some kind of YCC scenario?)

2Jonas V1y

I meant something like the Fed intervening to buy lots of bonds (including long-dated ones), without particularly thinking of YCC, though perhaps that's the main regime under which they might do it? Are there strong reasons to believe that the Fed wouldn't buy lots of (long-dated) bonds if interest rates increased a lot?

When and why should you use the Kelly criterion?

SimonM1y10

average returns

I think the disagreement here is on what "average" means. All-in maximises the arithmetic average return. Kelly maximises the geometric average. Which average is more relevant is equivalent to the Kelly debate though, so hard to say much more

AI #30: Dalle-3 and GPT-3.5-Instruct-Turbo

SimonM1y20

Wouldn’t You Prefer a Good Game of Chess?

I assume this was supposed to be a WarGames reference, in which case I think it should be a "nice" game of chess.

My guess for why I was wrong about US housing

SimonM2y30

Yeah, and it doesn't adjust for taxes there either. I thought this was less of an issue when comparing rents to owning though, as the same error should affect both equally.

My guess for why I was wrong about US housing

SimonM2y82

This doesn't seem to account for property taxes, which I expect would change the story quite a bit for the US.

DirectedEvolution2y102

I’d also add that female labor force participation rates will move these numbers around some. Their calculations assume all countries have 50% female participation when calculating income, when it actually varies from 11%-85% or so.

7romeostevensit2y

Indeed, it looks like property taxes average a third of US rates in the EU. Calculating the real cost of ownership over a length of time (thirty years, say) would be interesting.

Do humans still provide value in correspondence chess?

SimonM2y41

This seems needlessly narrow minded. Just because AI is better than humans doesn't make it uniformly better than humans in all subtasks of chess.

I don't know enough about the specifics that this guy is talking about (I am not an expert) but I do know that until the release of NN-based algorithms most top players were still comfortable talking about positions where the computer was mis-evaluating positions soon out of the opening.

To take another more concrete example - computers were much better than humans in 2004, and yet Peter Leko still managed to refute a computer prepared line OTB in a world championship game.

Do humans still provide value in correspondence chess?

SimonM2y40

Agreed - as I said, the most important things are compute and dilligence. Just because a large fraction of the top games are draws doesn't really say much about whether or not there is an edge being given by the humans (A large fraction of elite chess games are draws, but no-one doubts there are differences in skill level there). Really you'd want to see Jon Edward's setup vs a completely untweaked engine being administered by a novice.

-1RomanS2y

I agree. Judging by the fact that AI is strongly superhuman in chess, the only winning strategy is to completely remove the human from the loop, and instead invest in as much compute for the AI as one can afford. If it's a sequence that no superhuman AI would consider, this means that the sequence is inferior to the much better sequences that the AI would consider. It seems that even after 2 decades of the complete AI superiority, some top chess players are still imagining that they are in some ways better at chess than the AI, even if they can't win against it.

Do humans still provide value in correspondence chess?

Answer by SimonMMay 23, 202373

I believe the answer is potentially. The main things which matter in high-level correspondence chess are:

Total amount of compute available to players
Not making errors

Although I don't think either of those are really relevant. The really relevant bit is (apparently) planning:

For me, the key is planning, which computers do not do well — Petrosian-like evaluations of where pieces belong, what exchanges are needed, and what move orders are most precise within the long-term plan.

(From this interview with Jon Edwards (reigning correspondence world champion) from... (read more)

8Yair Halberstadt2y

Interesting. Note that Jon Edwards didn't win a single game there via play - he won one game because the opponent inputted the wrong move, and another because the opponent quit the tournament. All other games were draws.

On AI and Interest Rates

SimonM2y20

I agree, as I said here

GraphQL tutorial for LessWrong and Effective Altruism Forum

SimonM2y20

Just in case anyone is struggling to find the relevant bits of the the codebase, my best guess is the link for the collections folder in github is now here.

You are looking in "views.ts" eg .../collections/comments/views.ts

The best thing to search for (I found) was ".addView(" and see what fits your requirements

Log-odds are better than Probabilities

SimonM2y50

I feel in all these contexts odds are better than log-odds.

Log-odds simplifies Bayesian calculations: so does odds. (The addition becomes multiplication)

Every number is meaningful: every positive number is meaningful and the numbers are clearer. I can tell you intuitively what 4:1 or 1:4 means. I can't tell you what -2.4 means quickly, especially if I have to keep specifying a base.

Certainty is infinite: same is true for odds

Negation is the complement and 0 is neutral: Inverse is the complement and 1 is neutral. 1:1 means "I don't know" and 1:x is the inverse of x:1. Both ot these are intuitive to me.

Is Metaculus Slow to Update?

SimonM3y10

No - I think probability is the thing supposed to be a martingale, but I might be being dumb here.

652ceccf20f20130d0f8c2716521d24de3y

Just to confirm: Writing pt, the probability of A at time t, as pt=E[1A∣Ft] (here Ft is the sigma-algebra at time t), we see that pt must be a martingale via the tower rule. The log-odds xt=logpt1−pt are not martingales unless pt≡const because Itô gives us dxtdef=dlogpt1−ptItô=1pt(1−pt)dptmartingale part+12(1(1−pt)2−1p2t)d[p]tdrift part. So unless pt is continuous and of bounded variation (⇒ d[p]t=0, but this also implies that pt≡const; the integrand of the drift part only vanishes if pt≡12 for all t), the log-odds are not a martingale. Interesting analysis on log-odds might still be possible (just use dpt=pt+1−pt and d[p]t=(pt+1−pt)2 for discrete-time/jump processes as we naturally get when working with real data), but it's not obvious to me if this comes with any advantages over just working with pt directly.

Thoughts on the SPIES Forecasting Method?

Answer by SimonMMar 19, 202240

So, what do you think? Does this method seem at all promising? I'm debating with myself whether I should begin using SPIES on Metaculus or elsewhere.

I'm not super impressed tbh. I don't see "give a 90% confidence interval for x" as a question which comes up frequently? (At least in the context of eliciting forecasts and estimates from humans - it comes up quite a bit in data analysis).

For example, I don't really understand how you'd use it as a method on Metaculus. Metaculus has 2 question types - binary and continuous. For binary you have to give the prob... (read more)

2022 ACX predictions: market prices

SimonM3y40

17. Unemployment below five percent in December: 73 (Kalshi said 92% that unemployment never goes above 6%; 49 from Manifold)

I'm not sure exactly how you're converting 92% unemployment < 6% to < 5%, but I'm not entirely convinced by your methodology?

15. The Fed ends up doing more than its currently forecast three interest rate hikes: None (couldn't find any markets)

Looking at the SOFR Dec-22 3M futures 99.25/99.125 put spread on the 14-Feb, I put this probability at ~84%.

Thanks for doing this, I started doing it before I saw your competition an... (read more)

2Sam Marks3y

Thanks for this feedback! Re 17: You are right to be skeptical, because my methodology for this one was silly and ad hoc. I somewhat arbitrarily turned a 92% chance that unemployment never goes above 6% into a 80% chance that unemployment isn't above 5% in December. This is completely unprincipled, but I didn't have any better ideas, and the alternative was to ignore the Kalshi market completely and defer entirely to the 5 betters on Manifold, which seemed worse. If you have a more reasonable way of getting a number here, I'll happily defer to it. Re 15: Thanks! I'll edit that number in and point to your comment. Thanks also for the work you put into doing this last year! That post (along with Zvi's re-predictions) led to me running a small prediction contest with a handful of friends. That went well, was a lot of fun, and straightforwardly grew into me asking Scott if he wanted me and Eric to run the same thing for the ACX community. So, making up some numbers and hoping I can use Shapley values correctly, I estimate that you get 40% of the credit for this year's prediction contest happening.

Capturing Uncertainty in Prediction Markets

SimonM3y10

And one way to accomplish that would be to bet on what percentage of bets are on "uncertainty" vs. a prediction.

How do you plan on incentivising people to bet on "uncertainty"? All the ways I can think of lead to people either gaming the index, or turning uncertainty into a KBC.

Capturing Uncertainty in Prediction Markets

SimonM3y10

The market and most of the indicators you mentioned would be dominated by the 60 that placed large bets

I disagree with this. Volatility, liquidity, # predictors, spread of forecasts will all be affected by the fact that 20 people aren't willing to get involved. I'm not sure what information you think is being lost by people stepping away? (I guess the difference between "the market is wrong" and "the market is uninteresting"?)

1hawkebia3y

What is being lost is related to your intuition in the earlier comment: Without knowing how many people of the "I've studied this subject, and still don't think a reasonable prediction is possible" variety didn't participate in the market, it's very hard to place any trust in it being the "right" price. This is similar to the "pundit" problem where you are only hearing from the most opinionated people. If 60 nutritionist are on TV and writing papers saying eating fats is bad, you may try to draw the "wrong" conclusion from that.; because unknown to you, 40 nutritionists believe "we just don't know yet". And these 40 are provided no incentives to say so. Take the Russia-Kiev question on Metaculus which had a large number of participants. It hovered at 8% for a long time. If prediction markets are to be useful beyond just pure speculation, that market didn't tell me how many knowledgable people thought an opinion was simply not possible. The ontological skepticism signal is missing - people saying there is no right or wrong that "exists" - we just don't know. So be skeptical of what this market says. As for KBC - most markets allow you to change/sell your bet before the event happens; especially for longer-term events. So my guess is that this is already happening. In fact, the uncertainty index would seperate out much of the "What do other people think?" element into it's own question. For locked in markets like ACX where the suggestion is to leave your prediction blank if you don't know, imagine every question being paired with "What percentage of people will leave this prediction blank?"

Capturing Uncertainty in Prediction Markets

SimonM3y20

There are a bunch of different metrics which you could look at on a prediction market / prediction platform to gauge how "uncertain" the forecast is:

Volatility - if the forecast is moving around quite a bit, there are two reasons:
- Lots of new information arriving and people updating efficiently
- There is little conviction around "fair value" so traders can move the price with little capital
Liquidity - if the market is 49.9 / 50.1 in millions of dollars, then you can be fairly confident that 50% is the "right" price. If the market is 40 / 60 with $1 on t

... (read more)

1hawkebia3y

All these indicators are definitely useful for a market observer. And betting on these indicators would make for an interesting derivatives market - especially on higher volume questions. The issue I was referring to is that all these indicators are still only based on traders who felt certain enough to bet on the market. Say 100 people who have researched East-Asian geopolitics saw the question "Will China invade Taiwan this year?". 20 did not feel confident enough to place a bet. Of the remaining 80 people, 20 bet small amounts because of their lack of certainty. The market and most of the indicators you mentioned would be dominated by the 60 that placed large bets. A LOT of information about uncertainty would be lost. And this would have been fairly useful information about an event. The goal would be to capture the uncertainty signal of the 40 that did not place bets, or placed small bets. One way to do that would be to make "uncertainty" itself a bettable property of the question. And one way to accomplish that would be to bet on what percentage of bets are on "uncertainty" vs. a prediction.

Prediction Markets are for Outcomes Beyond Our Control

SimonM3y100

Prediction markets function best when liquidity is high, but they break completely if the liquidity exceeds the price of influencing the outcome. Prediction markets function only in situations where outcomes are expensive to influence.

There are a ton of fun examples of this failing:

Libor
"Chicken Libor"
Every sport, all the time
Option expiries (I don't have a good single link for this)

Money-generating environments vs. wealth-building environments (or "my thoughts on the stock market")

SimonM3y20

I don't know enough about how equities trade during earnings, but I do know a little about how some other products trade during data releases and while people are speaking.

In general, the vast, vast, vast majority of liquidity is withdrawn from the market before the release. There will be a few stale orders people have left by accident + a few orders left in at levels deemed ridiculously unlikely. As soon as the data is released, the fastest players will general send quotes making a (fairly wide market) around their estimate of the fair price. Over time (a... (read more)

Use Normal Predictions

SimonM3y20

I agree identifying model failure is something people can be good at (although I find people often forget to consider it). Pricing it they are usually pretty bad at.

Use Normal Predictions

SimonM3y40

I'd personally be more interested in asking someone for their 95% CI than their 68% CI, if I had to ask them for exactly one of the two. (Although it might again depend on what exactly I plain to do with this estimate.)

I'm usually much more interested in a 68% CI (or a 50% CI) than a 95% CI because:

People in general arent super calibrated, especially at the tails
You won't find out for a while how good their intervals are anyway
What happens most often is usually the main interest. (Although in some scenarios the tails are all that matters, so again, depends

... (read more)

1[comment deleted]3y

1Jan Christian Refsgaard3y

I agree with both points If you are new to continuous predictions then you should focus on the 50% Interval as it gives you most information about your calibration, If you are skilled and use for example a t-distribution then you have σ for the trunk and ν for the tail, even then few predictions should land in the tails, so most data should provide more information about how to adjust σ, than how to adjust ν Hot take: I think the focus 95% is an artifact of us focusing on p<0.05 in frequentest statistics.

Use Normal Predictions

SimonM3y30

Under what assumption?

1/ You aren't "[assuming] the errors are normally distributed". (Since a mixture of two normals isn't normal) in what you've written above.

2/ If your assumption is $X \sim N (0, 1)$ then yes, I agree the median of $X^{2}$ is ~0.45 (although

from scipy import stats
stats.chi2.ppf(.5, df=1)
>>> 0.454936

would have been an easier way to illustrate your point). I think this is actually the assumption you're making. [Which is a horrible assumption, because if it were true, you would already be perfectly calibrated].

3/ I guess ... (read more)

1Jan Christian Refsgaard3y

Our ability to talk past each other is impressive :) Yes this is almost the assumption I am making, the general point of this post is to assume that all your predictions follow a Normal distribution, with μ as "guessed" and with a σ that is different from what you guessed, and then use X2 to get a point estimate for the counterfactual σ you should have used. And as you point out if (counterfactual) σ=1 then the point estimate suggests you are well calibrated. In the post counter factual σ is ^σz

The Unreasonable Feasibility Of Playing Chess Under The Influence

SimonM3y10

I think the controversy is mostly irrelevant at this point. Leela performed comparably to Stockfish in the latest TCEC season and is based on Alpha Zero. It has most of the "romantic" properties mentioned in the post.

2MikkW3y

Not just in the latest TCEC season, they've been neck-and-neck for quite a bit now

Use Normal Predictions

SimonM3y20

That isn't a "simple" observation.

Consider an error which is 0.5 22% of the time, 1.1 78% of the time. The squared errors are 0.25 and 1.21. The median error is 1.1 > 1. (The mean squared error is 1)

1Jan Christian Refsgaard3y

Yes you are right, but under the assumption the errors are normal distributed, then I am right: If: p∼Bern(0.78)σ=p×N(0,1.1)+(p−1)N(0,0.5) Then E[σ2]≈0.37 Which is much less than 1. proof: import scipy as sp x1 = sp.stats.norm(0, 0.5).rvs(22 * 10000) x2 = sp.stats.norm(0, 1.1).rvs(78 * 10000) x12 = pd.Series(np.array(x1.tolist() + x2.tolist())) print((x12 ** 2).median())

Use Normal Predictions

SimonM3y20

Metaculus uses the cdf of the predicted distribution which is better If you have lots of predictions, my scheme gives an actionable number faster

You keep claiming this, but I don't understand why you think this

Use Normal Predictions

SimonM3y10

If you suck like me and get a prediction very close then I would probably say: that sometimes happen :) note I assume the average squared error should be 1, which means most errors are less than 1, because 0²⁺²2=2>1

I assume you're making some unspoken assumptions here, because $0^{2} + 2^{2} > 1^{2}$ is not enough to say that. A naive application of Chebyshev's inequality would just say that $E (X^{2}) = 1, E (X) = 0 \Rightarrow P (X \leq 1) \leq 1$ .

To be more concrete, if you were very weird, and either end up forecasting 0.5 s.d. or 1.1 s.d. away, (still with mean 0 and average... (read more)

1Jan Christian Refsgaard3y

I am making the simple observation that the median error is less than one because the mean squares error is one.

Use Normal Predictions

SimonM3y30

Go to your profile page. (Will be something like https://www.metaculus.com/accounts/profile/{some number}/). Then in the track record section, switch from Brier Score to "Log Score (continuous)"

Use Normal Predictions

SimonM3y60

I'd be happy to.

5Gunnar_Zarncke3y

I upvoted all comments in this thread for constructive criticism, response to it, and in the end even agreeing to review each other!

Two ominous charts on the financial markets

SimonM3y30

The 2000-2021 VIX has averaged 19.7, sp500 annualized vol 18.1.

I think you're trying to say something here like 18.1 <= 19.7, therefore VIX (and by extension) options are expensive. This is an error. I explain more in detail here, but in short you're comparing expected variance and expected volatility which aren't the same thing.

From a 2ndary source: "The mean of the realistic volatility risk premium since 2000 has been 11% of implied volatility, with a standard deviation of roughly 15%-points" from https://www.sr-sv.com/realistic-volatility-risk-premia

... (read more)

Use Normal Predictions

SimonM3y60

I still think you're missing my point.

If you're making ~20 predictions a year, you shouldn't be doing any funky math to analyse your forecasts. Just go through each one after the fact and decide whether or not the forecast was sensible with the benefit of hindsight.

I am even explaining what an normal distribution is because I do not expect my audience to know...

I think this is exactly my point, if someone doesn't know what a normal distribution is, maybe they should be looking at their forecasts in a fuzzier way than trying to back fit some model to them.

A

... (read more)

8Jan Christian Refsgaard3y

I would love you as a reviewer of my second post as there I will try to justify why I think this approach is better, you can even super dislike it before I publish if you still feel like that when I present my strongest arguments, or maybe convince me that I am wrong so I dont publish part 2 and make a partial retraction for this post :). There is a decent chance you are right as you are the stronger predictor of the two of us :)