All of tenthkrige's Comments + Replies

Good points well made. I'm not sure what you mean by "my expected log score is maximized" (and would like to know), but in any case it's probably your average world rather than your median world that does it?

2Molly
Figure 1 is clumsy, sorry. In the case of a smooth probability distribution of infinite worlds, I think the median and the average world are the same? But in practice, yes, it's an expected value calculation, summing P(world) * P(U|world) for all the worlds you've thought about.

Very interesting!

From eyeballing the graphs, it looks like the average Brier score is barely below 0.25. This indicates that GPT-4 is better than a dart-throwing monkey (i.e. predicting a random %age, score of 0.33), and barely better than chance (always predicting 50%, score of 0.25).

It would be interesting to see the decompositions for those two naive strategies for that set of questions, and compare to the sub-scores GPT-4 got.

You could also check if GPT-4 is significantly better than chance.

3Matthew Barnett
Fixed.

people who are focused on providing—and incentivized to provide—estimates of the expected number of cases

Can you say more about this? Would users forecast a single number? Would they get scored on how close their number is to the actual number? Could they give confidence intervals?

2rossry
I don't know. (As above, "When [users] tell you exactly what they think is wrong and how to fix it, they are almost always wrong.") A scoring rule that's proper in linear space (as you say, "scored on how close their number is to the actual number") would accomplish this -- either for scoring point estimates, or distributions. I don't think it's possible to extract an expected value from a confidence interval that covers orders of magnitude, so I expect that would work less well.

I think that's how I'd use this as well.

I don't think that solves the problem though. There are a lot of people, and many of them believe very unlikely models. Any model we (lesswrong-ish) people spend time discussing is going to be vastly more likely than a randomly selected human-thought-about model. I realise this is getting close to reference class tennis, sorry.

1Jotto999
I had little hope of solving much in this domain! But a base rate that is way off is still useful to me for some discussions.  What you're pointing to might offer some way to eliminate a lot of irrelevant n, or gouge probability away from them.  So with respect to discussions within smart circles, maybe the base rate ends up being much higher than 1/5million.  Maybe it's more like 1/10,000, or even higher.  I'm not a stickler, I'd take 1/1,000, if it lets certain individuals in these circles realize they have updated upward on a specific metaphysical idea way more strongly than they could reasonably.  That it's an obvious overconfidence to have updated all the way to 50% chance on a specific one that happens to be popular in smart circles at the time.

Cool idea. Any model we actually spend time talking about is going to be vastly above the base rate, though. Because most human-considered models are very nonsensical/unlikely.

2Jotto999
In hindsight I should've specified a time limit.  Someone pointed out to me that if something taxonomically included in "human" continued living for a very long time, then that thing could "consider" an indefinite number of ideas.  Maybe I should've said "that anyone considers up until the year 3k" or something.

At first I was dubious about the framing of a "shifting" n-dimensional landscape, because in a sense the landscape is fixed in 2n dimensions (I think?), but you've convinced me this is a useful tool to think about/discuss these issues. Thanks for writing this!

Epistemic status: gross over-simplification, and based on what I remember from reading this 6 months ago.

This paper resolved many quesitons I had left with MWI. Relevantly here, I think it argues that the number of worlds doesn't grow because there was already an infinity of them through space.

Observing an experiment is then equivalent to locating yourself in space. Worlds splitting is the process where identical regions of the universe become different.

2fin
Thanks, this is useful.

The scoring system incentivizes predicting your true credence, (gory details here).

I think Metaculus rewarding participation is one of the reasons it has participation. Metaculus can discriminate good predictors from bad predictors because it has their track record (I agree this is not the same as discriminating good/bad predictions). This info is incorporated in the Metaculus prediction, which is hidden by default, but you can unlock with on-site fake currency.

I think Metaculus rewarding participation is one of the reasons it has participation.

PredictionBook also had participation while being public about people's Brier's scores. I think the main reason Metaculus has more activity is that it has good curated questions.

There's also no reason to only have a single public metric. Being able to achieve something like the Superforcaster status on the Good Judgement Project would be valuable to motivate some people.

You could also check their track record. It has a calibration curve and much more.

This feels related to Policy Debates Should Not Appear One-Sided: anything that's obvious does not even enter into consideration, so you only have difficult choices to make.

Don't you mean that it will damage the institutions built on intellectual dark matter? Did I miss something?

2ChristianKl
In Samo Burja model all functioning institutions are partly build on intellectual dark matter.

This was interesting. I think I missed an assumption somewhere, because for , it seems that the penalty is , which seems very low for a -degree polynomial fitted on points.

3michael_h
Good point. Thank you for bringing this up. I just had a closer look in my notes at how the complexity penalty is derived and there is a additional assumption that I left out. The derivation uses a matrix X with p+1 columns and n rows which has entry xj−1i in the ith row and jth column (where x1,x2,…,xn is the training set). In the derivation it is assumed that X has rank p+1 which true most of the time provided that n≥p+1. For simplicity I won't add a mention of this matrix to original post but I will add the assumption n≥p+1.

I almost gave up halfway through, for much the same reasons, but this somehow felt important, the way some sequences/codex posts felt important at the time, so I powered through. I definitely will need a second pass on some of the large inferential steps, but overall this felt long-term valuable.

I find this kind of post very valuable, thank you for writing it so well.

I see someone who seems to see part of the world the same way I do, and I go “can we talk? can we be buds? can we be twinsies? are we on the same team?” and then I realize “oh, no, outside of this tiny little area, they…really don’t agree with me at all. Dammit.”

That rang very close to home, choked me up a little bit. But the good sad, where you put clean socks on and go make the world less terrible.

I'd never thought about it clearly, so thanks for this model.

A behavior I've observed (and participated in) that you don't mention: the group can temporarily splinter. Picture 6 people. Someone explores topic A. Two other people like the new topic. The other 3 listen politely for 1-2 minutes. One of the three bored people explores topic B, addressing a bored neighbor (identified by their silence). The third bored person latches on to them. Then both conversations evolve until one dies down or a leader forcibly merges the two.

(By forcibly merge, I mean: whi

... (read more)

Content feedback : the inferential distance between Löb's theorem and spurious counterfactuals seems larger than that of the other points. Maybe that's because I haven't internalised the theorem, not being a logician and all.

Unnecessary nitpick: the gears in the robot's brain would turn just fine as drawn: since the outer gears are both turning anticlockwise, the inner gear would just turn clockwise. (I think my inner engineer is showing)

Very neat tool, thanks for the conciseness of the explanation. Though I hope I won't have to measure 70° temperatures by hand any time soon. (I know, I know, it's in Fahrenheit, but it still sounds... dissonant ? to my european ears)

Well that's a mindset I don't encounter often irl. Do you estimate you're a central example in your country / culture ?

Tell me if this gets too personal, but do defectors evoke positive emotions? (Because they lower societal expectations?) Or negative emotions? (i.e. you have a sweet spot of cooperation and dislike deviations from it?)

9simple_name
If they have similar attitudes to mine, then the feelings are slightly positive, possibly because of receiving validation for my own behaviour. On the other hand, if the defectors are doing worse things, the feelings are fully negative, I don't think there is any effect as you suggest. To put things more concretely, I try not to do anything harmful but also don't do anything that helps society (charity, activism, environmental stuff, etc.) unless I get some concrete benefit. When someone does defect in the way of being actively harmful or breaking laws, then my emotions are negative as I said, but interestingly not as strong as in the case of activists. Perhaps because such behaviour feels normal and expected from other people, or just because it doesn't feel as much like a threat to me personally. So I would say that your second suggestion is correct in my case, I do have a sweet spot of cooperation (basically what I do and feel is justified) and dislike deviations from that, with heavier weight on the "more cooperation" direction.

I'm not sure the authority has to be malevolent, it could be incompetent (or something).

So: [authority / authority-wielders are my enemies / outgroup] & [collaborators side with rules / rulemakers / authority] => collaborators are my outgroup => I punish them

This seems to predict that people who distrust authority more will punish cooperators more.

The bottom half of the punishment graph does seem to be places where I would distrust authority more.

4Martin Sustrik
The original study has something to say about ingroups/outgroups. It's not exactly the same thing as the one we are discussing here but still:

I'm surprised nobody proposed : "This person is promoting a social norm more stringent than my current behavior, I'll whack him.". What's wrong with it ? Sure in this case the social norm is actually beneficial to the whacker, but we're adaptation-executers, not fitness-maximizers.

FWIW I first read this post before this comment was written, then happened to think about it again today and had this idea, and came here to post it.

I do think it's a dangerous fallacy to assume mutually-altruistic equilibria are optimal--'I take care of me, you take care of you' is sometimes more efficient than 'you take care of me, I take care of you'.

Maybe someone needs to study whether Western countries ever exhibit "antisocial cooperation," that is, an equilibrium of enforced public contributions in an "ineffici... (read more)

I'm from Eastern Europe and have this tendency. I've been quite curious about why for example any kind of activism evokes negative emotions and I think at least in my case the answer seems to be what you're proposing here. The prevalent attitude in society is to free-ride as much as you can and I'm also doing that. To answer the question from the beginning of the post, if we just let other people make cooperation the new norm, then I'll be expected to cooperate too. I want to keep not caring about society, so I guess the actions of cooperators cash out emotionally as a threat to the status quo that I want to preserve.

Just a nitpick, from one non-native English speaker (to another ?), I have been told that the word "retard" is extremely offensive (in american English at least). I'd say up to you to decide if that was your intended effect.

3Martin Sustrik
Not a native speaker. I wanted it to be offensive but not to the extent where you would have to kill the offender and whole his family to restore the honor. Changed to "moron".

Some sports players are pretty smart and probably some governors aren't. What about ( Reality TV celebrities ( heads of state of UNSC countries ( Physicists / Mathematicians / Engineers ) ) ).

(1 minute of thought did not provide another group of famous & not-even-a-little-bit-selected-for-intelligence people, unless there's a database of lottery winners, which I doubt. Curious for suggestions.)

(Famous engineers: of course Wikipedia does not disappoint.)

Which you also can't know if you don't test other fields. I think there are at least 3 concentric levels to distinguish : ( famous ( intelligent ( STEM ) ) ).

1Bucky
So potentially ( Sports players ( Literature laureates / governors ( Physicists / Mathematicians ) ) ) ?
1Pattern
That's a good point, which applies to both this and the prior post. The reason 'Nobel' Laureates are easy is probably the fame component.

Pretty good. I've updated weakly toward "it's okay to locally redefine accepted terms". [Meta : I didn't find the transitions from object level to meta level very intelligible, and I think the 'notable' facts deserve some example to ground the the whole thing if this is to be more than a quick idea-dump].

I have also taken the survey.

I'm colorblind. I have color cones in my eyes, but the red ones are mutated towards higher wavelengths (i.e. green). This makes red-green-brown, blue-purple and grey-pink hard to distinguish.

As a result, I pay quite a lot of attention to colors and shades in everyday life. I don't trust my eyes and often test my perceptions against other people's ("Hey, is that shirt green or yellow?"). To the point that I actually discern more shades than most people. I'm sometimes wrong about their names, but I see shades other people don't notice, e.g. me: &q... (read more)