EDIT: The competition is now closed, thanks to everyone who participated! Rohin’s posterior distribution is here, and winners are in this comment.
In this competition, we (Ought) want to amplify Rohin Shah’s forecast for the question: When will a majority of AGI researchers agree with safety concerns? Rohin has provided a prior distribution based on what he currently believes, and we want others to:
- Try to update Rohin’s thinking via comments (for example, comments including reasoning, distributions, and information sources). If you don’t want your comment to be considered for the competition, label it ‘aside’
- Predict what his posterior distribution for the question will be after he has read all the comments and reasoning in this thread
The competition will close on Friday July 31st. To participate in this competition, create your prediction on Elicit, click ‘Save Snapshot to URL,’ and post the snapshot link in a comment on this post. You can provide your reasoning in the ‘Notes’ section of Elicit or in your LessWrong comment. You should have a low bar for making predictions – they don’t have to be perfect.
Here is Rohin’s prior distribution on the question. His reasoning for the prior is in this comment. Rohin spent ~30 minutes creating this distribution based on the beliefs and evidence he already has. He will spend 2-5 hours generating a posterior distribution.
Click here to create your distribution
We will award two $200 prizes, in the form of Amazon gift cards:
- Most accurate prediction: We will award $200 to the most accurate prediction of Rohin’s posterior distribution submitted through an Elicit snapshot. This will be determined by estimating KL divergence between Rohin’s final distribution and others’ distributions. If you post more than one snapshot, either your most recent snapshot or the one you identify as your final submission will be evaluated.
- Update to thinking: Rohin will rank each comment from 0 to 5 depending on how much the reasoning updated his thinking. We will randomly select one comment in proportion to how many points are assigned (so, a comment rated 5 would be 5 times more likely to receive the prize than a comment rated 1), and the poster of this comment will receive the $200 prize.
Motivation
This project is similar in spirit to amplifying epistemic spot checks and other work on scaling up individual judgment through crowdsourcing. As in these projects, we’re hoping to learn about mechanisms for delegating reasoning, this time in the forecasting domain.
The objective is to learn whether mechanisms like this could save people like Rohin work. Rohin wants to know: What would I think if I had more evidence and knew more arguments than I currently do, but still followed the sorts of reasoning principles that I'm unlikely to revise in the course of a comment thread? In real-life applications of amplified forecasting, Rohin would only evaluate the arguments in-depth and form his own posterior distribution 1 out of 10 times. 9 out of 10 times he’d just skim the key arguments and adopt the predicted posterior as his new view.
Question specification
The question is: When will a majority of AGI researchers agree with safety concerns?
Suppose that every year I (Rohin) talk to every top AI researcher about safety (I'm not explaining safety, I'm simply getting their beliefs, perhaps guiding the conversation to the safety concerns in the alignment community). After talking to X, I evaluate:
- (Yes / No) Is X's work related to AGI? (AGI safety counts)
- (Yes / No) Does X broadly understand the main concerns of the safety community?
- (Yes / No) Does X agree that there is at least one concern such that we have not yet solved it and we should not build superintelligent AGI until we do solve it?
I then compute the fraction #answers(Yes, Yes, Yes) / #answers(Yes, *, *) (i.e. the proportion of AGI-related top researchers who are aware of safety concerns and think we shouldn't build superintelligent AGI before solving them). In how many years will this fraction be >= 0.5?
For reference, if I were to run this evaluation now, I would be looking for an understanding of reward gaming, instrumental convergence, and the challenges of value learning, but would not be looking for an understanding of wireheading (because I'm not convinced it's a problem we need to worry about) or inner alignment (because the safety community hasn't converged on the importance of inner alignment).
We'll define the set of top AI researchers somewhat arbitrarily as the top 1000 AI researchers in industry by salary and the top 1000 AI researchers in academia by citation count.
If the fraction never reaches > 0.5 (e.g. before the fraction reaches 0.5, we build superintelligent AGI and it kills us all, or it is perfectly benevolent and everyone realizes there weren't any safety concerns), the question resolves as >2100.
Interpret this reasonably (e.g. a comment to the effect of "your survey will annoy everyone and so they'll be against safety" will be ignored even if true, because it's overfitting to the specific counterfactual survey proposed here and is clearly irrelevant to the spirit of the question).
Additional information
Rohin Shah is an AI Safety researcher at the Center for Human-Compatible AI (CHAI). He also publishes the Alignment Newsletter. Here is a link to his website where you can find more information about his research and views.
You are welcome to share a snapshot distribution of your own beliefs, but make sure to specify that the snapshot contains your own beliefs and not your prediction of Rohin’s beliefs (snapshots of your own beliefs will not be evaluated for the competition).
my (quickly-made) snapshot: https://elicit.ought.org/builder/dmtz3sNSY
one conceptual contribution I'd put forward for consideration is whether this question may more about emotions or social equilibria than about reaching a reasoned intellectual consensus. it's worth considering how a relatively proximate/homogenous group of people tends to change its beliefs. for better or worse, everything from viscerally compelling demonstrations of safety problems to social pressure to coercion or top-down influence to the transition from intellectual to grounded/felt risk should be part of the model of change -- alongside rational, lucid, considered debate tied to deeper understanding or the truth of the matter. the demonstration doesn't actually have to be a compelling demonstration of risks to be a compelling illustration of them (imagine a really compelling VR experience, as a trivial example).
maybe the term I'd use is 'belief cascades', and I might point to a rapid shift towards office closures during early COVID as an example of this. the tipping point arrived sooner than some expected, not due to considered updates in beliefs about risk or the utility of closures (the evidence had been there for a while), but rather from a cascade of fear, a noisy consensus that not acting/thinking in alignment with the perceived consensus ('this is a real concern') would lead to social censure, etc.
in short, this might happen sooner, more suddenly, and for stranger reasons than I think the prior distribution implies.
NB the point about a newly unveiled population of researchers in my first bin might stretch the definition of 'top AI researchers' in the question specification, but I believe it's in line with the spirit of the question