Some information:
I won't be responding to this thread until after the competition ends. If you're not sure about something in the question, assume what (you think) I would assume. Feel free to argue for a specific interpretation in a comment if you think it's underspecified.
In any realistic plan that ends with "and then we deployed the AI knowing we had mitigated risk X", it seems to me like we need prestigious AI researchers to have some amount of consensus that X was actually a real risk. If you want a company to use a safer technique, the key decision-makers at the company will need to believe that it's necessary in order to "pay the alignment tax", and their opinions will be shaped by the higher-up AI researchers at the company. If you want a government to regulate AI, you need to convince the government that X is a real risk, and the government will probably defer to prestigious AI experts.
So it seems like an important variable is whether the AI community agrees that X is a real risk. (Also obviously whether in reality X is a real risk, but I'm assuming that it is for now.) It may be fine if it's only the AGI researchers -- if a company knows it can build AGI, it probably ignores the opinions of people who think we can't build AGI.
Hence, this question. While an answer to the explicit question would be interesting to me, I'm more excited to have better models of what influences the answer to the question.
Copied over from the Google Doc linked above. Written during the 30 minutes I had to come up with a prior.
Seems very unlikely that if I ran this survey now the fraction would be 0.5. Let's give that 0.1%, which I can effectively ignore.
My timelines for "roughly human-level reasoning" have a median of let's say 20 years; it seems likely we get significant warning shots a few years before human-level reasoning, that lead to increased interest in safety. It's not crazy that we get a warning shot in the next year from e.g. GPT-3, though I'd be pretty surprised, call it ~1%.
My model for how we get to consensus is that there's a warning shot, or some excellent paper, or a series of these sorts of things, that increases the prominence of safety concerns, especially among "influencers", then this percolates to the general AI community over the next few years. I do think it can percolate quite quickly, see e.g. the fairly large effects of Concrete Problems in AI safety or how quickly fairness + bias became mainstream. (That's fewer examples than I'd like...) So let's expect 1 - 10 years after the first warning shot. Given median timelines of 20 years + warning shot shortly before then + 1-10 years to reach consensus, probably median for this question should also be ~20 years? (Note even if we build human-level reasoning before a majority is reached, the question could still resolve positively after that, since human-level reasoning != AI researchers are out of a job)
There's also a decent chance that we don't get a significant warning shot before superintelligent AI (maybe 10% e.g. via fast takeoff or good safety techniques that don't scale to AGI), or that not enough people update on it / consensus building takes forever / the population I chose just doesn't pay attention to safety for some reason (maybe another 15%), so let's say 25% that it never happens. Also another ~10% for AGI / warning shots not happening before 2100, or the safety community disappearing before 2100, etc. So that's 35% on >2100 (which includes never).
So 25% for never leaves 75% for "at some point" of which I argued the median is 20 years, so I should have ~35% from now till 20 years, and 30% from 20 years till 2100 (since it's 10% on >2100 but not never).
Then I played around with Elicit until I got a distribution that fit these constraints and looked vaguely lognormal.
Rohin has created his posterior distribution! Key differences from his prior are at the bounds:
Overall, Rohin’s posterior is a bit more optimistic than his prior and more uncertain.
Ethan Perez’s snapshot wins the prize for the most accurate prediction of Rohin's posterior. Ethan kept a similar distribution shape while decreasing the probability >2100 less than the other submissions.
The prize for a comment that updated Rohin’s thinking goes to Jacob Pfau! This was determined by a draw with comments weighted proportionally to how much they updated Rohin’s thinking.
Thanks to everyone who participated and congratulations to the winners! Feel free to continue making comments and distributions, and sharing any feedback you have on this competition.
Cool! I feel like I should go into more detail on how I made the posterior prediction then - I just predicted relative increases/decreases in probability for each probability bucket in Rohin's prior:
Then I just let Elicit renormalize the probabilities.
I guess this process incorporates the "meta-prior" than Rohin won't change his prior much, and then I estimated the relative increase/decrease margins based on the number and upvotes of comments. E.g., there were a lot of highly voted comments that Rohin should increase his probability in the <2022 range, so I predicted a larger change.
My old prediction for when the fraction be >= 0.5: elicited
My old prediction for Rohin's posterior: elicited
I went through the top 20 list of most cited AI researchers on google scholar (thanks to Amanda for linking), and estimated that roughly 9 of them may qualify under Rohin's criterion. Of those 9, my guess was that 7/9 would answer 'Yes' on Rohin's question 3.
My sampling process was certainly biased. For one, AI researchers are likely to be more safety conscious than industry experts. My estimate also involved considerable guesswork, so I down-weighted the estimated 7/9 to a 65% chance that the >=0.5 threshold will be met within the first couple years. Given the extreme difference between my distribution and the others posted, I guess there's a 1/3 chance that my estimate based on the top 20 sampling will carry significant weight in Rohin's posterior.
The justification for the rest of my distribution is similar to what others have said here and elsewhere about AI safety. My AGI timeline is roughly in line with the metaculus estimate here. Before the advent of AGI, a number of eventualities are possible: a warning shot occurs, perhaps theoretical consensus will emerge, perhaps industry researchers will be oblivious to safety concerns because of a principal-agent nature to the problem, perhaps AGI will be invented before safety is worked out, etc.
Edit: One could certainly do a better job of estimating where the sample population of researchers currently stands by finding a less biased population. Maybe people interviewed by Lex Fridman, that might be a decent proxy for AGI-research-fame?
The above estimate was mislead since I had mistakenly read ' I then compute the fraction #answers(Yes, Yes, Yes) / #answers(Yes, *, *) ' as ' I then compute the fraction #answers(Yes, Yes, Yes) / #answers(Yes, Yes, *)'.
I agree with Ethan's recent comment that experience with RL matter a lot, so a lot comes down to how the ' Is X's work related to AGI? ' criterion is cashed out. On some reading of this, many NLP researchers do not count, on another reading they do count. I'd say my previous prediction was a decent, if slightly over-estimate of the scenario in which 'related to AGI' is interpreted narrowly, and many NLP researchers are ruled out.
A second major confounder is whether prominent AI researchers are far more likely to have been asked about their opinion on AI safety in which case they have some impetus to go read up on the issue.
To cash some of these concerns out into probabilities:
75% that Rohin takes a broad interpretation of AGI which includes e.g. GPT-team, NAS research etc.
33% estimated (Yes,Yes,Yes) by assuming prominent researchers 2x as likely to have read up on AI safety.
25% downweighted from 33% taking into account industry being less concerned.
Assuming that we're at ~33% now, 50% doesn't seem too far out of reach, so my estimates for following decades are based on the same concerns I listed in my above comment framed with the 33% in mind.
Updated personal distribution: elicited
Updated Rohin's posterior: elicited
(Post competition footnote: seems to me over short time horizons we should have a more-or-less geometric distribution. Think of the more-or-less independent per year chance that a NeurIPS keynote features AI safety, or youtube recommender algorithm goes bonkers for a bit. Seems strange to me that some other people's distribution over the next 10-15 years -- if not longer -- do not look geometric.)
I do take the broad interpretation of AGI-related work.
I hadn't considered the point that people may ask prominent AI researchers their opinion about AI safety, and that leading them to have better beliefs about safety. I think overall I don't actually expect this to be a major factor, but it's a good point and updated me slightly towards sooner.
I wouldn't expect a geometric distribution -- consensus building requires time, as a result you might expect a buildup from 0 for <time taken to build consensus> and then have it follow a geometric distribution. In addition, getting to 50% seems likely to require a warning shot of some significance; current AI systems don't seem capable enough to produce a compelling enough warning shot.
I think 1% in the next year and a half is significantly too low.
Firstly, conditioning on AGI researchers makes a pretty big difference. It rules out most mainstream AI researchers, including many of the most prominent ones who get the most media coverage. So I suspect your gut feeling about what people would say isn't taking this sufficiently into account.
Secondly, I think attributing ignorance to the outgroup is a pretty common fallacy, so you should be careful of that. I think a clear majority of AGI researchers are probably familiar with the concept of reward gaming by now, and could talk coherently about AGIs reward gaming, or manipulating humans. Maybe they couldn't give very concrete disaster scenarios, but neither can many of us.
And thirdly, once you get agreement that there are problems, you basically get "we should fix the problems first" for free. I model most AGI researchers as thinking that AGI is far enough away that we can figure out practical ways to prevent these things, like better protocols for giving feedback. So they'll agree that we should do that first, because they think that it'll happen automatically anyway.
In a similar vein to this, I found several resources that make me think it should be higher than 1% currently and in the next 1.5 years:
This is relevant, but I tend to think this sort of evidence isn't really getting at what I want. My main reaction is one that you already said:
Obviously this isn't an indication that they understand or agree with safety concerns, but directionally suggests people are concerned and thinking about this.
I think many people have a general prior of "we should be careful with wildly important technologies", and so will say things like "safety is important" and "AGI might be bad", without having much of an understanding of why.
Also, I don't expect the specific populations surveyed in those two sources to overlap much with "top AI researchers" as defined in the question, though I have low confidence in that claim.
These seem sensible comments to me, I had similar thoughts about current understanding of things like reward gaming. I’d be curious to see your snapshot?
surprisingly powerful demonstration soon could change things too, 1% seems low. look at how quickly views can change about things like it's just the flu, current wave of updating from gpt3 (among certain communities), etc
my (quickly-made) snapshot: https://elicit.ought.org/builder/dmtz3sNSY
one conceptual contribution I'd put forward for consideration is whether this question may more about emotions or social equilibria than about reaching a reasoned intellectual consensus. it's worth considering how a relatively proximate/homogenous group of people tends to change its beliefs. for better or worse, everything from viscerally compelling demonstrations of safety problems to social pressure to coercion or top-down influence to the transition from intellectual to grounded/felt risk should be part of the model of change -- alongside rational, lucid, considered debate tied to deeper understanding or the truth of the matter. the demonstration doesn't actually have to be a compelling demonstration of risks to be a compelling illustration of them (imagine a really compelling VR experience, as a trivial example).
maybe the term I'd use is 'belief cascades', and I might point to a rapid shift towards office closures during early COVID as an example of this. the tipping point arrived sooner than some expected, not due to considered updates in beliefs about risk or the utility of closures (the evidence had been there for a while), but rather from a cascade of fear, a noisy consensus that not acting/thinking in alignment with the perceived consensus ('this is a real concern') would lead to social censure, etc.
in short, this might happen sooner, more suddenly, and for stranger reasons than I think the prior distribution implies.
NB the point about a newly unveiled population of researchers in my first bin might stretch the definition of 'top AI researchers' in the question specification, but I believe it's in line with the spirit of the question
+1 for the general idea of belief cascades. This is an important point, though I had already considered it. When I said "percolates to the general AI community over the next few years" I wasn't imagining that this would happen via reasoned intellectual discourse, I was more imagining compelling demonstrations (which may or may not be well-connected to the actual reasons for worry).
I think a clear majority of AGI researchers are probably familiar with the concept of reward gaming by now, and could talk coherently about AGIs reward gaming, or manipulating humans.
Seems plausible, but I specifically asked for reward gaming, instrumental convergence, and the challenges of value learning. (I'm fine with not having concrete disaster scenarios.) Do you still think this is that plausible?
And thirdly, once you get agreement that there are problems, you basically get "we should fix the problems first" for free.
I agree that Q2 is more of a blocker than Q3, though I am less optimistic than you seem to be.
Overall I updated towards slightly sooner based on your comment and Beth's comment below (given that both of you interact with more AGI researchers than I do), but not that much, because I'm not sure whether you were looking at just reward gaming or all three conditions I laid out, and most of the other considerations were ones I had thought about and it's not obvious how to update on an argument of the form "I think <already-considered consideration>, therefore you should update in this direction". It would have been easier to update on "I think <already-considered consideration>, therefore the absolute probability in the next N years is X%".
In my experience, a highly predictive feature of "agreeing with safety concerns" is how much someone thinks about/works on RL (or decision-making more broadly). For example, many scientists in RL-driven labs (DeepMind, OpenAI) agree with safety concerns, while there is almost no understanding of safety concerns (let alone agreement that they are valid) among researchers in NLP (mainly driven by supervised learning and unsupervised learning); it's easier to intuitively motivate safety concerns from the perspective of RL and demos of where it fails (esp. with concerns like reward gaming and instrumental convergence). Thus, a useful way to decompose Rohin's question is:
We can add the numbers from 1.1 and 2.1 and divide by the total number of top AGI researchers (here, 2000).
I'd argue that more researchers are in category 2 (non-RL researchers) than 1 (RL researchers). A lot of recent progress towards AGI has been driven by improvements in representation learning, supervised learning, and unsupervised learning. Work in these areas is AGI-related in the relevant sense to Rohin; if we develop AGI without RL (e.g., GPT-10), we'd need the non-RL researchers who develop these models to coordinate and agree with safety concerns about e.g. releasing the models. I think it will continue to be the case for the foreseeable future that >50% of AGI-related researchers aren't doing RL, as representation learning, supervised learning, unsupervised learning, etc. all seem quite important to developing AGI.
The upshot (downshot?) of the above is that we'll probably need a good chunk of non-RL but AGI-related researchers to agree with safety concerns. Folks in this group seem less receptive to safety concerns (probably because they don't obviously come up as often as in RL work), so I think it'll take a pretty intuitively compelling demonstration/warning-shot to convince people in the non-RL group, visible enough to reach across several subfields in ML; preferably, these demos should be of direct relevance to the work in people doing non-RL work (i.e., directly showing how NLP systems are Stuart-Russell-style dangerous to convince NLP people). I think we'll need pretty advanced systems to get these kinds of demos, roughly +/-5 years from when we get AGI (vs. Rohin's prior estimate of 1-10 years before AGI). So overall, I think Rohin's posterior should be shifted right ~5 years.
Here is my Elicit snapshot of what I think Rohin's posterior will be after updating on all comments here. It seems like all the other comments are more optimistic than Rohin prior, so I predict that Rohin's posterior will become more optimistic, even though I think the concerns I've outlined above outweigh/override some of the considerations in the other comments. In particular, I think you'll get an overestimate of "agreement on safety concerns" by looking only at the NeurIPS/ICML community which is pretty RL-heavy relative to e.g. the NLP and Computer Vision communities (which will still face AGI-related coordination problems). The same can be said about researchers who explicitly self-identify with "Artificial Intelligence" or "Artificial General Intelligence" (historically focused on decision-making and games). Looking at the 100 most cited NLP researchers on Google Scholar, I found one who I could recognize as probably sympathetic to safety concerns (Wojciech Zaremba) and similar for Computer Vision.
Thanks, this is an important point I hadn't considered before.
I don't buy that the entire distribution should be shifted right 5 years -- presumably as time goes on it will be more clear what subfields of AI are relevant to AGI, we'll have better models of how AGI will be built, and safety arguments will be more tailored to the right subfields. I do buy that it should reduce the probability I assign in the near future (e.g. the next 5-10 years).
My snapshot. I put 2% more mass on the next 2 years and 7% more mass on 2023-2032. My reasoning:
1. 50% is a low bar.
2. They just need to understand and endorse AI Safety concerns. They don't need to act on them.
3. There will be lots of public discussion about AI Safety in the next 12 years.
4. Younger researchers seem more likely to have AI Safety concerns. AI is a young field. (OTOH, it's possible that lots of the top cited/paid researchers in 10 years time are people active today).
All of these seem like good reasons to be optimistic, though it was a bit hard for me to update on it given that these were already part of my model. (EDIT: Actually, not the younger researchers part. That was a new-to-me consideration.)
It's interesting to look back at this question 4 years later; I think it's a great example of the difficulty of choosing the right question to forecast in the first place.
I think it is still pretty unlikely that the criterion I outlined is met -- Q2 on my survey still seems like a bottleneck. I doubt that AGI researchers would talk about instrumental convergence in the kind of conversation I outlined. But reading the motivation for the question, it sure seems like a question that reflected the motivation well would have resolved yes by now (probably some time in 2023), given the current state of discourse and the progress in the AI governance space. (Though you could argue that the governance space is still primarily focused on misuse rather than misalignment.)
I did quite deliberately include Q2 in my planned survey -- I think it's important that the people whom governments defer to in crafting policy understand the concerns, rather than simply voicing support. But I failed to notice that it is quite plausible (indeed, the default) for there to be a relatively small number of experts that understand the concerns in enough depth to produce good advice on policy, plus a large base of "voicing support" from other experts who don't have that same deep understanding. This means that it's very plausible that fraction defined in the question never gets anywhere close to 0.5, but nonetheless the AI community "agrees on the risk" to a sufficient degree that governance efforts do end up in a good place.
Notes
Field of AGI research plausibly commenced on 1956 with Dartmouth conference. What happens if one uses Laplace's rule? Then a priori pretty implausible that it will happen, if it hasn't happened soon.
How do information cascades work in this context? How many researchers would I expect to have read and recall a reward gaming list (1, 2, 3, 4)
Here is A list of good heuristics that the case for AI x-risk fails. I'd expect that these, being pretty good heuristics, will keep having an effect on AGI researchers that will continue keeping them away from considering x-risks.
Rohin probably doesn't actually have enough information or enough forecasting firepower to predict that it hasn't happened at 0.1%, and be calibrated. He probably does have the expertise, though. I did some experiments a while ago, and "I'd be very surprised if I were wrong" translated for me to a 95%. YMMV.
An argument would go: "The question looks pretty fuzzy to me, having moving parts. Long tails are good in that case, and other forecasters who have found some small piece of evidence are over-updating." Some quotes:
There is strong experimental evidence, however, that such self-insight is usually faulty. The expert perceives his or her own judgmental process, including the number of different kinds of information taken into account, as being considerably more complex than is in fact the case. Experts overestimate the importance of factors that have only a minor impact on their judgment and underestimate the extent to which their decisions are based on a few major variables. In short, people's mental models are simpler than they think, and the analyst is typically unaware not only of which variables should have the greatest influence, but also which variables actually are having the greatest influence. (Source: Psychology of Intelligence Analysis , Chapter 5)
Our judges in this study were eight individuals, carefully selected for their expertise as handicappers. Each judge was presented with a list of 88 variables culled from the past performance charts. He was asked to indicate which five variables out of the 88 he would wish to use when handicapping a race, if all he could have was five variables. He was then asked to indicate which 10, which 20, and which 40 he would use if 10, 20, or 40 were available to him.
We see that accuracy was as good with five variables as it was with 10, 20, or 40. The flat curve is an average over eight subjects and is somewhat misleading. Three of the eight actually showed a decrease in accuracy with more information, two improved, and three stayed about the same. All of the handicappers became more confident in their judgments as information increased. (Source: Behavioral Problems of Adhering to a Decision Policy)
Here is my first entry to the competition. Here is my second and last entry to the competition. My changes are that I've assigned some probability (5%; I'd personally assign 10%) that it has already happened.
Some notes about that distribution:
Some quick comments at forecasters:
I was initially going to comment "yeah I meant to put 1% on 'already happened' but at the time that I made my distribution the option wasn't there" and then I reread my prior reasoning and saw the 0.1%. Not sure what happened there, I agree that 0.1% is way too confident.
On Laplace's rule: as with most outside views, it's tricky to say what your reference class should be. You could go with the Dartmouth conference, but given that we're talking about the AI safety community influencing the AI community, you could also go with the publication of Superintelligence in 2014 (which feels like the first real attempt to communicate with the AI community), and then you would be way more optimistic. (I might be neglecting lots of failed attempts by SIAI / MIRI, but my impression is that they didn't try to engage the academic AI community very much.)
I don't buy the point about there being good heuristics against x-risk: the premise of my reasoning was that we get warning shots, which would negate many (though not all) of the heuristics.
+1 for long tails
I think it's >1% likely that the one of the first few surveys Rohin conducted would result in a fraction of >0.5.
Evidence from When Will AI Exceed Human Performance?, in the form of median survey responses of researchers who published at ICML and NIPS in 2015:
These seem like fairly safe lower bounds compared to the population of researchers Rohin would evaluate, since concern regarding safety has increased since 2015 and the survey included all AI researchers rather than only those whose work is related to AGI.
These responses are more directly related to the answer to Question 3 ("Does X agree that there is at least one concern such that we have not yet solved it and we should not build superintelligent AGI until we do solve it?") than Question 2 ("Does X broadly understand the main concerns of the safety community?"). I feel very uncertain about the percentage that would pass Question 2, but think it is more likely to be the "bottleneck" than Question 3.
Given these considerations, I increased the probability before 2023 to 10%, with 8% below the lower bound. I moved the median | not never up to 2035 as a higher probability pretty soon also means a sooner median. I decreased the probability of “never” to 20%, since the “not enough people update on it / consensus building takes forever / the population I chose just doesn't pay attention to safety for some reason” condition seems not as likely.
I also added an extra bin to ensure that the probability continues to decrease on the right side of the distribution.
Note: I'm interning at Ought and thus am ineligible for prizes.
Seems like it could be helpful if people who've thought about this would also predict on the question of what the survey value would be today. (e.g. via elicit snapshots)
I thought about how I could most efficiently update my and Rohin’s views on this question.
My best ideas are:
1. Get information directly on this question. What can we learn from surveys of AI researchers or from public statements from AI researchers?
2. Get information on the question’s reference class. What can we learn about how researchers working on other emerging technologies that might have huge risks thought about those risks?
I did a bit of research/thinking on these, which provided a small update towards thinking that AGI researchers will evaluate AGI risks appropriately.
I think that there’s a bunch more research that would be helpful -- in particular, does anyone know of surveys of AI researchers on their views on safety?
I answered the following subquestion to help me answer the overall question: “How likely is it that the condition Rohin specifies will not be met by 2100?”
This could happen due to any of the following non-mutually exclusive reasons:
1. Global catastrophe before the condition is met that makes it so that people are no longer thinking about AI safety (e.g. human extinction or end of civilization): I think there's a 50% chance
2. Condition is met sometime after the timeframe (mostly, I'm imagining that AI progress is slower than I expect): 5%
3. AGI succeeds despite the condition not being met: 30%
4. There's some huge paradigm shift that makes AI safety concerns irrelevant -- maybe most people are convinced that we'll never build AGI, or our focus shifts from AGI to some other technology: 10%
5. Some other reason: 20%
I thought about this subquestion before reading the comments or looking at Rohin’s distribution. Based on that thinking, I thought that there was a 60% chance that the condition would not be met by 2100.
Biggest difference is that I estimate the risk of this kind of global catastrophe before development of AGI and before 2100 to be much lower -- not sure exactly what but 10% seems like the right ballpark. But this did cause me to update towards having more probability on >2100.
I answered the following subquestion to help me answer the overall question: “How likely is it that the condition Rohin specified would already be met (if he went out and talked to the researchers today)?”
Considerations that make it more likely:
1. The considerations identified in ricaz’s and Owain’s comments and their subcomments
2. The bar for understanding safety concerns (question 2 on the "survey") seems like it may be quite low. It seems to me that researchers entirely unfamiliar with safety could gain the required level of understanding in just 30 minutes of reading (depends on how Rohin would interpret his conversation with the researcher in deciding whether to mark “Yes” or “No”)
Considerations that make it less likely:
1. I’d guess that currently, most AI researchers have no idea what any of the concrete safety concerns are, i.e. they’d be “No”s on question 2
2. The bar for question 3 on the "survey" ("should we wait to build AGI") might be pretty high. If someone thinks that some safety concerns remain but that we should cautiously move forward on building things that look more and more like AGI, does that count as a "Yes" or a "No"?
3. I have the general impression that many AI researchers really dislike the idea that safety concerns are serious enough that we should in any way slow down AI research
I thought about this subquestion before reading the comments or looking at Rohin’s distribution. Based on that thinking, I thought that there was a 25% chance that the condition Rohin specified would already be met.
Note: I work at Ought, so I'm ineligible for the prizes
It seems to me that researchers entirely unfamiliar with safety could gain the required level of understanding in just 30 minutes of reading
I might have said an hour, but that seem qualitatively right. But that requires them 1. having the motivation to do so and 2. finding and reading exactly the right sources in a field with a thousand blog posts and not much explicit prioritization around them. I think both of these are huge considerations against this condition already being met.
If someone thinks that some safety concerns remain but that we should cautiously move forward on building things that look more and more like AGI, does that count as a "Yes" or a "No"?
Hmm, good question. Probably a Yes? I might try to push on more clear hypotheticals (e.g. a team believes that such-and-such training run would produce AGI, should they do it?) to get a clearer answer.
If there's an AGI within, say, 10 years and it mostly keeps the world recognizable so there are still researchers to have opinions, does that resolve as "never" or according to whether the AGI wants them to be convinced? Because if the latter, I expect that they will in hindsight be convinced that we should have paid more attention to safety. If the former, I submit that his prior doesn't change. If the latter, I submit that the entire prior is moved 10 years to the left (and the first 10 years cut off, and then renormalize along the y axis).
The third question is
Does X agree that there is at least one concern such that we have not yet solved it and we should not build superintelligent AGI until we do solve it?
Note the word “superintelligent.” This question would not resolve as “never” if the consensus specified in the question is reached after AGI is built (but before superintelligent AGI is built). Rohin Shah notes something similar in his comment:
even if we build human-level reasoning before a majority is reached, the question could still resolve positively after that, since human-level reasoning != AI researchers are out of a job
Unrelatedly, you should probably label your comment “aside.” [edit: I don't endorse this remark anymore.]
It was meant as a submission, except that I couldn't be bothered to actually implement my distribution on that website :) - even/especially after superintelligent AI, researchers might come to the conclusion that we weren't prepared and *shouldn't* build another - regardless of whether the existing sovereign would allow it.
Not quite. Just look at the prior and draw the vertical line at 2030. Note that you're incentivizing people to submit their guess as late as possible, both to have time to read other comments yourself and to put your guess right to one side of another.
We're ok with people posting multiple snapshots, if you want to update it based on later comments! You can edit your comment with a new snapshot link, or add a new comment with the latest snapshot (we'll consider the latest one, or whichever one you identify as your final submission)
Forecast - 25 mins
I think the following is underspecified:
Does X agree that there is at least one concern such that we have not yet solved it and we should not build superintelligent AGI until we do solve it?
What counts as building superintelligent AGI?
This could mean anything from working on foundational theory which could be used to facilitate building an AGI, to finishing the final phase of training on a fully functional AGI implementation.
In the former case you're going to get close to 0% agreement. In the latter, well over 50% already (I hope!).
I don't see any natural/clear in-the-spirit-of-the-question interpretation. E.g. if we set the bar as low as: "build superintelligent AGI" = "do non-safety work related to AGI", then you'll never get above 50% agreement, since anyone who fulfils (1) must deny (3) or admit they're already being irresponsible.
I don't think it's a useful question without clarification of this.
As things stand, it'll be pretty easy for a researcher to decide that they're not building superintelligent AGI, almost regardless of what they're doing. It's then easy to concede that safety problems should be solved first, since that can always mean later.
On the other hand, requiring that a researcher agrees with Rohin on interpretation of "build superintelligent AGI" in order to say that they "agree with safety concerns" seems a high bar.
I was thinking more like
finishing the final phase of training on a fully functional AGI implementation.
My biggest differences with Rohin's prior distribution are:
1. I think that it's much more likely than he does that AGI researchers already agree with safety concerns
2. I think it's considerably more likely than he does that the majority of AGI researchers will never agree with safety concerns
These differences are explained more on my distribution and in my other comments.
The next step that I think would help the most to make my distribution better would be to do more research.
EDIT: The competition is now closed, thanks to everyone who participated! Rohin’s posterior distribution is here, and winners are in this comment.
In this competition, we (Ought) want to amplify Rohin Shah’s forecast for the question: When will a majority of AGI researchers agree with safety concerns? Rohin has provided a prior distribution based on what he currently believes, and we want others to:
The competition will close on Friday July 31st. To participate in this competition, create your prediction on Elicit, click ‘Save Snapshot to URL,’ and post the snapshot link in a comment on this post. You can provide your reasoning in the ‘Notes’ section of Elicit or in your LessWrong comment. You should have a low bar for making predictions – they don’t have to be perfect.
Here is Rohin’s prior distribution on the question. His reasoning for the prior is in this comment. Rohin spent ~30 minutes creating this distribution based on the beliefs and evidence he already has. He will spend 2-5 hours generating a posterior distribution.
Click here to create your distribution
We will award two $200 prizes, in the form of Amazon gift cards:
Motivation
This project is similar in spirit to amplifying epistemic spot checks and other work on scaling up individual judgment through crowdsourcing. As in these projects, we’re hoping to learn about mechanisms for delegating reasoning, this time in the forecasting domain.
The objective is to learn whether mechanisms like this could save people like Rohin work. Rohin wants to know: What would I think if I had more evidence and knew more arguments than I currently do, but still followed the sorts of reasoning principles that I'm unlikely to revise in the course of a comment thread? In real-life applications of amplified forecasting, Rohin would only evaluate the arguments in-depth and form his own posterior distribution 1 out of 10 times. 9 out of 10 times he’d just skim the key arguments and adopt the predicted posterior as his new view.
Question specification
The question is: When will a majority of AGI researchers agree with safety concerns?
Suppose that every year I (Rohin) talk to every top AI researcher about safety (I'm not explaining safety, I'm simply getting their beliefs, perhaps guiding the conversation to the safety concerns in the alignment community). After talking to X, I evaluate:
I then compute the fraction #answers(Yes, Yes, Yes) / #answers(Yes, *, *) (i.e. the proportion of AGI-related top researchers who are aware of safety concerns and think we shouldn't build superintelligent AGI before solving them). In how many years will this fraction be >= 0.5?
For reference, if I were to run this evaluation now, I would be looking for an understanding of reward gaming, instrumental convergence, and the challenges of value learning, but would not be looking for an understanding of wireheading (because I'm not convinced it's a problem we need to worry about) or inner alignment (because the safety community hasn't converged on the importance of inner alignment).
We'll define the set of top AI researchers somewhat arbitrarily as the top 1000 AI researchers in industry by salary and the top 1000 AI researchers in academia by citation count.
If the fraction never reaches > 0.5 (e.g. before the fraction reaches 0.5, we build superintelligent AGI and it kills us all, or it is perfectly benevolent and everyone realizes there weren't any safety concerns), the question resolves as >2100.
Interpret this reasonably (e.g. a comment to the effect of "your survey will annoy everyone and so they'll be against safety" will be ignored even if true, because it's overfitting to the specific counterfactual survey proposed here and is clearly irrelevant to the spirit of the question).
Additional information
Rohin Shah is an AI Safety researcher at the Center for Human-Compatible AI (CHAI). He also publishes the Alignment Newsletter. Here is a link to his website where you can find more information about his research and views.
You are welcome to share a snapshot distribution of your own beliefs, but make sure to specify that the snapshot contains your own beliefs and not your prediction of Rohin’s beliefs (snapshots of your own beliefs will not be evaluated for the competition).