I'd consider those to be "in-scope" for the database, so the database would include any such estimates that I was aware of and that weren't too private to share in the database.
If I recall correctly, some estimates in the database are decently related to that, e.g. are framed as "What % of the total possible moral value of the future will be realized?" or "What % of the total possible moral value of the future is lost in expectation due to AI risk?"
But I haven't seen many estimates of that type, and I don't remember seeing any that were explici...
...and while I hopefully have your attention: My team is currently hiring for a Research Manager! If you might be interested in managing one or more researchers working on a diverse set of issues relevant to mitigating extreme risks from the development and deployment of AI, please check out the job ad!
The application form should take <2 hours. The deadline is the end of the day on March 21. The role is remote and we're able to hire in most countries.
People with a wide range of backgrounds could turn out to be the best fit for the role. As such, if you'...
I found this thread interesting and useful, but I feel a key point has been omitted thus far (from what I've read):
Personally I haven't thought about how strong the analogy to GoF is, but another thing that feels worth noting is that there may be a bunch of other cases where the analogy is similarly strong and where major government efforts aimed at risk-reduction have occurred. And my rough sense is that that's indeed the case, e.g. some of the examples here.
In general, at least for important questions worth spending time on, it seems very weird to say "You think X will happen, but we should be very confident it won't because in analogous case Y it didn't", without al...
Cool!
Two questions:
(Disclaimer: I only skimmed this post, having landed here from Habryka's comment on It could be useful if someone ran a copyediting service. Apologies i...
Thanks for this post! This seems like good advice to me.
I made an Anki card on your three "principles that stand out" so I can retain those ideas. (Mainly for potentially suggesting to people I manage or other people I know - I think I already have roughly the sort of mindset this post encourages, but I think many people don't and that me suggesting these techniques sometimes could be helpful.)
...It's not sufficient to argue that taking over the world will improve prediction accuracy. You also need to argue that during the training process (in which taking over the world wasn't possible), the agent acquired a set of motivations and skills which will later lead it to take over the world. And I think that depends a lot on the training process.
[...] if during training the agent is asked questions about the internet, but has no ability to edit the internet, then maybe it will have the goal of "predicting the world", but maybe it will have the goal of "
Thanks for this series! I found it very useful and clear, and am very likely to recommend it to various people.
Minor comment: I think "latter" and "former" are the wrong way around in the following passage?
...By contrast, I think the AI takeover scenarios that this report focuses on have received much more scrutiny - but still, as discussed previously, have big question marks surrounding some of the key premises. However, it’s important to distinguish the question of how likely it is that the second species argument is correct, from the question of how seriou
FWIW, I feel that this entry doesn't capture all/most of how I see "meta-level" used.
Here's my attempted description, which I wrote for another purpose. Feel free to draw on it here and/or to suggest ways it could be improved.
Thanks for writing this. The summary table is pretty blurry / hard to read for me - do you think you could upload a higher resolution version? Or if for some reason that doesn't work on LessWrong, could you link to a higher resolution version stored elsewhere?
This section of a new post may be more practically useful than this post was: https://forum.effectivealtruism.org/posts/4T887bLdupiNyHH6f/six-takeaways-from-ea-global-and-ea-retreats#Takeaway__2__Take_more__calculated__risks
My Anki cards
Nanda broadly sees there as being 5 main types of approach to alignment research.
...Addressing threat models: We keep a specific threat model in mind for how AGI causes an existential catastrophe, and focus our work on things that we expect will help address the threat model.
Agendas to build safe AGI: Let’s make specific plans for how to actually build safe AGI, and then try to test, implement, and understand the limitations of these plans. With an emphasis on understanding how to build AGI safely, rather than
Thanks for this! I found it interesting and useful.
I don't have much specific feedback, partly because I listened to this via Nonlinear Library while doing other things rather than reading it, but I'll share some thoughts anyway since you indicated being very keen for feedback.
Adam Binks replied to this list on the EA Forum with:
To add to your list - Subjective Logic represents opinions with three values: degree of belief, degree of disbelief, and degree of uncertainty. One interpretation of this is as a form of second-order uncertainty. It's used for modelling trust. A nice summary here with interactive tools for visualising opinions and a trust network.
Not sure what you mean by that being unverifiable? The question says:
...This question resolves as the total number of nuclear weapons (fission or thermonuclear) reported to be possessed across all states on December 31, 2022. This includes deployed, reserve/ nondeployed, and retired (but still intact) warheads, and both strategic and nonstrategic weapons.
Resolution criteria will come from the Federation of American Scientists (FAS). If they cease publishing such numbers before resolution, resolution will come from the Arms Control Association or any other sim
That makes sense to me.
But it seems like you're just saying the issue I'm gesturing at shouldn't cause mis-calibration or overconfidence, rather than that it won't reduce the resolution/accuracy or the practical usefulness of a system based on X predicting what Y will think?
I think it's good that a page like this exists; I'd want to be able to use it as a go-to link when suggesting people engage with or post on LessWrong, e.g. in my post on Notes on EA-related research, writing, testing fit, learning, and the Forum.
Unfortunately, it seems to me that this page isn't well suited to that purpose. Here are some things that seem like key issues to me (maybe other people would disagree):
Authoritarian closed societies probably have an advantage at covert racing, at devoting a larger proportion of their economic pie to racing suddenly, and at artificially lowering prices to do so. Open societies have probably a greater advantage at discovery/the cutting edge and have a bigger pie in the first place (though better private sector opportunities compete up the cost of defense engineering talent).
These are interesting points which I hadn't considered - thanks!
(Your other point also seems interesting and plausible, but I feel I lack the relevant knowledge to immediately evaluate it well myself.)
Interesting post.
You or other readers might also find the idea of epistemic security interesting, as discussed in the report "Tackling threats to informed decisionmaking in democratic societies: Promoting epistemic security in a technologically-advanced world". The report is by researchers at CSER and some other institutions. I've only read the executive summary myself.
There's also a BBC Futures article on the topic by some of the same authors.
While I am not sure I agree fully with the panel, an implication to be drawn from their arguments is that from an equilibrium of treaty compliance, maintaining the ability to race can disincentivize the other side from treaty violation: it increases the cost to the other side of gaining advantage, and that can be especially decisive if your side has an economic advantage.
This is an idea/argument I hadn't encountered before, and seems plausible, so it seems valuable that you shared it.
But it seems to me that there's probably an effect pushing in the opposit...
Thanks for this thought-provoking post. I found the discussion of how political warfare may have influenced nuclear weapons activism particularly interesting.
Since large yield weapons can loft dust straight to the stratosphere, they don’t even have to produce firestorms to start contributing to nuclear winter: once you get particles that block sunlight to an altitude that heating by the sun can keep them lofted, you’ll block sunlight a very long time and start harming crop yields.
I think it's true that this could "contribute" to nuclear winter, but I don't...
Note that:
Dando says ___ used biological weapons in WW1, but seemingly only against ___.
the Germans and perha
A final thought that came to mind, regarding the following passage:
It seems possible for person X to predict a fair number of a more epistemically competent person Y’s beliefs -- even before person X is as epistemically competent as Y. And in that case, doing so is evidence that person X is moving in the right direction.
I think that that's is a good and interesting point.
But I imagine there would also be many cases in which X develops an intuitive ability to predict Y's beliefs quite well in a given set of domains, but in which that ability doesn...
Here's a second thought that came to mind, which again doesn't seem especially critical to this post's aims...
You write:
Someone who can both predict my beliefs and disagrees with me is someone I should listen to carefully. They seem to both understand my model and still reject it, and this suggests they know something I don’t.
I think I understand the rationale for this statement (though I didn't read the linked Science article), and I think it will sometimes be true and important. But I think that those sentences might overstate the point. In par...
Thanks for this and its companion post; I found the two posts very interesting, and I think they'll usefully inform some future work for me.
A few thoughts came to mind as I read, some of which can sort-of be seen as pushing back against some claims, but in ways that I think aren't very important and that I expect you've already thought about. I'll split these into separate comments.
Firstly, as you note, what you're measuring is how well predictions match a proxy for the truth (the proxy being Elizabeth's judgement), rather than the truth itself. Something ...
Good idea! I didn't know about that feature.
I've now edited the post to use spoiler-blocks (though a bit messily, as I wanted to do it quickly), and will use them for future lazy-Anki-card-notes-posts as well.
I didn't add that tag; some other reader did.
And any reader can indeed downvote any tag, so if you feel that that tag shouldn't be there, you could just downvote it.
Unless you feel that the tag shouldn't be there but aren't very confident about that, and thus wanted to just gently suggest that maybe the tag should be removed - like putting in a 0.5 vote rather than a full one. But that doesn't seem to match the tone of your comment.
That said, it actually does seem to me that this post fairly clearly does match the description for that tag; the ...
Yeah, I definitely agree that that's a good idea with any initialisations that won't already be known to the vast majority of one's readers (e.g., I wouldn't bother with US or UK, but would with APA). In this case, I just copied and pasted the post from the EA Forum, where I do think the vast majority of readers would know what "EA" means - but I should've used the expanded form "effective altruism" the first time in the LessWrong version. I've now edited that.
Here's a comment I wrote on the EA Forum version of this post, which I'm copying here as I'd be interested on people's thoughts on the equivalent questions in the context of LessWrong:
Meta: Does this sort of post seem useful? Should there be more posts like this?
I previously asked Should pretty much all content that's EA-relevant and/or created by EAs be (link)posted to the Forum? I found Aaron Gertler's response interesting and useful. Among other things, he said:
...Eventually, we'd like it to be the case that almost all well-written EA content exists on the
Note: If you found this post interesting, you may also be interested in my Notes on "The Bomb: Presidents, Generals, and the Secret History of Nuclear War" (2020), or (less likely) Notes on The WEIRDest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous. (The latter book has a very different topic; I just mention it as the style of post is the same.)
To your first point...
My impression is that there is indeed substantially less literature on misuse risk and structural risk, compared to accident risk, in relation to AI x-risk. (I'm less confident when it comes to a broader set of negative outcomes, not just x-risks, but that's also less relevant here and less important to me.) I do think that that might the sort of work this post does less interesting if done in relation to those less-discussed types of risks, since there fewer disagreements have been revealed, so there's less to analyse and summarise.&...
Thanks for this post; this does seem like a risk worth highlighting.
I've just started reading Thomas Schelling's 1960 book The Strategy of Conflict, and noticed a lot of ideas in chapter 2 that reminded me of many of the core ideas in this post. My guess is that that sentence is an uninteresting, obvious observation, and that Daniel and most readers were already aware (a) that many of the core ideas here were well-trodden territory in game theory and (b) that this post's objectives were to:
List(s) of relevant problems
It occurs to me that all of the hypotheses, arguments, and approaches mentioned here (though not necessarily the scenarios) seem to be about the “technical” side of things. There are two main things I mean by that statement:
First, this post seems to be limited to explaining something along the lines of “x-risks from AI accidents”, rather than “x-risks from misuse of AI”, or “x-risk from AI as a risk factor” (e.g., how AI could potentially increase risks of nuclear war).
I do think it makes sense to limit the scope that way, because:
Thanks for this post! This seems like a really great way of visually representing how these different hypotheses, arguments, approaches, and scenarios interconnect. (I also think it’d be cool to see posts on other topics which use a similar approach!)
It seems that AGI timelines aren’t explicitly discussed here. (“Discontinuity to AGI” is mentioned, but I believe that's a somewhat distinct matter.) Was that a deliberate choice?
It does seem like several of the hypotheses/arguments mentioned here would feed into or relate to beliefs about timelines - in parti...
Thanks for this post; I found it useful.
The US policy has never ruled out the possibility of escalation to full countervalue targeting and is unlikely to do so.
But the 2013 DoD report says "The United States will not intentionally target civilian populations or civilian objects". That of course doesn't prove that the US actually wouldn't engage in countervalue targeting, but doesn't it indicate that US policy at that time ruled out engaging in countervalue targeting?
This is a genuine rather than rhetorical question. I feel I might be just missing som...
If I had to choose between a AW treaty and some treaty governing powerful AI, the latter (if it made sense) is clearly more important. I really doubt there is such a choice and that one helps with the other, but I could be wrong here. [emphasis added]
Did you mean something like "and in fact I think that one helps with the other"?
I don't think I know of any person who's demonstrated this who thinks risk is under, say, 10%
If you mean risk of extinction or existential catastrophe from AI at the time AI is developed, it seems really hard to say, as I think that that's been estimated even less often than other aspects of AI risk (e.g. risk this century) or x-risk as a whole.
I think the only people (maybe excluding commenters who don't work on this professionally) who've clearly given a greater than 10% estimate for this are:
Mostly I only start paying attention to people's opinions on these things once they've demonstrated that they can reason seriously about weird futures
[tl;dr This is an understandable thing to do, but does seem to result in biasing one's sample towards higher x-risk estimates]
I can see the appeal of that principle. I partly apply such a principle myself (though in the form of giving less weight to some opinions, not ruling them out).
But what if it turns out the future won't be weird in the ways you're thinking of? Or what if it turns out that, even if it wi...
I'm not sure which of these estimates are conditional on superintelligence being invented. To the extent that they're not, and to the extent that people think superintelligence may not be invented, that means they understate the conditional probability that I'm using here.
Good point. I'd overlooked that.
I think lowish estimates of disaster risks might be more visible than high estimates because of something like social desirability, but who knows.
(I think it's good to be cautious about bias arguments, so take the following with a grain of salt, and note th...
That does seem interesting and concerning.
Minor: The link didn’t work for me; in case others have the same problem, here is (I believe) the correct link.
Yeah, totally agreed.
I also think it's easier to forecast extinction in general, partly because it's a much clearer threshold, whereas there are some scenarios that some people might count as an "existential catastrophe" and others might not. (E.g., Bostrom's "plateauing — progress flattens out at a level perhaps somewhat higher than the present level but far below technological maturity".)
Conventional risks are events that already have a background chance of happening (as of 2020 or so) and does not include future technologies.
Yeah, that aligns with how I'd interpret the term. I asked about advanced biotech because I noticed it was absent from your answer unless it was included in "super pandemic", so I was wondering whether you were counting it as a conventional risk (which seemed odd) or excluding it from your analysis (which also seems odd to me, personally, but at least now I understand your short-AI-timelines-based reasoning for ...
The overall risk was 9.2% for the community forecast (with 7.3% for AI risk). To convert this to a forecast for existential risk (100% dead), I assumed 6% risk from AI, 1% from nuclear war, and 0.4% from biological risk
I think this implies you think:
Does that sound right to you? And if so, what was your reasoning?
I ask...
Very interesting, thanks for sharing! This seems like a nice example of combining various existing predictions to answer a new question.
a forecast for existential risk (100% dead)
It seems worth highlighting that extinction risk (risk of 100% dead) is a (big) subset of existential risk (risk of permanent and drastic destruction of humanity's potential), rather than those two terms being synonymous. If your forecast was for extinction risk only, then the total existential risk should presumably be at least slightly higher, due to risks of unrecoverable colla...
Thanks for those responses :)
MIRI people and Wei Dai for pessimism (though I'm not sure it's their view that it's worse than 50/50), Paul Christiano and other researchers for optimism.
It does seem odd to me that, if you aimed to do something like average over these people's views (or maybe taking a weighted average, weighting based on the perceived reasonableness of their arguments), you'd end up with a 50% credence on existential catastrophe from AI. (Although now I notice you actually just said "weight it by the probability that it turns out badly ...
Glad to hear that!
I do feel excited about this being used as a sort of "201 level" overview of AI strategy and what work it might be useful to do. And I'm aware of the report being included in the reading lists / curricula for two training programs for people getting into AI governance or related work, which was gratifying.
Unfortunately we did this survey before ChatGPT and various other events since then, which have majorly changed the landscape of AI governance work to be done, e.g. opening various policy windows. So I imagine people reading this report ... (read more)