This is not going to be a popular post here, but I wanted to articulate precisely why I have a very low pDoom (2-20%) compared to most people on LessWrong.
Every argument I am aware of for pDoom fits into one of two categories: bad or weak.
Bad arguments make a long list of claims, most of which have no evidence and some of which are obviously wrong. Examples include A List of Lethalities, which is almost the canonical example. There is no attempt to organize the list into a single logical argument, and it is built on many assumptions (analogies to human evolution, assumption of fast takeoff, ai opaqueness) which are in conflict with reality.
Weak arguments go like this: "AGI will be powerful. Powerful systems can do unpredictable things. Therefore AGI could doom us all." Examples of these arguments include each of the arguments on this list.
So the line of reasoning I follow is something like this;
- I start with a very low prior of AGI doom (for the purpose of this discussion, assume I defer to consensus).
- I then completely ignore the bad arguments,
- finally, I give 1 bit of evidence collectively for the weak arguments (I don't consider them independent, most are just rephrasing the example argument)
So even if I assume no one betting on Manifold has ever heard of the argument "AGI might be bad actually", I only get from 13% -> 30% with that additional bit of evidence.
In the comments: if you wish to convince me, please propose arguments that are neither bad nor weak. Please do not argue that I am using the wrong base-rate or that the examples that I have already given are neither bad nor weak.
EDIT:
There seems to be a lot of confusion about this, so I thought I should clarify what I mean by a "strong good argument"
Suppose you have a strongly-held opinion, and that opinion disagrees from the expert-consensus (in this case, the Manifold market or expert surveys showing that most AI experts predict a low probability of AGI killing us all). If you want to convince me to share your beliefs, you should have a strong good argument for why I should change my beliefs.
A strong good argument has the following properties:
- it is logically simple (can be stated in a sentence or two)
- This is important, because the longer your argument, the more details that have to be true, and the more likely that you have made a mistake. Outside the realm of pure-mathematics, it is rare for an argument that chains together multiple "therefore"s to not get swamped by the fact that
- Each of the claims in the argument is either self-evidently true, or backed by evidence.
- example of a claim that is self-evidently true would be: if AGI exists, it will be made out of atoms
- example of a claim that is not self-evidently true: if AGI exists, it will not share any human values
To give an example completely unrelated to AGI. The expert consensus is that nuclear power is more expensive to build and maintain than solar power.
However, I believe this consensus is wrong because: The cost of nuclear power is artificially inflated by the regulation which mandates nuclear be "as safe as possible", thereby guaranteeing that nuclear power can never be cheaper than other forms of power (which do not face similar mandates).
Notice that even if you disagree with my conclusion, we can now have a discussion about evidence. You might ask, for example "what fraction of nuclear power's cost is driven by regulation?" "Are there any countries that have built nuclear power for less than the prevailing cost in the USA?" "What is an acceptable level of safety for nuclear power plants?"
I should also probably clarify why I consider "long lists" bad arguments (and ignore them completely).
If you have 1 argument, it's easy for me to examine the argument on it's merits so I can decide whether it's valid/backed by evidence/etc.
If you have 100 arguments, the easiest thing for me to do is to ignore them completely and come up with 100 arguments for the opposite point. Humans are incredibly prone to cherry-picking and only noticing arguments that support their point of view. I have absolutely no reason to believe that you the reader have somehow avoided all this and done a proper average over all possible arguments. The correct way to do such an average is to survey a large number of experts or use a prediction market, not whatever method you have settled upon.
You need to not mix up conflicts between different human groups with the inability for humans to thrive. The fact that there has been a human history at all requires people to have the orientation to know what's going on, the capacity to act on it, and the care to do so. Humanity hasn't just given up or committed suicide, leaving just a nonhuman world.
Now it's true that generally, there was a self-centered thriving that favored the well-being of oneself and one's friends and family over others, and this would lead to various sorts of conflicts, often wrecking a lot of good people. We can only hope society becomes more discriminatory over time, to better nurture the goodness and only destroy the badness. But you can only say that genocide was bad because there was something that created good people who it was wrong to kill.
But critically, various historical environmental problems had lead to the creation of environmentalist groups, which enabled society to notice these atmospheric problems. Contrast this with prior environmental changes that there was no protection against.
You are misunderstanding. By "loosen up this default pull", I mean, let's say you implement a bot to handle food production, from farm to table. Right now, food production needs to be human-legible because it involves a collaborative effort between lots of people. With the bot, even if it handles food production perfectly fine, you've now removed the force that generates human legibility for food production.
As you remove human involvement from more and more places, humans become able to do fewer and fewer things. Maybe humans can still thrive under such circumstances, but surely you can see that strong people have a by-default better chance than weak people do? Notably, this is a separate part of the argument from adversarial search, and it applies even if we limit ourselves to reflex-like methods. The point here is to highlight what currently allows humans to thrive, and how that gets weakened by AI.
If you wait until humans have manually checked them all through, then you incentivize adversaries to develop military techniques that can destroy your country faster than you can wake up your interpretability researchers. (I expect this to be possible with only weak, reflex-based AI, like if you set up a whole bunch of automated bots to wreck havoc in various ways once triggered.)
It's not, it's registering my assumption in case you want to object to it. If you think nukes might be used in a more limited way, then maybe you also think adversarial searches might be used in a more limited way.
Registering something being unclear is helpful for where to take it. Like if we agreed on the overall picture, but you were more optimistic about the areas that were unclear, and I was more pessimistic about them, then I could continue the argument into those areas as well. Like I'm sort of trying to comprehensively enumerate all the relevant dynamics for how this is gonna develop, and explicitly mark off the places that are relevant to consider but which I haven't properly addressed.
Right now, though, you seem to be assuming that humans by-default thrive, and only exogenous dangers like war or oppression can prevent this. Meanwhile, I'm more using a sort of "inertial" model, where certain neuroses can drive humans to spontaneously self-destruct, sometimes taking a lot of their neighbors with them. As such it seems less relevant to explore these subtrees until the issue of self-destructive neuroses are addressed.
Looks like the default path to me? Like AI companies are dumping lots of knowledge and skills into LLMs, for instance, and at my job we've started integrating them with our product. Are there any other relevant dynamics you are seeing?
You need physical experimentation to test how well your methods for unleashing energy/flow into a particular direction works, so building reflex-like/tool AIs is going to be fundamentally rate-limited by the speed of physical experimentation.
However, as you build up a library of tricks to interact with the world, you can use compute to search through ways to combine these tricks to make bigger things happen. This is generally bounded by whatever the biggest "energy source" you can tap into is, because it is really hard to bring multiple different "energy sources" together into some shared direction.
We'll build corrigible AI that tries to help us with ordinary stuff like transporting food from farms to homes.
However, the more low-impact it is, the more exploitable it is. If you want food from a self-driving truck, maybe you could just stand in front of it, and it will stop, and then some of your friends can break in to it and steal the food it is carrying.
To prevent this, we need to incapacitate criminals. But criminals don't want to be incapacitated, so they will exploit whatever weaknesses the system for incapacitating them has. As part of this, the more advanced criminals will presumably build AIs to try to seek out weaknesses in the system. That's what I'm referring to with adversaries exploiting your weakness.
Being robust to exploitation from adversaries massively restricts your options. Whether the exact implementation includes an explicit utility function or not is less relevant than the fact that as it spontaneously adapts to undermine its adversaries, it needs to do so in a way that doesn't undermine humanity in general. I.e. you need to build some system that can unleash massive destruction towards sufficiently unruly enemies, without unleashing massive destruction towards friends. I think the classic utility maximizer instrumental convergence risk gives a pretty accurate picture for how that will look / how that gives you dangers, but if you think next-token-predictors can unleash destruction in a more controlled way, I'm all ears.
Any path for history needs to account for security and resource flow/allocation. These are the most important part of everything. My position doesn't really assume that much beyond this.