I think something strange is going on here. Suppose we have an AI that will fill the universe with X. Now slide X along the continuous evolutionary tree. Do we get a huge update against an AI that fills the universe with slime mold? What about an AI that fills the universe with Neanderthals? Dolphins?
The only two consistent lines would be
1) Put every detail you have into the selection. You believe you are randomly selected from the agents who are exactly like you in every last detail. So long as the AI doesn't run history simulations, you don't have to make any anthropic updates.
2) put nothing in to SSA, you are randomly sampled from all the items in the universe. This leads to the strange conclusion that, given a fixed number of copies of you, you are probably in a universe that doesn't contain many other entities you could be. Given our universe is full of stuff, this seems like a bad prediction. (Ie this SSA is very surprised that you are a human, not a random hydrogen atom.)
And even granting the premise, suppose you build an AI that fills the universe with simulations of people who think they are putting the finnishing touches on their ASI. You have a bays update of around a billion to one towards that working. (If it didn't work, most randomly chosen people don't think they are AI researchers. My experience of doing AI research is surprising. But if it does work, almost every human in the universe has that experience.) Assuming the experience of a modern AI researcher is relatively pleasant, both compared to historical human experience and nonexistance, the both total and average utilitarianism endorse this route. The utilitarian thing to do in this setting it to try to make AI, but also make sure to have a fun time doing it. (This feels like a conclusion that is also self benefiting, so a motivated reasoning warning applies here)
I agree with the argument conditional on us not being fundamentally confused about anthropics, but as with most of these sorts of arguments, most of my uncertainty is coming from what is meant by "reference class" and "observer". I have the nagging feeling that all of anthropics stems from us not being able to properly define what an observer actually is. And I think it's possible that observers actually don't exist and we're completely confused about all of this.
Just say "in the last X%" of people to get the opposite argument.
As a rule, anthropic arguments you come up with on the fly should be verified by reversing your windows and seeing if they still make sense.
When you say "The more new humans an AI tolerates, the higher its chances of failure", what exactly does "failure" mean?
Or, more pointedly, would causing human extinction qualify as "failure" of an AI?
In this post I hope to demonstrate that anyone that is able to create a superinteligent AI has the moral obligation to desing it to be Antinatalist. Failure to do so would substantially increase the probability of human extinction.
To the Garden of Eden
In his book Anthropic Bias, Nick Bostrom argues that the Selfs-Sampling Assumpution (SSA) is true. the SSA states that you should assume you are a random member of your reference class throughout space and time (in our case human beings). For this post we are only interested in the temporal aspect of the anthropic bias. That is to say: Whether we are to order all humans along a line, from Adam and Eve, all the way to the Last Man. where in that line (percent wise) should you expect yourself to be? If we assume the SSA to be true; you should say you have a 1% to be among the first 1%, a 10% to be among the last 10%, a 50% percent chance to be among the middle 50%, etc. This conclusion seems benign, but has some unusual consequences. Such as for example in the fable of Serpent’s Advice:
To the Labs of Eden
Why am I telling you all of this? Because in reality we are Adam and Eve.
Consider Sydney, future AI researcher. Sydney has developed a superintelligent AI. If its alignment scheme works it will create a future full of joy and happiness, resulting in the long and happy life of a quintillion human beings, making use of Dyson SwarmTechnology. If it does not work, everyone will instantly die. Luckily Sydney is pretty confident about this scheme, giving it a 99% chance of succeeding.
Seems like pretty good odds? Sure, there is a small risk that you end the human species. But you can't make an omelet without breaking a few eggs.
However: Just as Sydney reaches for the "on" button, the Serpent crawls into the window, saying: "“Pssst! If you turn the machine on, then either the AI's alignment scheme will work or it won’t. If it does, you will have been among the 100 billion out of a quintillion of people. Your conditional probability of having such early positions in the human species given this hypothesis is extremely small. If, on the other hand, the AI isn't aligned properly then the conditional probability, given this, of you being among the first 100 billion is equal to one. By Bayes’s theorem, the chance that the scheme works is one in 100 thousand. Therefore, my dear friend, do not indulge your desires worry not about the consequences!”
The maths checks out. Sydney should not activate the AI. Any objections
Ok sure, there is now an incredible low chance that that AI will succeed. But if it does, it will bring happiness to a much larger number of people.
Yes, but here you are presupposing that Utility scales linearly with population. Is a world with twice the amount of people in it twice as good? Would you flip a coin such that humanity ends if it lands heads, but a copy of the earth and all its people appears if it lands tails? Moreover, Sydney wants to not die herself, that alone is enough to keep her finger a long distance awat from ant botton
AI development is inevitable, If she doesn't press this button someone else will press some other button.
This is true as well, but that will take some time. Even if that other person presses it the day after, that is another gained day lived from Sydney's perspective. She has no chance of long term survival to begin with. On top of that, Sydney has an even better solution.
Sydney modifies the program. It is the same as it was before, only now it also ensures that no further human beings will be created from the moment it is turned on. This change does not affect the alignment scheme, so that the chance of success ignoring anthropics is still 99%. Only now the Serpent's vile Bayesian games do not matter anymore.
This then is my case for Post-Singularity Antinatalism: The more new humans an AI tolerates, the higher its chances of failure. We do not want it to fail, so we should limit the number of new human beings the AI can tolerate.
PS: Personally I'm not precisely on board with the SSA, I find the SIA more plausible.