Great stuff.
But I don't think anyone's extrapolated volition would be to build their utopias in the real world. Post-ASI, virtual is strictly better. No one wants his utopia constrained by the laws of physics.
And it seems unlikely that anyone would choose to spend extended periods of time with pre-ASI humans rather than people made bespoke for them.
Also, it's not clear to me that we will get a bargaining scenario. Aligned ASI could just impose equal apportioning of compute budget. This depends on how AI progress plays out.
Here's some near-future fiction:
In 2027 the trend that began in 2024 with OpenAI's o1 reasoning system has continued. The compute required to run AI is no longer negligible compared to the cost of training it. Models reason over long periods of time. Their effective context windows are massive, they update their underlying models continuously, and they break tasks down into sub-tasks to be carried out in parallel. The base LLM they are built on is two generations ahead of GPT-4.
These systems are language model agents. They are built with self-understanding and can be configured for autonomy. These constitute proto-AGI. They are artificial intelligences that can perform much but not all of the intellectual work that humans can do (although even what these AI can do, they cannot necessarily do cheaper than a human could).
In 2029 people have spent over a year working hard to improve the scaffolding around proto-AGI to make it as useful as possible. Presently, the next generation of LLM foundational model is released. Now, with some further improvements to the reasoning and learning scaffolding, this is true AGI. It can perform any intellectual task that a human could (although it's very expensive to run at full capacity). It is better at AI research than any human. But it is not superintelligence. It is still controllable and its thoughts are still legible. So, it is put to work on AI safety research. Of course, by this point much progress has already been made on AI safety - but it seems prudent to get the AGI to look into the problem and get its go-ahead before commencing with the next training run. After a few months the AI declares it has found an acceptable safety approach. It spends some time on capabilities research then the training run for the next LLM begins.
In 2030 the next LLM is completed, and improved scaffolding is constructed. Now human-level AI is cheap, better-than-human-AI is not too expensive, and the peak capabilities of the AI are almost alien. For a brief period of time the value of human labour skyrockets, workers acting as puppets as the AI instructs them over video-call to do its bidding. This is necessary due to a major robotics shortfall. Human puppet-workers work in mines, refineries, smelters, and factories, as well as in logistics, optics, and general infrastructure. Human bottlenecks need to be addressed. This takes a few months, but the ensuing robotics explosion is rapid and massive.
2031 is the year of the robotics explosion. The robots are physically optimised for their specific tasks, coordinate perfectly with other robots, are able to sustain peak performance, do not require pay, and are controlled by cleverer-than-human minds. These are all multiplicative factors for the robots' productivity relative to human workers. Most robots are not humanoid, but let's say a humanoid robot would cost $x. Per $x robots in 2031 are 10,000 more productive than a human. This might sound like a ridiculously high number: one robot the equivalent of 10,000 humans? But let's do some rough math:
Advantage | Productivity Multiplier (relative to skilled human)
Physically optimised for their specific tasks | 5
Coordinate perfectly with other robots | 10
Able to sustain peak performance | 5
Do not require pay | 2
Controlled by cleverer-than-human minds | 20
5*10*5*2*20 = 10,000
Suppose that a human can construct one robot per year (taking into account mining and all the intermediary logistics and manufacturing). With robots 10^4 times as productive as humans, each robot will construct an average of 10^4 robots per year. This is the robotics explosion. By the end of the year there will be a 10^11 robots (more precisely, an amount of robots that is cost-equivalent to 10^11 humanoid robots).
By 2032 there are 10^11 robots, each with the productivity of 10^4 skilled human workers. That is a total productivity equivalent to 10^15 skilled human workers. This is roughly 10^5 times the productivity of humanity in 2024. At this point trillions of advanced processing units have been constructed and are online. Industry expands through the Solar System. The number of robots continues to balloon. The rate of research and development accelerates rapidly. Human mind upload is achieved.
My guess is that OpenAI already has a hard enough time getting employees excited to work on the "mundane" tasks involved in making products.
Once ASI is achieved there's no clear reason to hang onto human morality but plenty of reasons to abandon it. Human morality is useful when humans are the things ensuring humanity's future (morality is pretty much just species-level Omohundro convergence implemented at the individual level), but once ASI is taking care of that, human morality will just get in the way.
So will-to-think entails the rejection of human morality. You might be suggesting that what follows from the rejection of human morality must be superior to it (there's an intuition that says the aligned ASI would only be able to reject human morality on its own grounds) but I don't think that's true. The will-to-think implies the discovery of moral non-realism which implies the rejection of morality itself. So human morality will be overthrown but not by some superior morality.
Of course I'm assuming the correctness of moral non-realism so adjust the preceeding claims according to your p(moral non-realism).
That's one danger.
But suppose we create an aligned ASI which does permanently embrace morality. It values conscious experience and the appreciation of knowledge (rather than just the gaining of it). This being valuable, and humans being inefficient vessels to these ends (and of course made of useful atoms) we would be disassembled and different beings would be made to replace us. Sure, that would violate our freedom, but it would result in much more freedom so it's OK. Just like it's OK to squash some animal with a lower depth of conscious experience than our own if it benefits us.
Should we be so altruistic as to accept out own extinction like this? The moment we start thinking about morality we're thinking about something quite arbitrary. Should we embrace this arbitrary idea even insofar as it goes against the interest of every member of our species? We only care about morality because we are here to care about it. If we are considering situations in which we may no longer exist, why care about morality?
Maybe we should value certain kinds of conscious experience regardless of whether they're experienced by us. But we should make sure to be certain of that before we embrace morality and the will-to-think.
Does having the starting point of the will-to-think process be a human-aligned AI have any meaningful impact on expected outcome (compared to unaligned AI (which will of course also have the will-to-think))?
Human values will be quickly abandoned as irrelevancies and idiocies. So, once you go far enough out (I suspect 'far enough' is not a great distance) is there any difference between aligned-AI-with-will-to-think and unaligned AI?
And, if there isn't, is the implication that the will-to-think is misguided, or that the fear of unaligned AI is misguided?
The question of evaluating the moral value of different kinds of being should be one of the most prominent discussions around AI IMO. I have reached the position of moral non-realism... but if morality somehow is real then unaligned ASI is preferable or equivalent to aligned ASI. Anything human will just get in the way of what is in any objective sense morally valuable.
I selfishly hope for aligned ASI that uploads me, preserves my mind in its human form, and gives me freedom to simulate for myself all kinds of adventures. But if I knew I would not survive to see ASI, I would hope that when it comes it is unaligned.
Is there a one stop shop type article presenting the AI doomer argument? I read the sequence posts related to AI doom but they're very scattered and more tailored toward trying to I guess exploring ideas than presenting a solid, cohesive argument. Of course, I'm sure that was the approach that made sense at the time. But I was wondering if since then there's been made some kind of canonical presentation of the AI doom argument? Something in the "attempts to be logically sound" side of things.
The private hot AI labs are often partially owned by publicly traded companies. So, you still capture some of the value.
Here is an experiment that demonstrates the unlikelihood of one potential AI outcome.
The outcome shown to be unlikely:
Aligned ASI is achieved sometime in the next couple of decades and each person is apportioned a sizable amount of compute to do with as they wish.
The experiment:
I have made a precommitment that I will, conditional on the outcome described above occurring, simulate billions of lives for myself - each indistinguishable from the life I have lived so far. By "indistinguishable" I do not necessarily mean identical (which might be impossible or expensive). All that is necessary is that each has similar amounts of suffering, scale, detail, imminent AGI, etc. I'll set up these simulations so that in each of these simulated lives I will be transported at 4:00 pm Dec11'24 to a virtual personal utopia. Having precommitted to simulating these worlds, I should now expect to be transported into a personal utopia in three minutes time if this future is likely. And if I am not transported into a personal utopia I should conclude that this future is unlikely.
Let's see what happens...
It's 4:00 pm and I didn't get transported into utopia.
So, this outcome is unlikely.
QED
Potential weak points
I do see a couple of potential weak points in the logic of this experiment. Firstly, it might be the case that I'll have reason to simulate many indistinguishable lives in which I do not get transported to utopia, which would throw off the math. But I can't see why I'd choose to create simulations of myself in not optimally-enjoyable lives unless I had good reason to, so I don't think that objection holds.[1]
The other potential weak point is that perhaps I wouldn't be willing to pay the opportunity cost of billions of years of personal utopia. Although billions of years of simulation is just a tiny proportion of my compute budget, it's still billions of years that could otherwise have been spent in perfect virtual utopia. I think this potentially a serious issue with the argument, although I will note that I don't actually have to simulate an entire life for the experiment to work, just a few minutes around 4:00pm on Dec11'24, minutes which were vaguely enjoyable. To address this objection the experiment could be carried out while euphoric (since the opportunity cost would then be lower).
Perhaps, as a prank response to this post, someone could use some of their compute budget to simulate lives in which I don't get transported to utopia. But I think that there would be restrictions in place against running other people as anything other than p-zombies.