I think your description of vision 1 is likely to give people misleading impressions of what this could plausibly look like or what the people who you cited as pursuing vision 1 are thinking will happen. You disclaim this by noting the doc is oversimplified, but I think various clarifications are quite important in practice.
(It's possible that you think these misleading impression aren't that important because from your perspective the main cruxes are in What does it take to defend the world from out-of-control AI? (But presumably you don't place total confidence in your views there?))
[Edit: I think this first paragraph originally came across as more aggressive than I was intending. Sorry. I've edited it a bit to tone it down.]
It seems important to note that the totally amount of autonomy in vision 1 might be extremely large in practice. E.g., AIs might conduct autonomous R&D where some AI instance works on a project for the equivalent of many months without any interaction with a human. (That said I think this system is very likely to be monitored by other AI systems and some actions might be monitored by humans, though it's plausible that the fraction monitored by humans is very low (e.g. 1%) and long contiguous sequences won't see any human monitoring.) Levels of autonomy this high might be required for speeding up R&D by large factors (e.g. 30x) due to a combination of serial bottlenecks (meaning that AIs need to serially outspeed humans in many cases) and the obvious argument that a 30x speed up requires AI to automate at least 97% of tasks. (To be clear, I think sometimes when people are imagining vision 1, they aren't thinking about situations this crazy, but I think they should.)
In fact, I think the level of autonomy between Visions 1 and 2 might be actually similar in practice (because even wild AIs in Vision 2 might want to utilize human labor for some tasks for some transitionary period).
The main difference between vision 1 and visions 2 (assuming vision 1 is working):
There’s no sharp line between the helper AIs of Vision 1 and the truly-autonomous AIs of Vision 2. For example, to what extent do the human supervisors really understand what their AI helpers are doing and how? The less the humans understand, the less we can say that the humans are really in control.
There is also the failure model of deceptive alignment where these AIs are lying in wait for a good opportunity for a treacherous turn. This is a problem even if humans have understood everything they've seen thus far.
One issue here is race-to-the-bottom competitive dynamics: if some humans entrust their AIs with more authority to make fast autonomous decisions for complex inscrutable reasons, then those humans will have a competitive advantage over the humans who don’t. Thus they will wind up in control of more resources, and in this way, the typical level of human control and supervision may very rapidly drop to zero.
Seems like a complicated empirical question. Note that adequately supervising 1% of all queries suffices to rule out a bunch of specific threat models. See auditing failures vs concentrated failures. Of course, adequate supervision is hard and might be much harder if competitive AIs must perform inscrutable actions which could contain inscrutable danger.
By and large, people in this camp have an assumption that TAI will look, and act, and be trained, much like LLMs, but they’ll work better.
FWIW, I think Paul in particular puts less than 50% on "TAI looks like LLMs" if by that you mean "most of the capabilities come from generative pretraining basically like what we have right now". Short timelines are more likely to look like this though presumably.
That’s a very helpful comment, thanks!
Yeah, Vision 1 versus Vision 2 are two caricatures, and as such, they differ along a bunch of axes at once. And I think you're emphasizing on different axes than the ones that seem most salient to me. (Which is fine!)
In particular, maybe I should have focused more on the part where I wrote: “In that case, an important conceptual distinction (as compared to Vision 1) is related to AI goals: In Vision 1, there’s a pretty straightforward answer of what the AI is supposed to be trying to do… By contrast, in Vision 2, it’s head-scratching to even say what the AI is supposed to be doing…”
Along this axis-of-variation:
Why am I emphasizing this axis in particular?
For one thing, I think this axis has practical importance for current research; on the narrow value learning vs ambitious value learning dichotomy, “narrow” is enough to execute Vision 1, but you need “ambitious” for Vision 2.
For example, if we move from “training by human approval” to “training by human approval after the human has had extensive time to reflect, with weak-AI brainstorming help”, then that’s a step from Vision 1 towards Vision 2 (i.e. a step from narrow value learning towards ambitious value learning). But my guess is that it’s a pretty small step towards Vision 2. I don’t think it gets us all the way to the AI I mentioned above, the one that will proactively deconvert a religious fundamentalist supervisor who currently has no interest whatsoever in questioning his faith.
For another thing, I think this axis is important for strategy and scenario-planning. For example, if we do Vision 2 really well, it changes the story in regards to “solution to global wisdom and coordination” mentioned in Section 3.2 of my “what does it take” post.
In other words, I think there are a lot of people (maybe including me) who are wrong about important things, and also not very scout-mindset about those things, such that “AI helpers” wouldn’t particularly help, because the person is not asking the AI for its opinion, and would ignore the opinion anyway, or even delete that AI in favor of a more sycophantic one. This is a societal problem, and always has been. One possible view of that problem is: “well, that’s fine, we’ve always muddled through”. But if you think there are upcoming VWH-type stuff where we won’t muddle through (as I tentatively do in regards to ruthlessly-power-seeking AGI), then maybe the only option is a (possibly aggressive) shift in the balance of power towards a scout-mindset-y subpopulation (or at least, a group with more correct beliefs about the relevant topics). That subpopulation could be composed of either humans (cf. “pivotal act”), or of Vision 2 AIs.
Here’s another way to say it, maybe. I think you’re maybe imagining a dichotomy where either AI is doing what we want it to do (which is normal human stuff like scientific R&D), or the AI is plotting to take over. I’m suggesting that there’s a third murky domain where the person wants something that he maybe wouldn’t want upon reflection, but where “upon reflection” is kinda indeterminate because he could be manipulated into wanting different things depending on how they’re framed. This third domain is important because it contains decisions about politics and society and institutions and ethics and so on. I have concerns that getting an AI to “perform well” in this murky domain is not feasible via a bootstrap thing that starts from the approval of random people; rather, I think a good solution would have to look more like an AI which is internally able to do the kinds of reflection and thinking that humans do (but where the AI has the benefit of more knowledge, insight, time, etc.). And that requires that the AI have a certain kind of “autonomy” to reflect on the big picture of what it’s doing and why. I think that kind of “autonomy” is different than how you’re using the term, but if done well (a big “if”!), it would open up a lot of options.
I agree that there isn't a sharp line between helper AIs and autonomous AIs. I think it's also important that autonomous won't necessarily outcompete helper AIs.
If we use DWIM as our alignment target, you could see a "helper AI" that's autonomous enough to "create a plan to solve cancer". The human just told it to do that, and will need to check the plan and ask the AI to actually carry it out if it seems safe.
If you only have a human in the loop at key points in big plans, there's no real competitive advantage for fully autonomous AGI.
“But what about comparative advantage?” you say. Well, I would point to the example of a not-particularly-bright 7-year-old child in today’s world. Not only would nobody hire that kid into their office or factory, but they would probably pay good money to keep him out, because he would only mess stuff up.
This is an extremely minor critique given that I'm responding to a footnote, so I hope it doesn't drown out more constructive responses, but I'm actually pretty skeptical that the reason why people don't hire children as workers is because the children would just mess everything up.
I think there are a number of economically valuable physical tasks that most 7-year-old children can perform without messing everything up. For example, one can imagine stocking shelves in stores, small cleaning jobs, and moving lightweight equipment. My thesis here is supported by fact that 7-year-olds were routinely employed to do labor in previous centuries:
In the 18th century, the arrival of a newborn to a rural family was viewed by the parents as a future beneficial laborer and an insurance policy for old age.4 At an age as young as 5, a child was expected to help with farm work and other household chores.5 The agrarian lifestyle common in America required large quantities of hard work, whether it was planting crops, feeding chickens, or mending fences.6 Large families with less work than children would often send children to another household that could employ them as a maid, servant, or plowboy.7 Most families simply could not afford the costs of raising a child from birth to adulthood without some compensating labor.
The reason why people don't hire children these days seems more a result of legal and social constraints than the structure of our economy. In modern times, child labor is seen as harmful or even abusive to the child. However, if these legal and social constraints were lifted, arguably most young children in the developed world could be earning wages well above the subsistence level of ~$3/day, making them more productive (in an economic sense) than the majority of workers in pre-modern times.
Thanks. I changed the wording to “moody 7-year-old” and “office or high-tech factory” which puts me on firmer ground I think. :)
I think there have been general increases in productivity across the economy associated with industrialization, automation, complex precise machines, and so on, and those things provide a separate reason (besides legal & social norms as you mentioned) that 7yos are far less employable today than in the 18th century. E.g. I can easily imagine a moody 7yo being net useful in a mom & pop artisanal candy shop, but it’s much harder to imagine a moody 7yo being net useful in a modern jelly bean factory.
I think your bringing up “$3/day” gives the wrong idea; I think we should focus on whether the sign is positive or negative. If the sign is positive at all, it’s probably >$3/day. The sign could be negative because they sometimes touch something they’re not supposed to touch, or mess up in other ways, or it could simply be that they bring in extra management overhead greater than their labor contribution. (We’ve all delegated projects where it would have been far less work to just do the project ourselves, right?) E.g. even if the cost to feed and maintain a horse were zero, I would still not expect to see horses being used in a modern construction project.
Anyway, I think I’m on firmer ground when talking about a post-AGI economy, in which case, literally anything that can be done by a human at all, can be automated.
I think the four scenarios outlined here roughly map to the areas 1, 6, 7, and 8 of the 60+ Possible Futures post.
It is a strange thing to me that there are people in the world who are actively trying to xenocide humanity, and this is often simply treated as "one of the options" or as an interesting political/values disagreement.
Of course, it is those things, especially "interesting", and these ideas ultimately aren't very popular. But it is still weird to me that the people who promote them e.g. get invited onto podcasts.
As an intuition pump: I suspect that if proponents of human replacement were to advocate for the extinction of a single demographic rather than all of humanity, they would not be granted a serious place in any relevant discussion. That is in spite of the fact that genocide is a much-less-bad thing than human extinction, by naive accounting.
I'm sure there are relatively simple psychological reasons for this discordance. I just wanted to bring it to salience.
There’s no sharp line between the helper AIs of Vision 1 and the truly-autonomous AIs of Vision 2.
This post seems like it doesn't quite cleave reality at the joints, from how I'm seeing things.
Vision 1 style models can be turned into Vision 2 autonomous models very easily. So, as you say, there's no sharp line there.
For me, Vision 3 shouldn't depend on biological neurons. I think it's more like 'brain-like AGI that is so brain-like that it is basically an accurate whole brain emulation, and thus you can trust it as much as you can trust a human (which isn't necessarily all that much)."
So again, no sharp line there from my point of view.
Since there are lots of different people in the world with different beliefs and goals, I expect that lots of variations with similarities to #1, #2, and #3 will be active in the world. So anyone who has a hope of just one of the visions coming true needs to include very strict worldwide governance enforcement as part of their vision.
I think my vision is some weird mashup of these. Like, I'm hoping for a powerful set of semi-aligned tool AI (type-1) to assist worldwide enforcement in stamping out dangerous type-2 rogue AI in the hands of bad actors, giving us a temporary safe window in which we can achieve either better alignment of type-1 or type-3 (Bio-enhancement and Whole Brain Emulation).
Vision 1 style models can be turned into Vision 2 autonomous models very easily
Sure, Vision 1 models can be turned into dangerous Vision 2 models, but they can’t be turned into good Vision 2 models that we want to have around, unless you solve the different set of problems associated with full-fledged Vision 2. For example, in the narrow value learning vs ambitious value learning dichotomy, “narrow” is sufficient for Vision 1 to go well, but you need “ambitious” for Vision 2 to go well. Right?
For me, Vision 3 shouldn't depend on biological neurons. I think it's more like 'brain-like AGI that is so brain-like that it is basically an accurate whole brain emulation, and thus you can trust it as much as you can trust a human (which isn't necessarily all that much)."
I think you’re more focused on “why do I trust the AI (insofar as I trust it)” (e.g. my “two paths” here), whereas in this post I’m ultimately focused on “what should I be working on (or funding, or whatever) and why”.
Thus, I think “System X does, or does not, involve actual squishy biological neurons” is not only a nice bright line, but it’s also a bright line with great practical importance for what research projects to work on, and what the eventual results will look like, and how the scenarios play out from there. I have lots of reasons for thinking that. E.g. super-ambitious moonshot BCI research is critical for “merging” but only slightly relevant for WBE; conversely measuring human brain connectomes is critical for WBE but only slightly relevant for “merging”. Another example: simbox testing is useful for WBEs but not “merging”. Also, a WBE would be an extraordinarily powerful system because it can be sped up 100-fold, duplicated, tweaked, and so on, in a way that any system involving actual squishy biological neurons basically can’t (I would argue). And that’s highly relevant to how it fits into longer-term scenarios.
Great post. Personally I think the "computational social choice" angle is unerexplored.
I think CSC can gradually morph itself into CEV and that's how we solve AI Goalcraft.
I think CSC can gradually morph itself into CEV and that's how we solve AI Goalcraft.
That sounds lovely if it’s true, but I think it’s a much more ambitious vision of CSC than people usually have in mind. In particular, CSC (as I understand it) usually takes people’s preferences as a given, so if somebody wants something they wouldn’t want upon reflection, and maybe they’re opposed to doing that reflection because their preferences were always more about signaling etc., well then that’s not really in the traditional domain of CSC, but CEV says we ought to sort that out (and I think I agree). More discussion in the last two paragraphs of this comment of mine.
This was a great read, thanks for writing!
Despite the unpopularity of my research on this forum, I think it's worth saying that I am also working towards Vision 2, with the caveat that autonomy in the real world (e.g. with a robotic body) or on the internet is not necessary: one could aim for an independent-thinker AI that can do what it thinks is best only by communicating via a chat interface. Depending on what this independent thinker says, different outcomes are possible, including the outcome in which most humans simply don't care about what this independent thinker advocates for, at least initially. This would be an instance of vision 2 with a slow and somewhat human-controlled, instead of rapid, pace of change.
Moreover, I don't know what views they have about autonomy as depicted in Vision 2, but it seems to me that also Shard Theory and some research bits by Beren Millidge are to some extent adjacent to the idea of AI which develops its own concept of something being best (and then acts towards it); or, at least, AI which is more human-like in its thinking. Please correct me if I'm wrong.
I hope you'll manage to make progress on brain-like AGI safety! It seems that various research agendas are heading towards the same kind of AI, just from different angles.
I disagree that "forever is really long time" in this context. To delay AI forever requires delaying it until industrial civilization collapse (from resource depletion or whatever other reason). That means 200-300 years, more likely that 50000.
I am in Vision 3 and 4, and indeed am a member of Pause.ai and have worked to inform technocrats, etc to help increase regulations on it.
My primary concern here is that biology remains substantial as the most important cruxes of value to me such as love, caring and family all are part and parcel of the biological body.
Transhumans who are still substantially biological, while they may drift in values substantially, will still likely hold those values as important. Digital constructions, having completely different evolutionary pressures and influences, will not.
I think I am among the majority of the planet here, though as you noted, likely an ignored majority.
love, caring and family all are part and parcel of the biological body
I’m not sure what you mean by this. Lifelong quadriplegics are perfectly capable of love, right? If you replaced the brain of a quadriplegic by a perfect ideal whole-brain-emulation of that same person’s brain, with similar (but now digital) input-output channels, it would still love, right?
completely different evolutionary pressures and influences
Yeah it depends on how you make the digital construction. I am very confident that it is possible to make a digital construction with nothing like human values. But I also think it’s possible (at least in principle) to make a digital construction that does have something like human values. Again, an perfect ideal whole-brain-emulation is a particularly straightforward case. A perfect emulation of my brain would have the same values as me, right?
Lifelong quadriplegics are perfectly capable of love, right?
As a living being in need of emotional comfort and who would die quite easily, it would be extremely useful to express love to motivate care and indeed excessively so. A digital construct of the same brain would have immediately different concerns, e.g. less need for love and caring, more to switch to a different body, etc.
Substrate matters massively. More on this below.
Again, an perfect ideal whole-brain-emulation is a particularly straightforward case. A perfect emulation of my brain would have the same values as me, right?
Nope! This is a very common and yet widespread error, which I suppose comes from the idea that the mind comes from the brain. But even casually, we can tell that this isn't true: would a copy of you, for example, still be recognizably you if put on a steady drip of cocaine? Or would it still be you if you were permanently ingesting alcohol? Both would result in a variation of you that is significantly different, despite otherwise identical brain. Your values would likely have shifted then, too. Your brain is identical - only the inputs to it have changed.
In essence, the mind is the entire body, e.g.
https://www.psychologytoday.com/us/blog/body-sense/202205/the-fiction-mind-body-separation
There is evidence that even organ transplants affect memory and mood.
The key here is that the self is always a dynamic construct of the environment and a multiplicity of factors. The "you" in a culture of cannibalism will likely have different values than a "you" in a culture of Shakers, to add to it.
The values of someone who is a digital construct who doesn't die and doesn't need to reproduce very much will be very different from a biological creature that needs emotional comfort, values trust in an enviromment of social deception, holds heroism in high regard due to the fragility of life, and needs to cooperate with other like minds.
Is it theoretically possible? If you replicate all biological conditions to a digital construct, perhaps but its fundamentally not intrinsic to the substrate, where digital substrate entails perfect copying via mechanical processes, while biology entails dynamic agentic cells in coordination and much more variability in process. Its like trying to use a hammer to be a screwdriver.
The concept of the holobiont goes much deeper into this and is a significant reason why I think any discussion of digital copying is the equivalent of a shadowy undead mockery than anything else, since it fails to account for the fundamental co-evolutions that build up an "organism."
https://en.m.wikipedia.org/wiki/Holobiont
In life, holobionts do change and alter, but its much more like evolutional extensions and molding by degree. Mechanism just tromps over it by fiat.
Nope! This is a very common and yet widespread error, which I suppose comes from the idea that the mind comes from the brain. But even casually, we can tell that this isn't true: would a copy of you, for example, still be recognizably you if put on a steady drip of cocaine? Or would it still be you if you were permanently ingesting alcohol? Both would result in a variation of you that is significantly different, despite otherwise identical brain. Your values would likely have shifted then, too. Your brain is identical - only the inputs to it have changed.
Cocaine and alcohol obviously affect brain functioning, right? That’s how they have the effects that they have. I am baffled that you could possibly see psychoactive drugs like those as evidence against the idea that the mind comes from the brain—from my perspective, it’s strong evidence for that idea.
From my perspective, you might as well have said: “There is a myth that torque comes from the car engine, but even casually, we can tell that this isn’t true: would an engine still produce the same torque if I toss it into the ocean? That would result in a torque that is significantly different, despite otherwise identical engine.”
(Note: If you respond, I’ll read what you write, but I’m not planning to carry on this conversation, sorry.)
Its not a myth, but an oversimplification which makes the original thesis much less useful. The mind, as we are care about, is a product and phenomenon of the entire environment it is in, as well as the values we can expect it to espouse.
It would indeed be akin to taking an engine, putting it in another environment like the ocean and expecting the similar phenomenon of torque to rise from it.
My primary concern here is that biology remains substantial as the most important cruxes of value to me such as love, caring and family all are part and parcel of the biological body.
I'm starting to think a big crux of my non-doominess probably rests on basically rejecting this premise, alongside a related premise that holds that value is complex and fragile, and the arguments for them being there being surprisingly weak, and the evidence in neuroscience is coming to the opposite conclusion, where values and capabilities are fairly intertwined, and the value generators are about as simple and general as we could have gotten, which makes me much less worried about several alignment problems like deceptive alignment.
the value generators are about as simple and general as we could have gotten
Would you say it's something like empowerment? Quoting Jacob:
Empowerment provides a succinct unifying explanation for much of the apparent complexity of human values: our drives for power, knowledge, self-actualization, social status/influence, curiosity and even fun[4] can all be derived as instrumental subgoals or manifestations of empowerment. Of course empowerment alone can not be the only value or organisms would never mate: sexual attraction is the principle deviation later in life (after sexual maturity), along with the related cooperative empathy/love/altruism mechanisms to align individuals with family and allies (forming loose hierarchical agents which empowerment also serves).
The key central lesson that modern neuroscience gifted machine learning is that the vast apparent complexity of the adult human brain, with all its myriad task specific circuitry, emerges naturally from simple architectures and optimization via simple universal learning algorithms over massive data. Much of the complexity of human values likewise emerges naturally from the simple universal principle of empowerment.
Empowerment-driven learning (including curiosity as an instrumental subgoal of empowerment) is the clear primary driver of human intelligence in particular, and explains the success of video games as empowerment superstimuli and fun more generally.
This is good news for alignment. Much of our values - although seemingly complex - derive from a few simple universal principles. Better yet, regardless of how our specific terminal values/goals vary, our instrumental goals simply converge to empowerment regardless. Of course instrumental convergence is also independently bad news, for it suggests we won't be able to distinguish altruistic and selfish AGI from their words and deeds alone. But for now, let's focus on that good news:
Safe AI does not need to learn a detailed accurate model of our values. It simply needs to empower us.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Tl;dr
When people work towards making a good future in regards to Transformative AI (TAI), what’s the vision of the future that they have in mind and are working towards?
I’ll propose four (caricatured) answers that different people seem to give:
For each of these four, I will go through:
I’ll interject a lot of my own opinions throughout, including a suggestion that, on the current margin, the community should be putting more direct effort into technical work towards contingency-planning for Vision 2.
Warning 1: Oversimplifications. This document is full of oversimplifications and caricatures. But hopefully it’s a useful starting point for certain purposes.
Warning 2: Jargon & Unexplained Assumptions. Lots of both; my target audience here is pretty familiar with the AGI safety and alignment literature, and buys into widely-shared assumptions within that literature. But DM me if something seems confusing or dubious, and I’ll try to fix it.
Vision 1: “Helper AIs”—AIs doing specifically what humans want them to do
1.1 Typical assumptions and ideas
By and large, people in this camp have an assumption that TAI will look, and act, and be trained, much like LLMs, but they’ll work better. They also typically have an assumption of slow takeoff, very high compute requirements for powerful AI, and relatively few big actors who are training and running AIs (but many more actors using AI through an API).
There are two common big-picture stories here:
1.2 Potential causes for concern
1.3 Who is thinking about this? And if this is your vision, what should you be working on?
Vision 2: “Autonomous AIs”—AIs out in the world, doing whatever they think is best
2.1 Typical assumptions and ideas
By and large, people in this camp have an assumption that TAI will be more in the category of humans, animals, and “RL agents” like AlphaStar. They often talk about AIs that think, figure things out, exhibit plan and foresight, come up with and autonomously implement clever out-of-the-box ways to solve their problems, etc. The AIs are generally assumed to do online learning (a.k.a. “continual learning”) as they figure out new things about the world, thus getting more and more competent over time without needing new human-provided training data, just as humans themselves do (individually and in groups). Also, a few people in this camp (not me) think that it’s very important in this story that the AI has a robotic body.[2]
As I mentioned in Vision 1 above, there’s no sharp line between the helper AIs of Vision 1 and the truly-autonomous AIs of Vision 2. For example, one can imagine a continuum from a ‘sycophantic servant AI’ that does whatever gets immediate approval from the human; to a ‘parent AI’ that may ask the human’s opinion, and care a lot about it, but also be willing to overrule that opinion in favor of (what it sees as) the human’s long-term best interest; to a ‘independent AI’ that could operate just fine without ever meeting a human in the first place. For clarity, I’ll focus discussion on a pretty extreme version of Vision 2.
In that case, an important conceptual distinction (as compared to Vision 1) is related to AI goals:
In Vision 1, there’s a pretty straightforward answer of what the AI is supposed to be trying to do—i.e., whatever the human supervisor had in mind, which can be inferred pretty well from some combination of general human data (from which the AI can get context, unspoken assumptions, etc.) and talking to the human in question (from which the AI can get details). The implementation side is by no means straightforward, but in Vision 1, you at least basically know what you’re hoping for.
By contrast, in Vision 2, it’s head-scratching to even say what the AI is supposed to be doing. We’re expecting the AIs to make lots of decisions where “do what the human wants” is not actionable—there might be no human around to ask, and/or not enough time to ask them, and/or the considerations might involve a lot of background knowledge or context that humans don’t know, and/or this may be a weird situation where humans would be very unsure (or even mistaken) about what they would want even if those humans did understand all the context and consequences. Recall, we’re generally expecting the AIs to go invent new science and technology, and build their own idiosyncratic concept-spaces, etc., and then, in this new world, which is out-of-distribution relative to all its prior experiences and human data, we generally expect the AIs to continue to make lots of high-context decisions on the fly without necessarily checking in with humans.
So that’s a problem. The paths I’ve heard of for tackling this problem seem to be:[3]
The most conceptually-straightforward version of (C) is to start with Whole Brain Emulation (WBE) of unusually decent and upstanding humans, then make it far more competent via speeding it up, tweaking it, adding more virtual cortical neurons, etc. After all, if it’s possible for humans to make decisions we’re happy about, directly or indirectly, then it’s possible in principle for WBEs of those humans to make those same good decisions too; and conversely, if it’s not possible for humans to make good decisions, directly or indirectly, then we’re screwed no matter what.
Another variation on (C) (my favorite!) involves “brain-like AGI” with (the better parts of) reverse-engineered human social instincts, more on which in 2.3 below.
2.2 Potential causes for concern
2.3 Who is thinking about this? And if this is your vision, what should you be working on?
I thinkEncultured AIis trying to do something related to that?Whoops, nope, they’ve pivoted.2.4 Hang on there Steve, this is your vision? This is what you actually want?
It’s important to distinguish “trying to make this vision happen” from “contingency-planning for this vision”. Taking them separately:
Vision 3: Supercharged biological human brains (via intelligence-enhancement or merging-with-AI)
3.1 Typical assumptions and ideas
3.2 Potential causes for concern
3.3 Who is thinking about this? And if this is your vision, what should you be working on?
Vision 4: Don’t build TAI
4.1 Typical assumptions and ideas
4.2 Potential causes for concern
4.3 Who is thinking about this? And if this is your vision, what should you be working on?
(Thanks Seth Herd, Linda Linsefors, Charlie Steiner, and Adam Marblestone for critical comments on earlier drafts.)
One of many challenges is that this kind of scenario planning leans on lots of technical questions about how future AI will work in detail, how competent it will be at different tasks, how much compute it will take to run (both at first, and in the longer term), and so on. It also leans on social questions, like how institutions and individual decision-makers will react in different (unprecedented) circumstances. And it also depends on various aspects of the “tech tree”, i.e. what inventions may be invented in the future. These are all really hard questions, so maybe it’s no surprise that reasonable people wind up with different opinions.
By the way, this is a prominent example of my more general rant that there has been insufficient progress and professionalization around thinking through strategies and scenarios of what might happen as we transition into TAI. Part of the problem is that it’s really inherently hard and complicated, with a million rabbit-holes and no empirical feedback; and part of the problem is that it sounds like “weird sci-fi stuff”, so academics generally won’t touch it (besides FHI, to their credit). I’m not really sure how to make this situation better though. (There are a bunch of long TAI-related technical reports from OpenPhil; I have my complaints, but I think that’s a good genre.)
I strongly expect that future powerful autonomous AIs will be able to use teleoperated robot bodies, with very little practice, just as humans can use teleoperated robot bodies with very little practice. I don’t think it’s very important that future AIs have robot bodies, in the human or animal sense. For example, lifelong-quadriplegic humans can be remarkably intelligent. More discussion of “embodiment” here.
One can imagine other related scenarios such as “make an AI that wants to set up a Long Reflection and cede power to whatever the result is”, or “make an AI that sets up and oversees an atomic communitarian thing”. But I think those aren’t an alternative to (A,B,C) in the text, but rather a broad strategy that we might hope the AIs with (A,B,C) type motivations will choose to pursue. After all, you can’t just wave a wand and get a Long Reflection; you need to make it happen, in the real world, including setting up appropriate institutions, rules of deliberation, etc., and that would involve the AI making lots of autonomous decisions, long before there is any Long Reflection outputs to defer to. So the AI still needs to have its own motivations that we’re happy about.
See e.g. Carl Shulman on the possible time-course of AI takeover.
“But what about comparative advantage?” you say. Well, I would point to the example of a moody 7-year-old child in today’s world. Not only would nobody hire that kid into their office or high-tech factory, but they would probably pay good money to keep him out, because he would only mess stuff up. And if the 7yo could legally found his own company, we would never expect it to get beyond a lemonade stand, given competition from dramatically more capable and experienced adults. So it will be, I claim, with all humans in a world of advanced autonomous AIs, if the humans survive.
I’m not an expert, but see here (including replies) for some references.
In this context, “working on Whole Brain Emulation (WBE)” would include both “making WBE happen” and “arguing about whether WBE is a good idea in the first place”. My own opinion is that WBE is quite unlikely to happen before AGI (and in particular, very unlikely to happen before having brain-like AGI that is not a WBE of a particular person); but if it did happen, it could be a very useful ingredient in a larger plan, with some care and effort. Others disagree with WBE being desirable in the first place; see e.g. here.