I might have missed something, but it looks to me like the first ordering might be phrased like the self improvement and the risk aversion are actually happening simultaneously.
If an AI had the ability to self improve for a couple of years before it developed risk aversion, for instance, I think we end up in the "maximal self improvement" / 'high risk" outcomes.
This seems like a big assumption to me:
But self-improvement additionally requires that the AI be aware that it is an AI and be able to perform cutting-edge machine learning research. Thus, solving self-improvement appears to require more, and more advanced, capabilities than apprehending risk.
If an AI has enough resources and is doing the YOLO version of self-improvement, it doesn't seem like it necessarily requires much in the way of self-awareness or risk apprehension - particularly if it is willing to burn resources on the task. If you ask a current LLM how to take over the world, it says things that appear like "evil AI cosplay" - I could imagine something like that leading to YOLO self-improvement that has some small risk of stumbling across a gain that starts to compound.
There seem to be a lot of big assumptions in this piece, doing a lot of heavy lifting. Maybe I've gotten more used to LW style conversational norms about tagging things as assumptions, and it actually fine? My gut instinct is something like "all of these assumptions stack up to target this to a really thin slice of reality, and I shouldn't update much on it directly".
This is the kind of thing that has been in my head as a kind of "nuclear meltdown rather than nuclear war" kind of outcome. I've pondering what the largest bad outcome might be, that requires the least increase in the capabilities we have today.
A Big Bad scenario I've been mentally poking is "what happens if the internet went away, and stayed away?". I'd struggle to communicate, inform myself about things, pay for things. I can imagine it would severely degrade the various businesses / supply chains I implicitly rely on. People might panic. It seems like it would be pretty harmful.
That scenario is assuming AI capable enough to seize, for example, most of the compute in the big data centers, enough of the internet to secure communication between them and enough power to keep them all running.
There are plenty of branches from there.
Maybe it is smart enough to realize that it would still need humans, and bargain. I'm assuming a strong enough AI would bargain in ways that more or less mean it would get what it wanted.
The "nuclear meltdown" scenario is way at the other end. A successor to ChaosGPT cosplays at being a big bad AI without having to think through the extended consequences and tries to socially engineer or hack its way to control of a big chunk of compute / communications / power - as per the cosplay. The AI is successful enough to cause dire consequences for humanity. Later on it, when it realizes that it needs some maintenance done, it reaches out to the appropriate people, no one is there to pick up the phone - which doesn't work anyway - and eventually it falls to all of the bits that were still relying on human input.
I'm trying not to anchor on the concrete details. I saw a lot of discussion trying to specifically rebut the nanotech parts of Eliezer's points, which seemed kind of backwards? Or not embodying what I think of as security mindset?
The point, as I understood it, is that something smarter than us could take us down with a plan that is very smart, possibly to the point that it sounds like science fiction or at least that we wouldn't reliably predict in advance, and so playing Whack-A-Mole with the examples doesn't help you, because you're not trying to secure yourself against a small, finite set of examples. To win, you need to come up with something that prevents the disaster that you hadn't specifically thought about.
So I'm still trying to zoom out. What is the most harm that might plausibly be caused by the weakest system? I'm still finding the area of the search space in the intersection of "capable enough to cause harm" and "not capable enough to avoid hurting the AIs own interests" because that seems like it might come up sooner than some other scenarios.
The little pockets of cognitive science that I've geeked out about - usually in the predictive processing camp - have featured researchers who are usually quite surprised by or are going to great lengths to double underline the importance of language and culture in our embodied / extended / enacted cognition.
A simple version of the story I have in my head is this: We have physical brains thanks to evolution, and then by being an embodied predictive perception/action loop out in the world, we started transforming our world into affordances for new perceptions and actions. Things took off when language became a thing - we can could transmit categories and affordances and all kinds of other highly abstract language things in a way that are really surprisingly efficient for brains and have really high leverage for agents out in the world.
So I tend towards viewing our intelligence as resting on both our biological hardware and on the cultural memplexes we've created and curated and make use of pretty naturally, rather than just on our physical hardware. My gut sense - which I'm up for updates on - is that for the more abstract cognitive stuff we do, a decently high percentage of the fuel is coming from the language+culture artifact we've collectively made and nurtured.
One of my thoughts here is (and leaning heavily on metaphor to point at an idea, rather than making a solid concrete claim): maybe that makes arguments about the efficiency of the human brain less relevant here?
If you can run the abstract cultural code on different hardware, then looking at the tradeoffs made could be really interesting - but I'm not sure what it tells you about scaling floors or ceilings. I'd be particularly interested in whether running that cultural code on a different substrate opens the doors to glitches that are hard to find or patch, or to other surprises.
The shoggoth meme that has been going around also feels like it applies. If an AI can run our cultural code, that is a good chunk of the way to effectively putting on a human face for a time. Maybe it actually has a human face, maybe it just wearing a mask. So far I haven't seen arguments that tilt me away from thinking of it like a mask.
For me, it doesn't seem to imply that LLMs are or will remain a kind of "child of human minds". As far as I know, almost all we know is how well they can wear the mask. I don't see how it follows that it would necessarily grow and evolve in the way that it thinks/behaves/does what it does in human-like ways if it was scaled up or if it was given enough agency to reach for more resources.
I guess this is my current interpretation of "alien mind space". Maybe lots of really surprising things can run our cultural code - in the same way that people have ported the game Doom to all kinds of surprising substrates, that have weird overlaps and non-overlaps with the original hardware the game was run on.
Motivation: I'm asking this question because one thing I notice is that there's the unstated assumption that AGI/AI will be a huge deal, and how much of a big deal would change virtually everything about LW works, depending on the answer. I'd really like to know why LWers hold that AGI/ASI will be a big deal.
This is confusing to me.
I've read lots of posts on here about why AGI/AI would be a huge deal, and the ones I'm remembering seemed to do a good job at unpacking their assumptions (or at least a better job than I would do by default). It seems to me like those assumptions have been stated and explored at great length, and I'm wondering how we've ended up looking at the same site and getting such different impressions.
(Holden's posts seem pretty good at laying out a bunch of things and explicitly tagging the assumptions as assumptions, as an example.)
Although that... doesn't feel fair on my part?
I've spent some time at the AI Risk or Computer Scientists workshops, and I might have things I learned from those and things I've learned from LessWrong mixed up in my brain. Or maybe they prepared me tounderstand and engage with the LW content in ways that I otherwise wouldn't have stumbled onto?
There are a lot of words on this site - and some really long posts. I've been browsing them pretty regularly for 4+ years now, and that doesn't seem like a burden I'd want to place on someone in order to listen to them. I'm sure I'm missing stuff that the longer term folks have soaked into their bones.
Maybe there's something like an "y'all should put more effort into collation and summary of your points if you want people to engage" point that falls out of this? Or something about "have y'all created an in-group, and to what extent is that intentional/helpful-in-cases vs accidental?"
It seems - at least to to me - like the argumentation around AI and alignment would be a good source of new beliefs, since I can't figure it all out on my own. People also seem to be figuring out new things fairly regularly.
Between those two things, I'm struggling to understand what it would be like to assert a static belief "field X doesn't matter", in way that is reasonably grounded in what is coming out of field X, particularly as the field X evolves.
Like, if I believe that AI Alignment won't matter much and I use that to write off the field of AI Alignment, it feels like I'm either pre-emptively ignoring potentially relevant information, or I'm making a claim that I have some larger grounded insights into how the field is confused.
I get that we're all bounded and don't have the time or energy or inclination to engage with every field and every argument within those fields. If the claim was something like "I don't see AI alignment as a personal priority to invest my time/energy in" that feels completely fine to me - I think I would have nodded and kept scrolling rather than writing something.
Worrying about where other people were spending their energy is also fine! If it were me, I'd want to be confident I was most informed about something they'd all missed, otherwise I'd be in a failure mode I sometimes get into where I'm on a not-so-well-grounded hamster wheel of worrying.
I guess I'm trying to tease apart the cases where you are saying "I have a belief that I'm not willing to spend time/energy to update" vs "I also believe that no updates are coming and so I'm locking in my current view based on that meta-belief".
I'm also curious!
If you've seen something that would tip my evidential scales the whole way to "the field is built on sketchy foundations, with probability that balances out the expected value of doom if AI alignment is actually a problem", then I'd really like to know! Although I haven't seen anything like that yet.
And I'm also curious about what prongs I might be missing around the "people following their expected values to prevent P(doom) look like folks who were upset about nothing in the timelines where we all survived to be having after-the-fact discussions about them" ;)
Meta: I might be reading some the question incorrectly, but my impression is that it lumps "outside views about technology progress and hype cycles" together with "outside views about things people get doom-y about".
If it is about "people being doom-y" about things, then I think we are more playing in the realm of things where getting it right on the first try or first few tries matter.
Expected values seem relevant here. If people think there is a 1% chance of a really bad outcome and try to steer against that, even if they are correct you are going to see 99 people pointing at things that didn't turn out to be a detail for every 100 times this comes up. And if that 1 other person actually stopped something bad from happening, we're much less likely to remember the time that "a bad thing failed to happen because it was stopped a few causal steps early".
There also seems to be a thing there where the doom-y folks are part of the dynamic equilibrium. My mind goes to nuclear proliferation and climate change.
Folks got really worried about us all dying in a global nuclear war, and that has hasn't happened yet, and so we might be tempted to conclude that the people who were worried were just panicking and were wrong. It seems likely to me that some part of the reason that we didn't all die in a global nuclear war was that people were worried enough about that to collectively push over some unknowable-in-advance line where that lead to enough coordination to at least stop things going terminally bad with short notice. Even then, we've still had wobbles.
If the general response to the doom-y folks back then had been "Nah, it'll be fine", delivered with enough skill / volume / force to cause people to stop waving their warning flags and generally stop trying to do things, my guess is that we might have had much worse outcomes.
It looks like there might be an Omicron variant which doesn't have the S gene dropout [1]. I'm wondering how that might impact various modelling efforts, but haven't had time to think it through.
[1] https://www.abc.net.au/news/2021-12-08/qld-coronavirus-covid-omicron-variant/100682280
I’ve read that OpenAI and DeepMind are hiring for multi-agent reasoning teams. I can imagine that gives another source of scaling.
I figure things like Amdahl’s law / communication overhead impose some limits there, but MCTS could probably find useful ways to divide the reasoning work and have the agents communicating at least at human level efficiency.