Review

Nate and I discuss the recent increase in public and political attention on AGI ruin. We try and figure out where, if anywhere, we feel extra hope as a result.

I think it's plausible that this is enough of a shake-up that the world starts relating to AGI in a pretty different way that might help a lot with AI risk going better. Nate is sceptical that a shake-up causes something that is this far from working to start working.

Relatedly, I also think that the extra time might be enough for a bunch of other game-board flipping plans (like cognitively enhancing humans or human uploads) to make significant progress. While Nate agrees that's relatively hopeful, he's pessimistic that humanity will take much advantage of longer timelines. For example, he's pessimistic that this will cause that many additional people to work on these things.

We eventually decided to keep parts of this conversation private. In one or two places that had cuts, the conversational flow is less smooth.

Oliver's & Nate's reactions to public attention on AGI

habryka

I guess my current feeling here is something like: "I do feel like the world is grappling with AGI a bunch more (though not necessarily better) than I was expecting". Like, I didn't expect AGI risk to be kind of a major topic in a presidential debate, and now think it's somewhat likely to happen.

I have really huge amounts of uncertainty on how likely it is for that attention to convert into good things.

I personally feel a pretty jaded or some kind of cynical feeling that's making me pay a lot of attention to "OK, but is anyone actually committing to anything real here?" and "is there a chance for this whole thing to reveal itself as smoke and mirrors?". Like, I remember a lot of excitement around the time of Elon Musk starting to pay a lot of attention to AGI Risk stuff, and I think that foray into things turned out to be a bad surprise instead of a good one. And I feel similar about the whole FTX situation.

When I look at the most recent government AI moves, I definitely notice similar hesitations. Like, a lot of measurement institutions are being set up, a lot of things are getting labeled with "AI Safety" or "AI Risk", but nothing actually concrete has happened. I don't expect the labs to scale slower or really change much about what they are doing in the next year. Though I do think there are ways for the things proposed to translate into them doing things quite differently 2-3 years from now.

habryka

I also am having a lot of feelings related to Simulacra levels stuff. In some sense, I feel like my ability to concretely point to what the AI Alignment problem is feels like it's gotten better in the past 6 months, with the Overton window shifting and so on. 

But the prior sure is that as governments and prestigious academic institutions and lots of other players get involved, my ability to communicate about which things are really about a thing, vs. which things are just trying to be associated with a thing, or are actually trying to destroy the ability for people to talk about the thing in strategic ways, gets a lot worse.

And I feel a pretty intense fear around that when I watch the UK AI Safety Summit videos and tweets. In-particular, there was a huge amount of "consensus building" or something which I felt pretty worried about, where the comms felt like it involved people trying to form a coalition with a party line or something where disagreeing with it or poking it would "break the peace", and I noticed I wanted, for my own sanity, to be pretty far away from that (though have uncertainty about the degree to which that's at least at the current margin just a necessary step for humanity to coordinate on anything). 

So8res

not-necessarily-comprehensible attempts to name some of my own unordered thoughts here:

  • this isn't what it looks like when people take things actually-seriously ("here's when we'll pause" → "here's what we'd consider to justify scaling" etc.)
  • maybe taking it unseriously is a step on the path towards taking it seriously though (uranium commitee → manhattan project etc.)
  • +1 to the analogy to lots of people being excited about elon getting involved back in ~2015, could chew on my own experiences from being around then
  • +1 to the "lots of things are getting labeled "AI safety" stuff; could lament about the death of the word "alignment"

i guess my own sense of tension or uncertainty here is mostly around something like "is this a step on the path towards earth waking up" vs "is this just earth derping but now with more of our vocabulary on its lips".

...i also have some thoughts about "the lines people hope are clear keep being blurry" that feel related, which feels relevant to the "is this a step on the path to taking things seriously", but that's maybe a tangent.

(posting this w/out reading your post above, as they were written simultaneously; will go read that momentarily.)

So8res

to your later paragraphs: i also have some worry about people building narratives around their political alliances and commiting themselves to plans that just don't work for confronting the problem at hand.

habryka

So, I think my reaction to "the path towards Earth waking up" is kind of grounded in a thing that Scott Alexander said the other day: 

This is, of course, nonsense. We designed our society for excellence at strangling innovation. Now we’ve encountered a problem that can only be solved by a plucky coalition of obstructionists, overactive regulators, anti-tech zealots, socialists, and people who hate everything new on general principle. It’s like one of those movies where Shaq stumbles into a situation where you can only save the world by playing basketball. Denying 21st century American society the chance to fulfill its telos would be more than an existential risk - it would be a travesty.

Like, it's not clear to me how much humanity needs to "wake up" to prevent people from building dangerous AGI. The semiconductor supply chain is a massive outlier in the degree to which it seems like it's continuously getting better and cheaper. Lots of things are getting harder and more expensive. Maybe all we need to do is to just get humanity to do the same thing with AI and semiconductor manufacturing as it has with many (if not most) other forms of industrial and intellectual progress, and then we have a bunch of time on our hands.

And I am not like totally confident what I would do with that time. I do think probably making smarter humans is one of the big things, though also suffers from the fate that I am talking about above. But in a fair battlefield of both facing substantial regulatory hurdles, making intelligence enhanced humans might actually win out in terms of timelines, especially since I think it might end up being feasible to do unilaterally before you can deploy dangerous AI systems (assuming a similarly harsh crackdown on AGI as we've had on genetic engineering). 

So8res

yeah i don't think i'd be shocked by getting some sort of pause or bought-time by way of accidental regulatory burden.

i'm not terribly hopeful about it, b/c my read is that it takes decades (and sometimes a disaster or two) for a regulatory body like the FDA or whatever-the-nuclear-regulators-are-called to amp up from "requires paperwork" to "effectively illegal", but i could sorta see it.

and yeah my main issue from there is that i don't see it turning into success, without some other plan. which you already touch upon. so, i'm not sure we have all that much disagreement here.

So8res

though i do still have some sense of... trouble visualizing earth actually turning away from the de-novo AI route at its current intelligence level, and aiming for uploads or cognitive augmentation or whatever instead, without some sort of mood-shift (maybe in the public, maybe among the decisionmaking elite, i dunno)?

it feels outside-possible that it somehow happens due to some people being driven to push that way while the rest of the world gets slowed down by half-intentional poorly-aimed but wontonly-destructive regulatory burdens, but i mostly don't expect things to play out that nicely.

it feels almost like wishing that biolabs get shut down by the FDA because they're pharmeceutical-adjacent or something. like it's funny to say "earth is very good at regulating things into oblivion" but in fact earth does that for drugs and not gain-of-function research; it's a derpy place.

habryka

Yeah, OK, the gain-of-function research point is kind of interesting. 

I guess my first reaction is: Surely gain-of-function must be slowed down still a quite large amount. Like, I don't know of any massively large organizations doing tons of GoF research. It's true it doesn't get clamped down more than the other stuff, which you would expect a competent earth to totally do, but I do expect it is still probably being slowed down a lot.

So8res

"AI is slowed down somewhat by paperwork requirements" does feel pretty plausible to me, fwiw.

So8res

(tho maybe not in absolute terms; there's the falling cost of compute making it easier for new players to enter, as a force opposing the rising cost of paperwork, etc.)

habryka

for a regulatory body like the FDA or whatever-the-nuclear-regulators-are-called to amp up from "requires paperwork" to "effectively illegal", but i could sorta see it.

I... feel like I agree with this in the outside view, but also, my inside-view does really make me think that the current policy movements do not look like the kind of thing that then requires decades to translate into "effectively illegal". 

habryka

Like, somehow this doesn't match how I expect FDA stuff to have happened, or something. It does feel more like if things happen, I expect them to happen reasonably soon, though, to be clear. I do really don't have much robustness in my models here and am really just doing some kind of very sketchy blindsight.

So8res

maybe? it seems plausible to me that we get "you need paperwork to use large amounts of compute" which then gets completely folded by the next transformers-esque insight that just wrecks the scaling curve, and then your laws are just not actually slowing newcomers down.

habryka

So, I agree that that's a substantial chunk of the probability mass, but I do also think that in many worlds the capability evals-based stuff does also catch the stuff after the next insight.

So8res

tbc, it's quite plausible to me that current regs lead to a "your corporation must be yae tall to ride" effect, that winds up slowing lots of things down. i'm mostly uncertain, and I'm mostly not expecting it to do anything (because there aren't processes running now, that i know of, that need just an extra 10y to complete and then we're fine), at least absent some sort of wider "people waking up to the reality of the problem" thingy.

So8res

(i agree that regs based more on capability evals than on compute thresholds are somewhat more robust.)

habryka

So, I feel like there is a thing that I feel in this space that somehow feels a bit weak in the context of this conversation, but like I do feel like I have some intuition that's something like "look, I really want to give the rest of humanity a shot at trying to solve this problem". 

And maybe that's super wishful thinking, but I have a sense that me and my social group have been trying to do things here, and we haven't been super successful, but now the problem is a lot more obvious, and if I buy an additional 10 years time, I do feel more hope than at any previous point in time that someone will surprise me with their ingenuity, and find traction here that I failed to find, and I feel like one of the key things I can do here is to just buy time for those people to also take their shot at the problem.

So8res

oh yeah i think it's well worth buying an extra 10y in hopes that someone can do something with them

So8res

(i don't have a ton of hope in it myself b/c it doesn't exactly look to me like there's people running around doing stuff that looks more likely to work than the stuff i've watched fail for the last 10 years; i've lived ten years of people trying and i've looked at a whole lot of proposals that look obviously-doomed-to-me, but sure, 10 years is better than 0!)

So8res

i think i do have some sense of "there are people running around thinking that i should be much more optimistic than i was last year, and maybe that's a good disagreement to splay out for people" but plausibly this isn't the place for that.

habryka

Hmm, to be clear, I do feel a lot more optimistic than I was 9 months ago (exactly one year ago I was more optimistic than now, but that's a bit FTX related).

My updates here are on a log scale, so it's not like huge swaths of probability that was moved, but yeah, I do feel like I see 10 years on the table that I didn't really see before

So8res

yeah, sure, my state here is something like: most of my hope has been coming from "maybe earth just isn't as it seems to me in some deep way such that all my errors are correlated", and that's still true, and then within the hypothesis where i basically understand what's going on, we've moved up a notch on the logistic success curve, increasing our odds by, i dunno, a factor of two, from ~0% to ~0%, on account of the fact that the governments are giving more lip-service to the issue at this stage than i was advance-predicting.

habryka

OK, I do think that I have more probability mass on there being a bunch of game-board flipping moves that feel enabled with a regulatory approach that slows things down, though I will admit that my plans here are sketchily devoid of detail

So8res

that does sound kinda interesting to explore (perhaps later). like, maybe there's some argument here of "is oli making an error in having lots of hope tied up in sketchy detailless gameboard-flip plans, or is nate making an error in being overly cynical in anything he can't forsee in detail?". not entirely sure that that'd be a productive convo, but might be fun to explore it a bit, i do feel like i... haven't really poked at this from certain angles.

habryka

Yeah, I am into it.

(At this point, the dialogue was paused, but resumed a few days later.)


Would 10 Years help?

habryka

I feel most excited about the "ok, but are there any game-board flipping plans that you can get with 10 years of additional time, and also how do you think about this stuff?" thread

Especially at the intersection of having access to like decently strong AI, without something that kills you

So8res

so a big part of my sense here is something like: i've been here 10y, and have seen what 10y can do, and i'm not terribly impressed

i think earth can do a lot more in 10y if it's really trying, but... well, i think we can do a lot more with 10y and a big ol' global mood shift, than with just 10y

(also, whenever someone is like "and imagine we have decently strong AI", i basically brace myself for the shell game where it's weak enough to not kill them on monday, and strong enough to help them on tuesday, and pay no heed to the fact that the things they ask on tuesday are post-lethal. but i'll note that and avoid complaining about it until and unless i believe myself to be observing it :-p)

habryka

(Yeah, I do think that is totally one of the things often going on in that conversation. I'll try to avoid it.)

So8res

(part of which is to say that i'm basically ignoring the "decently-strong AI" bit for now)

habryka

So, maybe let's be a bit more clear about timelines, since maybe that's where all of our differences lie? I feel like without a ton of regulatory intervention my median is currently something in the 10-year range, and still has like 25% on 20 years. And then if you add +10 on top of that via good coordination, I feel like maybe we have enough time to get into a safer situation.

So8res

well, i haven't yet been in this field for 20y and so that's a bit more time... though i think i'm mostly rating it as more time in which a big ol' mood-shift can happen. which is maybe what you're alluding to by "good coordination"?

i sure think it's much easier to do all the work that could in principle save us once the whole planet is more-or-less on board; i'm just not expecting that to happen until there's no time left.

habryka

Well, as I mentioned in last part of the conversation, I think there are a bunch of years on the table that look more like "humanity just drags semiconductor manufacturing and AI into the same molasses as it has a ton of other things", which doesn't really require a big ol' mood shift, I think?

So8res

sure, and then the next 10y go like the last 10, and nothing really happens alignment-wise and we die

...do you think there are ongoing alignment projects that, at roughly current levels of investment, succeed? or are you expecting a bunch more investment and for that to somehow go towards stuff that can succeed inside a decade? or...?

habryka

Even if humanity isn't like, having a huge mood shift, I do still expect the next 10 years to have a lot more people working on stuff that actually helps than the previous 10 years. 

I mean, idk, I also have a lot of cynicism here, but I feel like the case for AI risk is just so much easier now, and it's obviously getting traction in ways it hasn't, and so I feel hesitant to extrapolate out from the last 10 years.

So8res

feels a bit to me like "too little too late" / if you're only able to pick up these people in this way at this time then all you get is interpretability and evals and shallow arguments from superficial properties. or, that's not literally all you get, but... it's telling that you and i are here eyeballing alternative tech trees, and the people out there starting to wake up to AI alignment issues are nowhere near that place".

like, can you get a whole bunch of new people showing up at NIST to workshop evals ideas that are running well behind stuff that ARC thought up in their first afternoon (by dint of being staffed by people who'd been chewing on these issues for a long time)? sure, but...

...to get the sort of insights that change the strategic situation, it looks to me like we need either a lot more time or a pretty different cultural outlook on the situation ("a big ol' mood shift").

(At this point, the conversation went down a track that we didn't publish. It resumes on a new thread below.)

So8res

maybe there's a claim that i should be much more excited about things like the recent executive order, b/c they indicate that we might herp and derp our way into having an extra decade or two relative to what i was perhaps previously imagining when i imagined the gov'ts not to notice at all?

So8res

i... am not entirely sure that that'd be fair to past-me, past-me wasn't like "governments will pay the issue literally zero lip-service", past-me was like "within my space of possibilities are cases where the gov't pays no attention, and cases where it's like the uranium committee but not the manhattan project, and cases where the gov'ts launch into full-on races, and ...".

So8res

speaking in terms of historical analogs, here, it does seem to me like we are decidedly not in the world where gov'ts are taking this stuff seriously (eg, sunak being like "and in the most extreme and improbable cases" or w/e in his opening speech, and foregoing any talk of the real issues in his closing speech, and nobody putting any real constraints on labs yet, and things seeming on-track for relatively superficial interventions), and the hope is mostly coming from "sometimes the uranium committee [incompetent, bumbling, time-wasting] turns into the manhattan project [efficiently doing what it set out to do]"

with ofc caveats that if earth govt's do as well at listening to their scientists as the usg did in the manhattan project, ~all value gets lost, etc.

...and maybe i should stop guessing at your arguments and arguing against ppl who aren't you :-p

habryka

Ok, I feel like my reactions here are: 

  • Yeah, I do think the extra decade on the table here is huge, and I really want to make the best of it
  • I agree with you (from earlier) that some big global mood shift sure would help a lot, and I don't assign much probability to that, but enough to make me very curious about it
  • I agree that current governments clearly aren't taking it seriously, but also what we've seen so far is compatible with them taking it actually seriously soon, and a lot more compatible with that than what I expected to see
So8res

agreed on "what we've seen so far is compatible with them taking it actually seriously soon". i'm not at this juncture super expecting that, at least not to my standards (i think many people would say "look how seriously gov'ts are taking this!" today, and i reserve the right to think i see multiple levels more seriously that things need to get taken even if the apparent seriousness increases from here), but i'd agree that the gov't response over the past year is taking the sort of first steps that sometimes actually develop into more steps.

(they ofc also sometimes historically develop into a bunch of derpery; i haven't really tried to nail down my probabilities here but i'd be like, a little but not terribly surprised if gov'ts started taking things nate!seriously in the next 3y)

(also agreed that it's worth trying to take advantage of a spare decade, though i note that i'm much less excited about +1 decade compared to +2-5 b/c i think that the things that actually seem to me to have a chance at this point take more like 3-5 decades. with also a discount for "last decade sure hasn't turned up much and i kinda worry that this is hopium" etc.)

habryka

Ok, I think I do just want to go into the "but man, a world with near-AGI does seem like a lot more game-board-flippable than otherwise" direction. I agree that it's fraught because there is a frequent bait-and-switch here, but I also think it's a huge component of where my attention is going, and also where I feel like it's really hard to be concrete in the ways that I feel like your epistemology demands, but I still want to try.

So8res

i'm down for some rounds of "under which cup does nate think the shell went"

habryka

Lol, sure :P

Gameboard-flipping opportunities

(At this point habryka started typing 5-6 times but kept deleting what he wrote.)

habryka

Ok, I notice myself seizing up because I expect you to have some standard of concreteness that's hard for me to meet when it comes to predicting what I will want to do if I am better and smarter and have access to systems that make me better and smarter. 

So I'll say some stuff, and I agree that they'll sound kind of dumb to you, but I currently claim that them sounding dumb isn't actually that correlated to them being false, but idk, we'll see.

So8res

(where "you are better and smarter" b/c we're imagining someone else steering? b/c we're imagining successful cognitive augmentation? b/c we're imagining uploads running the ship?)

hooray for saying stuff that sounds kind of dumb to me

another place i expect to be annoying: vague "systems that make me better and smarter" cause me to raise skeptical eyebrows; paper makes me better and smarter and FAIs make be better and smarter and you can kinda fit quite a lot of things into the technological gaps between "paper" and "FAI"

(i don't mean to dissuade you from naming your nate!dumb-sounding proposals, just to note that i expect also to ask concreteness of what's making you better/smarter how, rather than taking "ambient better-smarterness" as background)

habryka

I mean, I do guess I kind of feel that for those technologies as well? Like, if you asked me to be super concrete about how the internet or programming ability or Google Docs enables me to do different things to achieve my goals, I would have a really hard time.

Let me just brainstorm some game-board flipping things: 

  • Make human uploads go faster. Maybe do it via something closer to neural activity distillation instead of literal neural simulation, which seems more in-reach.
  • Use AI to coordinate not making more AI. 
    • This actually currently feels like one of the best things for me to try, though man do I not really understand the dynamics that enable coordination here.
  • A bunch of other cognitive enhancement type things. Maybe genetic, maybe somatic.
So8res

(the first has long been on my list of stuff worth trying; the second i don't even really understand yet: are you trying to... persuade politicians? build lie-detectors that can be used to help labs trust each other? build monitoring devices that notice treaty-violating AI projects?)

(the reason i don't have much hope in the first is that as far as i've been able to tell, the problems divide into "the AI speedups aren't speeding up the places where most of the time is spent" or "the requisite capabilities are past the window where you're likely dead". but predicting which capabilities come in which order, or come bundled with what, is hard! this uncertainty is one reason it's been on my list a while)

habryka

Some ideas for the coordination things (very brainstormy):  

  • Use sandboxed AI systems to coordinate slowdown between different labs (age of EM style of spinning up EM copies to negotiate a contract with confidential information, then terminating)
  • Making it so that politicians and other stakeholders aren't terribly misinformed about AI and its consequences. Genuinely making it easier to identify people who are right about things vs. wrong about things.
  • General centralization of power as a result of a small number of players having access to frontier systems, and then that makes the coordination problem easier.
  • Better aggregation of population-wide preferences in a somewhat legible way making policy responses generally faster (this can be both good and bad, but seems like it changes the game a bunch)
  • I don't know, I guess just like billions of virtual people and AI girlfriends and boyfriends with some ideological biases resulting in some crazy stuff that feels like it has a decent chance of resulting in a very different coordination landscape, and my guess is overall makes things easier to coordinate (though again high-variance). But like, if a few players control the biases/prompting/disposition of these systems, that sure enables a very different level of global coordination.
So8res

i... have gained no sparks of hope by reading that, and am not sure if i was supposed to, and am not sure if you like want critique on any of those or something

my default is to be like "i agree with a general sense that shit is likely to get weird, but don't particularly see hope there"

Is AGI recklessness fragile or unusual?

habryka

Ok, I guess I feel like the current situation where humanity is just kind of racing towards AGI is weird. Like, the risks don't seem that hard to understand for individuals, the benefits of slowing down seem pretty obvious. I can understand mechanistically why we are doing it right now, given all the incentives involved, but I don't see a law that a large number of hypothetical civilizations that are structured very different would do the same thing. And I think a lot of the "shit goes crazy" worlds feel like a substantial reroll on that dimension, and on-priors I expect AI stuff to move us into a saner reroll instead of an insane reroll.

So8res

(example rejoinders: we already tried the "there are few culturally-similar people with aligned real incentives in the same room to coordinate", in puerto rico in '15, and that doesn't work; people could already be making polite-but-factual chatbots as hard as they can to flood twitter with sanity, kindness, and correctness and in fact they kinda aren't and probably won't start; ... i dunno it mostly just parses to me as wishful rather than real /shrug)

i... think the risks kinda are that hard to understand for individuals. like, cf elon musk and "we'll make it care about truth and it'll keep us around b/c we're good sources of truth", given how involved he's been for how long, and his general capacity to understand things. i don't like this fact about earth, but i feel like it's been kinda beaten into me at this point that people don't find this stuff intuitive, they kinda do need a bunch of remedial metaethics and other skills to even get to the point where it seems obvious.

habryka

I think I dispute the Elon Musk datapoint. I don't know, he just kind of says random things he clearly doesn't believe all the time, and I don't really know how to read that (and yeah, people saying random things they clearly don't believe is a problem for global coordination, but I currently think it doesn't actually affect people's behavior that much)

So8res

i'm pretty sure he's bought into the "we'll make it value truth, and this'll go fine (e.g. b/c minds that pursue truth are pursuing virtuous higher ideals and are likely virtuous minds, and b/c humans are a source of truths that it would then prefer not to destroy)" thing.

habryka

i'm pretty sure he's bought into the "we'll make it value truth, and this'll go fine (e.g. b/c minds that pursue truth are pursuing virtuous higher ideals and are likely virtuous minds, and b/c humans are a source of truths that it would then prefer not to destroy)" thing.

In as much as we can find an operationalization I would take a bet against that, though does seem hard. I guess I'll predict he'll do a bunch of stuff that's pretty obviously in-conflict with that and is much more explained by simple local financial and status incentives, and that he'll have dropped it as a thing within 3-4 years.

So8res

it sure seems plausible to me that he drops it as a thing within 3-4y tbc, that doesn't contradict my claim about the AI arguments being hard to get afaict

So8res

i buy that we might be headed for a reroll but... i guess i'm reminded of the time when, in 2012 or so, i was like "wow there sure is a lot of pressure in the american electorate; it doesn't seem to me that the 2012 party-lines can hold, this is kinda exciting b/c maybe what'll come from all this electoral pressure is a realignment of the political parties where the left goes neoliberal and the right goes libertarian, and then maybe we'd have two parties i could stomach",

and... well the part i was wrong about wasn't that there was electoral pressure

habryka

Yep, reality often succeeds at surprising you in how badly things go. I am not here trying to argue you should have a p(doom) of 50% or whatever. But I am here being like "idk, these things seem exciting and like things to invest in, and I think 10 years matters, and yeah, we are at the bottom of some logistic success curve, but I don't think you get much better shots at digging yourself out of that hole than this kind of stuff"

So8res

my point is not so much "reality often surprises you in how badly things go" as "if things are not almost right, then perturbing them a bunch predictably does not perturb them into a right configuration"

So8res

like, i'd be a lot more sympathetic to "with a couple solid knocks, these people will snap into the configuration of doing the things that i think are sensible and right" if they were almost there

COVID didn't shock the world into doing right things about pandemics; this isn't just because the world sometimes negatively surprises you, it's b/c it wasn't close to begin with. like there are some worlds that come out of covid deciding to build vaccine infrastructure and do dry-runs and eliminate common colds for practice, and not all of those worlds went into COVID with that infrastructure in place, but they went into COVID with something much closer to that infrastructure.

or so i claim as an unobservable fact about the multiverse :-p

whereas here on Earth, all the effort that our friends spend drafting post-COVID pandemic preparedness bills goes to waste because nobody will even sponsor it on the House floor, etc.

habryka

I guess I feel like not building AI just... isn't that hard and we are maybe almost there?

Do people do reasonable things after encountering the AI Risk arguments?

So8res

i think my "actually ppl find these arguments pretty hard" point is pretty solid? i can point to scott aaronson too, etc.

habryka

I feel a bit confused on the Elon point, but my model of Scott Aaronson is totally in-favor of stopping AI progress for quite a while. My model of Elon also would like it to basically stop. 

I agree that people get really confused when talking about solutions, and I agree that this creates the opening of "people will be wrongly convinced they found a legitimate solution", but I guess I don't actually have any clear examples of people who understood the basic arguments and then weren't in favor of stopping. Sam Altman seems kind of like an obvious candidate, and I feel confused what he thinks, but my guess is he would press the stop button if he had it?

So8res

(to see how the "but he'll change his mind" thing isn't in tension, i have a sense that we could have had ~exactly this convo ~8y ago, with me being like "i dunno, elon seems bought in on "an AI in every household" and is about to start openai over that belief" and if you're like "betcha that belief changes in 8y" i would have been prescient to say "sure, to something like "we'll make it value truth"". which is how pessimistic and derpy you have to be to be calling the shots appropriately here on earth.)

(whereas it seems to me that in real life i was trying to say that 8y ago, and people were like "no he's a 4D chess genius" and i would like my bayes points, admittedly maybe not from you maybe i'm tilting at other people here in a parenthetical, whatever)

habryka

(8y ago my relationship to Elon Musk was the unglorious "oh man, I really hope this person is the adult in the room, maybe I can just get all the adults in the room and they can figure it out". That take did not age well and I do not deserve bayes points.)

So8res

so my guess is that scott aaronson would, if queried, say that RSPs are fine and that if you get some solid evals then we should go ahead and proceed? i'd be happy to be wrong about that

if folks like scott and sam and elon and dario were all saying aloud "oh god yes, the regulators should shut us down along with everybody else, we'd love time here", that would seem to me like a very different state of affairs than the current one

i'd be much more sympathetic to "earth is almost stopping, and perturbations could cause us to stop" in that sort of world

habryka

Yeah, my current model of Scott would say that he totally thinks that humanity should take a lot more time to develop AGI, and would press a pause button for a pretty long period of time.

So8res

well, he's queryable, and maybe payable-for-things. wanna offer him $50 to tell us his thoughts on AI pause?

habryka

Yeah, I'll do it. I was planning to ping him about some related stuff anyways.

So8res

neato. i'm happy to contribute cash (and willing to not transmit cash if he's a "wtf is this" sorta person)

(i'm not sure how much it updates me if scott is like "oh yeah earth should take a long pause here", i'd have to mull on it a bit)

habryka

Ok, so do you concretely predict that Sam and Dario and Elon will not say that they would please like to be shut down together with everyone else please?

habryka

I don't know, I am not like super confident of this, but I guess I am at like 35% that that happens within 2 years.

So8res

i expect them to be like "but china"

So8res

maybe some are like "if you can shut the whole world down and get one centralized project that can take its time, that'd be great", but i mostly expect at least one and probably more of the lab leaders to be derpier than this.

(also, tbc, my model is that if the AI labs shut down and agree about what conditions are required for continuing, then if we just follow their chosen route to AI, the outcome is catastrophically bad. and the point of discussion here is whether the stalling somehow buys time for something like uploading to pan out in parallel.)

ah, perhaps this is part of why i'm sorta skeptical of this whole line of analysis: it's not just that it seems to me that earth isn't that close to truly pausing (in the sense that if you shake things up a bit it might snap into that configuration), it's also that the thing it is somewhat close to is not a real pause that buys you real time, it's some sort of international AI speed-limit that doesn't really buy us all that much time nor really protect us.

(which is itself somewhat close to some international collaboration that doesn't really protect us either; we're multiple steps from the sort of stuff i expect could save us.)

like we're maybe somewhat close-ish to a "everybody Scales Responsibly™" brand "pause" but it seems to me that people are much further from recognizing how their "responsible scaling" plans are leaving us in much the same bind

habryka

To be clear, the crux we were discussing here was more "does it look like the people who read the arguments end up not in favor of 'just not building AGI'"?

So8res

my understanding of the conversational stack is that you were like "well maybe AI just shakes things up a bunch and that snaps the world into a configuration where it's trying not to do AGI so fast, without necessarily a big-unlikely-mood-shift" and i was like "i doubt that that's where perturbations land us b/c we don't seem close to the working-state" and that recursed into how close earth is to the working state.

habryka

Yeah, where the concrete crux for me at least was whether people who had been exposed to the arguments and acted in a slightly more informed manner end up not against building AGI

(And yeah, I think OpenAI and Elon rushing towards AGI is definitely evidence against that, but I also do think the coordination problems here are real, and in as much as people would still press a pause button, that's still a much better world)

So8res

...do you think that dario/elon/sam would be happy about "earth stops trying to build agi directly and goes for cognitive augmentation (or suchlike) first"?

habryka

I am currently like ~30% on that

So8res

my guess is that there's some conflation here between "will say (at least to someone they expect wants to hear it) that taking it slower in a globally-coordinated manner sounds good" and "thinks humanity would be better-served by a wholly different approach"

30% on the disjunction that any one of them thinks humanity would be better-served by a wholly different approach, in their heart of hearts?

habryka

30% that it would shake out so that after a lot of hemming and hawing, if you give them a button that does that (and it would be public if they pressed that button), they would all press it

So8res

hemming and hawing = arguing with you? or like they wake up with the button, knowing what it does and not doubting the hypothetical?

habryka

More like arguing with their employees and random people on Twitter. I don't expect to be that involved, though I do expect some other people I know to be reasonably involved (though I am not expecting them to pull like miracles here)

So8res

i'm probably a few bits lower than that, hard to say, but i also... sorta suspect, on my model of where that potential-belief comes from, that it's coming from some imagination where they actually have that as a very clear option, while also imagining that, in their heart-of-hearts, they know something that you and i believe to be the truth.

i'm skeptical that they believe what we believe strategically in their heat-of-heats, and it also seems pretty relevant to me that they don't have this as a clean option. whereas they do have options to, like, jostle around for positioning in whatever entity gets nationalized

i wonder your odds that, like, if one of them merges with the DoD in a giant US nationalized project, they spend their political capital to be like "actually let's shut this down to give cognitive augmentation a chance to make the problem easier first" or like "lol nvm this is an uploading project now"

(this feels more relevant as a hypothetical to me)

habryka

I do expect them to be more like indefinite optimists. So I expect a higher likelihood of them shutting it down for a pretty vague "this is too dangerous" reason, rather than a "and instead we should do cognitive augmentation" reason.

Taking that as a given, in this specific hypothetical, I think I am like 30% that a randomly chosen person out of the three would do that (or more likely, they would be like "this is a national surveillance to stop AI project now").

So8res

yeah i think i am less optimistic about these people than that

i think it's much more likely that they're like "and now we're going to proceed Carefully™, i'm glad it was me who was randomly selected b/c i'm the only careful one"

(or, not exactly like that, but something similarly depressing)

(which perhaps is part of a generalized crux around earth seeming significantly further from a sensible state, to me, as explains me having lower probability that mere perturbation will jostle earth into a sensible state)

...though 30% isn't that high; i'm not sure that if i imagine believing 30% then i imagine agreeing with you. but if i imagine that all these lab leaders were definitely the sort of people who currently-think that if they were in charge of the international AI consortium, they'd just shut it down b/c it's too dangerous... well then yeah i would think that the current situation was significantly more brittle.

Do humans do useful things on feeling terrified?

habryka

I think a part of my beliefs here is something like "I expect AI will be viscerally scary to lots of people as it gets more competent". 

So8res

so it's not like "these people would shut the consortium down today, if they were suddenly dictator of the consortium and guaranteed that there were no non-consortium AI projects (even if they shut the consortium down)" and more like "they would eventually see something that spooks them and shut down, before it's too late"?

habryka

I think I mildly conflated between these two in my mind. I think my probability for the first, for these three is still pretty substantial but closer to 20%, and my probability for the latter is more like 40%.

So8res

yeah i'm definitely lower on both by a decent margin (haven't really tried to pump out probabilities but, at least a bit)

So8res

a stray piece of model that'll maybe result in you dropping your probability of the latter (with the obvious caveat that i haven't even heard your reasons etc.): i have a model i've been chewing on recently-ish which goes something like: reality actually gives humans very few chances to notice scary/fucked-up/bad things, before it all becomes common-place and unnoticeable.

example: caroline ellison being like "but by then being and alameda had eroded my compunctions against theft and fraud" or suchlike; like my psychological model here is that the very first time you do the Bad Thing, you get a mental 'ping' of 'wait maybe this is bad', but maybe it's like a particularly early or borderline or small or dubious case, so you press 'ignore', and then another slightly bigger one happens again and you hit 'ignore' again, and then... that's it, your brain no longer throws up prompts, it turns out that the second time you hit 'ignore' on a brain, it hears "never show me this notification again" and then it doesn't.

i hypothesize that this is part of what's going on with terrible war crimes (though i think that a bunch of other things going on there are "very different cultures" and "everyone else was doing it" and "they're the terrible outgroup" and bloodlust and etc.),

i... am trying to think of other actual observations that substantiate the theory directly, and i'm reminded of a convo about sexual kinks i was privy to but don't feel comfortable sharing details on...

and anyway i basically expect this with the AI-related scares; i expect them to hit 'ignore' on the first two notifications and then not get the rest

habryka

Yeah, I think I share that model and have been thinking about a bunch of very similar things as well.

I think my model currently has an additional dynamic that is related. Which is that if you press ignore a bunch of times, and then reality smashes you in the face with it having been wrong to ignore it, now you are in the paranoia regime where people overreact and burn everything with fire, though I feel confused about when this happens. It clearly happens with social dynamics (and it's a pattern I've observed in myself), and e.g. red scare stuff is an example of this, as well as nazi stuff.

So8res

nod, and perhaps also with companies, as visible in the bureaucratic scar-tissue (tho in companies i think there can be a weird effect where they get bitten hard by one thing, and then start developing unhelpful bureaucratic scar-tissue about everything in unhelpful ways. ...with the US TSA being a classic example at a national level i suppose).

habryka

Yep, the organizational scar-tissue thing definitely feels like it's pointing at a similar thing.

So8res

this basically sounds to me like the theory "maybe we'll get not only a warning shot, but enough warning shots that people decide to shut down and abandon ship (after centralizing things such that that matters)", which... i wouldn't rule out but i don't have like 20% on

habryka

I was more giving my two cents on my general model of ignoring scary/bad things. I think most of my hope for the relevant people is more "they are already actually convinced and terrified in the relevant ways".

So8res

...part of my skepticism here is that i'm just... having a really hard time coming up with cases where people are like "haha yeah nope nevermind" and then shut down a whole-ass technology. nuclear power i guess maybe kinda?

but that's a quite different case, right; it's a case of society first deciding to regulate a thing, and then private people plowing ahead and screwing some pooches, and regulators then being like "this is now de-facto banned (despite still being nominally legal)"

i still don't feel like i can recall any case of a team working on a project deciding to back down from it out of some sort of fear.

maybe i wouldn't've heard of biolabs being like "and then halfway through the smallpox synthesis and gain-of-function research, we decided to back down". i have instead heard only the stories of the BSL-4 leaks or whatever.

habryka

I think the nuclear thing is because nuclear weapons are viscerally terrifying.

So8res

i guess i maybe heard a rumor once of a team that started making reverse-chiral bacteria and was like "uhh on second thought that was a terrible plan"?

but tbc i have much higher probability on "earth puts up some derpy regulations, and then later after some bad shit happens those regulations get re-interpreted as a de-facto ban" than i have on "earth coordinates to have a single unified project, which then decides it's better to shut down".

habryka

Yeah, I also have substantially higher probability on the regulations -> ban situation. And I guess I feel kind of optimistic about people making scary demos that facilitate this happening. 

I guess I also kind of don't believe that scary lab demos can do it, but feel a bit confused about it (like, maybe people just actually have to get hurt for people to believe that things are bad)

Closing thoughts

habryka

I am quite interested in, in the future, digging more into the "I think maybe the people who have run into the arguments are just pretty deeply terrified in a way that will cause them to take reasonable action here" disagreement.

So8res

there we may disagree both about "deeply terrified" (nate suspects oli is typical-minding :-p) and about "would act reasonably" (nate again suspects typical-minding; elon sure was terrified back in the day but afaict mostly attached it to "demis bad" and then latched onto "openness" and etc.; nate thinks it takes... mental skills most lack, to reliably channel such fear productively) but... ok i guess i was trying to dive into those rather than just acknowledge them

So8res

spicy closing take that i'm slightly chewing on here, relevant to recent events, namely the SBF trial: i was struck in part by... not sure how well i'm gonna be able to articulate this, but i was struck in part by a sense of internal flexibility to sam, that allowed him to "not recall" a bunch of things rather than providing honest takes; that allowed him to do a bunch of wizard-lying... not quite sure what i'm trying to say here. (i do note that i used to think wizard-lying was ok and now don't think that, though i do think that lying to hide jews in your attic is ok and basically just subscribe to yudkowsky-style meta-honesty etc.), but...

what am i trying to say, something like: i suspect that a bunch of this hope that people are terrified in potentially productive ways is relying on imagining a type of internal fortitude and carefulness that most people just straight-up lack, and that's hard to cultivate.

no reply necessary; mostly that was me attempting to articulate a thought for my own sake, that it seemed somewhat valuable for me to attempt to force into words.

habryka

Interesting. I think I am somewhat failing to see the connection from before the triple-dot to after the triple-dot, but I think I am puzzling something together a bit

So8res

connection is something like: if you're running a big old ai project and start to suspect it's dangerous, there's all sorts of internal pressures to keep pushing forward because otherwise your past assurances that this was fine were false, and to double-down on your alignment ideas because otherwise you will have been wrong the whole time and otherwise your political fights are not fights you should have won and otherwise it turns out you've been endangering people the whole time or etc. etc.

habryka

Yeah, OK, that resonates with me a bunch, and connects with some related thoughts of things I would really like to fix about the current AI situation, and I might be able to

So8res

...well, godspeed. (i suspect you might need to fix things about earth's overall cultural situation, but, godspeed.)

habryka

Cool, thank you Nate. I enjoyed this. Hope you have a good night.

New Comment
5 comments, sorted by Click to highlight new comments since:
[-]Akash2111

Thanks for this dialogue. I find Nate and Oliver's "here's what I think will actually happen" thoughts useful.

I also think I'd find it useful for Nate to spell out "conditional on good things happening, here's what I think the steps look like, and here's the kind of work that I think people should be doing right now. To be clear, I think this is all doomed, and I'm only saying this being Akash directly asked me to condition on worlds where things go well, so here's my best shot."

To be clear, I think some people do too much "play to your outs" reasoning. In the excess, this can lead to people just being like "well maybe all we need to do is beat China" or "maybe alignment will be way easier than we feared" or "maybe we just need to bet on worlds where we get a fire alarm for AGI."

I'm particularly curious to see what happens if Nate tries to reason in this frame, especially since I expect his "play to your outs" reasoning/conclusions might look fairly different from that of others in the community.

Some examples of questions for Nate (and others who have written more about what they actually expect to happen and less about what happens if we condition on things going well):

  • Condition on the worlds in which we see substantial progress in the next 6 months. What are some things that have happened in those worlds? What does progress look like?
  • Condition on worlds in which the actions of the AIS community end up having a strong positive influence in the next 6 months. What are some wins that the AIS community (or specific actors within it) achieve?
  • Suppose for the sake of this conversation that you are fully adopting a "play to your outs" mentality. What outs do you see? Regardless of the absolute probabilities you assign, which of these outs seem most likely and most promising?
  • All things considered, what do you currently see as the most impactful ways you can spend your time?
  • All things considered, what do you currently see as the most impactful ways that "highly talented comms/governance/policy people can be spending their time?" (can divide into more specific subgroups if useful). 

I'll also note that I'd be open to having a dialogue about this with Nate (and possibly other "doomy" people who have not written up their "play to your outs" thoughts).

[-]aysja1714

Even if humanity isn't like, having a huge mood shift, I do still expect the next 10 years to have a lot more people working on stuff that actually helps than the previous 10 years.

What kinds of things are you imagining, here? I'm worried that on the current margin people coming into safety will predominately go into interpretability/evals/etc because that's the professional/legible thing we have on offer, even though by my lights the rate of progress and the methods/aims/etc of these fields are not nearly enough to get us to alignment in ~10 years (in worlds where alignment is not trivially easy, which is also the world I suspect we're in). My own hope for another ten years is more like "that gives us some space and time to develop a proper science here," which at the current stage doesn't feel very bottlenecked by number of people. But I'm curious what your thoughts are on the "adding more people pushes us closer to alignment" question. 

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

The ousting of Sam Altman by a Board with 3 EA people could be the strongest public move so far.

On "lab leaders would choose to stop if given the coordination-guaranteed button" vs "big ol' global mood shift", I think the mood shift is way more likely (relatively) for two reasons.

One of which was argued directionally about and I want to capture more crisply the way I see it, and the other I didn't see mentioned and might be a helpful model to consider.

  1. The "inverse scaling law" for human intelligence vs rationality. "AI arguments are pretty hard" for "folks like scott and sam and elon and dario" because it's very easy for intelligent people to wade into the thing overconfidently and tie themselves into knots of rationalization (amplified by incentives, "commitment and consistency" as Nate mentioned re: SBF, etc). Whereas for most people (and this, afaict, was a big part of Eliezer's update on communicating AGI Ruin to a general audience), it's a straightforward "looks very dangerous, let's not do this."

  2. The "agency bias" (?): lab leaders et al think they can and should fix things. Not just point out problems, but save the day with positive action. ("I'm not going to oppose Big Oil, I'm going to build Tesla.") "I'm the smart, careful one, I have a plan (to make the current thing be ok-actually to be doing; to salvage it; to do the different probably-wrong thing, etc.)" Most people don't give themselves that "hero license" and even oppose others having it, which is one of those "almost always wrong but in this case right actually" things with AI.

So getting a vast number of humans to "big ol' global mood shift" into "let's stop those hubristically-agentic people from getting everyone killed which is obviously bad" seems more likely to me than getting the small number of the latter into "our plans suck actually, including mine and any I could still come up with, so we should stop."