All of Jeffrey Ladish's Comments + Replies

Just donated 2k. Thanks for all you’re doing Lightcone Team!

+1 on this, and also I think Anthropic should get some credit for not hyping things like Claude when they definitely could have (and I think received some tangible benefit from doing so).

See: https://www.lesswrong.com/posts/xhKr5KtvdJRssMeJ3/anthropic-s-core-views-on-ai-safety?commentId=9xe2j2Edy6zuzHvP9, and also some discussion between me and Oli about whether this was good / what parts of it were good.
 

5Rob Bensinger
Agreed!

@Daniel_Eth  asked me why I choose 1:1 offsets. The answer is that I did not have a principled reason for doing so, and do not think there's anything special about 1:1 offsets except that they're a decent schelling point. I think any offsets are better than no offsets here. I don't feel like BOTECs of harm caused as a way to calculate offsets are likely to be particularly useful here but I'd be interested in arguments to this effect if people had them. 

an agent will aim its capabilities towards its current goals including by reshaping itself and its context to make itself better-targeted at those goals, creating a virtuous cycle wherein increased capabilities lock in & robustify initial alignment, so long as that initial alignment was in a "basin of attraction", so to speak

Yeah, I think if you nail initial alignment and have a system that has developed the instrumental drive for goal-content integrity, you're in a really good position. That's what I mean by "getting alignment to generalize in a robus... (read more)

It seems nice to have these in one place but I'd love it if someone highlighted a top 10 or something.

Yeah, I agree with all of this, seems worth saying. Now to figure out the object level... 🤔

3Eli Tyre
That's the hard part. My guess is that training cutting edge models, and not releasing them is a pretty good play, or would have been, if there wasn't huge AGI hype.  As it is, information about your models is going to leak, and in most cases the fact that something is possible is most of the secret to reverse engineering it (note: this might be true in the regime of transformer models, but it might not be true for other tasks or sub-problems).  But on the other hand, given the hype, people are going to try to do the things that you're doing anyway, so maybe leaks about your capabilities don't make that much difference?  This does point out an important consideration, which is "how much information needs to leak from your lab to enable someone else to replicate your results?" It seems like, in many cases, there's an obvious way to do some task, and the mere fact that you succeeded is enough info to recreate your result. But presumably there are cases, where you figure out a clever trick, and even if the evidence of your model's performance leaks, that doesn't tell the world how to do it (though it does cause maybe hundreds of smart people to start looking for how you did it, trying to discover how to do it themselves). I think I should regard the situation differently depending on the status of that axis.

Yeah that last quote is pretty worrying. If the alignment team doesn't have the political capital / support of leadership within the org to have people stop doing particular projects or development pathways, I am even more pessimistic about OpenAI's trajectory. I hope that changes!

Yeah I think we should all be scared of the incentives here.

Yeah I think it can both be true that OpenAI felt more pressure to release products faster due to perceived competition risk from Anthropic, and also that Anthropic showed restraint in not trying to race them to get public demos or a product out. In terms of speeding up AI development, not building anything > building something and keeping it completely secret > building something that your competitors learn about > building something and generating public hype about it via demos > building something with hype and publicly releasing it to users... (read more)

Eli Tyre*119

In terms of speeding up AI development, not building anything > building something and keeping it completely secret > building something that your competitors learn about > building something and generating public hype about it via demos > building something with hype and publicly releasing it to users & customers.

I think it is very helpful, and healthy for the discourse, to make this distinction. I agree that many of these things might get lumped together.

But also, I want to flag the possibility that something can be very very bad to do, e... (read more)

Jeffrey LadishΩ193526

I both agree that the race dynamic is concerning (and would like to see Anthropic address them explicitly), and also think that Anthropic should get a fair bit of credit for not releasing Claude before ChatGPT, a thing they could have done and probably gained a lot of investment / hype over.  I think Anthropic's "let's not contribute to AI hype" is good in the same way that OpenAI's "let's generate massive" hype strategy is bad.

Like definitely I'm worried about the incentive to stay competitive, especially in the product space. But I think it's worth ... (read more)

habryka214

I both agree that the race dynamic is concerning (and would like to see Anthropic address them explicitly), and also think that Anthropic should get a fair bit of credit for not releasing Claude before ChatGPT, a thing they could have done and probably gained a lot of investment / hype over.

I mean, didn't the capabilities of Claude leak specifically to OpenAI employees, so that it's pretty unclear that not releasing actually had much of an effect on preventing racing? My current best guess, though I am only like 30% of this hypothesis since there are many ... (read more)

RaemonΩ122219

Yeah I agree with this.

To be clear, I think Anthropic has done a pretty admirable job of showing some restraint here. It is objectively quite impressive. My wariness is "Man, I think the task here is really hard and even a very admirably executed company may not be sufficient." 

Thanks Buck, btw the second link was broken for me but this link works: https://cepr.org/voxeu/columns/ai-and-paperclip-problem Relevant section:

Computer scientists, however, believe that self-improvement will be recursive. In effect, to improve, and AI has to rewrite its code to become a new AI. That AI retains its single-minded goal but it will also need, to work efficiently, sub-goals. If the sub-goal is finding better ways to make paperclips, that is one matter. If, on the other hand, the goal is to acquire power, that is another.

The insight from econo

... (read more)
2Buck
Yeah, I agree copies are easier to work with; this is why I think that their situation is very analogous to new brain uploads.
2Viliam
This was actually used in a SF novel (don't click if you don't want to know which) to explain why the setting had a few superhuman AIs and yet they did not rule the world.

Yeah it seems possible that some AGI systems would be willing to risk value drift, or just not care that much. In theory you could have an agent that didn't care if its goals changed, right? Shoshannah pointed out to me recently that humans have a lot of variance in how much they care if they're goals are changed. Some people are super opposed to wireheading, some think it would be great. So it's not obvious to me how much ML-based AGI systems of around human level intelligence would care about this. Like maybe this kind of system converges pretty quickly to coherent goals, or maybe it's the kind of system that can get quite a bit more powerful than humans before converging, I don't know how to guess at that.

I think that would be a really good thing to have! I don't know if anything like that exists, but I would love to see one

I think the AI situation is pretty dire right now.  And at the same time, I feel pretty motivated to pull together and go out there and fight for a good world / galaxy / universe.

Nate Soares has a great post called "detach the grim-o-meter", where he recommends not feeling obligated to feel more grim when you realize world is in deep trouble.

It turns out feeling grim isn't a very useful response, because your grim-o-meter is a tool evolved for you to use to respond to things being harder *in your local environment* rather than the global state of thin... (read more)

I sort of agree with this abstractly and disagree on practice. I think we're just very limited in what kinds of circumstances we can reasonably estimate / guess at. Even the above claim, "a big proportion of worlds where we survived, AGI probably gets delayed" is hard to reason about.

But I do kind of need the know the timescale I'm operating in when thinking about health and money and skill investments, etc. so I think you need to reason about it somehow.

3[anonymous]
It's like writing a clickbait title -- they add clutter and noise for no benefit, and I want to discourage them.

And then on top of that there are significant other risks from the transition to AI. Maybe a total of more like 40% total existential risk from AI this century? With extinction risk more like half of that, and more uncertain since I've thought less about it.

40% total existential risk, and extinction risk half of that? Does that mean the other half is some kind of existential catastrophe / bad values lock-in but where humans do survive?

6evhub
Fwiw, I would put non-extinction existential risk at ~80% of all existential risk from AI. So maybe my extinction numbers are actually not too different than Paul's (seems like we're both ~20% on extinction specifically).

This is a temporary short form, so I can link people to Scott Alexander's book review post. I'm putting it here because Substack is down, and I'll take it down / replace it with a Substack link once it's back up. (also it hasn't been archived by Waybackmachine yet, I checked)

The spice must flow.

Edit: It's back up, link: https://astralcodexten.substack.com/p/book-review-what-we-owe-the-future

2Vladimir_Nesov
It isn't down for me right now. It was already on archive.today and on Internet Archive.

Thanks for the reply!

I hope to write a longer response later, but wanted to address what might be my main criticism, the lack of clarity about how big of a deal it is to break your pledge, or how "ironclad" the pledge is intended to be.

I think the biggest easy improvement would be amending the FAQ (or preferably something called "pledge details" or similar) to present the default norms for pledge withdrawal. People could still choose to choose different norms if they preferred, but it would make it more clear what people were agreeing to, and how strong the commitment was intended to be, without adding more text to the main pledge.

 

2lukefreeman
Thanks! Will have a think about how any existing language could be updated or what should be added. The TL;DR version of what my ideal would be is that people to take it seriously enough with enough foresight that there is a small (5-20%) chance of withdrawal (it's best to keep promises), but not so seriously that they wouldn't take it (worried that there's some tiny chance they'll break it) or wouldn't resign if it were truly the best thing for the world (including not just their direct impact but also the impact on their own wellbeing and the impact their resignation/follow through has on the norm). I see a few problems with having default norms for withdrawal, It can (a) be hard to universalise; (b) provide licence for some; (c) devalue the efforts of others. For the purpose of look at (a) (b) and (c) let's imagine we made an explicit default norm to be developing a chronic health issue (something that I can imagine being a good reason to withdraw for some people after careful consideration of their exact circumstance): (a) The types of chronic health issues can vary significantly on how much they'd change someones ability to follow through; and the places in which someone lives (e.g. public/private healthcare) and their employment situation can also change that. For example, I've had chronic back pain and headaches/migraines since I was 14 years old, but I don't see that as a dealbreaker. (c) We all fall prey to motivated reasoning and pre-commitment is meant to help you avoid that to some extent. If chronic health issues were listed as a norm I might have looked at the pledge and thought "Oh, that's for healthy people, I'm all good, I should keep 100% of my money." Or say I took the pledge before my chronic pain started, and then at some point I reflected and thought "I guess I'll just stop giving because that's just for the healthy people.". (b) If some people with the same situation have worked through it then it can devalue their efforts and/or make th

I'm a little surprised that I don't see more discussion of ways that higher bandwidth brain-computer interfaces might help, e.g. neurolink or equivalent. Like it sounds difficult but do people feel really confident it won't work? Seems like if it could work it might be achievable on much faster timescales than superbabies.

Oh cool. I was thinking about writing some things about private non-ironclad commitments but this covers most of what I wanted to write. :) 

I cannot recommend this approach on the grounds of either integrity or safety 😅

9Elizabeth
it worked for Jon Snow I don't know what your problem is. 

Yeah, I think it's somewhat boring without without more. Solving the current problems seems very desirable to me, very good, and also really not complete / compelling / interesting. That's why I'm intending to try to get at in part II. I think it's the harder part.

Rot13: Ab vg'f Jbegu gur Pnaqyr

1UnderTruth
Rot13: V gubhtug vg jbhyq or Znaan ol Znefunyy Oenva

This could mitigate financial risk to the company but I don't think anyone will sell existential risk insurance, or that it would be effective if they did

I think that's a legit concern. One mitigating factor is that people who seem inclined to rash destructive plans tend to be pretty bad at execution, e.g. Aum Shinrikyo

Recently Eliezer has used the dying with dignity frame a lot outside his April 1st day post. So while some parts of that post may have been a joke, the dying with dignity part was not. For example: https://docs.google.com/document/d/11AY2jUu7X2wJj8cqdA_Ri78y2MU5LS0dT5QrhO2jhzQ/edit?usp=drivesdk

If you have specific examples where you think I took something too seriously that was meant to be a joke, I'd be curious to see those.

  1. Recently Eliezer has used the dying with dignity frame a lot outside his April 1st day post. So while some parts of that post may have been a joke, the dying with dignity part was not. For example: https://docs.google.com/document/d/11AY2jUu7X2wJj8cqdA_Ri78y2MU5LS0dT5QrhO2jhzQ/edit?usp=drivesdk

  2. I think you're right that dying with dignity is a better frame specifically for recommending against doing unethical stuff. I agree with everything he said about not doing unethical stuff, and tried to point to that (maybe if I have time I will add some more em

... (read more)
2johnlawrenceaspden
Not clear to me. Why not?
3Adam Zerner
That makes sense. And thank you for emphasizing this. I think both of our points stand. My point is about the title of this specific April Fools Day post. If it's gonna be an April Fools Day post, "playing to your outs" isn't very April Fools-y. And your point stands I think as well, if I'm interpreting you correctly, that he's chosen the messaging of "death with dignity" outside of the context of April Fools Day as well, in which case "it's an April Fools Day post" isn't part of the explanation. I hear ya for sure. I'm not sure what to think about how necessary it is either. The heuristic of "be more cynical about humans" comes to mind though, and I lean moderately strongly towards thinking it is a good idea.

Just a note on confidence, which seems especially important since I'm making a kind of normative claim:

I'm very confident "dying with dignity" is a counterproductive frame for me. I'm somewhat confident that "playing to your outs" is a really useful frame for me and people like me. I'm not very confident "playing to your outs" is a good replacement to "dying with dignity" in general, because I don't know how much people will respond to it like I do. Seeing people's comments here is helpful.

Vaniver*160

So, in my mind, the thing that "dying with dignity" is supposed to do is that when you look at plan A and B, you ask yourself: "which of these is more dignified?" instead of "which of these is less likely to lead to death?", because your ability to detect dignity is more sensitive than your ability to detect likelihood of leading to death on the present margin. [This is, I think, the crux; if you don't buy this then I agree the framing doesn't seem sensible.]

This lets you still do effective actions (that, in conjunction with lots of other things, can still... (read more)

"It also seems to encourage #3 (and again the vague admonishment to "not do that" doesn't seem that reassuring to me.)"

I just pointed to Eleizer's warning which I thought was sufficient. I could write more about why I think it's not a good idea, but I currently think a bigger portion of the problem is people not trying to come up with good plans rather than people coming up with dangerous plans which is why my emphasis is where it is.

Eliezer is great at red teaming people's plans. This is great for finding ways plans don't work, and I think it's very impor... (read more)

4Joe Collman
I largely agree with that, but I think there's an important asymmetry here: it's much easier to come up with a plan that will 'successfully' do huge damage, than to come up with a plan that will successfully solve the problem. So to have positive expected impact you need a high ratio of [people persuaded to come up with good plans] to [people persuaded that crazy dangerous plans are necessary]. I'd expect your post to push a large majority of readers in a positive direction (I think it does for me - particularly combined with Eliezer's take). My worry isn't that many go the other way, but that it doesn't take many.

I currently don't know of any outs. But I think I know some things that outs might require and am working on those, while hoping someone comes up with some good outs - and occasionally taking a stab at them myself.

I think the main problem is the first point and not the second point:

  • Do NOT assume that what you think is an out is certainly an out.
  • Do NOT assume that the potential outs you're aware of are a significant proportion of all outs.

The current problem, if Eleizer is right, is basically that we have 0 outs. Not that the ones we have might be less... (read more)

3Joe Collman
Oh sure - I don't mean to imply there's no upside in this framing, or that I don't see a downside in Eliezer's. However, whether you know of outs depends on what you see as an out. E.g. buying much more time to come up with a solution could be seen as an out by some people. It's easy to imagine many bad plans to do that, with potentially hugely negative side-effects. Some of those bad plans would look rational, conditional on an assumption that there was no other way to avoid losing the future. Of course making such an assumption is poor reasoning, but the trouble is that it happens implicitly: nobody needs to say to themselves "...and here I assume that no-one on earth has or will come up with approaches I've missed", they only need to fail to ask themselves the right questions. Conditional on being very clear on not knowing the outs, I think this framing may well be a good one for many people - but I'm serious about the mental exercise.

I agree finding your outs is very hard, but I don't think this is actually a different challenge than increasing "dignity". If you don't have a map to victory, then you probably lose. I expect that in most worlds where we win, some people figured out some outs and played to them.

I donated:

$100 to Zvi Mowshowitz for his post "Covid-19: My Current Model" but really for all his posts. I appreciated how Zvi kept posting Covid updates long after I have energy to do my own research on this topic. I also appreciate how he called the Omicron wave pretty well.

$100 to Duncan Sabien for his post "CFAR Participant Handbook now available to all". I'm glad CFAR decided to make it public, both because I have been curious for a while what was in it and because in general I think it's pretty good practice for orgs like CFAR to publish more of what they do. So thanks for doing that!

I've edited the original to add "some" so it reads "I'm confident that some nuclear war planners have..."

It wouldn't surprise me if some nuclear war planners had dismissed these risks while others had thought them important.

I'm fairly confident that at least some nuclear war planners have thought deeply about the risks of climate change from nuclear war because I've talked to a researcher at RAND who basically told me as much, plus the group at Los Alamos who published papers about it, both of which seem like strong evidence that some nuclear war planners have taken it seriously. Reisner et al., "Climate Impact of a Regional Nuclear Weapons Exchange: An Improved Assessment Based On Detailed Source Calculations" is mostly Los Alamos scientists I believe.

Just because some of t... (read more)

Thanks this is helpful! I'd be very curious to see where Paul agreed / disagree with the summary / implications of his view here.

4Rob Bensinger
(I'll emphasize again, by the way, that this is a relative comparison of my model of Paul vs. Eliezer. If Paul and Eliezer's views on some topic are pretty close in absolute terms, the above might misleadingly suggest more disagreement than there in fact is.)

After reading these two Eliezer <> Paul discussions, I realize I'm confused about what the importance of their disagreement is.

It's very clear to me why Richard & Eliezer's disagreement is important. Alignment being extremely hard suggests AI companies should work a lot harder to avoid accidentally destroying the world, and suggests alignment researchers should be wary of easy-seeming alignment approaches.

But it seems like Paul & Eliezer basically agree about all of that. They disagree about... what the world looks like shortly before the end... (read more)

I would frame the question more as 'Is this question important for the entire chain of actions humanity needs to select in order to steer to good outcomes?', rather than 'Is there a specific thing Paul or Eliezer personally should do differently tomorrow if they update to the other's view?' (though the latter is an interesting question too).

Some implications of having a more Eliezer-ish view include:

  • In the Eliezer-world, humanity's task is more foresight-loaded. You don't get a long period of time in advance of AGI where the path to AGI is clear; nor do yo
... (read more)

Another way to run this would be to have a period of time before launches are possible for people to negotiate, and then to not allow retracting nukes after that point.  And I think next time I would make it so that the total of no-nukes would be greater than the total if only one side nuked, though I did like this time that people had the option of a creative solution that "nuked" a side but lead to higher EV for both parties than not nuking. 

1Idan Arye
You also need to only permit people who took part in the negotiations to launch nukes. Otherwise newcomers could just nuke without anyone having a chance to establish a precommittment to retaliate against them.

I think the fungibility is a good point, but it seems like the randomizer solution is strictly better than this. Otherwise one side clearly gets less value, even if they are better off than they would have been had the game not happened. It's still a mixed motive conflict!

I'm not sure that anyone exercised restraint in not responding to the last attack, as I don't have any evidence that anyone saw the last response. It's quite possible people did see it and didn't respond, but I have no way to know that.

Oh I should have specified, that I would consider the coin flip to be a cooperative solution! Seems obviously better to me than any other solution.

I think there are a lot of dynamics present here that aren't present in the classic prisoners dilemma, and some dynamics that are present (and some that are present in various iterated prisoner's dilemmas). The prize might be different for different actors, since actors place different value of "cooperative" outcomes. If you can trust people's precommitments, I think there is a race to commit OR precommit to an action. 

E.g. if I wanted the game to settle with no nukes launched, then I could pre-commit to launching a retaliatory strike to either side if an attack was launched.

I sort of disagree. Not necessarily that it was the wrong choice to invest your security resources elsewhere--I think your threat model is approximately correct--but I disagree that it's wrong to invest in that part of your stack.

My argument here is that following best practices is a good principle, and that you can and should make exceptions sometimes, but Zack is right to point it out as a vulnerability. Security best practices exist to help you reduce attack surface without having to be aware of every attack vector. You might look at this instance and r... (read more)

6habryka
I do not know what the difference here is. Presumably one implies the other?
Taleuntum*220

Furthermore, It is also not inconceivable to me that an adversary might be able to use the hash itself without cracking it. For example, the sha256 hash of some information is commonly used to prove that someone has that information without revealing it, so an adversary, using the hash, could credibly lie that he already possesses a launch code and in a possible counterfactual world where no one found about the client side leaking the hash except this adversary, use this lie to acquire an actual code with some social engineering.

Like:

"Attention Lesswrong! ... (read more)

I agree that this is a correct application of security mindset; exposures like these can compound with, for example, someone's automatic search of the 100 most common ways to screw up secure random number generation such as by using the current time as a seed. Deep security is about reducing the amount of thinking you have to do and your exposure to wrong models and stuff you didn't think of.

I don't think it's super clear, but I do think it's the clearest that we are likely to get that's more than 10% likely. I disagree that SARS could 15 years, or at least I think that one could have been called within a year or two. My previous attempt to operationalize a bet had the bet resolve if, within two years, a mutually agreed upon third party updated to believe that there is >90% probability that an identified intermediate host or bat species was the origin point of the pandemic, and that this was not a lab escape. 

Now that I'm writing this ... (read more)

1Donald Gislason
Given that full transparency from Chinese authorities is unlikely, assessing the probabilities is the best we can do. Fortunately, that has been done with with impressive scientific rigour by DRASTIC member Dr. Steven Quay MD, PhD in his technically detailed 193-page Bayesian analysis of 26 known facts about the outbreak: https://zenodo.org/record/4477081#.YNAFry0ZNE4 which he explains in layman's terms in his interview with Julius Killerby (cited in my comment above). The advantage of this approach is that it follows the scientific method: laying out clearly its premises and calculations so that they can be challenged and tested by experts in the field. The evidence is so convincing that, along with his influential piece in the Wall Street Journal (co-authored by astrophysicist Richard Muller) https://www.wsj.com/articles/the-science-suggests-a-wuhan-lab-leak-11622995184 his Bayesian analysis -- made available to both the WHO and the Biden administration -- likely represents 'the writing on the wall' for public decision-makers. It was the 'nudge' indicating that keeping the story low-key was no longer an option, given the amount of technical expertise weighing in on the subject in public discussion. In my view, given the dramatic quality of the statistical evidence, the Biden administration now finds itself the dog that caught the car. The three-month time period for a report from the intelligence community is likely only a breather to assess how to handle the truth of the matter politically with China, and no longer an attempt to establish what is actually true.
2ChristianKl
Having looked more into it, it's quite plausible that we will have confirmation that it's a lab leak in a few months or years. The US intelligence community is currently tasked with looking for evidence, and it's quite plausible that someone in China actually knows that it's a lab leak and the US intelligence community manages to intercept clearcut information that goes beyond the reduced cell phone traffic and possible road closures around the WIV in October 2019 and the 3 researchers from the WIV who went to the hospital with symptoms matching flu and COVID-19 in November 2019. 

It seems like an interesting hypothesis but I don't think it's particularly likely. I've never heard of other viruses becoming well adapted to humans within a single host. Though, I do think that's the explanation for how several variants evolved (since some of them emerged with a bunch of functional mutations rather than just one or two). I'd be interest to see more research into the evolution of viruses within human hosts, and what degree of change is possible & how this relates to spillover events.

Thanks! I'm still wrapping my mind around a lot of this, but this gives me some new directions to think about.

I have an intuition that this might have implications for the Orthogonality Thesis, but I'm quite unsure. To restate the Orthogonality Thesis in the terms above, "any combination of intelligence level and model of the world, M2". This feels different than my intuition that advanced intelligences will tend to converge upon a shared model / encoding of the world even if they have different goals. Does this make sense? Is there a way to reconcile these intuitions?

4johnswentworth
Important point: neither of the models  in this post are really "the optimizer's model of the world". M1 is an observer's model of the world (or the "God's-eye view"); the world "is being optimized" according to that model, and there isn't even necessarily "an optimizer" involved. M2 says what the world is being-optimized-toward. To bring "an optimizer" into the picture, we'd probably want to say that there's some subsystem which "chooses"/determines θ′, in such a way that E[−logP[X|M2]|M1(θ′)]≤E[−logP[X|M2]|M1(θ)], compared to some other θ-values. We might also want to require this to work robustly, across a range of environments, although the expectation does that to some extent already. Then the interesting hypothesis is that there's probably a limit to how low such a subsystem can make the expected-description-length without making θ′ depend on other variables in the environment. To get past that limit, the subsystem needs things like "knowledge" and a "model" of its own - the basic purpose of knowledge/models for an optimizer is to make the output depend on the environment. And it's that model/knowledge which seems likely to converge on a similar shared model/encoding of the world.
Load More