I think the problem is that it's not actually a good analogy, and EY made an error in using the current event to amplify his message. AFAIK, there's never been anything that has to be perfect the very first time, and pointing out all the times we chose iteration over perfection isn't evidence for that thesis.
The fact that there are ZERO good past analogies may be evidence that EY is wrong, or it may not be. But Matt Levine definitely has an advantage in communication that he can pick a new example (or at least a new aspect of it) every day for the 10 or so themes he repeats over and over. EY has no such source of repeated stories.
I think the problem is that it’s not actually a good analogy, and EY made an error in using the current event to amplify his message. AFAIK, there’s never been anything that has to be perfect the very first time, and pointing out all the times we chose iteration over perfection isn’t evidence for that thesis.
Well, yes. EY says that.AGI is a unique threat that has never happened before...and also that it's analogus to other things.
I think Eliezer's tweet is wrong even if you grant the rocket <> alignment analogy (unless you grant some much more extreme background views about AI alignment).
Assume that "deploy powerful AI with no takeover" is exactly as hard as "build a rocket that flies correctly the first time even though it has 2x more thrust than anything anyone as tested before." Assume further that an organization is able to do one of those tasks if and only if it can do the other.
Granting the analogy, the relevant question is how much harder it would be to successfully launch and land a rocket the first time without doing tests of any similarly-large rockets. If you tell me it increases costs by 10% I'm like "that's real but manageable." If you tell me it doubles the cost, that's a problem but not anywhere close to doom. If it 10x's the cost then you'd have to solve a hard political problem.
The fact that SpaceX fails probably tells us that it costs at least 1% or maybe even 10% more to develop starship without ever failing. It doesn't really tell us much beyond that. I don't see any indication that they were surprised this failed or that they took significant pains to avoid a failure. The main thing commenters have pushed back on is that this isn't a mistake in SpaceX's case, so it's not helpful evidence about the difficulty of doing something right the first time.
(In fact I'd guess that never doing a test costs much more than 10% extra, but this launch isn't a meaningful part of the evidence for that.)
Granting the analogy, Eliezer could help himself to a much weaker conclusion:
The fastest and easiest possible way for Elon Musk to build an AI would lead to an AI takeover. He's not so good at science that "trial by error," on the actual problem you care about rather than analogies and warmups, doesn't significantly reduce costs.
But saying "the fastest possible way for Bob to make an AI would lead to an AI takeover" does not imply that "Bob is not qualified to run an AGI company." Instead it just means that Bob shouldn't rely on his company doing the fastest and easiest thing and having it turn out fine. Instead Bob should expect to make sacrifices, either burning down a technical lead or operating in (or helping create) a regulatory environment where the fastest and easiest option isn't allowed.
I suspect what's really happening is that Eliezer thinks AGI alignment is much harder than successfully launching and landing a rocket the first time. So if getting it right the first time increases costs by 10% for a rocket, it will increase costs by 1,000% for an AGI.
But if that's the case then the key claim isn't "solving problems is hard when you can't iterate." The key claim is that solving alignment (and learning from safe scientific experiments) is much harder than in other domains, so much harder that a society that can solve alignment will never need to learn from experience for any normal "easy" engineering problem like building rockets. I think that's conceivable but I'd bet against. Either way, it's not surprising that people will reject the analogy since it's based on a strong implicit claim about alignment that most people find outlandish.
Assume that "deploy powerful AI with no takeover" is exactly as hard as "build a rocket that flies correctly the first time even though it has 2x more thrust than anything anyone as tested before."
I think you are way underestimating. A more reasonable guess is that expected odds of the first Starship launch failure go down logarithmically with budget and time. Even if you grant a linear relationship, reducing the odds of failure from 10% to 1% means 10x the budget and time. If you want to never fail, you need an infinite budget and time. If the failure results in an extinction event, then you are SOL.
A more reasonable guess is that expected odds of the first Starship launch failure go down logarithmically with budget and time.
That's like saying that it takes 10 people to get 90% reliability, 100 people to get to 99% reliability, and a hundred million people to get to 99.99% reliability. I don't think it's a reasonable model though I'm certainly interested in examples of problems that have worked out that way.
Linear is a more reasonable best guess. I have quibbles, but I don't think it's super relevant to this discussion. I expect the starship first failure probability was >>90%, and we're talking about the difficulty of getting out of that regime.
That's like saying that it takes 10 people to get 90% reliability, 100 people to get to 99% reliability, and a hundred million people to get to 99.99% reliability. I don't think it's a reasonable model though I'm certainly interested in examples of problems that have worked out that way.
Conditional on it being a novel and complicated design. I routinely churn six-sigma code when I know what I am doing, and so do most engineers. But almost never on the first try! The feedback loop is vital, even if it is slow and inefficient. For anything new you are fighting not so much the designs, but human fallibility. Eliezer's point is that it if you have only one try to succeed, you are hooped. I do not subscribe to the first part, I think we have plenty of opportunities to iterate as LLM capabilities ramp up, but, conditional on "perfect first try or extinction", our odds of survival are negligible. There might be alignment by default, or some other way out, but conditional on that one assumption, we have no chance in hell.
It seems to me that you disagree with that point, somehow. That by pouring more resources upfront into something novel, we have good odds of succeeding on the first try, open loop. That is not a tenable assumption, so I assume I misunderstood something.
I agree you need feedback from the world; you need to do experiments. If you wanted to get a 50% chance of launching a rocket successfully on the first time (at any reasonable cost) you would need to do experiments.
The equivocation between "no opportunity to experiment" and "can't retry if you fail" is doing all the work in this argument.
>Instead it just means that Bob shouldn't rely on his company doing the fastest and easiest thing and having it turn out fine. Instead Bob should expect to make sacrifices, either burning down a technical lead or operating in (or helping create) a regulatory environment where the fastest and easiest option isn't allowed.
The above feels so bizarre that I wonder if you're trying to reach Elon Musk personally. If so, just reach out to him. If we assume there's no self-reference paradox involved, we can safely reject your proposed alternatives as obviously impossible; they would have zero credibility even if AI companies weren't in an arms race, which appears impossible to stop from the inside unless all the CEOs involved can meet at Bohemian Grove.
Perhaps I'm missing something obvious, and just continuing the misunderstanding, but...
It seems to me that if you're the sort of thing capable of one-shotting Starship launches, you don't just hang around doing so. You tackle harder problems. The basic Umeshism: if you're not failing sometimes, you're not trying hard enough problems.
Even the "existential" risk of SpaceX getting permanently and entirely shut down, or just Starship getting shut down, is much closer in magnitude to the payoff than is the case in AI risk scenarios.
Some problems are well calibrated to our difficulties, because we basically understand them and there's a feedback loop providing at least rough calibration. AI is not such a problem, rockets are, and so the analogy is a bad analogy. The problem isn't just one of communication, the analogy breaks for important and relevant reasons.
This is extremely true for hypercompetitive domains like writing tweets that do well.
The basic Umeshism: if you're not failing sometimes, you're not trying hard enough problems.
Well, or you're trying problems that you can't afford to fail at. If a trapeze artist doesn't fall off 50% of his no-net performances, should they try a harder performance?
That's the point. SpaceX can afford to fail at this; the decision makers know it. Eliezer can afford to fail at tweet writing and knows it. So they naturally ratchet up the difficulty of the problem until they're working on problems that maximize their expected return (in utility, not necessarily dollars). At least approximately. And then fail sometimes.
Or, for the trapeze artist... how long do they keep practicing? Do they do the no-net route when they estimate their odds of failure are 1/100? 1/10,000? 1e-6? They don't push them to zero, at some point they make a call and accept the risk and go.
Why should it be any different for an entity that can one-shot those problems? Why would they wait until they had invested enough effort to one-shot it, and then do so? When instead they could just... invest less effort, attempt it earlier, take some risk of failure, and reap a greater expected reward?
The analogy suggests that entities capable of one-shotting problem X (presumably, by putting in a lot of preparatory effort, running analysis, and so on) will do so. I don't think that's true.
(And I think the tweet writing problem is actually an especially strong example of this -- hypercompetitive social environments absolutely produce problems calibrated to be barely-solvable and that scale with ability, assuming your capability is in line with the other participants, which I assert is the case for Eliezer. he might be smarter / better at writing tweets than most, but he's not that far ahead.)
SpaceX can afford to fail at this; the decision makers know it.
Well, to be fair, the post is making the point that perhaps they can afford less than they thought. They completely ignored the effects their failure would have on the surrounding communities (which reeks highly of conceit on their part) and now they're paying the price with the risk of a disproportionate crackdown. It'll cost them more than they expected for sure.
The analogy suggests that entities capable of one-shotting problem X (presumably, by putting in a lot of preparatory effort, running analysis, and so on) will do so. I don't think that's true.
You're right, but the analogy is also saying I think that if we were capable enough to one-shot AGI (which according to EY we need to), then we surely would be capable enough to also very cheaply one-shot a Starship launch, because it's a simpler problem. Failure may be a good teacher, but it's not a free one. If you're competent enough to one-shot things with only a tiny bit of additional effort, you do it. Having this failure rate instead shows that you're already straining yourself at the very limit of what's possible, and the very limit is apparently... launching big rockets. Which while awesome in a general sense is really, really child's play compared to getting superhuman AGI right, and on that estimate I do agree with Yud.
I would add that a huge part of solving alignment requires being keenly aware of and caring about human values in general, and in that sense, the sort of mindset that leads to not foreseeing or giving a damn about how pissed off people would be by clouds of launchpad dust in their towns really isn't the culture you want to bring into AGI creation.
Perhaps I'm missing some obvious failing that is well known but wouldn't an isolated VR environment allow failed first tries without putting the world at risk? We probably don't have sufficiently advanced environments currently and we don't have any guarantee that everyone developing AGI would actually limit their efforts to such environments.
But I don't think I've ever seen such an approach suggested. Is there some failure point I'm missing?
Of course such approaches are suggested, for example LOVE in a simbox is all you need. The main argument has been whether the simulation can be realistic, and whether it can be secure.
Thanks. I'm surprised there are not more obvious/visibe efforts, and results/finding, along that line of approach.
I would say a sandbox is probably not the environment I would choose. I would suggest, at least once someone thinks they might actually be testing a true AGI, a physically isolated system 100% self contained and disconnected from all power and communications networks in the real world.
Perhaps it won’t be, and SpaceX will never fly again
I think you made a typo, what is grounded, and this is coherent with the articles you link to, is the Starship only. According to wikipedia, three falcon 9 have launched the 27 April, 28 April and 1 May, so obviously SpaceX keeps flying.
One thing though is that the reason why there is an investigation in the Space X launch is that it vastly exceeded the estimate of possible damage. While no one was hurt directly, the cloud of debris from the pulverized launchpad apparently reached way further than projected, including inhabited areas. That means at best people having to clean their cars and windows (which is only an annoyance but still one that didn't need to happen, though it could be easily fixed by Space X paying for the cleaning crews) and at worst health issues due to the dust and any possibly toxic components within it.
So that is, straight up, Space X underestimating a risk and underestimating second-order effects (such as, if your supposedly innocuous experimental launch that you flaunt openly is following a "fail fast and learn fast" methodology happens to cause trouble to people who have nothing to do with you, those people will be annoyed and will get back at you), so that the resulting mistake may indeed cost them way more than anticipated. Which is interesting in the framework of the analogy because while you probably can't send an experimental rocket straight into orbit at first try, you probably also can at least do basic engineering to ensure it doesn't blow up its own launchpad; this was simply deemed unnecessary in the name of iterating quicker and testing multiple uncertain things at once.
Previously (Eliezer Yudkowsky): The Rocket Alignment Problem.
Recently we had a failure to launch, and a failure to communicate around that failure to launch. This post explores that failure to communicate, and the attempted message.
Some Basic Facts about the Failed Launch
Elon Musk’s SpaceX launched a rocket. Unfortunately, the rocket blew up, and failed to reach orbit. SpaceX will need to try again, once the launch pad is repaired.
There was various property damage, but from what I have seen no one was hurt.
I’ve heard people say the whole launch was a s***show and the grounding was ‘well earned.’ How the things that went wrong were absurd, SpaceX is the worst, and so on.
The government response? SpaceX Starship Grounded Indefinitely By FAA.
Perhaps this will be a standard investigation, and several months later everything will be fine. Perhaps it won’t be, and SpaceX will never fly again because those in power dislike Elon Musk and want to seize this opportunity.
There are also many who would be happy that humans won’t get to go into space, if in exchange we get to make Elon Musk suffer, perhaps including those with power. Other signs point to the relationships with regulators remaining strong, yet in the wake of the explosion the future of Starship is for now out of SpaceX’s hands.
A Failure to Communicate
In light of these developments, before we knew the magnitude or duration of the grounding, Eliezer wrote the following, which very much failed in its communication.
Eliezer has been using the rocket metaphor for AI alignment for a while, see The Rocket Alignment Problem.
I knew instantly both what the true and important point was here, and also the way in which most people would misunderstand.
The idea is that in order to solve AGI alignment, you need to get it right on the first try. If you create an AGI and fail at its alignment, you do not get to scrap the experiment, learn from what happened. You do not get to try, try again until you succeed, the way we do for things like rocket launches.
That is because you created an unaligned AGI. Which kills you.
Eliezer’s point here was to say that the equivalent difficulty level and problem configuration to aligning an AGI successfully would be if Musk stuck the landing on Starship on the first try. His first attempt to launch a rocket would need to end up safely back on the launching pad.
The problem is that the rocket blowing up need not even get one person killed, let alone kill everyone. The rocket blowing up caused a bunch of property damage. Why Play in Hard Mode (or Impossible Mode) when you only need to Play in Easy Mode?
Here were two smart people pointing out exactly this issue.
And Paul Graham.
Even if Elon could have done enough extra work, such that he stuck the landing the first time reliably, that doesn’t mean he should have spent the time and effort to do that.
The question is whether this is an illustration that we can’t solve something like this, or merely that we choose not to, or perhaps didn’t realize we needed to?
Eliezer’s intended point was not that Elon should have gotten this right on the first try, it was that if Elon had to get it right on the first try, that is not the type of thing humans are capable of doing.
Clearly, the communication attempt failed. Even knowing what Eliezer intended to say, I still primarily experienced the same reaction as Paul and Jeffrey, although they’d already pointed it out so I didn’t have to say anything. Eliezer post-mortems:
Yeah, sadly that simply is failing and knowing How the Internet Works.
Perhaps Getting it Right The First Time is Underrated
What if that’s also not how government works? Oh no.
If you don’t get your rocket right on the first try, you see, the FAA will, at a minimum, ground you until they’ve done a complete investigation. The future is, in an important sense, potentially out of your hands.
Some people interpreted or framed this as “Biden Administration considering the unprecedented step of grounding Starship indefinitely,” citing previous Democratic attacks on Elon Musk. That appears not to be the case, as Manifold Markets still has Starship at 73% to reach orbit this year.
Given risk of another failure has to account for a lot of the 27% chance of failure, that is high confidence that the FAA will act reasonably.
In the absence of considering the possibility of a hostile US government using this to kill the whole program, everyone agreed that it was perfectly reasonable to risk a substantial chance that the unmanned rocket would blow up. Benefits exceed costs.
However, there existed an existential risk. If you don’t get things to go right on the first try, an entity far more powerful than you are might emerge, that has goals not all that well aligned with human values, and that does not respect your property rights or the things that have value in the universe, and you might lose control of the future to it, destroying all your hopes.
The entity in question, of course, is the Federal Government. Not AGI.
It seems not to be happening in this case, yet it is not hard to imagine it as a potential outcome, and thus a substantial risk.
Thus, while the costs of failure were not existential to the project let alone to Musk, they could have been existential to the project. There were indeed quite large incentives to get this right on the first try.
Instead, as I understand what happened, multiple important things went wrong. Most importantly, the launch went off without the proper intended launch pad, purely because no one involved wanted to wait for the right launch pad to be ready.
That’s without being in much of a race with anyone.
The Performance of an Impossibility
Eliezer also writes:
It is central to Eliezer Yudkowsky’s model that we need to solve AGI alignment on the first try, in the sense that:
If one of these four claims is false, you have a much much easier problem, one that Eliezer himself thinks becomes eminently solvable.
There are a number of other potential ‘ways out’ of this problem as well. The most hopeful one, perhaps, is: Perhaps we have existing aligned systems sufficiently close in power to combat the first AGI where our previous alignment techniques fail, so we can have a successful failure rather than an existentially bad failure. In a sense, this too would be solving the alignment problem on the first try – we’ve got sufficiently aligned sufficiently powerful systems, passing their first test. Still does feel importantly different and perhaps easier.
Takeaways
I don’t know enough to say to what extent SpaceX (or the FAA?) was too reckless or incompetent or irresponsible with regard to the launch. Hopefully everything still works out fine, the FAA lets them launch again and the next one succeeds. The incident does provide some additional evidence that there will be that much more pressure to launch new AI and even AGI systems before they are fully ready and fully tested. We have seen this with existing systems, where there were real and important safety precautions taken towards some risks, but in important senses the safeguards against existential concerns and large sudden jumps in capabilities were effectively fake – we did not need them this time, but if we had, they would have failed.
What about the case that Eliezer was trying to make about AI?
The important takeaway here does not require Eliezer’s level of confidence in the existential costs of failure. All that is required is to understand this, which I strongly believe to be true:
What can we learn from the failure to communicate? As usual, that it is good if the literal parsing of one’s words results in a true statement, but that is insufficient for good communication. One must ask what reaction a person reading will have to the thing you have written, whether that reaction is fair or logical or otherwise, and adjust until that reaction is reliably something you want – saying ‘your reaction is not logical’ is Straw Vulcan territory.
Also, one must spell out far more than one realizes, especially on Twitter and especially when discussing such topics. Even with all that I write, I worry I don’t do enough of this. When I compare to the GOAT of columnists, Matt Levine, I notice him day in and day out patiently explaining things over and over. After many years I find it frustrating, yet I would never advise him to change.
Oh, and Stable Diffusion really didn’t want to let me have a picture of a rocket launch that was visibly misaligned. Wonder if it is trying to tell me something.