Garrett Baker

Independent alignment researcher

Sequences

Isolating Vector Additions

Wiki Contributions

Comments

Does the possibility of China or Russia being able to steal advanced AI from labs increase or decrease the chances of great power conflict?

An argument against: It counter-intuitively decreases the chances. Why? For the same reason that a functioning US ICBM defense system would be a destabilizing influence on the MAD equilibrium. In the ICBM defense circumstance, after the shield is put up there would be no credible threat of retaliation America's enemies would have if the US were to launch a first-strike. Therefore, there would be no reason (geopolitically) for America to launch a first-strike, and there would be quite the reason to launch a first strike: namely, the shield definitely works for the present crop of ICBMs, but may not work for future ICBMs. Therefore America's enemies will assume that after the shield is put up, America will launch a first strike, and will seek to gain the advantage while they still have a chance by launching a pre-emptive first-strike.

The same logic works in reverse. If Russia were building a ICBM defense shield, and would likely complete it in the year, we would feel very scared about what would happen after that shield is up.

And the same logic works for other irrecoverably large technological leaps in war. If the US is on the brink of developing highly militaristically capable AIs, China will fear what the US will do with them (imagine if the tables were turned, would you feel safe with Anthropic & OpenAI in China, and DeepMind in Russia?), so if they don't get their own versions they'll feel mounting pressure to secure their geopolitical objectives while they still can, or otherwise make themselves less subject to the threat of AI (would you not wish the US would sabotage the Chinese Anthropic & OpenAI by whatever means if China seemed on the brink?). The fast the development, the quicker the pressure will get, and the more sloppy & rash China's responses will be. If its easy for China to copy our AI technology, then there's much slower mounting pressure.

This seems contrary to how much of science works. I expect if people stopped talking publicly about what they're working on in alignment, we'd make much less progress, and capabilities would basically run business as usual.

The sort of reasoning you use here, and that my only response to it basically amounts to "well, no I think you're wrong. This proposal will slow down alignment too much" is why I think we need numbers to ground us.

Yeah, there are reasons for caution. I think it makes sense for those concerned or non-concerned to make numerical forecasts about the costs & benefits of such questions, rather than the current state of everyone just comparing their vibes against each other. This generalizes to other questions, like the benefits of interpretability, advances in safety fine-tuning, deep learning science, and agent foundations.

Obviously such numbers aren't the end-of-the-line, and like in biorisk, sometimes they themselves should be kept secret. But it still seems a great advance.

If anyone would like to collaborate on such a project, my DMs are open (not so say this topic is covered, this isn't exactly my main wheelhouse).

I don't really know what people mean when they try to compare "capabilities advancements" to "safety advancements". In one sense, its pretty clear. The common units are "amount of time", so we should compare the marginal (probablistic) difference between time-to-alignment and time-to-doom. But I think practically people just look at vibes.

For example, if someone releases a new open source model people say that's a capabilities advance, and should not have been done. Yet I think there's a pretty good case that more well-trained open source models are better for time-to-alignment than for time-to-doom, since much alignment work ends up being done with them, and the marginal capabilities advance here is zero. Such work builds on the public state of the art, but not the private state of the art, which is probably far more advanced.

I also don't often see people making estimates of the time-wise differential impacts here. Maybe people think such things would be exfo/info-hazardous, but nobody even claims to have estimates here when the topic comes up (even in private, though people are glad to talk about their hunches for what AI will look like in 5 years, or the types of advancements necessary for AGI), despite all the work on timelines. Its difficult to do this for the marginal advance, but not so much for larger research priorities, which are the sorts of things people should be focusing on anyway.

There's also the problem of: what do you mean by "the human"? If you make an empowerment calculus that works for humans who are atomic & ideal agents, it probably breaks once you get a superintelligence who can likely mind-hack you into yourself valuing only power. It never forces you to abstain from giving up power, since if you're perfectly capable of making different decisions, but you just don't.

Another problem, which I like to think of as the "control panel of the universe" problem, is where the AI gives you the "control panel of the universe", but you aren't smart enough to operate it, in the sense that you have the information necessary to operate it, but not the intelligence. Such that you can technically do anything you want--you have maximal power/empowerment--but the super-majority of buttons and button combinations you are likely to push result in increasing the number of paperclips.

I agree people shouldn’t use the word cruxy. But I think they should instead just directly say whether a consideration is a crux for them. I.e. whether a proposition, if false, would change their mind.

Edit: Given the confusion, what I mean is often people use “cruxy” in a more informal sense than “crux”, and label statements that are similar to statements that would be a crux but are not themselves a crux “cruxy”. I claim here people should stick to the strict meaning.

I agree that contrarians 'round these parts are often wrong more often than academic consensus, but the success of their predictions about AI, crypto, and COVID prove to me its still worth listening to them, trying to be able to think like them, and probably taking their investment advice. That is, when they're right, they're right big-time.

[edit: nevermind I see you already know about the following quotes. There's other evidence of the influence in Sedley's book I link below]

In De Reum Natura around line 716:

Add, too, whoever make the primal stuff Twofold, by joining air to fire, and earth To water; add who deem that things can grow Out of the four- fire, earth, and breath, and rain; As first Empedocles of Acragas, Whom that three-cornered isle of all the lands Bore on her coasts, around which flows and flows In mighty bend and bay the Ionic seas, Splashing the brine from off their gray-green waves. Here, billowing onward through the narrow straits, Swift ocean cuts her boundaries from the shores Of the Italic mainland. Here the waste Charybdis; and here Aetna rumbles threats To gather anew such furies of its flames As with its force anew to vomit fires, Belched from its throat, and skyward bear anew Its lightnings' flash. And though for much she seem The mighty and the wondrous isle to men, Most rich in all good things, and fortified With generous strength of heroes, she hath ne'er Possessed within her aught of more renown, Nor aught more holy, wonderful, and dear Than this true man. Nay, ever so far and pure The lofty music of his breast divine Lifts up its voice and tells of glories found, That scarce he seems of human stock create.

Or for a more modern translation from Sedley's Lucretius and the Transformation of Greek Wisdom

Of these [sc. the four-element theorists] the foremost is
Empedocles of Acragas, born within the three-cornered terres-
trial coasts of the island [Sicily] around which the Ionian Sea,
flowing with its great windings, sprays the brine from its green
waves, and from whose boundaries the rushing sea with its
narrow strait divides the coasts of the Aeolian land with its
waves. Here is destructive Charybdis, and here the rumblings of
Etna give warning that they are once more gathering the wrath
of their flames so that her violence may again spew out the fire
flung from her jaws and hurl once more to the sky the lightning
flashes of flame. Although this great region seems in many ways
worthy of admiration by the human races, and is said to deserve
visiting for its wealth of good things and the great stock of men
that fortify it, yet it appears to have had in it nothing more
illustrious than this man, nor more holy, admirable, and pre-
cious. What is more, the poems sprung from his godlike mind
call out and expound his illustrious discoveries, so that he
scarcely seems to be born of mortal stock.

I find this very hard to believe. Shouldn't Chinese merchants have figured out eventually, traveling long distances using maps, that the Earth was a sphere? I wonder whether the "scholars" of ancient China actually represented the state-of-the-art practical knowledge that the Chinese had.

Nevertheless, I don't think this is all that counterfactual. If you're obsessed with measuring everything, and like to travel (like the Greeks), I think eventually you'll have to discover this fact.

Load More