simonsimonsimon — LessWrong

We need to align the performance of some large task, a 'pivotal act' that prevents other people from building an unaligned AGI that destroys the world.

What is the argument for why it's not worth pursuing a pivotal act without our own AGI? I certainly would not say it was likely that current human actors could pull it off, but if we are in a "dying with more dignity" context anyway, it doesn't seem like the odds are zero.

My idea, which I'll include more as a demonstration of what I mean than a real proposal, would be to develop a "cause area" for influencing military/political institutions as quickly as possible. Yes, I know this sounds too slow and too hard and a mismatch with the community's skills, but consider:

Militaries/governments are "where the money is": they probably do have the coercive power necessary to perform a pivotal act, or at least buy a lot of time. If the PRC is able to completely lock down its giant sophisticated cities, it could probably halt domestic AI research. The West hasn't really tried to do extreme control in a while, for various good reasons, but (just e.g.) the WW2 war economy was awfully tightly managed. We are good at slowing stuff down with crazy red tape. Also there are a lot of nukes
- Yes, there are lots of reasons this is hard, but remember we're looking for hail marys.
"The other guy might develop massive offensive capability soon" is an extremely compelling narrative to normal people, and the culture definitely possesses the meme of "mad scientists have a crazy new weapon". Convincing some generals that we need to shut down TSMC or else China will build terminators might be easier than convincing ML researchers they are doing evil.
- Sure, if this narrative became super salient, it could possibly lead to a quicker technological arms-race dynamic, but there are other possible dynamics it might lead to, such as (just e.g.) urgency on non-proliferation, or urgency for preemptive military victory using current (non-AGI) tools.
- I know attempts to get normal people to agree with EA-type thinking have been pretty dispiriting, but I'm not sure how much real energy has gone into making a truly adequate effort, and I think the "military threat" angle might be a lot catchier to the right folks. The "they'll take our jobs" narrative also has a lot of appeal.
- Importantly, even if convincing people is impossible now, we could prepare for a future regime where we've gotten lucky and some giant smoke alarm event has happened without killing us. You can even imagine both white-hat and black-hat ways of making such an alarm more likely, which might be very high value.
- Again, remember we're looking for hail marys. When all you have is an out-of-the-money call option, more volatility is good.
The rationalist community's libertarian bent might create a blind spot here. Yes governments and militaries are incredibly dumb, but they do occasionally muddle their way into giant intentional actions.
Also with respect to biases, it smells a little bit like we are looking for an "AI-shaped key to unlock an AI-shaped lock", so we should make sure we are putting enough effort into non-AI pivotal actions even if my proposal here is wrong.

The Geometric Expectation

simonsimonsimon3y20

it's not intuitive to me when it's reasonable to apply geometric rationality in an arbitrary context.

e.g. if i offered you a coin flip where i give you $0.01 with p=50%, and $100 with q=50%, i get G = = $1, which like, obviously you would go bankrupt really fast valuing things this way.

in kelly logic, i'm instead supposed to take the geometric average of my entire wealth in each scenario, so if i start with $1000, I'm supposed to take $\sqrt{1000.01} \sqrt{1100}$ = $1048.81, which does the nice, intuitive thing of penalizing me a little vs. linear expectation for the added volatility.

but... what's the actual rule for knowing the first approach is wrong?

AGI Ruin: A List of Lethalities

simonsimonsimon4y119

The prototypical catastrophic AI action is getting root access to its datacenter

simonsimonsimon4y70

On the off chance we spend some time in a regime where preventable+detectable catastrophic actions might be attempted, it might be a good idea to somehow encourage the creation of a Giant Alarm which will alert previously skeptical experts that a catastrophe almost occurred and hopefully freak the right people out.

On saving one's world

simonsimonsimon4y10

The steelman version of flailing, I think, is being willing to throw a "hail mary" when you're about to lose anyway. If the expected outcome is already that you die, sometimes an action with naively negative value but fat tails can improve your position.

If different hail mary options are mutually exclusive, you definitely want to coordinate to pick the right one and execute it the best you can, but you also need to be willing to go for it at some point.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments