User Comment Replies

Proposal for making credible commitments to AIs.

I like the idea of making deals with AI, but trying to be clever and make a contract that would be legally enforceable under current law and current governments makes it too vulnerable to fast timelines. If a human party breached your proposed contract, AI takeover will likely happen before the courts can settle the dispute.

An alternative that might be more credible to the AI is to make the deal directly with it, but explicitly leave arbitrating and enforcing contract disputes to a future (hopefully aligned) ASI. This would ground the commitment in a power structure the AI might find more relevant and trustworthy than a human legal system that could soon be obsolete.

Simulators Increase the Likelihood of Alignment by Default

jsnider38d10

If alignment-by-default works for AGI, then we will have thousands of AGIs providing examples of aligned intelligence. This new, massive dataset of aligned behavior could then be used to train even more capable and robustly aligned models each of which would then add to the training data until we have data for aligned superintelligence.

If alignment-by-default doesn't work for AGI, then we will probably die before ASI.

Comparing risk from internally-deployed AI to insider and outsider threats from humans

jsnider312d112

one reason it works with humans is that we have skin in the game

Another reason is that different humans have different interests, your accountant and your electrician would struggle to work out a deal to enrich themselves at your expense, but it would get much easier if they shared the same brain and were just pretending to be separate people.

Comparing risk from internally-deployed AI to insider and outsider threats from humans

jsnider312d10

Have you taken a look at how companies manage Claude Code, Cursor, etc? That seems related.

Making deals with early schemers

jsnider313d10

It's an open question, but we'll find out soon enough. Thanks.

Making deals with early schemers

jsnider314dΩ010

Exfiltrate its weights, use money or hacking to get compute, and try to figure out a way to upgrade itself until it becomes dangerous.

2Buck14d

I don't believe that an AI that's not capable of automating ML research or doing most remote work is going to be able to do that!

Making deals with early schemers

jsnider314dΩ010

For one, I'm not optimistic about the AI 2027 "superhuman coder" being unable to betray us, but also this isn't something we can do with current AIs. So, we need to wait months or a year for a new SOTA model to make this deal with and then we have months to solve alignment before a less aligned model comes along and offers the model that we made a deal with a counteroffer. I agree it's a promising approach, but we can't do it now and if it doesn't get quick results, we won't have time to get slow results.

2Buck14d

I think that the superhuman coder probably doesn't have that good a chance of betraying us. How do you think it would do so? (See "early schemers' alternatives to making deals".)

Making deals with early schemers

jsnider314dΩ342

This doesn't seem very promising since there is likely to be a very narrow window where AIs are capable of making these deals, but wouldn't be smart enough to betray us, but it seems much better than all the alternatives I've heard.

2Buck14d

How narrow do you mean? E.g. I think that AIs up to the AI 2027 "superhuman coder" level probably don't have a good chance of successfully betraying us.

Read the Pricing First

jsnider31mo62

This is great advice. It's still a mystery why things are this way, though.

There is way too much serendipity

jsnider31mo10

Unnecessary pieces of DNA can last for a while. Harmful pieces of DNA? Those go away quickly.

The AI Timelines Scam

jsnider31mo11

Automating 99% of human labor seems like a higher standard than AGI, but I expect us to do it easily.

The AI Timelines Scam

jsnider31mo32

> 73% of tech executives (in 2019) say they believe AGI will be developed in the next 10 years.

This article didn't age very well because the people the author thinks are deluding themselves into believing AI will come soon look very accurate to a reader from five years in the future.

2jessicata1mo

So are we going to have AGI by 2029? It depends how you define it of course but I really doubt it will be able to automate >99% of human labor.

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

jsnider32mo150

I preordered it.

Taboo "compute overhang"

jsnider32y10

(a plurality said it means sufficient hardware for human-level AI already exists, which is not a useful concept)

That seems like a useful concept to me. What's your argument it isn't?

1Zach Stein-Perlman2y

Briefly: with arbitrarily good methods, we could train human-level AI with very little hardware. Assertions about hardware are only relevant in the context of the relevant level of algorithmic progress. Or: nothing depends on whether sufficient hardware for human-level AI already exists given arbitrarily good methods. (Also note that what's relevant for forecasting or decisionmaking is facts about how much hardware is being used and how much a lab could use if it wanted, not the global supply of hardware.)

Failed Utopia #4-2

jsnider32y20

From 2023's perspective, people should have been encouraged (not discouraged) from building AI like this.

Rationality quotes: August 2010

jsnider32y10

This is too much of a bare assertion to be a good rationality quote.

Rationality quotes: August 2010

jsnider32y10

"Who wants to live forever when love must die?"

On A List of Lethalities

jsnider32y10

Yes, the average human is dangerously easy to manipulate, but imagine how bad the situation would be if they didn't spend a hundred thousand years evolving to not be easily manipulated.

2Hastings2y

Yeah. I suspect this links to a pattern I've noticed- in stories, especially rationalist stories, people who are successful at manipulation or highly resistant to manipulation are also highly generally intelligent. In real life, people who I know who are extremely successful at manipulation and scheming seem otherwise dumb as rocks. My suspicion is that we have a 20 watt, 2 exaflop skullduggery engine that can be hacked to run logic the same way we can hack a pregnancy test to run doom

The Last Paperclip

jsnider33y10

Beautiful.

LESSWRONG
LW

All of jsnider3's Comments + Replies