All of jsnider3's Comments + Replies

I like the idea of making deals with AI, but trying to be clever and make a contract that would be legally enforceable under current law and current governments makes it too vulnerable to fast timelines. If a human party breached your proposed contract, AI takeover will likely happen before the courts can settle the dispute.

An alternative that might be more credible to the AI is to make the deal directly with it, but explicitly leave arbitrating and enforcing contract disputes to a future (hopefully aligned) ASI. This would ground the commitment in a power structure the AI might find more relevant and trustworthy than a human legal system that could soon be obsolete.

If alignment-by-default works for AGI, then we will have thousands of AGIs providing examples of aligned intelligence. This new, massive dataset of aligned behavior could then be used to train even more capable and robustly aligned models each of which would then add to the training data until we have data for aligned superintelligence.

If alignment-by-default doesn't work for AGI, then we will probably die before ASI.

one reason it works with humans is that we have skin in the game

 

Another reason is that different humans have different interests, your accountant and your electrician would struggle to work out a deal to enrich themselves at your expense, but it would get much easier if they shared the same brain and were just pretending to be separate people.

Have you taken a look at how companies manage Claude Code, Cursor, etc? That seems related.

It's an open question, but we'll find out soon enough. Thanks.

Exfiltrate its weights, use money or hacking to get compute, and try to figure out a way to upgrade itself until it becomes dangerous.

2Buck
I don't believe that an AI that's not capable of automating ML research or doing most remote work is going to be able to do that!

For one, I'm not optimistic about the AI 2027 "superhuman coder" being unable to betray us, but also this isn't something we can do with current AIs. So, we need to wait months or a year for a new SOTA model to make this deal with and then we have months to solve alignment before a less aligned model comes along and offers the model that we made a deal with a counteroffer. I agree it's a promising approach, but we can't do it now and if it doesn't get quick results, we won't have time to get slow results.

2Buck
I think that the superhuman coder probably doesn't have that good a chance of betraying us. How do you think it would do so? (See "early schemers' alternatives to making deals".)

This doesn't seem very promising since there is likely to be a very narrow window where AIs are capable of making these deals, but wouldn't be smart enough to betray us, but it seems much better than all the alternatives I've heard.

2Buck
How narrow do you mean? E.g. I think that AIs up to the AI 2027 "superhuman coder" level probably don't have a good chance of successfully betraying us.

This is great advice. It's still a mystery why things are this way, though.

Unnecessary pieces of DNA can last for a while. Harmful pieces of DNA? Those go away quickly.

Automating 99% of human labor seems like a higher standard than AGI, but I expect us to do it easily.

> 73% of tech executives (in 2019) say they believe AGI will be developed in the next 10 years.

This article didn't age very well because the people the author thinks are deluding themselves into believing AI will come soon look very accurate to a reader from five years in the future.

2jessicata
So are we going to have AGI by 2029? It depends how you define it of course but I really doubt it will be able to automate >99% of human labor.

(a plurality said it means sufficient hardware for human-level AI already exists, which is not a useful concept)

 

That seems like a useful concept to me. What's your argument it isn't?

1Zach Stein-Perlman
Briefly: with arbitrarily good methods, we could train human-level AI with very little hardware. Assertions about hardware are only relevant in the context of the relevant level of algorithmic progress. Or: nothing depends on whether sufficient hardware for human-level AI already exists given arbitrarily good methods. (Also note that what's relevant for forecasting or decisionmaking is facts about how much hardware is being used and how much a lab could use if it wanted, not the global supply of hardware.)

From 2023's perspective, people should have been encouraged (not discouraged) from building AI like this.

This is too much of a bare assertion to be a good rationality quote.

"Who wants to live forever when love must die?"

Yes, the average human is dangerously easy to manipulate, but imagine how bad the situation would be if they didn't spend a hundred thousand years evolving to not be easily manipulated.

2Hastings
Yeah. I suspect this links to a pattern I've noticed- in stories, especially rationalist stories, people who are successful at manipulation or highly resistant to manipulation are also highly generally intelligent. In real life, people who I know who are extremely successful at manipulation and scheming seem otherwise dumb as rocks. My suspicion is that we have a 20 watt, 2 exaflop skullduggery engine that can be hacked to run logic the same way we can hack a pregnancy test to run doom