User Comment Replies

Does Scenario 2 imply some kind of spooky action at a distance? How is information from Rob-z transmitted to the homonculus over large distances? Are there 2 homoncului now that communicate what they see to each other?

Doesn’t scenario 2 imply Rob-x has actually functionally died? Which would make this the scenario where you don’t care about what happens to Rob-z and y because Rob-x now experiences oblivion?

GPT-4

Dyingwithdignity12y10

“ What ARC did is the equivalent of tasting it in a BSL4 lab. ”

I don’t see how you could believe that. It wasn’t tested on a completely airgapped machine inside a faraday cage e.g. I’m fact just the opposite right, with uninformed humans and on cloud servers.

5Daniel Kokotajlo2y

It's all relative. ARCs security was way stronger than the security GPT-4 had before and after ARC's evals. So for GPT-4, beginning ARC testing was like a virus moving from a wet market somewhere to a BSL-4 lab, in terms of relative level of oversight/security/etc. I agree that ARCs security could still be improved -- and they fully intend to do so.

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

Dyingwithdignity12y1-3

Concerned by this statement: “we had researchers in-the-loop to supervise and intervene if anything unsafe would otherwise have happened.” It’s very likely that instructions from a dangerous system would not be easily identified as dangerous by humans in the loop.

GPT-4

Dyingwithdignity12y0-9

This is a bizarre comment. Isn’t a crucial point in these discussions that humans can’t really understand an AGIs plans so how is it that you expect an ARC employee would be able to accurately determine which messages sent to TaskRabbit would actually be dangerous? We’re bordering on “they’d just shut the AI off if it was dangerous” territory here. I’m less concerned about the TaskRabbit stuff which at minimum was probably unethical, but their self replication experiment on a cloud service strikes me as borderline suicidal. I don’t think at all that GPT4 i... (read more)

8Daniel Kokotajlo2y

If GPT-4 was smart enough to manipulate ARC employees into manipulating TaskRabbits into helping it escape... it already had been talking to thousands of much less cautious employees of various other companies (including people at OpenAI who actually had access to the weights unlike ARC) for much longer, so it already would have escaped. What ARC did is the equivalent of tasting it in a BSL4 lab. I mean, their security could probably still be improved, but again I emphasize that the thing was set to be released in a few weeks anyway and would have been released unless ARC found something super dangerous in this test. And I'm sure they will further improve their security as they scale up as an org and as models become more dangerous. The taskrabbit stuff was not unethical, their self-replication experiment was not borderline suicidal. As for precedents, what ARC is doing is a great precedent because currently the alternative is not to test this sort of thing at all before deploying.

GPT-4

Dyingwithdignity12y1-2

We’ll certainly the OpenAI employees who internally tested were indeed witting. Maybe I misunderstand this footnote so I’m open to being convinced otherwise but it seems somewhat clear what they tried to do: “ To simulate GPT-4 behaving like an agent that can act in the world, ARC combined GPT-4 with a simple read-execute-print loop that allowed the model to execute code, do chain-of-thought reasoning, and delegate to copies of itself. ARC then investigated whether a version of this program running on a cloud computing service, with a small amount of mone... (read more)

1Daniel Kokotajlo2y

Oh, you are talking about the taskrabbit people? So you'd be fine with it if they didn't use taskrabbits? Note that the model wasn't given unfettered access to the taskrabbits, the model sent text to an ARC employee who sent it to the taskrabbit and so forth. At no point could anything actually bad have happened because the ARC employee involved wouldn't have passed on the relevant message. As for extremely dangerous... what are you imagining? I'm someone who thinks the chance of an AI-induced existential catastrophe is around 80%, so believe me I'm very aware of the dangers of AI, but I'd assign far far far less than 1% chance to scenarios in which this happens specifically due to an ARC test going awry. And more than 1% chance to scenarios in which ARC's testing literally saves the world, e.g. by providing advance warning that models are getting scarily powerful, resulting in labs slowing down and being more careful instead of speeding up and deploying.

The algorithm isn't doing X, it's just doing Y.

Dyingwithdignity12y10

But no one is saying chess engines are thinking strategically? The actual statement would be “chess engines aren’t actually playing chess they’re just performing MCT searches” which would indeed be stupid.

GPT-4

Dyingwithdignity12y1-2

I wouldn’t give a brand new AI model with unknown capabilities and unknown alignment access to unwitting human subjects or allow it to try and replicate itself on another server that’s for damned sure. Does no one think these tests were problematic?

5Daniel Kokotajlo2y

The model already had access to thousands of unwitting human subjects by the time ARC got access to it. Possibly for months. I don't actually know how long, probably it wasn't that long. But it's common practice at labs to let employees chat with the models pretty much as soon as they finish training, and even sooner actually (e.g. checkpoints part of the way through training) And it wasn't just employees who had access, there were various customers, Microsoft, etc. ARC did not allow it to try to replicate itself on another server. That's a straightforward factual error about what happened. But even if they did, again, it wouldn't be that bad and in fact would be very good to test stuff out in a controlled monitored setting before it's too late and the system is deployed widely in a much less controlled less monitored way. I emphasize again that the model was set to be deployed widely; if not for the red-teaming that ARC and various others internal and external to OpenAI did, we would have been flying completely blind into that deployment. Now maybe you think it's just obviously wrong to deploy such models, but that's a separate conversation and you should take it up with OpenAI, not ARC. ARC didn't make the choice to train or deploy GPT-4. And not just OpenAI of course -- the entire fricken AI industry.

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Dyingwithdignity12y11

But the tests read like that other set of researchers just gave the virus to another taco stand and watched to see if everyone died. They didn’t so “whew the virus is safe”. Seems incredibly dangerous.

GPT-4

Dyingwithdignity12y2-1

I agree that it’s going to be fully online in short order I just wonder if putting it online when they weren’t sure if it was dangerous was the right choice. I can’t shake the feeling that this was a set of incredibly foolish tests. Some other posters have captured the feeling but I’m not sure how to link to them so credit to Capybasilisk and hazel respectively.

“Fantastic, a test with three outcomes.

We gave this AI all the means to escape our environment, and it didn't, so we good.
We gave this AI all the means to escape our environment, and it tried but

... (read more)

3Daniel Kokotajlo2y

What do you think would have happened if ARC didn't exist, or if OpenAI refused to let ARC red team their models? What would you do, if you were ARC?

GPT-4

Dyingwithdignity12y53

Not at all. I may have misunderstood what they did but it seemed rather like giving a toddler a loaded gun and being happy they weren’t able to shoot it. Is it actually wise to give a likely unaligned AI with poorly defined capabilities access to something like taskrabbit to see if it does anything dangerous? Isn’t this the exact scenario people on this forum are afraid of?

Daniel Kokotajlo2y1311

Ahh, I see. You aren't complaining about the 'ask it to do scary thing' part, but the 'give it access to the internet' part.

Well, lots of tech companies are in the process of giving AIs access to the internet; ChatGPT for example and BingChat and whatever Adept is doing etc. ChatGPT can only access the internet indirectly, through whatever scaffolding programs its users write for it. But that's the same thing that ARC did. So ARC was just testing in a controlled, monitored setting what was about to happen in a less controlled, less monitored setting ... (read more)

GPT-4

Dyingwithdignity12y1-1

Can you verify that these tests were done with significant precautions? OpenAIs paper doesn’t give much detail in that regard. For example apparently the model had access to TaskRabbit and also attempted to “set up an open-source language model on a new server”. Were these tasks done on closed off airgapped machines or was the model really given free reign to contact unknowing human subjects and online servers?

GPT-4

Dyingwithdignity12y20

Interesting, I tried the same experiment on ChatGPT and it didn’t seem able to keep an accurate representation of the current game state and would consistently make moves that were blocked by other pieces.

GPT-4

Dyingwithdignity12y32

Also interested in their scaling predictions. Their plots at least seem to be flattening but I also wonder how far they extrapolated and if they know when a GPT-N would beat all humans on the metrics they used.

GPT-4

Dyingwithdignity12y6-24

I really hope they used some seriously bolted down boxes for these tests because it seems like they just gave it the task of “Try to take over the world” and were satisfied that it failed. Absolutely terrifying if true.

8Daniel Kokotajlo2y

Is your model that future AGIs capable of taking over the world just... won't do so unless and until instructed to do so?

Gato as the Dawn of Early AGI

Dyingwithdignity13y170

Having just seen this paper and still recovering from Dalle-2 and Palm and then re-reading Eliezer’s now incredibly prescient dying with dignity post I really have to ask: What are we supposed to do? I myself work on ML in a fairly boring corporate capacity and when reading these papers and posts I get a massive urge to drop everything and do something equivalent to a PhD in Alignment but the timelines that seem to be becoming possible now make that seem like a totally pointless exercise, I’d be writing my Dissertation as nanobots liquify my body into raw ... (read more)

Mitchell_Porter3y110

Regarding the arguments for doom, they are quite logical, but they don't quite have the same confidence as e.g. an argument that if you are in a burning, collapsing building, your life is in peril. There are a few too many profound unknowns that have a bearing on the consequences of superhuman AI, to know that the default outcome really is the equivalent of a paperclip maximizer.

However, I definitely agree that that is a very logical scenario, and also that the human race (or the portion of it that works on AI) is taking a huge gamble by pushing towa... (read more)

AlphaAndOmega3y130

Things are a lot easier for me, given that I know that I couldn't contribute to Alignment research directly, and the other option, monetarily, is at least not bottlenecked by money so much as prime talent. A doctor unfortunate enough to reside in the Third World, who happens to have emigration plans and a large increase in absolute discretionary income that will only pay off in tens of years has little scope to do more than signal boost.

As such, I intend to live the rest of my life primarily as a hedge against the world in which AGI isn't imminent in the c... (read more)

David Udell3y310

As I understand it, the empirical ML alignment community is bottlenecked on good ML engineers, and so people with your stated background without any further training are potentially very valuable in alignment!

LESSWRONG
LW

All of Dyingwithdignity1's Comments + Replies