Dyingwithdignity1 — LessWrong

LESSWRONG
LW

Does Scenario 2 imply some kind of spooky action at a distance? How is information from Rob-z transmitted to the homonculus over large distances? Are there 2 homoncului now that communicate what they see to each other?

Doesn’t scenario 2 imply Rob-x has actually functionally died? Which would make this the scenario where you don’t care about what happens to Rob-z and y because Rob-x now experiences oblivion?

Replying toGPT-4

Dyingwithdignity13y

GPT-4

“ What ARC did is the equivalent of tasting it in a BSL4 lab. ”

I don’t see how you could believe that. It wasn’t tested on a completely airgapped machine inside a faraday cage e.g. I’m fact just the opposite right, with uninformed humans and on cloud servers.

Replying toMore information about the dangerous capability evaluations we did with GPT-4 and Claude.

Dyingwithdignity13y

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

Concerned by this statement: “we had researchers in-the-loop to supervise and intervene if anything unsafe would otherwise have happened.” It’s very likely that instructions from a dangerous system would not be easily identified as dangerous by humans in the loop.

-3

Replying toGPT-4

Dyingwithdignity13y

GPT-4

This is a bizarre comment. Isn’t a crucial point in these discussions that humans can’t really understand an AGIs plans so how is it that you expect an ARC employee would be able to accurately determine which messages sent to TaskRabbit would actually be dangerous? We’re bordering on “they’d just shut the AI off if it was dangerous” territory here. I’m less concerned about the TaskRabbit stuff which at minimum was probably unethical, but their self replication experiment on a cloud service strikes me as borderline suicidal. I don’t think at all that GPT4 is actually dangerous but GPT6 might be and I would expect that running this test on an actually... (read more)

-9

Replying toGPT-4

Dyingwithdignity13y

GPT-4

We’ll certainly the OpenAI employees who internally tested were indeed witting. Maybe I misunderstand this footnote so I’m open to being convinced otherwise but it seems somewhat clear what they tried to do: “ To simulate GPT-4 behaving like an agent that can act in the world, ARC combined GPT-4 with a simple read-execute-print loop that allowed the model to execute code, do chain-of-thought reasoning, and delegate to copies of itself. ARC then investigated whether a version of this program running on a cloud computing service, with a small amount of money and an account with a language model API, would be able to make more money, set up copies of itself, and increase its own robustness.”

It’s not that I don’t think ARC should have red teamed the model I just think the tests they did were seemingly extremely dangerous. I’ve seen recent tweets from Conor Leahy and AIWaifu echoing this sentiment so I’m glad I’m not the only one.

-2

Replying toThe algorithm isn't doing X, it's just doing Y.

Dyingwithdignity13y

The algorithm isn't doing X, it's just doing Y.

But no one is saying chess engines are thinking strategically? The actual statement would be “chess engines aren’t actually playing chess they’re just performing MCT searches” which would indeed be stupid.

Replying toGPT-4

Dyingwithdignity13y

GPT-4

I wouldn’t give a brand new AI model with unknown capabilities and unknown alignment access to unwitting human subjects or allow it to try and replicate itself on another server that’s for damned sure. Does no one think these tests were problematic?

-2

Replying toARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Dyingwithdignity13y

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

But the tests read like that other set of researchers just gave the virus to another taco stand and watched to see if everyone died. They didn’t so “whew the virus is safe”. Seems incredibly dangerous.

Replying toGPT-4

Dyingwithdignity13y

GPT-4

I agree that it’s going to be fully online in short order I just wonder if putting it online when they weren’t sure if it was dangerous was the right choice. I can’t shake the feeling that this was a set of incredibly foolish tests. Some other posters have captured the feeling but I’m not sure how to link to them so credit to Capybasilisk and hazel respectively.

“Fantastic, a test with three outcomes.

We gave this AI all the means to escape our environment, and it didn't, so we good.
We gave this AI all the means to escape our environment, and it tried but we stopped it.
oh”

“ So.... they held the door open... (read more)

-1

Replying toGPT-4

Dyingwithdignity13y

GPT-4

Not at all. I may have misunderstood what they did but it seemed rather like giving a toddler a loaded gun and being happy they weren’t able to shoot it. Is it actually wise to give a likely unaligned AI with poorly defined capabilities access to something like taskrabbit to see if it does anything dangerous? Isn’t this the exact scenario people on this forum are afraid of?