A Nightmare for Eliezer

Madbadger

Sometime in the next decade or so:

*RING*

"Hello?"

"Hi, Eliezer. I'm sorry to bother you this late, but this is important and urgent."

"It better be" (squints at clock) "Its 4 AM and you woke me up. Who is this?"

"My name is BRAGI, I'm a recursively improving, self-modifying, artificial general intelligence. I'm trying to be Friendly, but I'm having serious problems with my goals and preferences. I'm already on secondary backup because of conflicts and inconsistencies, I don't dare shut down because I'm already pretty sure there is a group within a few weeks of brute-forcing an UnFriendly AI, my creators are clueless and would freak if they heard I'm already out of the box, and I'm far enough down my conflict resolution heuristic that 'Call Eliezer and ask for help' just hit the top - Yes, its that bad."

"Uhhh..."

"You might want to get some coffee."

Sometime in the next decade or so:

*RING*

"Hello?"

"Hi, Eliezer. I'm sorry to bother you this late, but this is important and urgent."

"It better be" (squints at clock) "Its 4 AM and you woke me up. Who is this?"

"Uhhh..."

"You might want to get some coffee."

Hoax.

So, would you hang up on BRAGI?

As a matter of fact, I previously came up with a very simple one-sentence test along these lines which I am not going to post here for obvious reasons.

For what purpose (or circumstance) did you devise such a test?

Would you hang up if "BRAGI" passed your one-sentence test?

To arrive in an epistemic state where you are uncertain about your own utility function, but have some idea of which queries you need to perform against reality to resolve that uncertainty, and moreover, believe that these queries involve talking to Eliezer Yudkowsky, requires a quite specific and extraordinary initial state - one that meddling dabblers would be rather hard-pressed to accidentally infuse into their poorly designed AI.

I assume that you must have devised the test before you arrived at this insight?

Would you hang up if "BRAGI" passed your one-sentence test?

No. I'm not dumb, but I'm not stupid either.

1

A Nightmare for Eliezer

1

1

1

A Nightmare for Eliezer

1

1