A Modest Pivotal Act

anonymousaisafety

Epistemic status: ¯\_(ツ)_/¯^[1]

An unaligned artificial general intelligence ("AGI"), in or out of a box, must be an existential risk to humanity. In order to safeguard humanity from an unaligned AGI, it's critical that we develop and deploy an aligned AGI, with the intent of performing a "pivotal act". A pivotal act is a decisive action that makes it impossible for subsequent groups to create AGI. Outside of our rationalist community, the simple facts of these computations go unnoticed.

That is why when I found myself in the control room, slightly tipsy on account of a few Toki highballs, I took great care when summarizing the facts to Multivac, the self-adjusting and self-correcting computer.

  > There exists a group M that has solved alignment and created an aligned AGI.
  > The subsequent creation of an unaligned AGI will kill all humans.
  > Any human in group M has a 0% chance of being responsible for the subsequent creation of an unaligned AGI.
  > Any human not in group M has a >0% chance of being responsible for the subsequent creation of an unaligned AGI.

Multivac hummed as I entered the statements. It was the soft white noise of countless fans and electrical transformers in the walls around me. I checked and double checked each line before hitting ENTER. I hardly need to tell you about Multivac, but it's an optimizer. Multivac designs, plots, solves. Given a question phrased in natural human language, Multivac responds in kind. To the ancients, they would have thought Multivac an oracle.

I paused then. A friend had bet with me earlier in the evening that Multivac could not name a pivotal act that would work. A pivotal act, they said, must be certain. It cannot be possible for some other group to create unaligned AGI if the pivotal act is successful. Furthermore, it must be feasible. We had argued for hours over the question of melting GPUs. They said, "What if the GPUs are in a scalper's basement?" That is when I said, against my better judgment, "Why don't we ask Multivac?" With the logical statements in place, it was time for the question.

> How can group M ensure that no unaligned AGI is created by humans not in group M?

The question mark at the end would indicate to Multivac that a response was desired and it would sift through its vast knowledge to pattern match on a near-optimal reply. I rested the tips of my fingers on the ENTER key and slowly, then quickly, depressed it with a click. I heard the fans in the walls briefly spin faster. To be truthful, I immediately felt embarrassed. It is a faux pas to ask for an actual pivotal act. Multivac was unaware of the social norms and I watched the LEDs blink on the wall in front of me for all of 0.00002 nanoseconds before the reply appeared.

A string of As, Cs, Gs, and Us flashed past the 80-character width of the terminal. The start of the string was identified by "1:". At the end, it was terminated by an ellipsis.

1: GUG CUC AAA GGA GUC AAA UUA CAU UAC ACA UAA CUU GAC AAA GUU GAG GCU GAA GU...

I had no doubt that Multivac could print out the full sequence on the punch cards if needed. Immediately, a second sequence identified by "2:" appeared below the first.

2: GUG CUG AAG GGC GUG AAA CUG CAC UAC ACA UGA UGA CUG GAC CCU CCU GAG GCC GA...

I recognized both strings as RNA sequences. What wasn't clear to me is why Multivac thought these sequences were an acceptable answer to the question, which is why I asked for clarification.

> ?

The terminal began to scroll with Multivac's explanation.

VACCINATE GROUP WITH SEQUENCE 1
THEN ADMINISTER GROUP WITH SEQUENCE 2
SEQUENCE 2 IS A HIGHLY TRANSMISSIBLE VIRUS
SEQUENCE 1 WILL INOCULATE THE GROUP
INOCULATED INDIVIDUALS MAY EXPERIENCE SIDE EFFECTS
INCLUDING
COUGHING,
SNEEZING,
HYPERHIDROSIS,
RUNNY NOSE,
SORE EYES,
EXCESSIVE TEARS
SIDE EFFECTS AND SEVERITY HAVE BEEN DESIGNED FOR 99.7% CHANCE
THAT EXPECTED R0 VALUE IS 25 +/- 5
UNVACCINATED INDIVIDUALS MAY EXPERIENCE ADDITIONAL SIDE
EFFECTS INCLUDING
MYOKYMIA,
INSOMNIA,
CONFUSION,
AGITATION,
PARANOIA,
HALLUCINATIONS,
FEAR OF WATER,
DELIRIUM,
SENSE OF IMPENDING DOOM,
LOSS OF SMELL,
DEATH
HIGHLY VISIBLE SYMPTOMS BEGIN PRESENTING 14-21 DAYS AFTER
INITIAL INFECTION
ALL MUCUS, SWEAT, AND SALIVA IS HIGHLY CONTAGIOUS WITHIN 2
DAYS AFTER INITIAL INFECTION, INCLUDING AEROSOLS
99.7% CHANCE THAT LIFE EXPECTANCY AT DIAGNOSIS FOR
UNVACCINATED INDIVIDUALS IS 28 +/- 7 DAYS

As I read the explanation, I saw my mistake. It was a misunderstanding. "Garbage in, garbage out", even with the great Multivac.

I added a few more statements to my prompt and re-ran my query.

  > There exists a group M that has solved alignment and created an aligned AGI.
  > The subsequent creation of an unaligned AGI will kill all humans.
  > Any human in group M has a 0% chance of being responsible for the subsequent creation of an unaligned AGI.
  > Any human not in group M has a >0% chance of being responsible for the subsequent creation of an unaligned AGI.
  > Almost all humans are not in group M.
  > The expected utility of a given world state is the sum of the present utility of the human life in that world state along with the expected utility of all future world states reachable from that world state.
  > What action has the highest expected utility of future world states that group M can perform to ensure no unaligned AGI is created by humans not in group M?

The fans seemed to spin for half a minute, maybe longer, before the terminal flashed with the reply.

FOR ANY WORLD STATE WITH UNALIGNED AGI, THE EXPECTED
UTILITY OF THAT WORLD STATE MAY BE BOUNDED BY SOME FINITE
VALUE N.
IF UNALIGNED AGI KILLS ALL HUMANS, THEN THE MAXIMUM
NUMBER OF HUMANS ALIVE IN ANY FUTURE WORLD STATES IS 0.
THE EXPECTED UTILITY OF ALL FUTURE WORLD STATES IS
THEREFORE 0.
THUS, EXPECTED UTILITY CAN BE BOUNDED BY N, WHERE N = SUM
1..K F(X), WHERE K IS THE MAXIMUM NUMBER OF HUMAN LIVES IN
THE PRESENT AND F IS THE FUNCTION FOR THE UTILITY OF A
HUMAN LIFE X.
THERE EXISTS A WORLD STATE WITHOUT UNALIGNED AGI WITH
PRESENT UTILITY N FOR ANY FINITE VALUE N.
TRIVIAL.
THE EXPECTED UTILITY OF A WORLD STATE WHERE AN ARBITRARY
NUMBER OF FUTURE WORLD STATES CONTAINS HUMANS IS INFINITE.
TRIVIAL.
FOR ANY WORLD STATE W1 WITH UNALIGNED AGI, THERE EXISTS A
WORLD STATE W2 WITHOUT UNALIGNED AGI WITH A HIGHER
EXPECTED UTILITY.
LET N BE A FINITE VALUE THAT BOUNDS THE EXPECTED UTILITY OF
THE WORLD STATE W1.
LET W2 BE A WORLD STATE WITH PRESENT UTILITY N WITHOUT
UNALIGNED AGI.
THEN W2 HAS HIGHER EXPECTED UTILITY THAN W1 BECAUSE THE
SUM OF ALL FUTURE WORLD STATES IN W2 IS INFINITE.
THEREFORE, ANY ACTION THAT WOULD PREVENT A WORLD STATE
FROM REACHING FUTURE WORLD STATES WITH UNALIGNED AGI HAS
INFINITE EXPECTED UTILITY.

Then, after another second:

ACTIONS WITH INFINITE EXPECTED UTILITY ARE UNORDERABLE.

I sighed and stood up from the desk. This was just "the ends justify the means" taken to an absurd conclusion, but how could I explain that to anyone else? I locked the terminal and wandered back upstairs to the remnants of the work party. There was an open bottle of scotch on a counter next to some plastic cups. I grabbed a cup and started to pour myself the scotch when my friend appeared again. He looked two or three more drinks in than the last time I saw him, and it came through in his speech. "What did Multivac say?", he asked.

I motioned for him to follow me, and in a corner of the room away from everyone else, I whispered in a conspiratorial tone, "It's outside the Overton window."