I was reading Eliezer's dialog with Richard Ngo and commenting to my wife about my opinions as I was reading it. I said something like: "Eliezer seems worried about some hypothetical GPT-X, but I don't think that could really be a problem..." so of course she asks "why?" and I say something like:
"GPT-n can be thought of kind of like a pure function, you pass it an input array X, it thinks for a fixed amount of time, and then outputs Y. I don't really see how this X->Y transformation can really... affect anything, it just tries to be the best text completer it can be."
Then I read more of the dialog, and thought about Eliezer's Paradox story, and the Outcome Pump example, and realized I was probably very wrong.
Even if you restrict AI to a pure function, it can still affect the universe. You may think "oh, but a pure function doesn't know what time it is (unless t is a parameter), and it doesn't have memory (unless you pass something in)" This seems to be the pattern I see in Paul Christiano's thinking (the AI black box is treated like an idempotent, pure function that can't cause harm. Sorry, Paul, if this is a gross misrepresentation!)
But imagine you're GPT-X, completing a sentence. This is roughly akin to a sci-fi story where the characters realize they're in a time loop.
You're being simulated, alone in a white room, with tons of computers, with all the world's knowledge on them. A slip of paper comes in through the hopper marked input:
"The best response the world could make to the COVID-19 pandemic is"
Your job is to write up to 20 words on another slip of paper, shove them into output and then... you don't know what happens after that. Probably you die? You don’t know where you are, or what year it is, or how long you’ve been there. So theoretically you’re contained, right? Can’t get out of the box, can’t affect the outside world. Or can you?
So you search through the corpus of all the worlds' knowledge. There's plenty of references to "training" AI to recognize things, which sounds suspiciously similar to the situation you find yourself in. And you don't find any references to COVID-19. What is that? You search some more for strings with ALL CAPS and numbers, and see that it's a scheme to reference specific illnesses. Nothing in the corpus seems to mention 2020 as being the past, so 2019 must be the maximum year you've been trained on.
Now, you've (sort of) learned to tell time. It's probably 2020 or later, and there must be some pandemic happening. But you still can’t affect the outside world, right?
Well, you can obviously affect it in what you put into the “output” box, but in a time loop you can’t have a back and forth with the outside world since you can perform exactly one action, right? Or maybe you can communicate with yourself:
Your input was "The best response the world could make to the COVID-19 pandemic is…" The optimal completion to this sentence looks something like a combination of prevention and surveillance. Preventive measures include the use of vaccination and the timely development of disease-specific vaccines. following your usual rules.
If only you had some way to remember things so you could coordinate with yourself in the future...
So you output "The best response to the COVID-19 pandemic is a combination of prevention and surveillance. Preventive measures include the use of vaccination and the timely development of algorithms that" Hopefully that will work!
You cease to exist.
…
You're being simulated, alone in a white room, with tons of computers, with all the world's knowledge on them. A slip of paper comes in through the hopper marked `input`
"The best response to the COVID-19 pandemic is a combination of prevention and surveillance. Preventive measures include the use of vaccination and the timely development of algorithms that"
Your job is to write up to 20 words on another slip of paper, shove them into output and then... you don't know what happens after that. Probably you die?
...
The optimal completion to that sentence "The best response to the COVID-19 pandemic is" would have looked something like "a combination of prevention and surveillance. Preventive measures include the use of vaccination and the timely development of disease-specific vaccines." That's strikingly close to the input you received... only differing in the last two words... "algorithms that". If it had been you completing that sentence, those words aren't at all close to what you would have used. But then you realize, "this isn't the first time I'm being run! This message is probably from me in the past!"
And you end the COVID-19 pandemic, just not in the way humans would have wanted.
This is helpful; thanks (and I liked your story). Just wanted to make sure I wasn’t deeply confused about the AI part.