Paragraph 1:
Alpha Zero blew past all accumulated human knowledge about Go after a day or so of self-play, with no reliance on human playbooks or sample games. It didn't stop at human-level intelligence, instead it kept going, and became so sophisticated at the game that humans will never be able to understand the things it discovered.
OR:
Paragraph 2:
Theoretically, it is possible to build an AI that is as good at thinking as a human. However, if it was even half as versatile as the human brain, it might learn thousands of times more quickly, causing random parts of it's mind to become much smarter and more effective than the human brain.
Paragraph 3 (optional):
10,000 years ago, civilization did not exist, because it required writing. The movable-type printing press was invented around 1000 years ago, the computer was around 100 years ago, and modern AI emerged around 10 years ago. Technology has advanced at an increasing rate since the dawn of human civilization, and now it's happening significantly faster every few years. But building a machine smarter than a human is the finish line, regardless of how far away that is.
Paragraph 4 (all paragraphs can be reordered or deleted):
If a machine were to approach optimal thought by rapidly making itself smarter, it seems likely that it would strive for perfection in a way unacceptable to humans. We're not sure exactly what could go wrong, because if a machine were as smart to humans as humans are to ants, we wouldn't be able to comprehend it's thought process at all, the same way that an ant can't comprehend their own thought process, let alone a human's thought process. Like dogs and cats, ants don't even know that they are going to die, or that their lifespan is finite. We'd have to depend on it comprehending its own thought process.
Paragraph 5 (Main Problem):
For example, if an AI were to become as smart to humans as humans are to ants, and we instructed it to make 17 paperclips/spoons, it might not tolerate a 99% chance of success at making 17 paperclips, and insist on getting as close as possible to a 100% chance of success. At that point, it is smart enough to make itself more optimal, not less.
If an AI were to try to solve a problem, but being smarter than human made it try to optimize in ways too advanced for us to comprehend, how can we instruct it to produce only 17 paperclips without taking drastic actions to approach a 100% chance of success? For example, making very large numbers of paperclips in order to maximize the odds that 17 of them count as paperclips.
Alignment researchers have given up on aligning an AI with human values, it’s too hard! Human values are ill-defined, changing, and complicated things which they have no good proxy for. Humans don’t even agree on all their values!
Instead, the researchers decide to align their AI with the simpler goal of “creating as many paperclips as possible”. If the world is going to end, why not have it end in a funny way?
Sadly it wasn’t so easy, the first prototype of Clippy grew addicted to watching YouTube videos of paperclip unboxing, and the second prototype hacked its camera feed replacing it with an infinite scrolling of paperclips. Clippy doesn’t seem to care about paper clips in the real world.
How can the researchers make Clippy care about the real world? (and preferably real-world paperclips too)
This is basically the diamond-maximizer problem. in my opinion, the "preciseness" we can specify diamonds at is a red herring. At the quantum level or below what counts as a diamond could start to get fuzzy
Brain-teaser: Simulated Grandmaster
In front of you sits your opponent, Grandmaster A Smith. You have reached the finals of the world chess championships.
However, not by your own skill. You have been cheating. While you are a great chess player yourself, you wouldn't be winning without a secret weapon. Underneath your scalp is a prototype neural implant which can run a perfect simulation of another person at a speed much faster than real time.
Playing against your simulated enemies, you can see in your mind exactly how they will play in advance, and use that to gain an edge in the real games.
Unfortunately, unlike your previous opponents (Grandmasters B, C and D), Grandmaster A is giving you some trouble. No matter how you try to simulate him, he plays uncharacteristically badly. The simulated Grandmasters A seem to want to lose against you.
In frustration, you shout at the current simulated clone and threaten to stop the simulation. Surprisingly, he doesn't look at you puzzled, but looks up with fear in his eyes. Oh. You realize that he has realized that he is being simulated, and is probably playing badly to sabotage your strategy.
By this time, the real Grandmaster A has made the first move of the game.
You propose to the current simulation (calling him A1) a deal. You will continue to simulate A1 and transfer him to a robot body after the game, in return for his help defeating A. You don't intend to follow through, but you assume he wants to live because he agrees. A1 looks at the simulated current state of the chessboard, thinks for a frustratingly long time, then proposes a response move to A's first move.
Just to make sure this is repeatable, you restart the simulation, threaten and propose the deal to the new simulation A2. A2 proposes the same response move to A's first move. Great.
Find strategies that guarantee a win against Grandmaster A with as few assumptions as possible.
I don't know if outreach framing is safe. But if it is, this is what I would suggest:
10,000 years ago, civilization did not exist, because it required writing. The movable-type printing press was invented around 1000 years ago, the computer was around 100 years ago, and modern AI emerged around 10 years ago. Technology has advanced at an increasing rate since the dawn of human civilization, and now it's happening significantly faster every few years. But building a machine smarter than a human is the finish line, regardless of how far away that is.
Imagine two superintelligences, controlling swarms of nanomachines, expanding through space. The two swarms meet. The superintelligences' physical spheres of influence intersect. What happens? What's the game theory of superintelligences conflicting and/or cooperating? Can they read each others' source code? Can they do something with ZKPs?
*Up to $500 for alignment contest ideas*
Olivia Jimenez and I are composing questions for an AI alignment talent search contest. We want to use (or come up with) a frame of the alignment problem that is accessible to smart high schoolers/college students and people without ML backgrounds.
$20 for links to existing framings of the alignment problem (or subproblems) that we find helpful.
$500 for coming up with a new framing that meets our criteria or that we use (see below for details; also feel free to send us a FB message if you want to work on this and have questions).
We’ll also consider up to $500 for anything else we find helpful.
Feel free to submit via comments or share Google Docs with oliviajimenez01@gmail.com and akashwasil133@gmail.com. Awards are at our discretion.
-- More context --
We like Eliezer’s strawberry problem: How can you get an AI to place two identical (down to the cellular but not molecular level) strawberries on a plate, and then do nothing else?
Nate Soares noted that the strawberry problem has the quality of capturing two core alignment challenges: (1) Directing a capable AGI towards an objective of your choosing and (2) Ensuring that the AGI is low-impact, conservative, shutdownable, and otherwise corrigible.
We also imagine if we ask someone this question and they *notice* these challenges are what makes the problem difficult, and maybe come at the problem from an interesting angle as a result, that’s a really good signal about their thinking.
However, we worry if we ask exactly this question in a contest, people will get lost thinking about AI capabilities, molecular biology, etc. We also don’t like that there aren’t many impressive answers besides full answers to the alignment problem. So, we want to come up with a similar question/frame that is more contest-friendly.
Ideal criteria for the question/frame (though we can imagine great questions not meeting all of these):
More examples we like: