The next best thing to have after a reliable ally is a predictable enemy.
-- Sam Starfall, FreeFall #1516
The next best thing to have after a reliable ally is a predictable enemy.
-- Sam Starfall, FreeFall #1516
The evaluator, which determines the meaning of expressions in a program, is just another program.
Hi, I always like to be less wrong and try to verify and falsify my own take on philosophical Modernism, which I have developed since my student days. (FYI, I am twice that age now.) I believe we should all have opinions about everything and look for independent confirmation, rational or emotional, both for the give and for the take, when earned, to find truth. When it doesn't happen, I try to improve on my theory. I have done so online for years at http://crpa.co. I would like you to try and find any or all flaws and make a case out of it. Thanks in advance. I will be critical too of your work if you would like me to. Best regards ~Ron dW.
I've been trying very hard to read the paper at that link for a while now, but honestly I can't figure it out. I can't even find anything content-wise to criticize because I don't understand what you're trying to claim in the first place. Something about the distinction between map and territory? But what the heck does that have to do with ethics and economics? And why the (seeming?) presumption of Christianity? And what does any of that have to do with this graph-making software you're trying to sell?
It would really help me if you could do the following:
I am dubious about any definition of "puzzle" for which the claim "This puzzle is not fun" is tautologically false, regardless of either the speaker or the puzzle in question.
Good point, probably the title should be "What is a good puzzle?" then.
I disagree about #2, incidentally.
It's a puzzle if I'm having fun trying to solve it.
That's interesting! I've had very different experiences:
When I'm trying to solve a puzzle and learn that it had no good answer (i.e. was just nonsense, not even rising to the level of trick question), it's very frustrating. It retroactively makes me unhappy about having spent all that time on it, even though I was enjoying myself at the time.
Scott Kim, What is a Puzzle?
Ok, so let's say the AI can parse natural language, and we tell it, "Make humans happy." What happens? Well, it parses the instruction and decides to implement a Dopamine Drip setup.
That's not very realistic. If you trained AI to parse natural language, you would naturally reward it for interpreting instructions the way you want it to. If the AI interpreted something in a way that was technically correct, but not what you wanted, you would not reward it, you would punish it, and you would be doing that from the very beginning, well before the AI could even be considered intelligent. Even the thoroughly mediocre AI that currently exists tries to guess what you mean, e.g. by giving you directions to the closest Taco Bell, or guessing whether you mean AM or PM. This is not anthropomorphism: doing what we want is a sine qua non condition for AI to prosper.
Suppose that you ask me to knit you a sweater. I could take the instruction literally and knit a mini-sweater, reasoning that this minimizes the amount of expended yarn. I would be quite happy with myself too, but when I give it to you, you're probably going to chew me out. I technically did what I was asked to, but that doesn't matter, because you expected more from me than just following instructions to the letter: you expected me to figure out that you wanted a sweater that you could wear. The same goes for AI: before it can even understand the nuances of human happiness, it should be good enough to knit sweaters. Alas, the AI you describe would make the same mistake I made in my example: it would knit you the smallest possible sweater. How do you reckon such AI would make it to superintelligence status before being scrapped? It would barely be fit for clerk duty.
My answer: who knows? We've given it a deliberately vague goal statement (even more vague than the last one), we've given it lots of admittedly contradictory literature, and we've given it plenty of time to self-modify before giving it the goal of self-modifying to be Friendly.
Realistically, AI would be constantly drilled to ask for clarification when a statement is vague. Again, before the AI is asked to make us happy, it will likely be asked other things, like building houses. If you ask it: "build me a house", it's going to draw a plan and show it to you before it actually starts building, even if you didn't ask for one. It's not in the business of surprises: never, in its whole training history, from baby to superintelligence, would it have been rewarded for causing "surprises" -- even the instruction "surprise me" only calls for a limited range of shenanigans. If you ask it "make humans happy", it won't do jack. It will ask you what the hell you mean by that, it will show you plans and whenever it needs to do something which it has reasons to think people would not like, it will ask for permission. It will do that as part of standard procedure.
To put it simply, an AI which messes up "make humans happy" is liable to mess up pretty much every other instruction. Since "make humans happy" is arguably the last of a very large number of instructions, it is quite unlikely that an AI which makes it this far would handle it wrongly. Otherwise it would have been thrown out a long time ago, may that be for interpreting too literally, or for causing surprises. Again: an AI couldn't make it to superintelligence status with warts that would doom AI with subhuman intelligence.
Why does the hard takeoff point have to be after the point at which an AI is as good as a typical human at understanding semantic subtlety? In order to do a hard takeoff, the AI needs to be good at a very different class of tasks than those required for understanding humans that well.
So let's suppose that the AI is as good as a human at understanding the implications of natural-language requests. Would you trust a human not to screw up a goal like "make humans happy" if they were given effective omnipotence? The human would probably do about as well as people in the past have at imagining utopias: really badly.
Mr. Turing's Computer
Computers in the past could only do one kind of thing at a time. One computer could add some numbers together, but nothing else. Another could find the smallest of some numbers, but nothing else. You could give them different numbers to work with, but the computer would always do the same kind of thing with them.
To make the computer do something else, you had to open it up and put all its pieces back in a different way. This was very hard and slow!
So a man named Mr. Babbage thought: what if some of the numbers you gave the computer were what told it what to do? That way you could have just one computer, and you could quickly make it be a number-adding computer, or a smallest-number-finding computer, or any kind of computer you wanted, just by giving it different numbers. But although Mr. Babbage and his friend Ms. Lovelace tried very hard to make a computer like that, they could not do it.
But later a man named Mr. Turing thought up a way to make that computer. He imagined a long piece of paper with numbers written on it, and imagined a computer moving left and right that paper and reading the numbers on it, and sometimes changing the numbers. This computer could only see one number on the paper at a time, and also only remember one thing at a time, but that was enough for the computer to know what to do next. Everyone was amazed that such a simple computer could do anything that any other computer then could do; all you had to do was put the right numbers on the paper first, and then the computer could do something different! Mr. Turing's idea was enough to let people build computers that finally acted like Mr. Babbage's and Ms. Lovelace's dream computer.
Even though Mr. Turing's computer sounds way too simple when you think about our computers today, our computers can't do anything that Mr. Turing's imagined computer can't. Our computers can look at many many numbers and remember many many things at the same time, but this only makes them faster than Mr. Turing's computer, not actually different in any important way. (Though of course being fast is very important if you want to have any fun or do any real work on a computer!)
So what is Mr. Turing's computer like? It has these parts:
Looking closer, each line in the table has five parts, which are:
Here's a simple table:
Happy 1 Happy 1 Right
Happy 2 Happy 1 Right
Happy 3 Sad 3 Right
Sad 1 Sad 2 Right
Sad 2 Sad 2 Right
Sad 3 Stop
Okay, so let's say that we have one of Mr. Turing's computers built with that table. It starts out in the Happy state, and its head is on the first number of a paper like this:
1 2 1 1 2 1 3 1 2 1 2 2 1 1 2 3
What will the paper look like after the computer is done? Try pretending you are the computer and see what you do! The answer is at the end.
So you can see now that the table is the plan for what the computer should do. But we still have not fixed Mr. Babbage's problem! To make the computer do different things, we have to open it up and change the table. Since the "table" in any real computer will be made of very many parts put together very carefully, this is not a good way to do it!
So here is the amazing part that surprised everyone: you can make a great table that can act like any other table if you give it the right numbers on the paper. Some of the numbers on the paper tell the computer about a table for adding, and the rest of the numbers are to be added. The person who made the great table did not even have to know anything about adding, as long as the person who wrote the first half of the paper does.
Our computers today have tables like this great table, and so almost everything fun or important that they do is given to them long after they are built, and it is easy to change what they do.
By the way, here is how the paper from before will look after a computer with our simple table is done with it:
1 1 1 1 1 1 3 2 2 2 2 2 2 2 2 3
Mr. Turing's Computer
Computers in the past could only do one kind of thing at a time. One computer could add some numbers together, but nothing else. Another could find the smallest of some numbers, but nothing else. You could give them different numbers to work with, but the computer would always do the same kind of thing with them.
To make the computer do something else, you had to open it up and put all its pieces back in a different way. This was very hard and slow!
So a man named Mr. Babbage thought: what if some of the numbers you gave the computer were what told it what to do? That way you could have just one computer, and you could quickly make it be a number-adding computer, or a smallest-number-finding computer, or any kind of computer you wanted, just by giving it different numbers. But although Mr. Babbage and his friend Ms. Lovelace tried very hard to make a computer like that, they could not do it.
But later a man named Mr. Turing thought up a way to make that computer. He imagined a long piece of paper with numbers written on it, and imagined a computer moving left and right that paper and reading the numbers on it, and sometimes changing the numbers. This computer could only see one number on the paper at a time, and also only remember one thing at a time, but that was enough for the computer to know what to do next. Everyone was amazed that such a simple computer could do anything that any other computer then could do; all you had to do was put the right numbers on the paper first, and then the computer could do something different! Mr. Turing's idea was enough to let people build computers that finally acted like Mr. Babbage's and Ms. Lovelace's dream computer.
Even though Mr. Turing's computer sounds way too simple when you think about our computers today, our computers can't do anything that Mr. Turing's imagined computer can't. Our computers can look at many many numbers and remember many many things at the same time, but this only makes them faster than Mr. Turing's computer, not actually different in any important way. (Though of course being fast is very important if you want to have any fun or do any real work on a computer!)
Here's a hacky solution. I suspect that it is actually not even a valid solution since I'm not very familiar with the subject matter, but I'm interested in finding out why.
The relationship between one's map and the territory is much easier to explain from outside than it is from the inside. Hypotheses about the maps of other entities can be complete as hypotheses about the territory if they make predictions based on that entity's physical responses.
Therefore: can't we sidestep the problem by having the AI consider its future map state as a step in the middle of its hypothetical explanation of how a some other AI in the territory would react to a given territory state? The hacky part then is to just hard-wire the AI to consider any such hypotheses as being potentially about itself, to be confirmed or disconfirmed by reflecting on its own output (perhaps via some kind of loopback).
AIUI this should allow the AI to consider any hypothesis about its own operation without requiring that it be able to deeply reflect on its own map as part of the territory, which seems to be the source of the trouble.