Stuart_Armstrong comments on Siren worlds and the perils of over-optimised search - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (411)
I could be wrong but I believe that this argument relies on an inconsistent assumption, where we assume we have solved the problem of creating an infinitely powerful AI, but we have not solved the problem of operationally defining commonplace English words which hundreds of millions of people successfully understand in such a way that a computer can perform operations using them.
It seems to me that the strong AI problem is many orders of magnitude more difficult than the problem of rigorously defining terms like "liberty". I imagine that a relatively small part of the processing power of one human brain is all that is needed to perform operations on terms like "liberty" or "paternalism" and engage in meaningful use of them so it is a much, much smaller problem than the problem of creating even a single human-level AI, let alone a vastly superhuman AI.
If in our imaginary scenario we can't even define "liberty" in such a way that a computer can use the term, it doesn't seem very likely that we can build any kind of AI at all.
Yes. Here's another brute force approach: upload a brain (without understanding it), run it very fast with simulated external memory, subject it to evolutionary pressure. All this can be done with little philosophical and conceptual understanding, and certainly without any understanding of something as complex as liberty.
If you can do that, then you can just find someone who you think understands what we mean by "liberty" (ideally someone with a reasonable familiarity with Kant, Mill, Dworkin and other relevant writers), upload their brain without understanding it, and ask the uploaded brain to judge the matter.
(Off-topic: I suspect that you cannot actually get a markedly superhuman AI that way, because the human brain could well be at or near a peak in the evolutionary landscape so that there is no evolutionary pathway from a current human brain to a vastly superhuman brain. Nothing I am aware of in the laws of physics or biology says that there must be any such pathway, and since evolution is purposeless it would be an amazing lucky break if it turned out that we were on the slope of the highest peak there is, and that the peak extends to God-like heights. That would be like if we put evolutionary pressure on a cheetah and discovered that if we do that we can evolve a cheetah that runs at a significant fraction of c.
However I believe my argument still works even if I accept for the sake of argument that we are on such a peak in the evolutionary landscape, and that creating God-like AI is just a matter of running a simulated human brain under evolutionary pressure for a few billion simulated years. If we have that capability then we must also be able to run a simulated philosopher who knows what "liberty" refers to).
EDIT: Downvoting this without explaining why you disagree doesn't help me understand why you disagree.
And would their understanding of liberty remain stable under evolutionary pressure? That seems unlikely.
Have not been downvoting it.
I didn't think we needed to put the uploaded philosopher under billions of years of evolutionary pressure. We would put your hypothetical pre-God-like AI in one bin and update it under pressure until it becomes God-like, and then we upload the philosopher separately and use them as a consultant.
(As before I think that the evolutionary landscape is unlikely to allow a smooth upward path from modern primate to God-like AI, but I'm assuming such a path exists for the sake of the argument).
And then we have to ensure the AI follows the consultant (probably doable) and define what querying process is acceptable (very hard).
But your solution (which is close to Paul Christiano's) works whatever the AI is, we just need to be able to upload a human. My point was that we could conceivably create an AI without understanding any of the hard problems, still stands. If you want I can refine it: allow partial uploads: we can upload brains, but they don't function as stable humans, as we haven't mapped all the fine details we need to. However, we can use these imperfect uploads, plus a bit of evolution, to produce AIs. And here we have no understanding of how to control its motivations at all.
I won't argue against the claim that we could conceivably create an AI without knowing anything about how to create an AI. It's trivially true in the same way that we could conceivably turn a monkey loose on a typewriter and get strong AI.
I also agree with you that if we got an AI that way we'd have no idea how to get it to do any one thing rather than another and no reason to trust it.
I don't currently agree that we could make such an AI using a non-functioning brain model plus "a bit of evolution". I am open to argument on the topic but currently it seems to me that you might as well say "magic" instead of "evolution" and it would be an equivalent claim.
Why are you confident that an AI that we do develop will not have these traits? You agree the mindspace is large, you agree we can develop some cognitive abilities without understanding them. If you add that most AI programmers don't take AI risk seriously and will only be testing their AI's in controlled environments, that the AI will be likely developed for a military or commercial purpose, I don't see why you'd have high confidence that they will converge on a safe design?
Why do you think such an AI wouldn't just fail at being powerful, rather than being powerful in a catastrophic way?
If programs fail in the real world then they are not working well. You don't happen to come across a program that manages to prove the Riemann hypothesis when you designed it to prove the irrationality of the square root of 2.
If it fails at being powerful, we don't have to worry about it, so I feel free to ignore those probabilities.
But you might come across a program motivated to eliminate all humans if you designed it to optimise the economy...
(EDIT: See below.) I'm afraid that I am now confused. I'm not clear on what you mean by "these traits", so I don't know what you think I am being confident about. You seem to think I'm arguing that AIs will converge on a safe design and I don't remember saying anything remotely resembling that.
EDIT: I think I figured it out on the second or third attempt. I'm not 100% committed to the proposition that if we make an AI and know how we did so that we can definitely make sure it's fun and friendly, as opposed to fundamentally uncontrollable and unknowable. However it seems virtually certain to me that we will figure out a significant amount about designing AIs to do what we want in the process of developing them. People who subscribe to various "FOOM" theories about AI coming out of nowhere will probably disagree with this as is their right, but I don't find any of those theories plausible.
I also I hope I didn't give the impression that I thought it was meaningfully possible to create a God-like AI without understanding how to make AI. It's conceivable in that such a creation story is not a logical contradiction like a square circle or a colourless green dream sleeping furiously, but that is all. I think it is actually staggeringly unlikely that we will make an AI without either knowing how to make an AI, or knowing how to upload people who can then make an AI and tell use how they did it.
Significant is not the same as sufficient. How low do you think the probability of negative AI outcomes is, and what are your reasons for being confident in that estimate?
For the same reason a jet engine doesn't have comfy chairs: with all machines, you develop the core physical and mathematical principles first, and then add human comforts.
The core mathematical and physical principles behind AI are believed, not without reason, to be efficient cross-domain optimization. There is no reason for an arbitrarily-developed Really Powerful Optimization Process to have anything in its utility function dealing with human morality; in order for it to be so, you need your AI developers to be deliberately aiming at Friendly AI, and they need to actually know something about how to do it.
And then, if they don't know enough, you need to get very, very, very lucky.
That's what happens when Friendly is used to mean both Fun and Safe.
Early jets didn't have comfy chairs, but they did have electors seats. Safety was a concern.
If an .AI researchers feels their .AI might kill them, they will have every motivation to build in safety features.
That has nothing g to do with making an .AI Your Plastic Pal Who's Fun To Be With.
MIRIs arguments aren't about deliberate weaponisation, they are about the inadvertent creation of dangerous .AI by competent and well intentioned people.
The weaponisation of .AI has almost happenedalready the form of stuxnet and it is significant that there were a lot safeguards built into it. .AI researchers seemed be aware enough.
I have no idea why the querrying process would have to be hard. Is David Frost some super genius?
"Defining what querying process is acceptable" is the hard part.
The justification of which is?
That no one has come close to providing a successful approach on how to do this, and that each proposal fails in very similar ways. There is no ontologically fundamental difference between an acceptable and an unacceptable query, and drawing a practical boundary has so far proved to be impossible.
If you have a solution to that, then I advise you analyse it carefully, and then put it as a top level post. Since it would half-solve the whole FAI problem, it would garner great interest.