Because an AI built as a utility-maximizer will consider any rules restricting its ability to maximize its utility as obstacles to be overcome. If an AI is sufficiently smart, it will figure out a way to overcome those obstacles. If an AI is superintelligent, it will figure out ways to overcome those obstacles which humans cannot predict even in theory and so cannot prevent even with multiple well-phrased fail-safes.
A paperclip maximizer with a built-in rule "Only create 10,000 paperclips per day" will still want to maximize paperclips. It can do this by deleting the offending fail-safe, or by creating other paperclip maximizers without the fail-safe, or by creating giant paperclips which break up into millions of smaller paperclips of their own accord, or by connecting the Earth to a giant motor which spins it at near-light speed and changes the length of a day to a fraction of a second.
Unless you feel confident you can think of every way it will get around the rule and block it off, and think of every way it could get around those rules and block them off, and so on ad infinitum, the best thing to do is to build the AI so it doesn't want to break the rules - that is, Friendly AI. That way you have the AI cooperating with you instead of trying to thwart you at every turn.
Related: Hidden Complexity of Wishes
True, but irrelevant, because humanity has never produced a provably-correct software project anywhere near as complex as an AI would be, we probably never will, and even if we had a mathematical proof it still wouldn't be a complete guarantee of safety because the proof might contain errors and might not cover every case we care about.
The right question to ask is not, "will this safeguard make my AI 100% safe?", but rather "will this safeguard reduce, increase, or have no effect on the probability of disaster, and by how much?" (And then separately, at some point, "what is the probability of disaster now and what is the EV of launching vs. waiting?" That will depend on a lot of things that can't be predicted yet.)
What is the difference between "a rule" and "what it wants"?
I'm interpreting this as the same question you wrote below as "What is the difference between a constraint and what is optimized?". Dave gave one example but a slightly different metaphor comes to my mind.
Imagine an amoral businessman in a country that takes half his earnings as tax. The businessman wants to maximize money, but has the constraint is that half his earnings get taken as tax. So in order to achieve his goal of maximizing money, the businessman sets up some legally permissible deal with a foreign tax shelter or funnels it to holding corporations or something to avoid taxes. Doing this is the natural result of his money-maximization goal, and satisfies the "pay taxes" constraint..
Contrast this to a second, more patriotic businessman who loved paying taxes because it helped his country, and so didn't bother setting up tax shelters at all.
The first businessman has the motive "maximize money" and the constraint "pay taxes"; the second businessman has the motive "maximize money and pay taxes".
From the viewpoint of the government, the first businessman is an unFriendly agent with a constraint, and the second businessman is a Friendly agent.
Does that help answer your question?
Asking what it really values is anthropomorphic. It's not coming up with loopholes around the "don't murder" people constraint because it doesn't really value it, or because the paperclip part is its "real" motive.
It will probably come up with loopholes around the "maximize paperclips" constraint too - for example, if "paperclip" is defined by something paperclip-shaped, it will probably create atomic-scale nanoclips because these are easier to build than full-scale human-sized ones, much to the annoyance of the office-supply company that built it.
But paperclips are pretty simple. Add a few extra constraints and you can probably specify "paperclip" to a degree that makes them useful for office supplies.
Human values are really complex. "Don't murder" doesn't capture human values at all - if Clippy encases us in carbonite so that we're still technically alive but not around to interfere with paperclip production, ve has fulfilled the "don't murder" imperative, but we would count this as a fail. This is not Clippy's "fault" for deliberately trying to "get around" the anti-murder constraint, it's ...
The space of possible AI behaviours is large, you can't succeed by ruling parts of it out. It would be like a cake recipe that went
- Don't use avacados.
- Don't use a toaster.
- Don't use vegetables. ...
Clearly the list can never be long enough. Chefs have instead settled on the technique of actually specifying what to do. (Of course the analogy doesn't stretch very far, AI is less like trying to bake a cake, and more like trying to build a chef.)
A huge problem with failsafes is that a failsafe you hardcode into the seed AI is not likely to be reproduced in the next iteration that is built by the seed AI, which has, but does not care about, the failsafe. Even if some are left in as a result of the seed reusing its own source code, they are not likely to survive many iterations.
Does anyone who proposes failsafes have an argument for why their proposed failsafes would be persistant over many iterations of recursive self-improvement?
I believe that failsafes are necessary and desirable, but not sufficient. Thinking you can solve the friendly AI problem just by defining failsafe rules is dangerously naive. You not only need to guarantee that the AI correctly reimplements the safeguards in all its successors, you also need to guarantee that the safeguards themselves don't have bugs that cause disaster.
It is not necessarily safe or acceptable for the AI to shut itself down after it's been running for awhile, and there is not necessarily a clear line between the AI itself and the AI's tech...
I remember reading the argument in one of the sequence articles, but I'm not sure which one. The essential idea is that any such rules just become a problem to solve for the AI, so relying on a superintelligent, recursively self-improving machine to be unable to solve a problem is not a very good idea (unless the failsafe mechanism was provably impossible to solve reliably, I suppose. But here we're pitting human intelligence against superintelligence, and I, for one, wouldn't bet on the humans). The more robust approach seems to be to make the AI motivated to not want to do whatever the failsafe was designed to prevent it from doing in the first place, i.e. Friendliness.
This came up in the latest London Meetup, where I voiced a thought I've been having for a while. What if we created an epistemic containment area, effectively a simulated universe that contains the problem that we want solved? The AI will not even know anything else outside that universe exists and will have no way of gaining information about it. I think ciphergoth mentioned this is also David Chalmers' proposal too? In any case, I suspect we could prove containment within such a space, with us having read-only access to the results of the process.
This is a very timely question for me. I asked something very similar of Michael Vassar last week. He pointed me to Eliezer's "Creating Friendly AI 1.0" paper and, like you, I didn't find the answer there.
I've wondered if the Field of Law has been considered as a template for a solution to FAI--something along the lines of maintaining a constantly-updating body of law/ethics on a chip. I've started calling it "Asimov's Laws++." Here's a proposal I made on the AGI discussion list in December 2009:
"We all agree that a few simple laws...
Where are the arguments concerning this suggestion?
I once tried to fathom the arguments, I'm curious to hear your take on it.
This is a very timely question for me. I asked something very similar of Michael Vassar last week. He pointed me to Eliezer's "Creating Friendly AI 1.0" paper and, like you, I didn't find the answer there.
I've wondered if the Field of Law has been considered as a template for a solution to FAI--something along the lines of maintaining a constantly-updating body of law/ethics on a chip. I've started calling it "Asimov's Laws++." Here's a proposal I made on the AGI discussion list in December 2009:
"We all agree that a few simple laws...
You can't be serious. Human lawyers find massive logical loopholes in the law all the time, and at least their clients aren't capable of immediately taking over the world given the opportunity.
This is a very timely question for me. I asked something very similar of Michael Vassar last week. He pointed me to Eliezer's "Creating Friendly AI 1.0" paper and, like you, I didn't find the answer there.
I've wondered if the Field of Law has been considered as a template for a solution to FAI--something along the lines of maintaining a constantly-updating body of law/ethics on a chip. I've started calling it "Asimov's Laws++." Here's a proposal I made on the AGI discussion list in December 2009:
"We all agree that a few simple laws (ala Asimov) are inadequate for guiding AGI behavior. Why not require all AGIs be linked to a SINGLE large database of law--legislation, orders, case law, pending decisions--to account for the constant shifts [in what's prohibited and what's allowed]? Such a corpus would be ever-changing and reflect up-to-the-minute legislation and decisions on all matters man and machine. Presumably there would be some high level guiding laws, like the US Constitution and Bill of Rights, to inform the sub-nanosecond decisions. And when an AGI has miliseconds to act, it can inform its action using analysis of the deeper corpus. Surely a 200 volume set of international law would be a cakewalk for an AGI. The latest version of the corpus could be stored locally in most AGIs and just key parts local in low end models--with all being promptly and wirelessly updated as appropriate.
This seems like a reasonable solution given the need to navigate in a complex, ever changing, context-dependent universe."
Given this approach, AIs' goals and motivations might be mostly decoupled from an ethics module. An AI could make plans and set goals using any cognitive processes it deems fit. However, before taking actions, the AI must check the corpus to make sure it's desired actions are legal. If they are not legal, the AI must consider other actions or suffer the wrath of law enforcement (from fines to rehabilitation). This legal system of the future would be similar to what we're familiar with today, including being managed as a collaborative process between lots of agents (human and machine citizens, legislators, judges, and enforcers). Unlike current legal systems, however, it could hopefully be more nimble, fair, and effective given emerging computer-related technologies and methods (e.g, AI, WiFi, ubiquitous sensors, cheap/powerful processors, decision theory, Computational Law, ...).
This seems like a potentially practical, flexible, and effective approach given its long history of human precedent. AIs could even refer to the appropriate corpus when traveling in different jurisdictions (e.g., Western Law, Islamic Law, Chinese Law) in advance of more universal laws/ethics that might emerge in the future.
This approach should make most runaway paper clip production scenarios off limits. Such behavior would seem to violate a myriad of laws (human welfare, property rights, speeding (?)) and would be dealt with harshly.
Perhaps this might be seen as a kind of practical implementation of CEV?
Complex problems require complex solutions.
Comments? Pointers?
Surely a 200 volume set of international law would be a cakewalk for an AGI.
It seems like an applause light to invoke international law as a solution to almost anything, particularly this problem. What aspect of having rules made in a compromise of politicing makes it less likely to have exploitable loopholes than any other system?
If they are not legal, the AI must consider other actions or suffer the wrath of law enforcement (from fines to rehabilitation).
Fines? The misdoing we're worried about is seizing power. Fines would require power sufficient...
Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?