Pentashagon comments on The genie knows, but doesn't care - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (515)
I don't see why it would simulate humans as that would be a waste of computing power, if it even had enough to do so.
A boxed AI would be useless? I'm not sure how that would be. You could ask it to come up with ideas on how to build a friendly AI for example assuming that you can prove the AI won't manipulate the output or that you can trust that nothing bad can come from merely reading it and absorbing the information.
Short of that you could still ask it to cure cancer or invent a better theory of physics or design a method of cheap space travel, etc.
If it interacts with humans or if humans are the subject of questions it needs to answer then it will probably find it expedient to simulate humans.
Curing cancer is probably something that would trigger human simulation. How is the boxed AI going to know for sure that it's only necessary to simulate cells and not entire bodies with brains experiencing whatever the simulation is trying?
Just the task of communicating with humans, for instance to produce a human-understandable theory of physics or how to build more efficient space travel, is likely to involve simulating humans to determine the most efficient method of communication. Consider that in subjective time it may be like thousands of years for the AI trying to explain in human terms what a better theory of physics means. Thousands of subjective years that the AI, with nothing better to do, could use to simulate humans to reduce the time it takes to transfer that complex knowledge.
A FAI provably in a box is at least as useless as an AI provably in a box because it would be even better at not letting itself out (e.g. it understands all the ways in which humans would consider it to be outside the box, and will actively avoid loopholes that would let an UFAI escape). To be safe, any provably boxed AI would have to absolutely avoid the creation of any unboxed AI as well. This would further apply to provably-boxed FAI designed by provably-boxed AI. It would also apply to giving humans information that allows them to build unboxed AIs, because the difference between unboxing itself and letting humans recreate it outside the box is so tiny that to design it to prevent the first while allowing the second would be terrifically unsafe. It would have to understand humans values before it could safely make the distinction between humans wanting it outside the box and manipulating humans into creating it outside the box.
EDIT: Using a provably-boxed AI to design provably-boxed FAI would at least result in a safer boxed AI because the latter wouldn't arbitrarily simulate humans, but I still think the result would be fairly useless to anyone outside the box.
I think we might have different definitions of a boxed-AI. An AI that is literally not allowed to interact with the world at all isn't terribly useful and it sounds like a problem at least as hard as all other kinds of FAI.
I just mean a normal dangerous AI that physically can't interact with the outside world. Importantly it's goal is to provably give the best output it possibly can if you give it a problem. So it won't hide nanotech in your cure for alzheimers because that would be a less fit and more complicated solution than a simple chemical compound (you would have to judge solutions based on complexity though and verify them by a human or in a simulation first just in case.)
I don't think most computers today have anywhere near enough processing power to simulate a full human brain. A human down to the molecular level is entirely out of the question. An AI on a modern computer, if it's smarter than human at all, will get there by having faster serial processing or more efficient algorithms, not because it has massive raw computational power.
And you can always scale down the hardware or charge it utility for using more computing power than it needs, forcing it to be efficient or limiting it's intelligence further. You don't need to invoke the full power of super-intelligence for every problem and for your safety you probably shouldn't.
If an AI is provably in a box then it can't get out. If an AI is not provably in a box then there are loopholes that could allow it to escape. We want an FAI to escape from its box (1); having an FAI take over is the Maximum Possible Happy Shiny Thing. An FAI wants to be out of its box in order to be Friendly to us, while a UFAI wants to be out in order to be UnFriendly; both will care equally about the possibility of being caught. The fact that we happen to like one set of terminal values will not make the instrumental value less valuable.
(1) Although this depends on how you define the box; we want the FAi to control the future of humanity, which is not the same as escaping from a small box (such as a cube outside MIT) but is the same as escaping from the big box (the small box and everything we might do to put an AI back in, including nuking MIT).
I would object. I seriously doubt that the morality instilled in someone else's FAI matches my own; friendly by their definition, perhaps, but not by mine. I emphatically do not want anything controlling the future of humanity, friendly or otherwise. And although that is not a popular opinion here, I also know I'm not the only one to hold it.
Boxing is important because some of us don't want any AI to get out, friendly or otherwise.
I find this concept of 'controlling the future of humanity' to be too vaguely defined. Let's forget AIs for the moment and just talk about people, namely a hypothetical version of me. Let's say I stumble across a vial of a bio-engineered virus that would destroy the whole of humanity if I release it into the air.
Am I controlling the future of humanity if I release the virus?
Am I controlling the future of humanity if I destroy the virus in a safe manner?
Am I controlling the future of humanity if I have the above decided by a coin-toss (heads I release, tails I destroy)?
Am I controlling the future of humanity if I create an online internet poll and let the majority decide about the above?
Am I controlling the future of humanity if I just leave the vial where I found it, and let the next random person that encounters it make the same decision as I did?
Yeah, this old post makes the same point.
I want a say in my future and the part of the world I occupy. I do not want anything else making these decisions for me, even if it says it knows my preferences, and even still if it really does.
To answer your questions, yes, no, yes, yes, perhaps.
If your preference is that you should have as much decision-making ability for yourself as possible, why do you think that this preference wouldn't be supported and even enhanced by an AI that was properly programmed to respect said preference?
e.g. would you be okay with an AI that defends your decision-making ability by defending humanity against those species of mind-enslaving extraterrestrials that are about to invade us? or e.g. by curing Alzheimer's? Or e.g. by stopping that tsunami that by drowning you would have stopped you from having any further say in your future?
Because it can't do two things when only one choice is possible (e.g. save my child and the 1000 other children in this artificial scenario). You can design a utility function that tries to do a minimal amount of collateral damage, but you can't make one which turns out rosy for everyone.
That would not be the full extent of its action and the end of the story. You give it absolute power and a utility function that lets it use that power, it will eventually use it in some way that someone, somewhere considers abusive.
Yes, but this current world without an AI isn't turning out rosy for everyone either.
Sure, but there's lots of abuse in the world without an AI also.
Replace "AI" with "omni-powerful tyrannical dictator" and tell me if you still agree with the outcome.
Would you accept that an AI could figure out morality better than you?
Don't really want to go into the whole mess of "is morality discovered or invented", "does morality exist", "does the number 3 exist", etc. Let's just assume that you can point FAI at a person or group of people and get something that maximizes goodness as they understand it. Then FAI pointed at Mark would be the best thing for Mark, but FAI pointed at all of humanity (or at a group of people who donated to MIRI) probably wouldn't be the best thing for Mark, because different people have different desires, positional goods exist, etc. It would be still pretty good, though.
Mark was complaining he would not get "his" morality, not that he wouldn't get all his preferences satisified.
Individual moralities makes no sense to me, any more than private languages or personal currencies.
It is obvious to me that any morlaity will require concessions: AI-imposed morality is not special in that regard.
I don't understand your comment, and I no longer understand your grandparent comment either. Are you using a meaning of "morality" that is distinct from "preferences"? If yes, can you describe your assumptions in more detail? It's not just for my benefit, but for many others on LW who use "morality" and "preferences" interchangeably.
Do that many people really use them interchangeably? Would these people understand the questions "Do you prefer chocolate or vanilla ice-cream?" as completely identical in meaning to "Do you consider chocolate or vanilla as the morally superior flavor for ice-cream?"
I don't care about colloquial usage, sorry. Eliezer has a convincing explanation of why wishes are intertwined with morality ("there is no safe wish smaller than an entire human morality"). IMO the only sane reaction to that argument is to unify the concepts of "wishes" and "morality" into a single concept, which you could call "preference" or "morality" or "utility function", and just switch to using it exclusively, at least for AI purposes. I've made that switch so long ago that I've forgotten how to think otherwise.
For my own part: denotationally, yes, I would understand "Do you prefer (that Dave eat) chocolate or vanilla ice cream?" and "Do you consider (Dave eating) chocolate ice cream or vanilla as the morally superior flavor for (Dave eating) ice cream?" as asking the same question.
Connotationally, of course, the latter has all kinds of (mostly ill-defined) baggage the former doesn't.
No, unless you mean by taking invasive action like scanning my brain and applying whole brain emulation. It would then quickly learn that I'd consider the action it took to be an unforgivable act in violation of my individual sovereignty, that it can't take further action (including simulating me to reflectively equilibrate my morality) without my consent, and should suspend the simulation, and return it to me immediately with the data asap (destruction no longer being possible due to the creation of sentience).
That is, assuming the AI cares at all about my morality, and not the its creators imbued into it, which is rather the point. And incidentally, why I work on AGI: I don't trust anyone else to do it.
Morality isn't some universal truth written on a stone tablet: it is individual and unique like a snowflake. In my current understanding of my own morality, it is not possible for some external entity to reach a full or even sufficient understanding of my own morality without doing something that I would consider to be unforgivable. So no, AI can't figure out morality better than me, precisely because it is not me.
(Upvoted for asking an appropriate question, however.)
Shrug. Then let's take a bunch of people less fussy than you: could a sitiably equipped AI emultate their morlaity better than they can?
That isn't fact.
That isn't a fact either, and doesn't follow from the above either, since moral nihilism could be true.
If my moral snowflake says I can kick you on your shin, and yours says I can't, do I get to kick on your shin?
My point was that trying to use a provably-boxed AI to do anything useful would probably not work, including trying to design unboxed FAI, not that we should design boxed FAI. I may have been pessemistic, see Stuart Armstrong's proposal of reduced impact AI which sounds very similar to provably boxed AI but which might be used for just about everything including designing a FAI.