Houshalter comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: Pentashagon 16 September 2013 03:49:37PM 2 points [-]

A large risk is that a provably boxed but sub-Friendly AI would probably not care at all about simulating conscious humans.

A minor risk is that the provably boxed AI would also be provably useless; I can't think of a feasible path to FAI using only the output from the boxed AI; a good boxed AI would not perform any action that could be used to make an unboxed AI. That might even include performing any problem-solving action.

Comment author: Houshalter 01 October 2013 01:19:52AM 0 points [-]

I don't see why it would simulate humans as that would be a waste of computing power, if it even had enough to do so.

A boxed AI would be useless? I'm not sure how that would be. You could ask it to come up with ideas on how to build a friendly AI for example assuming that you can prove the AI won't manipulate the output or that you can trust that nothing bad can come from merely reading it and absorbing the information.

Short of that you could still ask it to cure cancer or invent a better theory of physics or design a method of cheap space travel, etc.

Comment author: VAuroch 11 January 2014 10:32:32AM 1 point [-]

If you can trust it to give you information on how to build a Friendly AI, it is already Friendly.

Comment author: Houshalter 22 January 2014 06:36:08AM 0 points [-]

You don't have to trust it, you just have to verify it. It could potentially provide some insights, and then it's up to you to think about them and make sure they actually are sufficient for friendliness. I agree that it's potentially dangerous but it's not necessarily so.

I did mention "assuming that you can prove the AI won't manipulate the output or that you can trust that nothing bad can come from merely reading it and absorbing the information". For instance it might be possible to create an AI whose goal is to maximize the value of it's output, and therefore would have no incentive to put trojan horses or anything into it.

You would still have to ensure that what the AI thinks you mean by the words "friendly AI" is what you actually want.

Comment author: VAuroch 22 January 2014 07:57:05PM -1 points [-]

If the AI is can design you a Friendly AI, it is necessarily able to model you well enough to predict what you will do once given the design or insights it intends to give you (whether those are AI designs or a cancer cure is irrelevant). Therefore, it will give you the specific design or insights that predictably lead to you to fulfill its utility function, which is highly dangerous if it is Unfriendly. By taking any information from the boxed AI, you have put yourself under the sight of a hostile Omega.

assuming that you can prove the AI won't manipulate the output

Since the AI is creating the output, you cannot possibly assume this.

or that you can trust that nothing bad can come from merely reading it and absorbing the information

This assumption is equivalent to Friendliness.

For instance it might be possible to create an AI whose goal is to maximize the value of it's output, and therefore would have no incentive to put trojan horses or anything into it.

You haven't thought through what that means. "maximize the value of it's output" by what standard? Does it have an internal measure? Then that's just an arbitrary utility function, and you have gained nothing. Does it use the external creator's measure? Then it has a strong incentive to modify you to value things it can produce easily. (i.e. iron atoms)

Comment author: Houshalter 28 February 2015 05:23:45AM -1 points [-]

You are making a lot of very strong assumptions that I don't agree with. Like it being able to control people just by talking to them.

But even if it could, it doesn't make it dangerous. Perhaps the AI has no long term goals and so doesn't care about escaping the box. Or perhaps it's goal is internal, like coming up with a design for something that can be verified by a simulator. E.g. asking for a solution to a math problem or a factoring algorithm, etc.

Comment author: VAuroch 03 March 2015 11:40:53AM -1 points [-]

A prerequisite for planning a Friendly AI is understanding individual and collective human values well enough to predict whether they would be satisfied with the outcome, which entails (in the logical sense) having a very well-developed model of the specific humans you interact with, or at least the capability to construct one if you so choose. Having a sufficiently well-developed model to predict what you will do given the data you are given is logically equivalent to a weak form of "control people just by talking to them".

To put that in perspective, if I understood the people around me well enough to predict what they would do given what I said to them, I would never say things that caused them to take actions I wouldn't like; if I, for some reason, valued them becoming terrorists, it would be a slow and gradual process to warp their perceptions in the necessary ways to drive them to terrorism, but it could be done through pure conversation over the course of years, and faster if they were relying on me to provide them large amounts of data they were using to make decisions.

And even the potential to construct this weak form of control that is initially heavily constrained in what outcomes are reachable and can only be expanded slowly is incredibly dangerous to give to an Unfriendly AI. If it is Unfriendly, it will want different things than its creators and will necessarily get value out of modeling them. And regardless of its values, if more computing power is useful in achieving its goals (an 'if' that is true for all goals), escaping the box is instrumentally useful.

And the idea of a mind with "no long term goals" is absurd on its face. Just because you don't know the long-term goals doesn't mean they don't exist.

Comment author: Jiro 03 March 2015 05:00:55PM 0 points [-]

A prerequisite for planning a Friendly AI is understanding individual and collective human values well enough to predict whether they would be satisfied with the outcome, which entails (in the logical sense) having a very well-developed model of the specific humans you interact with, or at least the capability to construct one if you so choose. Having a sufficiently well-developed model to predict what you will do given the data you are given is logically equivalent to a weak form of "control people just by talking to them".

By that reasoning, there's no such thing as a Friendly human. I suggest that most people when talking about friendly AIs do not mean to imply a standard of friendliness so strict that humans could not meet it.

Comment author: TheOtherDave 14 March 2015 08:25:27PM 1 point [-]

Yeah, what Vauroch said. Humans aren't close to Friendly. To the extent that people talk about "friendly AIs" meaning AIs that behave towards humans the way humans do, they're misunderstanding how the term is used here. (Which is very likely; it's often a mistake to use a common English word as specialized jargon, for precisely this reason.)

Relatedly, there isn't a human such that I would reliably want to live in a future where that human obtains extreme superhuman power. (It might turn out OK, or at least better than the present, but I wouldn't bet on it.)

Comment author: [deleted] 14 March 2015 08:41:32PM *  0 points [-]

Relatedly, there isn't a human such that I would reliably want to live in a future where that human obtains extreme superhuman power. (It might turn out OK, or at least better than the present, but I wouldn't bet on it.)

Just be careful to note that there isn't a binary choice relationship here. There are also possibilities where institutions (multiple individuals in a governing body with checks and balances) are pushed into positions of extreme superhuman power. There's also the possibility of pushing everybody who desires to be enhanced through levels of greater intelligence in lock step so as to prevent a single human or groups of humans achieving asymmetric power.

Comment author: VAuroch 14 March 2015 03:45:27PM 0 points [-]

By that reasoning, there's no such thing as a Friendly human.

True. There isn't.

I suggest that most people when talking about friendly AIs do not mean to imply a standard of friendliness so strict that humans could not meet it.

Well, I definitely do, and I'm at least 90% confident Eliezer does as well. Most, probably nearly all, of people who talk about Friendliness would regard a FOOMed human as Unfriendly.

Comment author: Houshalter 04 March 2015 02:59:17AM -1 points [-]

Having an accurate model of something is in no way equivalent to letting you do anything you want. If I know everything about physics, I still can't walk through walls. A boxed AI won't be able to magically make it's creators forget about AI risks and unbox it.

There are other possible set ups, like feeding it's output to another AI who's goal is to find any flaws or attempts at manipulation in it, and so on. Various other ideas might help, like threatening to severely punish attempts at manipulation.

This is of course only necessary for the AI who can interact with us at such a level, the other ideas were far more constrained, e.g. restricting it to solving math or engineering problems.

Nor is it necessary to let it be superintelligent, instead of limiting it to something comparable to high IQ humans.

And the idea of a mind with "no long term goals" is absurd on its face. Just because you don't know the long-term goals doesn't mean they don't exist.

Another super strong assumption with no justification at all. It's trivial to propose an AI model which only cares about finite time horizons. Predict what actions will have the highest expected utility at time T, take that action.

Comment author: VAuroch 14 March 2015 04:13:58PM -1 points [-]

A boxed AI won't be able to magically make it's creators forget about AI risks and unbox it.

The results of AI box game trials disagree.

t's trivial to propose an AI model which only cares about finite time horizons. Predict what actions will have the highest expected utility at time T, take that action.

And what does it do at time T+1? And if you said 'nothing', try again, because you have no way of justifying that claim. It may not have intentionally-designed long-term preferences, but just because your eyes are closed does not mean the room is empty.

Comment author: Houshalter 15 March 2015 09:26:24AM 1 point [-]

The results of AI box game trials disagree.

That doesn't prove anything, no one has even seen logs. Based on reading what people involved have said about it, I strongly suspect the trick is for the AI to emotionally abuse the gatekeeper until they don't want to play anymore (which counts as letting the AI out.)

This doesn't apply to the real world AI, since no one is forcing you to choose between letting the AI out, and listening to it for hours. You can just get up and leave. You can turn the AI off. There is no reason you even have to allow interactivity in the first place.

But Yudkowsky and others claim these experiments demonstrate that human brains are "hackable". That there is some sentence which, just by reading, will cause you to involuntarily perform any arbitrary action. And that a sufficiently powerful AI can discover it.

And what does it do at time T+1?

At time T+1, it does whatever it thinks will result in the greatest reward at time T+2, and so on. Or you could have it shut off or reset to a blank state.

Comment author: Pentashagon 01 October 2013 05:42:42AM *  1 point [-]

I don't see why it would simulate humans as that would be a waste of computing power, if it even had enough to do so.

If it interacts with humans or if humans are the subject of questions it needs to answer then it will probably find it expedient to simulate humans.

Short of that you could still ask it to cure cancer or invent a better theory of physics or design a method of cheap space travel, etc.

Curing cancer is probably something that would trigger human simulation. How is the boxed AI going to know for sure that it's only necessary to simulate cells and not entire bodies with brains experiencing whatever the simulation is trying?

Just the task of communicating with humans, for instance to produce a human-understandable theory of physics or how to build more efficient space travel, is likely to involve simulating humans to determine the most efficient method of communication. Consider that in subjective time it may be like thousands of years for the AI trying to explain in human terms what a better theory of physics means. Thousands of subjective years that the AI, with nothing better to do, could use to simulate humans to reduce the time it takes to transfer that complex knowledge.

You could ask it to come up with ideas on how to build a friendly AI for example assuming that you can prove the AI won't manipulate the output or that you can trust that nothing bad can come from merely reading it and absorbing the information.

A FAI provably in a box is at least as useless as an AI provably in a box because it would be even better at not letting itself out (e.g. it understands all the ways in which humans would consider it to be outside the box, and will actively avoid loopholes that would let an UFAI escape). To be safe, any provably boxed AI would have to absolutely avoid the creation of any unboxed AI as well. This would further apply to provably-boxed FAI designed by provably-boxed AI. It would also apply to giving humans information that allows them to build unboxed AIs, because the difference between unboxing itself and letting humans recreate it outside the box is so tiny that to design it to prevent the first while allowing the second would be terrifically unsafe. It would have to understand humans values before it could safely make the distinction between humans wanting it outside the box and manipulating humans into creating it outside the box.

EDIT: Using a provably-boxed AI to design provably-boxed FAI would at least result in a safer boxed AI because the latter wouldn't arbitrarily simulate humans, but I still think the result would be fairly useless to anyone outside the box.

Comment author: Houshalter 01 October 2013 08:10:36AM 0 points [-]

I think we might have different definitions of a boxed-AI. An AI that is literally not allowed to interact with the world at all isn't terribly useful and it sounds like a problem at least as hard as all other kinds of FAI.

I just mean a normal dangerous AI that physically can't interact with the outside world. Importantly it's goal is to provably give the best output it possibly can if you give it a problem. So it won't hide nanotech in your cure for alzheimers because that would be a less fit and more complicated solution than a simple chemical compound (you would have to judge solutions based on complexity though and verify them by a human or in a simulation first just in case.)

I don't think most computers today have anywhere near enough processing power to simulate a full human brain. A human down to the molecular level is entirely out of the question. An AI on a modern computer, if it's smarter than human at all, will get there by having faster serial processing or more efficient algorithms, not because it has massive raw computational power.

And you can always scale down the hardware or charge it utility for using more computing power than it needs, forcing it to be efficient or limiting it's intelligence further. You don't need to invoke the full power of super-intelligence for every problem and for your safety you probably shouldn't.

Comment author: Chrysophylax 09 January 2014 03:59:36PM -1 points [-]

If an AI is provably in a box then it can't get out. If an AI is not provably in a box then there are loopholes that could allow it to escape. We want an FAI to escape from its box (1); having an FAI take over is the Maximum Possible Happy Shiny Thing. An FAI wants to be out of its box in order to be Friendly to us, while a UFAI wants to be out in order to be UnFriendly; both will care equally about the possibility of being caught. The fact that we happen to like one set of terminal values will not make the instrumental value less valuable.

(1) Although this depends on how you define the box; we want the FAi to control the future of humanity, which is not the same as escaping from a small box (such as a cube outside MIT) but is the same as escaping from the big box (the small box and everything we might do to put an AI back in, including nuking MIT).

Comment author: [deleted] 10 January 2014 10:16:17AM 0 points [-]

We want an FAI to escape from its box (1); having an FAI take over is the Maximum Possible Happy Shiny Thing.

I would object. I seriously doubt that the morality instilled in someone else's FAI matches my own; friendly by their definition, perhaps, but not by mine. I emphatically do not want anything controlling the future of humanity, friendly or otherwise. And although that is not a popular opinion here, I also know I'm not the only one to hold it.

Boxing is important because some of us don't want any AI to get out, friendly or otherwise.

Comment author: ArisKatsaris 10 January 2014 01:02:39PM *  2 points [-]

I emphatically do not want anything controlling the future of humanity, friendly or otherwise.

I find this concept of 'controlling the future of humanity' to be too vaguely defined. Let's forget AIs for the moment and just talk about people, namely a hypothetical version of me. Let's say I stumble across a vial of a bio-engineered virus that would destroy the whole of humanity if I release it into the air.

Am I controlling the future of humanity if I release the virus?
Am I controlling the future of humanity if I destroy the virus in a safe manner?
Am I controlling the future of humanity if I have the above decided by a coin-toss (heads I release, tails I destroy)?
Am I controlling the future of humanity if I create an online internet poll and let the majority decide about the above?
Am I controlling the future of humanity if I just leave the vial where I found it, and let the next random person that encounters it make the same decision as I did?

Comment author: cousin_it 10 January 2014 01:25:25PM 1 point [-]

Yeah, this old post makes the same point.

Comment author: [deleted] 10 January 2014 08:29:08PM 0 points [-]

I want a say in my future and the part of the world I occupy. I do not want anything else making these decisions for me, even if it says it knows my preferences, and even still if it really does.

To answer your questions, yes, no, yes, yes, perhaps.

Comment author: ArisKatsaris 10 January 2014 08:35:09PM *  0 points [-]

If your preference is that you should have as much decision-making ability for yourself as possible, why do you think that this preference wouldn't be supported and even enhanced by an AI that was properly programmed to respect said preference?

e.g. would you be okay with an AI that defends your decision-making ability by defending humanity against those species of mind-enslaving extraterrestrials that are about to invade us? or e.g. by curing Alzheimer's? Or e.g. by stopping that tsunami that by drowning you would have stopped you from having any further say in your future?

Comment author: [deleted] 10 January 2014 08:41:06PM 1 point [-]

If your preference is that you should have as much decision-making ability for yourself as possible, why do you think that this preference wouldn't be supported and even enhanced by an AI that was properly programmed to respect said preference?

Because it can't do two things when only one choice is possible (e.g. save my child and the 1000 other children in this artificial scenario). You can design a utility function that tries to do a minimal amount of collateral damage, but you can't make one which turns out rosy for everyone.

e.g. would you be okay with an AI that defends your decision-making ability by defending humanity against those species of mind-enslaving extraterrestrials that are about to invade us? or e.g. by curing Alzheimer's? Or e.g. by stopping that tsunami that by drowning you would have stopped you from having any further say in your future?

That would not be the full extent of its action and the end of the story. You give it absolute power and a utility function that lets it use that power, it will eventually use it in some way that someone, somewhere considers abusive.

Comment author: ArisKatsaris 10 January 2014 09:43:04PM -1 points [-]

You can design a utility function that tries to do a minimal amount of collateral damage, but you can't make one which turns out rosy for everyone

Yes, but this current world without an AI isn't turning out rosy for everyone either.

That would not be the full extent of its action and the end of the story. You give it absolute power and a utility function that lets it use that power, it will eventually use it in some way that someone, somewhere considers abusive.

Sure, but there's lots of abuse in the world without an AI also.

Comment author: TheAncientGeek 10 January 2014 11:17:17AM 2 points [-]

Would you accept that an AI could figure out morality better than you?

Comment author: cousin_it 10 January 2014 12:00:55PM *  2 points [-]

Don't really want to go into the whole mess of "is morality discovered or invented", "does morality exist", "does the number 3 exist", etc. Let's just assume that you can point FAI at a person or group of people and get something that maximizes goodness as they understand it. Then FAI pointed at Mark would be the best thing for Mark, but FAI pointed at all of humanity (or at a group of people who donated to MIRI) probably wouldn't be the best thing for Mark, because different people have different desires, positional goods exist, etc. It would be still pretty good, though.

Comment author: TheAncientGeek 10 January 2014 12:31:37PM *  0 points [-]

Mark was complaining he would not get "his" morality, not that he wouldn't get all his preferences satisified.

Individual moralities makes no sense to me, any more than private languages or personal currencies.

It is obvious to me that any morlaity will require concessions: AI-imposed morality is not special in that regard.

Comment author: cousin_it 10 January 2014 12:47:30PM *  3 points [-]

I don't understand your comment, and I no longer understand your grandparent comment either. Are you using a meaning of "morality" that is distinct from "preferences"? If yes, can you describe your assumptions in more detail? It's not just for my benefit, but for many others on LW who use "morality" and "preferences" interchangeably.

Comment author: ArisKatsaris 10 January 2014 12:56:49PM 1 point [-]

but for many others on LW who use "morality" and "preferences" interchangeably.

Do that many people really use them interchangeably? Would these people understand the questions "Do you prefer chocolate or vanilla ice-cream?" as completely identical in meaning to "Do you consider chocolate or vanilla as the morally superior flavor for ice-cream?"

Comment author: TheAncientGeek 10 January 2014 04:26:18PM 0 points [-]

Are you using a meaning of "morality" that is distinct from "preferences"? You bet.

Comment author: [deleted] 10 January 2014 06:56:55PM *  1 point [-]

Would you accept that an AI could figure out morality better than you?

No, unless you mean by taking invasive action like scanning my brain and applying whole brain emulation. It would then quickly learn that I'd consider the action it took to be an unforgivable act in violation of my individual sovereignty, that it can't take further action (including simulating me to reflectively equilibrate my morality) without my consent, and should suspend the simulation, and return it to me immediately with the data asap (destruction no longer being possible due to the creation of sentience).

That is, assuming the AI cares at all about my morality, and not the its creators imbued into it, which is rather the point. And incidentally, why I work on AGI: I don't trust anyone else to do it.

Morality isn't some universal truth written on a stone tablet: it is individual and unique like a snowflake. In my current understanding of my own morality, it is not possible for some external entity to reach a full or even sufficient understanding of my own morality without doing something that I would consider to be unforgivable. So no, AI can't figure out morality better than me, precisely because it is not me.

(Upvoted for asking an appropriate question, however.)

Comment author: TheAncientGeek 14 January 2014 01:37:14PM 0 points [-]

No, unless you mean by taking invasive action like scanning my brain and applying whole brain emulation. It would then quickly learn that I'd consider the action it took to be an unforgivable act in violation of my individual sovereignty,

Shrug. Then let's take a bunch of people less fussy than you: could a sitiably equipped AI emultate their morlaity better than they can?

Morality isn't some universal truth written on a stone tablet:

That isn't fact.

it is individual and unique like a snowflake.

That isn't a fact either, and doesn't follow from the above either, since moral nihilism could be true.

If my moral snowflake says I can kick you on your shin, and yours says I can't, do I get to kick on your shin?

Comment author: Pentashagon 10 January 2014 03:31:18AM 0 points [-]

My point was that trying to use a provably-boxed AI to do anything useful would probably not work, including trying to design unboxed FAI, not that we should design boxed FAI. I may have been pessemistic, see Stuart Armstrong's proposal of reduced impact AI which sounds very similar to provably boxed AI but which might be used for just about everything including designing a FAI.