Comment author: ChristianKl 29 August 2016 03:17:57PM 1 point [-]

(1) Given: AI risk comes primarily from AI optimizing for things besides human values.

I don't that's a good description of the orthogonality thesis. An AI that optimizes for a single human value like purity could still produce huge problems.

Given: humans already are optimizing for things besides human values.

Human's don't effectively self modify to achieve specific objectives in the way an AGI could.

(6) Given: Partial optimization for human values is easier than total optimization. (Where "partial optimization" is at least close enough to achieve an okay outcome.)

Why do you believe that?

Comment author: WhySpace 29 August 2016 10:12:41PM 0 points [-]

I don't that's a good description of the orthogonality thesis.

Probably not, but it highlights the relevant (or at least related) portion. I suppose I could have been more precise by specifying terminal values, since things like paperclips are obviously instrumental values, at least for us.

Human's don't effectively self modify

Agreed, except in the trivial case where we can condition ourselves to have different emotional responses. That's substantially less dangerous, though.

Partial optimization for human values is easier than total optimization.

Why do you believe that?

I'm not sure I do, in the sense that I wouldn't assign the proposition >50% probability. However, I might put the odds at around 25% for a Reduced Impact AI architecture providing a useful amount of shortcuts.

That seems like decent odds of significantly boosting expected utility. If such an AI would be faster to develop by even just a couple years, that could make the difference between winning and loosing an AI arms race. Sure, it'd be at the cost of a utopia, but if it boosted the odds of success enough it'd still have enough expected utility to compensate.

In response to The call of the void
Comment author: WhySpace 28 August 2016 09:38:50PM 1 point [-]

I occasionally experience this, but I've never assigned it strong positive or negative affect/valence. I'm high in openness to experience, so I just kinda thought it was an academically interesting phenomenon, and haven't thought much of it, much less lost any sleep over it. It's just interesting in the same way that this is interesting.

If anyone takes it too seriously, I recommend this approach. Just clicking the xkcd link will help your brain more strongly associate the humorous and fascinating bits about it with the call of the void. I recommend trying to set up a Trigger Action Plan, so that you think of the xkcd any time you experience the call of the void.

Comment author: turchin 28 August 2016 10:34:44AM 1 point [-]

I only said that it would reduce chance of stupid decisions resulting from not understanding basic human words and values. But it would not reduce chances of deliberately malicious AI.

There are (at least) two different type of UFAI: real UFAI and failed FAI. Failed FAI wanted to be good but failed, and the best example of it smile maximizer which will cover all Solar system with smiles. (Paperclip maximizer also is some form of failed FAI, as initial goal was positive - produce many paperclips)

So it is not full recipe for real FAI, but just one way of value learning

Comment author: WhySpace 28 August 2016 05:15:33PM 1 point [-]

I'm still not sure I understand you correctly. I suspect that if we follow this to the end, we will discover that we are only arguing semantics, and don't actually disagree over anything tangible. If that's your impression too, please say so, and we'll both save ourselves some time.

I only said that it would reduce chance of stupid decisions

I wouldn't disagree that having such an operator is better than not having one. I am questioning the value of having the operator uploaded. Why would programing an AI to care about the operator's values and not manipulate the operator be easier if the operator is uploaded? Wouldn't the operator just be manipulated even faster?

The only answer I see to that is that the uploading part is just to provide a faster and better user interface. If value loading was done via a game of 20 billion questions, for example, this would take an impractically long time. (Thousands of years, if just one person at a time is answering questions.) Same goes if the AI learns values via machine learning, using rewards and punishments given out by the operator, although you'd still have to keep it from wire-heading by manipulating the operator. Also, as an interesting aside, it may be easier to pull values directly out of someone's brain.

If we're only arguing about semantics, however, I have a guess at the source:

I understand "failed FAI" to be something like a pure smile maximizer, which has just as much incentive to route around human operators as a paperclip maximizer or suffering maximizer. It wouldn't care about our values any more than we care about what sorts of values evolution tried to give us. The unstated assumption here is that value uploading failed or never happened, and the AI is no longer trying to load values, but only implement the values it has. I believe this is what you're gesturing toward with "real UFAI".

Do you understand "failed FAI" to be one which simply misunderstood our values, like a smile maximizer, but which never exited the value loading phase? This sort of AI might have some sort of "uncertainty" about it's utility function. If so, it might still care about what values we intended to give it.

Comment author: turchin 27 August 2016 07:39:20PM 1 point [-]

Ask AI to scan at least one human brain with Ph.d in ethics and age after 40, and let the AI run it in a simulation to give judgment of all AI's decisions. It will dramatically reduce chances of many obviously stupid decisions.

Comment author: WhySpace 28 August 2016 03:52:46AM 1 point [-]

I don't mean to sound dismissive, but is that any better than any other boxing technique, like requiring it to ask verbal permission of a physical operator?

Unless I'm missing something, the AI would still have all the same incentives to influence the operator's answers, and solving those problems would be just as difficult for a digital operator as a physical one.

Comment author: WhySpace 28 August 2016 03:44:40AM 1 point [-]

I actually brought up a similar question in the open thread, but it didn't really go very far. May or may not be worth reading, but it's still not clear to me whether such a thing is even practical. It's likely that all substantially easier AIs are too far from FAI to still be a net good.

I've come a little closer to answering my questions by stumbling on this Future of Humanity Institute video on "Reduced Impact AI". Apparently that's the technical term for it. I haven't had a chance to look for papers on the subject, but perhaps some exist. No hits on google scholar, but a quick search shows a couple mentions on LW and MIRI's website.

Comment author: pcm 24 August 2016 03:29:20PM 0 points [-]

I expect that MIRI would mostly disagree with claim 6.

Can you suggest something specific that MIRI should change about their agenda?

When I try to imagine problems for which imperfect value loading suggests different plans from perfectionist value loading, I come up with things like "don't worry about whether we use the right set of beings when creating a CEV". But MIRI gives that kind of problem low enough priority that they're acting as if they agreed with imperfect value loading.

Comment author: WhySpace 24 August 2016 04:35:59PM *  0 points [-]

I'm pretty sure I also mostly disagree with claim 6. (See my other reply below.)

The only specific concrete change that comes to mind is that it may be easier to take one person's CEV than aggregate everyone's CEV. However, this is likely to be trivially true, if the aggregation method is something like averaging.

If that's 1 or 2 more lines of code, then obviously it doesn't really make sense to try and put those lines in last to get FAI 10 seconds sooner, except in a sort of spherical cow in a vacuum sort of sense. However, if "solving the aggregation problem" is a couple years worth of work, maybe it does make sense to prioritize other things first in order to get FAI a little sooner. This is especially true in the event of an AI arms race.

I’m especially curious whether anyone else can come up with scenarios where a maxipok strategy might actually be useful. For instance, is there any work being done on CEV which is purely on the extrapolation procedure or procedures for determining coherence? It seems like if only half our values can easily be made coherent, and we can load them into an AI, that might generate an okay outcome.

Comment author: Manfred 23 August 2016 11:29:08PM 0 points [-]

1) Sure.
2) Okay.
3) Yup.
4) This is weasely. Sure, 1-3 are enough to establish that an okay outcome is possible, but don't really say anything about probability. You also don't talk about how good of an optimization process is trying to optimize these values.

5) Willing to assume for the sake of argument.
6) Certainly true but not certainly useful.
7) Doesn't follow, unless you read 6 in a way that makes it potentially untrue.

All of this would make more sense if you tried to put probabilities to how likely you think certain outcomes are.

Comment author: WhySpace 24 August 2016 01:40:53PM *  0 points [-]

That was pretty much my take. I get the feeling that "okay" outcomes are a vanishingly small portion of probability space. This suggests to me that the additional marginal effort to stipulate "okay" outcomes instead of perfect CEV is extremely small, if not negative. (By negative, I mean that it would actually take additional effort to program an AI to maximize for "okay" outcomes instead of CEV.)

However, I didn't want to ask a leading question, so I left it in the present form. It's perhaps academically interesting that the desirability of outcomes as a function of “similarity to CEV” is a continuous curve rather than a binary good/bad step function. However, I couldn't really see any way of taking advantage of this. I posted mainly to see if others might spot potential low hanging fruit.

I guess the interesting follow up questions are these: Is there any chance that humans are sufficiently adaptable that human values are more than just an infinitesimally small sliver of the set of all possible values? If so, is there any chance this enables an easier alternative version of the control problem? It would be nice to have a plan B.

Comment author: WhySpace 23 August 2016 06:26:08PM *  2 points [-]

(1) Given: AI risk comes primarily from AI optimizing for things besides human values.

(2) Given: humans already are optimizing for things besides human values. (or, at least besides our Coherent Extrapolated Volition)

(3) Given: Our world is okay.^[CITATION NEEDED!]

(4) Therefore, imperfect value loading can still result in an okay outcome.

This is, of course, not necessarily always the case for any given imperfect value loading. However, our world serves as a single counterexample to the rule that all imperfect optimization will be disastrous.

(5) Given: A maxipok strategy is optimal. ("Maximize the probability of an okay outcome.")

(6) Given: Partial optimization for human values is easier than total optimization. (Where "partial optimization" is at least close enough to achieve an okay outcome.)

(7) ∴ MIRI should focus on imperfect value loading.

Note that I'm not convinced of several of the givens, so I'm not certain of the conclusion. However, the argument itself looks convincing to me. I’ve also chosen to leave assumptions like “imperfect value loading results in partial optimization” unstated as part of the definitions of those 2 terms. However, I’ll try and add details to any specific areas, if questioned.

Comment author: AlexMennen 17 August 2016 04:23:28PM 0 points [-]

People who worry about FAI are likely to also be enthusiastic about uploading, but I'm not sure if the average person who is enthusiastic about uploading is worried about FAI.

Right, that's why I said it would probably be a smaller source of selection, but the correlation is still strong, and goes in the preferred direction.

Comment author: WhySpace 17 August 2016 07:45:00PM 0 points [-]

Ah, understood. We're on the same page, then.

In response to comment by WhySpace on Identity map
Comment author: turchin 17 August 2016 06:00:34PM 0 points [-]

I don't think it is true, I just said that it follows from the definition of identity based on ability to remember past moments.

In real life we use more complex understanding of identity, where it is constantly verified by multiple independent channels (I am overstraching here). if i don't remember what I did yesterday, there still my non changing attributes like name, and also causal continuity of experiences.

In response to comment by turchin on Identity map
Comment author: WhySpace 17 August 2016 07:34:34PM 1 point [-]

Ah, thanks for the clarification. I interpreted it as a reductio ad absurdum.

View more: Prev | Next