Comment author: orthonormal 29 December 2015 09:01:09PM 1 point [-]

I fear this misses an important reason why new work is needed on concept learning for superintelligent agents: straightforward clustering is not necessarily a good tool for concept learning when the space of possible actions is very large, and the examples and counterexamples cannot cover most of it.

To take a toy example from this post, imagine that we have built an AI with superhuman engineering ability, and we would like to set it the task of making us a burrito. We first present the AI with millions of acceptable burritos, along with millions of unacceptable burritos and objects that are not burritos at all. We then ask it to build us things that are more like the positive examples than like the negative examples.

I claim that this is likely to fail disastrously if it evaluates likeness by straightforward clustering in the space of observables it can scan about the examples. All our examples and counterexamples lie on the submanifold of "things we (and previous natural processes) are able to build", which has high codimension in the manifold of "things the AI is able to build".

A burrito with a tiny self-replicator nanobot inside, for instance, would cluster closer to all of the positive examples than to all of the negative examples, since there are no tiny self-replicating nanobots in any of the examples or counterexamples, and in all other respects it matches the examples better. (Or a toxic molecule that has never before occurred in nature or been built by humans, etc.)

The sense in which those would be poor attempts to learn the concept are simply not captured by straightforward clustering, and it's not enough to say that we should try non-parametric models, we would need to think about how a non-parametric model might do this well. (Here's an example of a parametric learner which tries to confront this problem.)

Comment author: orthonormal 09 May 2015 12:10:41PM 6 points [-]
Comment author: orthonormal 27 April 2015 11:56:32PM 1 point [-]

Can you put in a summary break, please? Some of us like to scroll through the r/all/new feed.

Comment author: Kaj_Sotala 17 April 2015 07:42:07AM 0 points [-]

Hmm. In a future post, I'm hoping to get to the question of "suppose that an AI could expand the way it has defined its existing concepts by including additional dimensions which humans are incapable of conceptualizing, and this led its values to diverge from human ones", and I agree that this post is not yet sufficient to solve that one. I think that's the same problem as you're talking about (if previously your concepts had N dimensions and now they have N+1, you could find something that fulfilled all the previous criteria while still being different from what we'd prefer if we knew about the N+1th dimension), but I'm not entirely sure?

Comment author: orthonormal 21 April 2015 12:12:02AM 2 points [-]

Yes, except I'm much more pessimistic about reinforcement learning sufficing, since I expect that a superhuman-engineering-capability AI would have, not just a few additional degrees of freedom, but incredibly many. And then it would not suffice for the AI to make its best guess about how to extrapolate human values to a world with nanotech and memehacking and (whatever else)- that would almost surely lead to disaster.

Comment author: orthonormal 16 April 2015 09:20:23PM 3 points [-]

I'm glad that you're thinking about these things, but this misses what I think is the hard part of the problem: truly out-of-sample cases. The thing that I'm worried about isn't that a superhuman AI will map (human beings suffering in a currently understood way) to the concept "good", but that it will have a lot of degrees of freedom of where to map (thing that is only possible with nanotech, which human brains aren't capable of fully understanding) or (general strategy for meme-hacking human brains, which human brains aren't able to conceptualize), etc, and that a process of picking the best action may be likely to pick up one of these edge cases that would differ from our extrapolated volitions.

Basically, I don't see how we can be confident yet that this continues to work once the AI is able to come up with creative edge cases that our brains aren't explicitly able to encompass or classify the way our extrapolated volitions would want. For an example of progress that might help with this, I might hope there's a clever way to regularize model selection so that they don't include edge cases of this sort, but I've not seen anything of that type.

Comment author: Benito 26 March 2015 12:20:42AM 4 points [-]

Right! So you're trying to get ahold of the idea of an intelligent computational agent, in clear formalisms, and trying to solve the basic issues that arise there. And often, the issues you discover at the fundamental mathematical level work their way through to the highly applied level.

That makes sense. I feel like this is the most direct answer to the question

... are you curious why MIRI does so much with mathematical logic, and why people on Less Wrong keep referring to Löb's Theorem?

Comment author: orthonormal 26 March 2015 06:36:48PM 4 points [-]

Thanks! I'll try and work that into the introduction.

In response to comment by Venu on Selling Nonapples
Comment author: pnrjulius 27 April 2012 11:08:11PM 0 points [-]

Isn't supervised learning the current method of achieving friendly natural intelligence?

(Most insane psychopaths had bad parents, didn't they?)

Comment author: orthonormal 26 March 2015 06:33:58PM 1 point [-]

Isn't supervised learning the current method of achieving friendly natural intelligence?

Yes, because we get to start with a prefabricated Friendliness-compatible architecture.

(Most insane psychopaths had bad parents, didn't they?)

Probably yes, but that doesn't distinguish "bad parenting" from "psychopath genes".

Comment author: dankane 26 March 2015 05:16:49AM 0 points [-]

Yes, obviously. We solve the Lobstacle by not ourselves running on formal systems and sometimes accepting axioms that we were not born with (things like PA). Allowing the AI to only do things that it can prove will have good consequences using a specific formal system would make it dumber than us.

Comment author: orthonormal 26 March 2015 06:29:04PM 2 points [-]

I think, rather, that humans solve decision problems that involve predicting other human deductive processes by means of some evolved heuristics for social reasoning that we don't yet fully understand on a formal level. "Not running on formal systems" isn't a helpful answer for how to make good decisions.

Comment author: dankane 26 March 2015 04:46:41AM 3 points [-]

Actually, why is it that when the Lobian obstacle is discussed that it seem to always be in reference to an AI trying to determine if a successor AI is safe, and not an AI trying to determine whether or not it, itself, is safe?

Comment author: orthonormal 26 March 2015 06:23:31PM 2 points [-]

Because we're talking about criteria for action, not epistemology. The heart of the Lobstacle problem is that straightforward ways of evaluating the consequences of actions start to break down when those consequences involve the outcomes of deductive processes equal to or greater than the one brought to bear.

Comment author: dankane 26 March 2015 04:11:59AM 2 points [-]

Question: If we do manage to build a strong AI, why not just let it figure this problem out on its own when trying to construct a successor? Almost definitionally, it will do a better job of it than we will.

Comment author: orthonormal 26 March 2015 06:19:32PM 3 points [-]

The biggest problem with deferring the Lobstacle to the AI is that you could have a roughly human-comparable AI which solves the Lobstacle in a hacky way, which changes the value system for the successor AI, which is then intelligent enough to solve the Lobstacle perfectly and preserve that new value system. So now you've got a superintelligent AI locked in on the wrong target.

View more: Next