Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: orthonormal 25 May 2017 10:13:07PM 19 points [-]

In the spirit of Murphyjitsu, the most obvious failure mode that you didn't mention is that I expect you to burn out dramatically after a few weeks, from exhaustion or the psychological strain of trying to optimize the experiences of N people. The bootcamp phase is not analogous to anything I've heard of you doing sustainably for an extended period of time.

So, do you expect Dragon Army Barracks to work if Eli has to take over for you in Week Four?

Comment author: orthonormal 10 January 2017 11:49:12PM 1 point [-]

How did this post get attributed to [deleted] instead of to Eliezer? I'm 99% sure this post was by him, and the comments seem to bear it out.

Comment author: orthonormal 27 December 2016 07:11:42PM *  2 points [-]

This sweeps some of the essential problems under the rug; if you formalize it a bit more, you'll see them.

It's not an artificial restriction, for instance, that a Solomonoff Induction oracle machine doesn't include things like itself in its own hypothesis class, since the question of "whether a given oracle machine matches the observed data" is a question that sometimes cannot be answered by an oracle machine of equivalent power. (There are bounded versions of this obstacle as well.)

Now, there are some ways around this problem (all of them, so far as I know, found by MIRI): modal agents, reflective oracle machines and logical inductors manage to reason about hypothesis classes that include objects like themselves. Outside of MIRI, people working on multiagent systems make do with agents that each assume the other is smaller/simpler/less meta than itself (so at least one of those agents is going to be wrong).

But this entire problem is hidden in your assertion that the agent, which is a Turing machine, "models the entire wrold, including the agent it self, as one unknown, output only Turing machine". The only way to find the other problems swept under the rug here is to formalize or otherwise unpack your proposal.

Comment author: orthonormal 03 December 2016 05:58:39AM 12 points [-]

If CFAR will be discontinuing/de-emphasizing rationality workshops for the general educated public, then I'd like to see someone else take up that mantle, and I'd hope that CFAR would make it easy for such a startup to build on what they've learned so far.

Comment author: orthonormal 29 December 2015 09:01:09PM 1 point [-]

I fear this misses an important reason why new work is needed on concept learning for superintelligent agents: straightforward clustering is not necessarily a good tool for concept learning when the space of possible actions is very large, and the examples and counterexamples cannot cover most of it.

To take a toy example from this post, imagine that we have built an AI with superhuman engineering ability, and we would like to set it the task of making us a burrito. We first present the AI with millions of acceptable burritos, along with millions of unacceptable burritos and objects that are not burritos at all. We then ask it to build us things that are more like the positive examples than like the negative examples.

I claim that this is likely to fail disastrously if it evaluates likeness by straightforward clustering in the space of observables it can scan about the examples. All our examples and counterexamples lie on the submanifold of "things we (and previous natural processes) are able to build", which has high codimension in the manifold of "things the AI is able to build".

A burrito with a tiny self-replicator nanobot inside, for instance, would cluster closer to all of the positive examples than to all of the negative examples, since there are no tiny self-replicating nanobots in any of the examples or counterexamples, and in all other respects it matches the examples better. (Or a toxic molecule that has never before occurred in nature or been built by humans, etc.)

The sense in which those would be poor attempts to learn the concept are simply not captured by straightforward clustering, and it's not enough to say that we should try non-parametric models, we would need to think about how a non-parametric model might do this well. (Here's an example of a parametric learner which tries to confront this problem.)

Comment author: orthonormal 09 May 2015 12:10:41PM 6 points [-]
Comment author: orthonormal 27 April 2015 11:56:32PM 1 point [-]

Can you put in a summary break, please? Some of us like to scroll through the r/all/new feed.

Comment author: Kaj_Sotala 17 April 2015 07:42:07AM 0 points [-]

Hmm. In a future post, I'm hoping to get to the question of "suppose that an AI could expand the way it has defined its existing concepts by including additional dimensions which humans are incapable of conceptualizing, and this led its values to diverge from human ones", and I agree that this post is not yet sufficient to solve that one. I think that's the same problem as you're talking about (if previously your concepts had N dimensions and now they have N+1, you could find something that fulfilled all the previous criteria while still being different from what we'd prefer if we knew about the N+1th dimension), but I'm not entirely sure?

Comment author: orthonormal 21 April 2015 12:12:02AM 2 points [-]

Yes, except I'm much more pessimistic about reinforcement learning sufficing, since I expect that a superhuman-engineering-capability AI would have, not just a few additional degrees of freedom, but incredibly many. And then it would not suffice for the AI to make its best guess about how to extrapolate human values to a world with nanotech and memehacking and (whatever else)- that would almost surely lead to disaster.

Comment author: orthonormal 16 April 2015 09:20:23PM 3 points [-]

I'm glad that you're thinking about these things, but this misses what I think is the hard part of the problem: truly out-of-sample cases. The thing that I'm worried about isn't that a superhuman AI will map (human beings suffering in a currently understood way) to the concept "good", but that it will have a lot of degrees of freedom of where to map (thing that is only possible with nanotech, which human brains aren't capable of fully understanding) or (general strategy for meme-hacking human brains, which human brains aren't able to conceptualize), etc, and that a process of picking the best action may be likely to pick up one of these edge cases that would differ from our extrapolated volitions.

Basically, I don't see how we can be confident yet that this continues to work once the AI is able to come up with creative edge cases that our brains aren't explicitly able to encompass or classify the way our extrapolated volitions would want. For an example of progress that might help with this, I might hope there's a clever way to regularize model selection so that they don't include edge cases of this sort, but I've not seen anything of that type.

Comment author: Benito 26 March 2015 12:20:42AM 4 points [-]

Right! So you're trying to get ahold of the idea of an intelligent computational agent, in clear formalisms, and trying to solve the basic issues that arise there. And often, the issues you discover at the fundamental mathematical level work their way through to the highly applied level.

That makes sense. I feel like this is the most direct answer to the question

... are you curious why MIRI does so much with mathematical logic, and why people on Less Wrong keep referring to Löb's Theorem?

Comment author: orthonormal 26 March 2015 06:36:48PM 4 points [-]

Thanks! I'll try and work that into the introduction.

View more: Next