Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: AndreInfante 25 August 2015 09:11:46PM 2 points [-]

But that might be quite a lot of detail!

In the example of curing cancer, your computational model of the universe would need to include a complete model of every molecule of every cell in the human body, and how it interacts under every possible set of conditions. The simpler you make the model, the more you risk cutting off all of the good solutions with your assumptions (or accidentally creation false solutions due to your shortcuts). And that's just for medical questions.

I don't think it's going to be possible for an unaided human to construct a model like that for a very long time, and possibly not ever.

Comment author: Stuart_Armstrong 26 August 2015 01:49:40PM 0 points [-]

Indeed (see my comment on the problem with simplified model being unsolved).

However, it's a different kind of problem to standard FAI (it's "simply" a question of getting a precise enough model, and not a philosophically open problem), and there are certainly simpler versions that are tractable.

Comment author: RolfAndreassen 26 August 2015 04:39:36AM 6 points [-]

If the excellent simulation of a human with cancer is conscious, you've created a very good torture chamber, complete with mad vivisectionist AI.

Comment author: Stuart_Armstrong 26 August 2015 01:46:59PM 0 points [-]

I have to be honest: I hadn't considered that angle yet (I tend to create ideas first, then hone them and remove issues).

The first point is that this was just an example, the first one to occur to me, and we can certainly find safer examples or improve this one.

The second is that torture is very unlikely - death, maybe painful death, but not deliberate torture.

The third is that I know some people who might be willing to go through with this, if it cured cancer through the world.

But I will have to be more careful in these issues in future, thanks.

Comment author: buybuydandavis 26 August 2015 02:51:15AM 0 points [-]

The point of the NFL theorems in practice is to keep you from getting your hopes up that you'll get a free lunch.

Comment author: Stuart_Armstrong 26 August 2015 01:44:45PM 0 points [-]

So the point of no free lunch theorems is to tell you you won't get a free lunch? ^_^

Comment author: OrphanWilde 25 August 2015 02:16:39PM 7 points [-]

The AI figures out that the only way to truly cure its subjects' cancers is to take over that meddlesome "real world" to stop people from giving its virtual subjects cancers.

That is, the fact that the real world is tampering with the virtual world gives the real world a relevant connection to that virtual world, and thus a relevant connection to an AI whose motivations are based solely on the virtual world.

Comment author: Stuart_Armstrong 25 August 2015 04:03:57PM 2 points [-]

That's why you want to specify the AIs motivation entirely in terms of the model, and reset it from one episode to the next.

Comment author: cousin_it 24 August 2015 08:22:57PM *  1 point [-]

Yeah, this should work correctly, assuming that the AI's prior specifies just one mathematical world, rather than e.g. a set of possible mathematical worlds weighted by simplicity. I posted about something similar five years ago.

The application to "fake cancer" is something that hadn't occurred to me, and it seems like a really good idea at first glance.

Comment author: Stuart_Armstrong 25 August 2015 10:27:12AM 1 point [-]

Thanks, that's useful. I'll think how to formalise this correctly. Ideally I want a design where we're still safe if a) the AI knows, correctly, that pressing a button will give it extra resources, but b) still doesn't press it because its not part of its description.

Comment author: AndreInfante 24 August 2015 07:58:36PM 2 points [-]

I think there's a question of how we create an adequate model of the world for this idea to work. It's probably not practical to build one by hand, so we'd likely need to hand the task over to an AI.

Might it be possible to use the modelling module of an AI in the absence of the planning module? (or with a weak planning module) If so, you might be able to feed it a great deal of data about the universe, and construct a model that could then be "frozen" and used as the basis for the AI's "virtual universe."

Comment author: Stuart_Armstrong 25 August 2015 10:10:02AM 0 points [-]

I think there's a question of how we create an adequate model of the world

Generally, we don't. A model of the (idealised) computational process of the AI is very simple compared with the real world, and the rest of the model just needs to include enough detail for the problem we're working on.

The virtual AI within its virtual world

3 Stuart_Armstrong 24 August 2015 04:42PM

A putative new idea for AI control; index here.

In a previous post, I talked about an AI operating only on a virtual world (ideas like this used to be popular, until it was realised the AI might still want to take control of the real world to affect the virtual world; however, with methods like indifference, we can guard against this much better).

I mentioned that the more of the AI's algorithm that existed in the virtual world, the better it was. But why not go the whole way? Some people at MIRI and other places are working on agents modelling themselves within the real world. Why not have the AI model itself as an agent inside the virtual world? We can quine to do this, for example.

Then all the restrictions on the AI - memory capacity, speed, available options - can be specified precisely, within the algorithm itself. It will only have the resources of the virtual world to achieve its goals, and this will be specified within it. We could define a "break" in the virtual world (ie any outside interference that the AI could cause, were it to hack us to affect its virtual world) as something that would penalise the AI's achievements, or simply as something impossible according to its model or beliefs. It would really be a case of "given these clear restrictions, find the best approach you can to achieve these goals in this specific world".

It would be idea if the AI's motives were not given in terms of achieving anything in the virtual world, but in terms of making the decisions that, subject to the given restrictions, were most likely to achieve something if the virtual world were run in its entirety. That way the AI wouldn't care if the virtual world were shut down or anything similar. It should only seek to self modify in way that makes sense within the world, and understand itself existing completely within these limitations.

Of course, this would ideally require flawless implementation of the code; we don't want bugs developing in the virtual world that point to real world effects (unless we're really confident we have properly coded the "care only about the what would happen in the virtual world, not what actually does happen).

Any thoughts on this idea?

 

AI, cure this fake person's fake cancer!

9 Stuart_Armstrong 24 August 2015 04:42PM

A putative new idea for AI control; index here.

An idea for how an we might successfully get useful work out of a powerful AI.

 

The ultimate box

Assume that we have an extremely detailed model of a sealed room, with a human in it and enough food, drink, air, entertainment, energy, etc... for the human to survive for a month. We have some medical equipment in the room - maybe a programmable set of surgical tools, some equipment for mixing chemicals, a loud-speaker for communication, and anything else we think might be necessary. All these objects are specified within the model.

We also have some defined input channels into this abstract room, and output channels from this room.

The AI's preferences will be defined entirely with respect to what happens in this abstract room. In a sense, this is the ultimate AI box: instead of taking a physical box and attempting to cut it out from the rest of the universe via hardware or motivational restrictions, we define an abstract box where there is no "rest of the universe" at all.

 

Cure cancer! Now! And again!

What can we do with such a setup? Well, one thing we could do is to define the human in such a way that they have some from of advanced cancer. We define what "alive and not having cancer" counts as, as well as we can (the definition need not be fully rigorous). Then the AI is motivated to output some series of commands to the abstract room that results in the abstract human inside not having cancer. And, as a secondary part of its goal, it outputs the results of its process.

continue reading »
Comment author: Wei_Dai 23 August 2015 09:52:42PM 0 points [-]

By changing the prior, you can make an AIXI agent explore more if it receives one set of inputs and also explore less if it receives another set of inputs. You can't do this by changing an "exploration rate", unless you're using some technical definition where it's not a scalar number?

Comment author: Stuart_Armstrong 24 August 2015 10:30:28AM 0 points [-]

Given arbitrary computing power and full knowledge of the actual environment, these are equivalent. But, as you point out, in practice they're going to be different. For us, something simple like a "exploration rate" is probably more understandable for what the AIXI's actions will look like.

Comment author: mavant 23 August 2015 07:46:14PM 0 points [-]

Third obvious possibility: B maximises u~Σpivi, subject to the constraints E(Σpivi|B) ≥ E(Σpivi|A) and E(u|B) ≥ E(u|A). where ~ is some simple combining operation like addition or multiplication, or "the product of A and B divided by the sum of A and B".

I think these possibilities all share the problem that the constraint makes it essentially impossible to choose any action other than what A would have chosen. If A chose the action that maximized u, then B cannot choose any other action while satisfying the constraint E(u|B) ≥ E(u|A) unless there were multiple actions that had the exact same payoff (which seems unlikely if payoff values are distributed over the reals, rather than over a finite set). And the first possibility (to maximize u while respecting E(Σpivi|B) ≥ E(Σpivi|A) ) just results in choosing the exact same action as A would have chosen, even if there's another action that has an identical E(u) AND higher E(Σpivi).

Comment author: Stuart_Armstrong 24 August 2015 10:29:04AM 0 points [-]

I think these possibilities all share the problem that the constraint makes it essentially impossible to choose any action other than what A would have chosen.

I see I've miscommunicated the central idea. Let U be the proposition "the agent will remain a u maximiser forever". Agent A acts as if P(U)=1 (see the entry on value learning). In reality, P(U) is probably very low. So A is a u-maximiser, but a u-maximiser that acts on false beliefs.

Agent B is is allowed to have a better estimate of P(U). Therefore it can find actions that increase u beyond what A would do.

Example: u values rubies deposited in the bank. A will just collect rubies until it can't carry them any more, then go deposit them in the bank. B, knowing that u will change to something else before A has finished collecting rubies, rushes to the bank ahead of that deadline. So E(u|B) > E(u|A).

And, of course, if B can strictly increase E(u), that gives it some slack to select other actions that can increase (Σpivi).

View more: Next