The virtual AI within its virtual world

Stuart_Armstrong

A putative new idea for AI control; index here.

In a previous post, I talked about an AI operating only on a virtual world (ideas like this used to be popular, until it was realised the AI might still want to take control of the real world to affect the virtual world; however, with methods like indifference, we can guard against this much better).

I mentioned that the more of the AI's algorithm that existed in the virtual world, the better it was. But why not go the whole way? Some people at MIRI and other places are working on agents modelling themselves within the real world. Why not have the AI model itself as an agent inside the virtual world? We can quine to do this, for example.

Then all the restrictions on the AI - memory capacity, speed, available options - can be specified precisely, within the algorithm itself. It will only have the resources of the virtual world to achieve its goals, and this will be specified within it. We could define a "break" in the virtual world (ie any outside interference that the AI could cause, were it to hack us to affect its virtual world) as something that would penalise the AI's achievements, or simply as something impossible according to its model or beliefs. It would really be a case of "given these clear restrictions, find the best approach you can to achieve these goals in this specific world".

It would be idea if the AI's motives were not given in terms of achieving anything in the virtual world, but in terms of making the decisions that, subject to the given restrictions, were most likely to achieve something if the virtual world were run in its entirety. That way the AI wouldn't care if the virtual world were shut down or anything similar. It should only seek to self modify in way that makes sense within the world, and understand itself existing completely within these limitations.

Of course, this would ideally require flawless implementation of the code; we don't want bugs developing in the virtual world that point to real world effects (unless we're really confident we have properly coded the "care only about the what would happen in the virtual world, not what actually does happen).

Any thoughts on this idea?

A putative new idea for AI control; index here.

Any thoughts on this idea?

I like your second argument better. The first, I think, holds no water.

There are basically 2 explanations of morality, the pragmatic and the moral.

By pragmatic I mean the explanation that "moral" acts ultimately are a subset of the acts that increase our utility function. This includes evolutionary psychology, kin selection, and group selection explanations of morality. It also includes most pre-modern in-group/out-group moralities, like Athenian or Roman morality, and Nietzsche's consequentialist "master morality". A key problem with this approach is that if you say something like, "These African slaves seem to be humans rather like me, and we should treat them better," that is a malfunctioning of your morality program that will decrease your genetic utility.

The moral explanation posits that there's a "should" out there in the universe. This includes most modern religious morality, though many old (and contemporary) tribal religions were pragmatic and made practical claims (don't do this or the gods will be angry), not moral ones.

Modern Western humanistic morality can be interpreted either way. You can say the rule not to hurt people is moral, or you can say it's an evolved trait that gives higher genetic payoff.

The idea that we give moral standing to things like humans doesn't work in either approach. If morality is in truth pragmatic, then you'll assign them moral standing if they have enough power for it to be beneficial for you to do so, and otherwise not, regardless of whether they're like humans or not. (Whether or not you know that's what you're doing.) Explanation of morality of pragmatic easily explains the popularity of slavery.

"Moral" morality, from my shoes, seems incompatible with the idea that we assign moral standing to things for looking or thinking like us. I feel no "oughtness" to "we should treat agents different from us like objects." For one thing, it implies racism is morally right, and probably an obligation. For another, it's pretty much exactly what most "moral leaders" have been trying to overcome for the past 2000 years.

It feels to me like what you're doing is starting out by positing morality is pragmatic, and so we expect by default to assign moral status to things like us because that's always a pragmatic thing to do and we've never had to admit moral status to things not like us. Then you extrapolate it into this novel circumstance, in which it might be beneficial to mutually agree with AIs that each of us has moral status. You've already agreed that morals are pragmatic at root, but you are consciously following your own evolved pragmatic programming, which tells you to accept as moral agents things that look like you. So you say, "Okay, I'll just apply my evolved morality program, which I know is just a set of heuristics for increasing my genetic fitness and has no compelling oughtness to it, in this new situation, regardless of the outcome." So you're self-consciously trying to act like an animal that doesn't know its evolved moral program has no oughtness to it. That's really strange.

If you mean that humans are stupid and they'll just apply that evolved heuristic without thinking about it, then that makes sense. But then you're being descriptive. I assumed you were being prescriptive, though that's based on my priors rather than on what you said.

That's... an odd way of thinking about morality.

I value other human beings, because I value the processes that go on inside my own head, and can recognize the same processes at work in others, thanks to my in-built empathy and theory of the mind. As such, I prefer that good things happen to them rather than bad. There isn't any universal 'shouldness' to it, it's just the way that I'd rather things be. And, since most other humans have similar values, we can work together, arm in arm. Our values converge rather than diverge. That's morality.

I extend that ... (read more)

10

The virtual AI within its virtual world

10

10

10

The virtual AI within its virtual world

10

10