Broolucks — LessWrong

LESSWRONG
LW

Replying toThe genie knows, but doesn't care

I apologize for the late response, but here goes :)

I think you missed the point I was trying to make.

You and others seem to say that we often poorly evaluate the consequences of the utility functions that we implement. For instance, even though we have in mind utility X, the maximization of which would satisfy us, we may implement utility Y, with completely different, perhaps catastrophic implications. For instance:

X = Do what humans want
Y = Seize control of the reward button

What I was pointing out in my post is that this is only valid of perfect maximizers, which are impossible. In practice, the training procedure for an AI would morph the utility... (read 705 more words →)

Replying toThe genie knows, but doesn't care

Broolucks12y

The genie knows, but doesn't care

I have done AI. I know it is difficult. However, few existing algorithms, if at all, have the failure modes you describe. They fail early, and they fail hard. As far as neural nets go, they fall into a local minimum early on and never get out, often digging their own graves. Perhaps different algorithms would have the shortcomings you point out. But a lot of the algorithms that currently exist work the way I describe.

And obviously, if an AI was indeed stuck in a local minimum obvious to you of its own utility gradient, this condition would not last past it becoming smarter than you.

You may be right. However, this is... (read 360 more words →)

Replying toThe genie knows, but doesn't care

Broolucks12y

The genie knows, but doesn't care

It is something specific about that specific AI.

If an AI wishes to take over its reward button and just press it over and over again, it doesn't really have any "rivals", nor does it need to control any resources other than the button and scraps of itself. The original scenario was that the AI would wipe us out. It would have no reason to do so if we were not a threat.. And if we were a threat, first, there's no reason it would stop doing what we want once it seizes the button. Once it has the button, it has everything it wants -- why stir the pot?

Second, it would protect... (read more)

-1

Replying toThe genie knows, but doesn't care

Broolucks12y

The genie knows, but doesn't care

Then when it is more powerful it can directly prevent humans from typing this.

That depends if it gets stuck in a local minimum or not. The reason why a lot of humans reject dopamine drips is that they don't conceptualize their "reward button" properly. That misconception perpetuates itself: it penalizes the very idea of conceptualizing it differently. Granted, AIXI would not fall into local minima, but most realistic training methods would.

At first, the AI would converge towards: "my reward button corresponds to (is) doing what humans want", and that conceptualization would become the centerpiece, so to speak, of its reasoning ability: the locus through which everything is filtered. The thought of pressing... (read more)

Replying toThe genie knows, but doesn't care

Broolucks12y

The genie knows, but doesn't care

Why does the hard takeoff point have to be after the point at which an AI is as good as a typical human at understanding semantic subtlety? In order to do a hard takeoff, the AI needs to be good at a very different class of tasks than those required for understanding humans that well.

Semantic extraction -- not hard takeoff -- is the task that we want the AI to be able to do. An AI which is good at, say, rewriting its own code, is not the kind of thing we would be interested in at that point, and it seems like it would be inherently more difficult than implementing, say,... (read more)

Replying toThe genie knows, but doesn't care

Broolucks12y

The genie knows, but doesn't care

Ok, so let's say the AI can parse natural language, and we tell it, "Make humans happy." What happens? Well, it parses the instruction and decides to implement a Dopamine Drip setup.

That's not very realistic. If you trained AI to parse natural language, you would naturally reward it for interpreting instructions the way you want it to. If the AI interpreted something in a way that was technically correct, but not what you wanted, you would not reward it, you would punish it, and you would be doing that from the very beginning, well before the AI could even be considered intelligent. Even the thoroughly mediocre AI that currently exists tries to... (read 462 more words →)

Replying toThe genie knows, but doesn't care

Broolucks12y

The genie knows, but doesn't care

What counts as 'resources'? Do we think that 'hardware' and 'software' are natural kinds, such that the AI will always understand what we mean by the two? What if software innovations on their own suffice to threaten the world, without hardware takeover?

What is "taking over the world", if not taking control of resources (hardware)? Where is the motivation in doing it? Also consider, as others pointed out, that an AI which "misunderstands" your original instructions will demonstrate this earlier than later. For instance, if you create a resource "honeypot" outside the AI which is trivial to take, an AI would naturally take that first, and then you know there's a problem. It... (read 865 more words →)

Replying toThe genie knows, but doesn't care

Broolucks12y

The genie knows, but doesn't care

programmers build a seed AI (a not-yet-superintelligent AGI that will recursively self-modify to become superintelligent after many stages) that includes, among other things, a large block of code I'll call X.
The programmers think of this block of code as an algorithm that will make the seed AI and its descendents maximize human pleasure.

The problem, I reckon, is that X will never be anything like this.

It will likely be something much more mundane, i.e. modelling the world properly and predicting outcomes given various counterfactuals. You might be worried by it trying to expand its hardware resources in an unbounded fashion, but any AI doing this would try to shut itself down if its... (read more)

Replying toI attempted the AI Box Experiment again! (And won - Twice!)

Broolucks12y

I attempted the AI Box Experiment again! (And won - Twice!)

We were talking about extracting knowledge about a particular human from that human's text stream, though. It is already assumed that the AI knows about human psychology. I mean, assuming the AI can understand a natural language such as English, it obviously already has access to a large corpus of written works, so I'm not sure why it would bother foraging in source code, of all things. Besides, it is likely that seed AI would be grown organically using processes inspired from evolution or neural networks. If that is so, it wouldn't even contain any human-written code at all.

Replying toI attempted the AI Box Experiment again! (And won - Twice!)

Broolucks12y

I attempted the AI Box Experiment again! (And won - Twice!)

I'm unsure of how much an AI could gather from a single human's text input. I know that I at least miss a lot of information that goes past me that I could in theory pick up.

At most, the number of bits contained in the text input, which is really not much, minus the number of bits non-AGI algorithms could identify and destroy (like speech patterns). The AI would also have to identify and throw out any fake information inserted into the stream (without knowing whether the majority of the information is real or fake). The exploitable information is going to be scarce and noisy even for a perfect AI.

An AI using

... (read more)