mgnb — LessWrong

I also disagree with that false conclusion, but I would probably say that 'goals are dangerous' is the false premise. Goals are dangerous when, well, they actually are dangerous (to my life or yours,) and when they are attached to sufficient optimising power, as you get at in your last paragraph.

I think the line of argumentation Bostrom is taking here is that superintelligence by definition has a huge amount of optimisation power, so whether it is dangerous to us is reduced to whether its goals are dangerous to us.

(Happy New Year!)

Superintelligence 16: Tool AIs

mgnb11y50

Okay, that's fair enough.

In the context of Superintelligence, though, in Table 11 a Tool AI is defined thusly: 'Tool: A system not designed to exhibit goal-directed behaviour.' I am responding directly to that. But it sounds as though you would object to Bostrom's characterisation of tool proposals.

In Bostrom's parlance, I think your proposals for Tool AI would be described as (1) Oracle AI + stunting and (2) Oracle AI + boxing—the energy thing is interesting. I'm hopeful they would be safe, but I'm not convinced it would take much energy to pose an existential threat.

Superintelligence 16: Tool AIs

mgnb11y20

1) I must admit that I'm a little sad that this came across as tacit: that was in part the point I was trying to make! I don't feel totally comfortable with the distinction between tools and agents because I think it mostly, and physically, vanishes when you press the start button on the tool, which is much the same as booting the agent. In practice, I can see that something that always pauses and waits for the next input might be understood as not an agent, is that something you might agree with?

My understanding of [at least one variant of] the tool argument is more that a) software tools can be designed that do not exhibit goal-based behaviour, which b) would be good because the instrumental values argument for deadliness would no longer apply. But since anything can be described as having goals (they are just a model of behaviour) the task of evaluating the friendliness of those 'goals' would remain. Reducing this just to 'programs that always pause before the next input' or somesuch doesn't seem to match the tool arguments I've read. Note: I would be very pleased to have my understanding of this challenged.

Mostly, I am trying to pin down my own confusion about what it means for a physical phenomena to 'have a goal', firstly, because goal-directedness is so central to the argument that superintelligence is dangerous, and secondly, because the tool AI objection was the first that came to mind for me.

2) Hmm, this might be splitting hairs, but I think I would prefer to say that a nuclear bomb's 'goals' are limited to a relatively small subset of the world state, which is why it's much less dangerous than an AI at the existential level. The lack of adaptive planning of a nuclear bomb seems less relevant than its blast radius in evaluating the danger it poses!

EDIT: reading some of your other comments here, I can see that you have given a definition for an agent roughly matching what I said—sorry that missed that! I would still be interested in your response if you have one :)

Superintelligence 16: Tool AIs

mgnb11y50

All this seems to be more or less well explained under Optimization process and Really powerful optimization process, but I'll give my take on it, heavily borrowed from those and related readings.

I went around in circles on 'goals' until I decided to be rigorous in thinking naturalistically rather than anthropomorphically, or mentalistically, for want of a better term. It seems to me that a goal ought to correspond to a set of world states, and then, naturalistically, the 'goal' of a process might be a set of world states that the process tends to modify the world towards: a random walk would have no goal, or alternatively, its goal would be any possible world. My goals involve world states where my body is comfortable, I am happy, etc.

It depends on what Bostrom means by a search process, but, taking a stab, in this context it would not really be distinct from a goal provided it had an objective. In this framework, Google Maps can be described as having a goal, but it's pretty prosaic: manipulate the pixels on the user's screen in a way that represents the shortest route given the inputs. It's hugely 'indifferent' between world states that do not involve changes to those pixels.

I'm not too keen on the distinction between agents and tools made by Holden because, as he says, any process can be described as having a goal—a nuclear explosion can probably be described this way—but in this context a Tool AI would possibly be described as one that is similarly hugely 'indifferent' between world states in that it has no tendency to optimise towards them (I'm not that confident that others would be happy with that description).

([Almost] unrelated pedant note: I don't think utility functions are expressive enough to capture all potentially relevant behaviours and would suggest it's better to talk more generally of goals: it's more naturalistic, and makes fewer assumptions about consistency and rationality.)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments