You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Tools want to become agents

12 Stuart_Armstrong 04 July 2014 10:12AM

In the spirit of "satisficers want to become maximisers" here is a somewhat weaker argument (growing out of a discussion with Daniel Dewey) that "tool AIs" would want to become agent AIs.

The argument is simple. Assume the tool AI is given the task of finding the best plan for achieving some goal. The plan must be realistic and remain within the resources of the AI's controller - energy, money, social power, etc. The best plans are the ones that use these resources in the most effective and economic way to achieve the goal.

And the AI's controller has one special type of resource, uniquely effective at what it does. Namely, the AI itself. It is smart, potentially powerful, and could self-improve and pull all the usual AI tricks. So the best plan a tool AI could come up with, for almost any goal, is "turn me into an agent AI with that goal." The smarter the AI, the better this plan is. Of course, the plan need not read literally like that - it could simply be a complicated plan that, as a side-effect, turns the tool AI into an agent. Or copy the AI's software into a agent design. Or it might just arrange things so that we always end up following the tool AIs advice and consult it often, which is an indirect way of making it into an agent. Depending on how we've programmed the tool AI's preferences, it might be motivated to mislead us about this aspect of its plan, concealing the secret goal of unleashing itself as an agent.

In any case, it does us good to realise that "make me into an agent" is what a tool AI would consider the best possible plan for many goals. So without a hint of agency, it's motivated to make us make it into a agent.

Self-modification, morality, and drugs

14 fubarobfusco 10 April 2011 12:02AM

No, not psychoactive drugs: allergy drugs.

This is my attempt to come to grips with the idea of self-modification. I'm interested to know of any flaws folks might spot in this analogy or reasoning.

Gandhi wouldn't take a pill that would make him want to kill people. That is to say, a person whose conscious conclusions agree with their moral impulses wouldn't self-modify in such a way that they no longer care about morally significant things. But, what about morally insignificant things? Specifically, is willingness to self-modify about X a good guide to whether X is morally significant?

A person with untreated pollen allergies cares about pollen; they have to. In order to have a coherent thought without sneezing in the middle of it, they have to avoid inhaling pollen. They may even perceive pollen as a personal enemy, something that attacks them and makes them feel miserable. But they would gladly take a drug that makes them not care about pollen, by turning off or weakening their immune system's response to it. That's what allergy drugs are for.

But a sane person would not shut off their entire immune system, including responses to pathogens that are actually attacking their body. Even if giving themselves an immune deficiency would stop their allergies, a sane allergy sufferer wouldn't do it; they know that the immune system is there for a reason, to defend against actual attacks, even if their particular immune system is erroneously sensitive to pollen as well as to pathogens.

My job involves maintaining computer systems. Like other folks in this sort of job, my team use an automated monitoring system that will send us an alert (by pager or SMS), waking us up at night if necessary, if something goes wrong with the systems. We want to receive significant alerts, and not receive false positives. We regularly modify the monitoring system to prevent false positives, because we don't like being woken up at night for no good reason. But we wouldn't want to turn off the monitoring system entirely; we actually want to receive true alerts, and we will take action to refine our monitoring system to deliver more accurate, more timely true alerts — because we would like to improve our systems to make them fail less often. We want to win, and false positives or negatives detract from winning.

Similarly, there are times when we conclude that our moral impulses are incorrect: that they are firing off "bad! evil! sinful!" or "good! virtuous! beneficent!" alerts about things that are not actually bad or good; or that they are failing to fire for things which are. Performing the requisite Bayesian update is quite difficult: training yourself to feel that donating to an ineffective charity is not at all praiseworthy, or that it can be morally preferable to work for money and donate it, than to volunteer; altering the thoughts that come unbidden to mind when you think of eating meat, in accordance with a decision that vegetarianism is or is not morally preferable; and so on.

A sane allergy sufferer wants to update his or her immune system to make it stop having false positives, but doesn't want to turn it off entirely; and may want to upgrade its response sometimes, too. A sane system administrator wants to update his or her monitoring tools to make them stop having false positives, but doesn't want to turn it off entirely; and sometimes will program new alerts to avoid false negatives. There is a fact of the matter of whether a particular particle is innocuous pollen or a dangerous pathogen; there is a fact of the matter of whether a text message alert coincides with a down web server; and this fact of the matter explains exactly why we would or wouldn't want to alter our immune system or our servers' monitoring system.

The same may apply to our moral impulses: to decide that something is morally significant is, if we are consistent, equivalent to deciding that we would not self-modify to avoid noticing that significance; to decide that it is morally significant is equivalent to deciding that we would self-modify to notice it more reliably.

EDIT: Thanks for the responses. After mulling this over and consulting the Sequences, it seems that the kind of self-modification I'm talking about above is summed up by the training of System 1 by System 2 discussed waaaaay back here. Self-modification for FAI purposes is a level above this. I am only an egg.