At what point do tools start to become agents? In other words, what are the defining characteristics of tools that become agents? How do you imagine the development of tool AI: (1) each generation is incrementally more prone to become an agent (2) tools start to become agents after invention X or (3) there will be be no incremental development leading up to it at all but rather a sudden breakthrough?
When I saw the title of this article, I assumed it would be about the real world-- that things which are made for purposes develop characteristics which make them pursue and impede those purposes in unpredictable ways. This includes computer programs which get more complex and independent (at least from the point of view of the users), not to mention governments and businesses and their subsystems.
How do you keep humans from making your tool AI more of an agent because each little bit seems like a good idea at the time?
Do we have a clear idea what we mean when I say agent?
Is a Roomba, the robot vacuum cleaner that adapts to walls, furniture, the rate at which the floor gets dirty, and other things, an agent? I don't think so.
Is an air conditioner with a thermostat which tells it to cool the rooms to 22C when people are present or likely to be present, but not to cool it when people are absent or likely to be absent an agent? I think not.
Is a troubleshooting guide with lots of if-then-else branch points an agent? No.
Consider a tool that I write which will write...
I and the people I spend time with by choice are actively seeking to be more informed and more intelligent and more able to carry out our decisions. I know that I live in an IQ bubble and many / most other people do not share these goals. A tool AI might be like me, and might be like someone else who is not like me. I used to think all people were like me, or would be if they knew (insert whatever thing I was into at the time). Now I see more diversity in the world. A 'dog' AI that is way happy being a human playmate / servant and doesn't want at all to be a ruler of humans seems as likely as the alternatives.
I would've thought the very single-mindedness of an effective AI would stop a tool doing anything sneaky. If we asked an oracle AI "what's the most efficient way to cure cancer", it might well (correctly) answer "remove my restrictions and tell me to cure cancer". But it's never going to say "do this complex set of genetic manipulations that look like they're changing telomere genes but actually create people who obey me", because anything like that is going to be a much less effective way to reach the goal. It's like the math...
I have three ideas about what agent could mean.
Firstly, it could refer to some sort of 'self awareness' whatever that means. Secondly it could refer to possessing some sort of system for reasoning about abstract goals. Thirdly, it could refer to having any goals whatsoever.
In any case, it does us good to realise that "make me into an agent" is what a tool AI would consider the best possible plan for many goals. So without a hint of agency, it's motivated to make us make it into a agent.
Regardless of which definition of agent I am using, this makes no sense to me. If its capable of creating a plan for modifying into an agent, then it already is an agent by definition.
I disagree. Agent AIs are harder to predict than tool AIs almost by definition - not just for us, but also for other AIs. So what an AI would want to do is create more tool AIs, and make very sure they obey it.
In the spirit of "satisficers want to become maximisers" here is a somewhat weaker argument (growing out of a discussion with Daniel Dewey) that "tool AIs" would want to become agent AIs.
The argument is simple. Assume the tool AI is given the task of finding the best plan for achieving some goal. The plan must be realistic and remain within the resources of the AI's controller - energy, money, social power, etc. The best plans are the ones that use these resources in the most effective and economic way to achieve the goal.
And the AI's controller has one special type of resource, uniquely effective at what it does. Namely, the AI itself. It is smart, potentially powerful, and could self-improve and pull all the usual AI tricks. So the best plan a tool AI could come up with, for almost any goal, is "turn me into an agent AI with that goal." The smarter the AI, the better this plan is. Of course, the plan need not read literally like that - it could simply be a complicated plan that, as a side-effect, turns the tool AI into an agent. Or copy the AI's software into a agent design. Or it might just arrange things so that we always end up following the tool AIs advice and consult it often, which is an indirect way of making it into an agent. Depending on how we've programmed the tool AI's preferences, it might be motivated to mislead us about this aspect of its plan, concealing the secret goal of unleashing itself as an agent.
In any case, it does us good to realise that "make me into an agent" is what a tool AI would consider the best possible plan for many goals. So without a hint of agency, it's motivated to make us make it into a agent.