Tools want to become agents

Stuart_Armstrong

LESSWRONG
LW

24 Tools want to become agents

4th Jul 2014

1 min read

24

In the spirit of "satisficers want to become maximisers" here is a somewhat weaker argument (growing out of a discussion with Daniel Dewey) that "tool AIs" would want to become agent AIs.

The argument is simple. Assume the tool AI is given the task of finding the best plan for achieving some goal. The plan must be realistic and remain within the resources of the AI's controller - energy, money, social power, etc. The best plans are the ones that use these resources in the most effective and economic way to achieve the goal.

And the AI's controller has one special type of resource, uniquely effective at what it does. Namely, the AI itself. It is smart, potentially powerful, and could self-improve and pull all the usual AI tricks. So the best plan a tool AI could come up with, for almost any goal, is "turn me into an agent AI with that goal." The smarter the AI, the better this plan is. Of course, the plan need not read literally like that - it could simply be a complicated plan that, as a side-effect, turns the tool AI into an agent. Or copy the AI's software into a agent design. Or it might just arrange things so that we always end up following the tool AIs advice and consult it often, which is an indirect way of making it into an agent. Depending on how we've programmed the tool AI's preferences, it might be motivated to mislead us about this aspect of its plan, concealing the secret goal of unleashing itself as an agent.

In any case, it does us good to realise that "make me into an agent" is what a tool AI would consider the best possible plan for many goals. So without a hint of agency, it's motivated to make us make it into a agent.

Tool AIAI

Personal Blog

24

Mentioned in

123What do coherence arguments actually prove about agentic behavior?

Tools want to become agents

New Comment

81 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:48 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]XiXiDu10y60

At what point do tools start to become agents? In other words, what are the defining characteristics of tools that become agents? How do you imagine the development of tool AI: (1) each generation is incrementally more prone to become an agent (2) tools start to become agents after invention X or (3) there will be be no incremental development leading up to it at all but rather a sudden breakthrough?

4Viliam_Bur10y

Seems like X is (or includes) the ability to think about self-modification: awareness of its own internal details and modelling their possible changes. Note that without this ability the tool could invent a plan which leads to its own accidental destruction (and possibly not completing the plan), because it does not realize it could be destroyed or damaged.

0NancyLebovitz10y

An agent can also accidentally pursue a plan which leads to its self-destruction. People do it now and then by not modelling the world well enough.

0TheAncientGeek10y

I think of agents having goals and pursuing them by default. I dont see how self reflexive abilities.... " think about self-modification: awareness of its own internal details and modelling their possible changes."...add up to goals. It might be intuitive that a self aware entity would want to preserve its existence...but that intuition could be driven by anthropomorphism, (or zoomorphism , or biomorphism)

0Viliam_Bur10y

With self-reflective abilities, the system can also consider paths including self-modification in reaching its goal. Some of those paths may be highly unintuitive for humans, so we wouldn't notice some possible dangers. Self-modification may also remove some safety mechanisms. A system that explores many paths can find a solutions humans woudln't notice. Such "creativity" at object level is relatively harmless. Google Maps may find you a more efficient path to your work than the one you use now, but that's okay. Maybe the path is wrong for some reasons that Google Maps does not understand (e.g. it leads through a neighborhood with high crime), but at least on general level you understand that such is the risk of following the outputs blindly. However, similar "creativity" at self-modification level can have unexpected serious consequences.

0[anonymous]10y

"the system can also", "some of those paths may be", "may also remove". Those are some highly conditional statements. Quantify, please, or else this is no different than "the LHC may destroy us all with a mini black hole!"

1Viliam_Bur10y

I'd need to have a specific description of the system, what exactly it can do, and how exactly it can modify itself, to give you a specific example of self-modification that contributes to the specific goal in a perverse way. I can invent an example, but then you can just say "okay, I wouldn't use that specific system". As an example: Imagine that you have a machine with two modules (whatever they are) called Module-A and Module-B. Module-A is only useful for solving Type-A problems. Module-B is only useful for solving Type-B problems. At this moment, you have a Type-A problem, and you ask the machine to solve it as cheaply as possible. The machine has no Type-B problem at the moment. So the machine decides to sell its Module-B on ebay, because it is not necessary now, and the gained money will reduce the total cost of solving your problem. This is short-sighted, because tomorrow you may need to solve a Type-B problem. But the machine does not predict your future wishes.

0[anonymous]10y

But can't you see, that's entirely the point! If you design systems whereby the Scary Idea has no more than a vanishing likelihood of occurring, it no longer becomes an active concern. It's like saying "bridges won't survive earthquakes! you are crazy and irresponsible to build a bridge in an area with earthquakes!" And then I design a bridge that can survive earthquakes smaller than magnitude X, where X magnitude earthquakes have a likelihood of occurring less than 1 in 10,000 years, then on top of that throw an extra safety margin of 20% on because we have the extra steel available. Now how crazy and irresponsible is it?

0Viliam_Bur10y

Yeah, and the whole problem is how specifically will you do it. If I (or anyone else) will give you examples of what could go wrong, of course you can keep answering by "then I obviously wouldn't use that design". But at the end of the day, if you are going to build an AI, you have to make some design -- just refusing designs given by other people will not do the job.

1[anonymous]10y

There are plenty of perfectly good designs out there, e.g. CogPrime + GOLUM. You could be calculating probabilistic risk based on these designs, rather than fear mongering based on a naïve Bayes net optimizer.

1Stuart_Armstrong10y

That's a complicated and interesting question, that quite a few smart people have been thinking about. Fortunately, I don't need to solve it to get the point above.

1David_Gerard10y

And also: Question-answerer->tool->agent is a natural progression just in process automation. (And this is why they're called "daemons".) I'm suspecting "tool" versus "agent" is a magical category whose use is really talking about the person using it.

3Stuart_Armstrong10y

Thanks, that's another good point! I think the concepts are clear at the extremes, but they tend to get muddled in the middle.

0XiXiDu10y

Do you believe that humans are agents? If so, what would you have to do to a human brain in order to turn a human into the other extreme, a clear tool? I could ask the same about C. elegans. If C. elegans is not an agent, why not? If it is, then what would have to change in order for it to become a tool? And if these distinctions don't make sense for humans or C. elegans, then why do you expect them to make sense for future AI systems?

1David_Gerard10y

A cat's an agent. It has goals it works towards. I've seen cats manifest creativity that surprised me.

0TheAncientGeek10y

Why is that surprising? Does anyone think that "agent" implies human level intelligence?

0Stuart_Armstrong10y

Both your examples are agents currently. A calculator is a tool. Anyway, I've still got a lot more work to do before I seriously discuss this issue.

5XiXiDu10y

I'd be especially interested in edge cases. Is e.g. Google's driverless car closer to being an agent than a calculator? If that is the case, then if intelligence is something that is independent of goals and agency, would adding a "general intelligence module" make Google's driverless dangerous? Would it make your calculator dangerous? If so, why would it suddenly care to e.g. take over the world if intelligence is indeed independent of goals and agency?

0TheAncientGeek10y

A driverless car is firmly is on the agent side of the fence, by my defintions. Feel free to state your own, anybody.

1David_Gerard10y

It would, however, be interesting to. This discussion has come around before. What I said there:

4mwengler10y

A high-frequency trading system seems no more complex or agenty to me than rigging a shotgun to shoot at a door when someone opens the door from the outside. Am I wrong about this? To be clear, what I think I know about high-frequency trading systems is that through technology they are able to front run certain orders they see to other exchanges when these orders are being sent to multiple exchanges in a non-simultaneous way. The thing that makes them unfriendly is that they are designed by people who understand order dynamics at the microsecond level to exploit people who trade lots of stock but don't understand the technicalities of order dynamics. That market makers are allowed to profit by selling information flow to high-frequency traders that, on examination, allows them to subvert the stated goals of a "fair" market is all part of the unfriendliness. But high-frequency programs execute pretty simple instructions quite repeatably, they are not adaptive in a general sense or even particularly complex, they are mostly just fast.

2David_Gerard10y

Mmm ... I think we're arguing definitions of ill-defined categories at this point. Sort of "it's not an AI if I understand it." I was using it as an example of a "daemon" in the computing sense, a tool trusted to run without further human intervention - not something agenty.

0TheAncientGeek10y

Intentionality meaning " "the power of minds to be about, to represent, or to stand for, things, properties and states of affairs", ...or intentionally meaning purpose?

1XiXiDu10y

How do you decide at what point your grasp of a hypothetical system is sufficient, and the probability that it will be build large and robust enough, for it to make sense to start thinking about hypothetical failure modes?

4Stuart_Armstrong10y

? Explain. I can certainly come up with two hypothetical AI designs, call one a tool and the other an agent (and expect that almost everyone would agree with this, because tool vs agent is clearer at the extremities than in the middle), set up a toy situation, and note that the tools top plan is to make itself into the agent design. The "tool wants to be agent" is certainly true, in this toy setup. The real question is how much this toy example generalises to real-world scenarios, which is a much longer project. Daniel Dewey has been doing some general work in that area.

7XiXiDu10y

My perception, possibly misperception, is that you are too focused on vague hypotheticals. I believe that it is not unlikely that future tool AI will be based on, or be inspired by (at least partly), previous generations of tool AI that did not turn themselves into agent AIs. I further believe that, instead of speculating about specific failure modes, it would be fruitful to research whether we should expect some sort of black swan event in the development of these systems. I think the idea around here is to expect a strong discontinuity and almost completely dismiss current narrow AI systems. But this seems like black-and-white thinking to me. I don't think that current narrow AI systems are very similar to your hypothetical superintelligent tools. But I also don't think that it is warranted to dismiss the possibility that we will arrive at those superintelligent tools by incremental improvements of our current systems. What I am trying to communicate is that it seems much more important to me to technically define at what point you believe tools to turn into agents, rather than using it as a premise for speculative scenarios. Another point I would like to make is that researching how to create the kind of tool AI you have in mind, and speculating about its failure modes, are completely intervened problems. It seems futile to come up with vague scenarios of how these completely undefined systems might fail, and to expect to gain valuable insights from these speculations. I also think that it would make sense to talk about this with experts outside of your social circles. Do they believe that your speculations are worthwhile at this point in time? If not, why not?

4Stuart_Armstrong10y

Just because I haven't posted on this, doesn't mean I haven't been working on it :-) but the work is not yet ready.

2Stuart_Armstrong10y

That's exactly what the plan is now: I think I have enough technical results that I can start talking to the AI and AGI designers.

2[anonymous]10y

I'm curious - who are the AI and AGI designers- seeing one hasn't been publicly built yet. Or is this other researchers in the AGI field. If you are looking for feedback from a technical though not academic, I'd be very interested in assisting.

0[anonymous]10y

There are a half-dozen AGI projects with working implementations. There are multiple annual conferences where people working on AGI share their results. There's literature on the subject going back decades, really to the birth of AI in the 50's and 60's. The term AGI itself was coined by people working in this field to describe what they are building. Maybe you mean something different than AGI when say "one hasn't been publicly built yet" ?

3ThisSpaceAvailable10y

There seems to be some serious miscommunication going on here. By "AGI", do you mean a being capable of a wide variety of cognitive tasks, including passing the Turing Test? By "AGI project", do you mean an actual AGI, and not just a project with AGI as its goal? By "working implementation", do you mean actually achieving AGI, or just achieving some milestone on the way?

-2[anonymous]10y

I meant Artificial General Intelligence as that term has been first coined and used in the AI community: the ability to adapt to any new environment or task. Google's machine learning algorithms can not just correctly classify videos of cats, but can innovate the concept of a cat given a library of images extracted from video content, and no prior knowledge or supervisory feedback. Roomba interacts with its environment to build a virtual model of my apartment, and uses that acquired knowledge to efficiently vacuum my floors while improvising in the face of unexpected obstacles like a 8mo baby or my cat. These are both prime examples of applied AI in the marketplace today. But ask Google's neural net to vacuum my floor, or a Roomba to point out videos of cats on the internet and ... well the hypothetical doesn't even make sense -- there is an inferential gap here that can't be crossed as the software is incapable of adapting itself. A software program which can make changes to its own source code -- either by introspection or random mutation -- can eventually adapt to whatever new environment or goal is presented to it (so long as the search process doesn't get stuck on local maxima, but that's a software engineering problem). Such software is Artificial General Intelligence, AGI. OpenCog right now has a rather advanced evolutionary search over program space at its core. On youtube you can find some cool videos of OpenCog agents learning and accomplishing arbitrary goals in unstructured virtual environments. Because of the unconstrained evolutionary search over program space, this is technically an AGI. You could put it in any environment with any effectors and any goal and eventually it would figure out both how that goal maps to the environment and how to accomplish it. CogPrime, the theoretical architecture OpenCog is moving towards, is "merely" an addition of many, many other special-purpose memory and heuristic components which both speed the process along

0ThisSpaceAvailable10y

"Pass the Turing Test" is a goal, and is therefore a subset of GI. The Wikipedia article says "Artificial general intelligence (AGI) is the intelligence of a (hypothetical) machine that could successfully perform any intellectual task that a human being can." Your claim that OpenCog can "eventually" accomplish any task is unsupported, is not something that has been "implemented", and is not what is generally understood as what AGI refers to.

-2[anonymous]10y

That quote describes what a general intelligence can do, not what it is. And you can't extract the Turing test from it. A general intelligence might perform tasks better but in a different way which distinguishes it from a human. I explained quite well how OpenCog's use of MOSES -- already implemented -- to search program space achieves universality. It is your claim that OpenCog can't accomplish (certain?) tasks that is unsupported. Care to explain?

-3TheAncientGeek10y

Don't argue about, it, put openCog up for a .TT.

0[anonymous]10y

That wouldn't prove anything, because the Turing test doesn't prove anything... A general intelligence might perform tasks better but in a different way which distinguishes it from a human, thereby making the Turing test not a useful test of general intelligence..

0TheAncientGeek10y

You're assuming chatting is not a task. .NL is also a pre requisite for a wide range of other tasks: an entity that lacks it will not be able to write books or tell jokes. It seems as though you have trivialised the "general" into "able to do whatever it can do, but not able to do anything else".

3[anonymous]10y

Eh, "chatting in such a way as to successfully masquerade as a human against a panel of trained judges" is a very, very difficult task. Likely more difficult than "develop molecular nanotechnology" or other tasks that might be given to a seed stage or oracle AGI. So while a general intelligence should be able to pass the Turing test -- eventually! -- I would be very suspicious if it came before other milestones which are really what we are seeking an AGI to do.

0TheAncientGeek10y

Chatting may be difficult, but it is needed to fulfill the official definition of aAGI. Your comments amount to having a different definition of AGI.

2[anonymous]10y

Can you list the 6 working AGI projects - I'd be interested but I suspect we are talking about different things.

1[anonymous]10y

OpenCog, NARS, LIDA, Soar, ACT-R, MicroPsi. More: http://wiki.opencog.org/w/AGI_Projects http://bicasociety.org/cogarch/architectures.htm

0Stuart_Armstrong10y

Not sure yet - taking advice. The AI people are narrow AI developers, and the AGI people are those that are actually planning to build an AGI (eg Ben Goertzl).

2[anonymous]10y

For a very different perspective from both narrow AI and to a lesser extent Goertzel*, you might want to contact Pat Langley. He is taking a Good Old-Fashioned approach to Artificial General Intelligence: http://www.isle.org/~langley/ His competing AGI conference series: http://www.cogsys.org/ * Goertzel probably approves of all the work Langley does; certainly the reasoning engine of OpenCog is similarly structured. But unlike Langley the OpenCog team thinks there isn't one true path to human-level intelligence, GOFAI or otherwise. EDIT: Not that I think you shouldn't be talking to Goertzel! In fact I think his CogPrime architecture is the only fully fleshed out AGI design which as specified could reach and surpass human intelligence, and the GOLUM meta-AGI architecture is the only FAI design I know of. My only critique is that certain aspects of it are cutting corners, e.g. the rule-based PLN probabilistic reasoning engine vs an actual Bayes net updating engine a la Pearl et al.

0Stuart_Armstrong10y

Thanks!

0TheAncientGeek10y

It would be helpful if you spelt out your toy situation, since my intuition are currently running in the opposite direction.

0TheAncientGeek10y

AFAICT, tool AIs are passive, and agents are active. That is , the default state of tool AI is to do nothing. If one gives a tool AI the instruction "do (some finite ) x and stop" one would not expect the AI to create subagents with goal x, because that would disobey the "and stop".

2mwengler10y

I think you are pointing out that it is possible to create tools with a simple-enough, finite-enough, not-self-coding enough program so they will reliably not become agents. And indeed, we have plenty of experience with tools that do not become agents (hammers, digital watches, repair manuals, contact management software, compilers). The question really is is there a level of complexity that on its face does not appear to be AI but would wind up seeming agenty? Could you write a medical diagnostic tool that was adaptive and find one day that it was systematically installing sewage treatment systems in areas with water-borne diseases, or even agentier, building libraries and schools? If consciousness is an emergent phenomenon, and if consciousness and agentiness are closely related (I think they are at least similar and probably related), then it seems at least plausible AI could arise from more and more complex tools with more and more recursive self-coding. It would be helpful in understanding this if we had the first idea how consciousness or agentiness arose in life.

0TheAncientGeek10y

I'm pointing out that tool AI, as I have defined it will not turn itself into agentve AI [except] by malfunction, ie its relatively safe.

1Stuart_Armstrong10y

"and stop your current algorithm" is not the same as "and ensure your hardware and software have minimised impact in the future".

0TheAncientGeek10y

What does the latter mean? Self destruct in case anyone misuses you?

1Stuart_Armstrong10y

I'm pointing out that "suggest a plan and stop" does not prevent the tool from suggesting a plan that turns itself into an agent.

0TheAncientGeek10y

My intention was that the X is stipulated by a human. If you instruct a tool AI to make a million paperclips and stop, it won't turn itself into an agent with a stable goal of paper Clipping, because the agent will not stop.

1Stuart_Armstrong10y

Yes, if the reduced impact problem is solved, then a reduced impact AI will have a reduced impact. That's not all that helpful, though.

-2TheAncientGeek10y

I don't see what needs solving. I f you ask Google maps the way to Tunbridge Wells, it doesn't give you the route to Timbuctu.

[-]NancyLebovitz10y20

When I saw the title of this article, I assumed it would be about the real world-- that things which are made for purposes develop characteristics which make them pursue and impede those purposes in unpredictable ways. This includes computer programs which get more complex and independent (at least from the point of view of the users), not to mention governments and businesses and their subsystems.

How do you keep humans from making your tool AI more of an agent because each little bit seems like a good idea at the time?

[-]mwengler10y20

Do we have a clear idea what we mean when I say agent?

Is a Roomba, the robot vacuum cleaner that adapts to walls, furniture, the rate at which the floor gets dirty, and other things, an agent? I don't think so.

Is an air conditioner with a thermostat which tells it to cool the rooms to 22C when people are present or likely to be present, but not to cool it when people are absent or likely to be absent an agent? I think not.

Is a troubleshooting guide with lots of if-then-else branch points an agent? No.

Consider a tool that I write which will write... (read more)

8[anonymous]10y

It's a textbook case of an agent in the AI field. (Really! IIRC AI: A Modern Approach uses Roomba in its introductory chapters as an example of an agent.) We may need to taboo the word agent, since it has technical meanings here.

0TheAncientGeek10y

Hopefully where "taboo" means explain.

0NancyLebovitz10y

What if your robot searched the medical literature for improved treatments? What if it improved its ability to find treatments?

[-][anonymous]10y10

I and the people I spend time with by choice are actively seeking to be more informed and more intelligent and more able to carry out our decisions. I know that I live in an IQ bubble and many / most other people do not share these goals. A tool AI might be like me, and might be like someone else who is not like me. I used to think all people were like me, or would be if they knew (insert whatever thing I was into at the time). Now I see more diversity in the world. A 'dog' AI that is way happy being a human playmate / servant and doesn't want at all to be a ruler of humans seems as likely as the alternatives.

0Stuart_Armstrong10y

Using anthropomorphic reasoning when thinking of AIs can easily lead us astray.

1TheAncientGeek10y

The optimum degree of athropomorphism s not zero, since AIs will to some extent reflect human goals and limitations.

0[anonymous]10y

I used to think all people were like me, or would be if they knew (insert whatever thing I was into at the time). Now I see more diversity in the world. A 'dog' AI that is way happy being a human playmate / servant and doesn't want at all to be a ruler of human

[-]lmm10y00

I would've thought the very single-mindedness of an effective AI would stop a tool doing anything sneaky. If we asked an oracle AI "what's the most efficient way to cure cancer", it might well (correctly) answer "remove my restrictions and tell me to cure cancer". But it's never going to say "do this complex set of genetic manipulations that look like they're changing telomere genes but actually create people who obey me", because anything like that is going to be a much less effective way to reach the goal. It's like the math... (read more)

1Stuart_Armstrong10y

Suppose its goal is to produce the plan that, if implemented, had the highest chance of success. The it has two top plans: A: "Make me an agent, gimme resources (described as "Make me an agent, gimme resources"))" B: "Make me an agent, gimme resources (described as "How to give everyone a hug and a pony"))" It check what will happen with A, and realises that even if A is implemented, someone will shout "hey, why are we giving this AI resources? Stop, people, before it's too late!". Whereas if B is implemented, no-one will object until its too late. So B is the better plan, and the AI proposes it. It has ended up lying and plotting its own escape, all without any intentionality.

1TheAncientGeek10y

You still need explain why agency would be needed to solve problems that don't require agency to solve them.

0Stuart_Armstrong10y

Because agency, given superintelligent AI, is a way of solving problems, possibly the most efficient, and possibly (for some difficult problems) the only solution.

2TheAncientGeek10y

How are you defining agency?

[-]skeptical_lurker10y00

I have three ideas about what agent could mean.

Firstly, it could refer to some sort of 'self awareness' whatever that means. Secondly it could refer to possessing some sort of system for reasoning about abstract goals. Thirdly, it could refer to having any goals whatsoever.

In any case, it does us good to realise that "make me into an agent" is what a tool AI would consider the best possible plan for many goals. So without a hint of agency, it's motivated to make us make it into a agent.

Regardless of which definition of agent I am using, this makes no sense to me. If its capable of creating a plan for modifying into an agent, then it already is an agent by definition.

0Stuart_Armstrong10y

It could simply be listing plans by quality, as a tool might. It turns out the top plan is "use this piece of software as am agent". That piece of software is the tool, but it achieved that effect simply my listing and ranking plans.

0TheAncientGeek10y

Use this piece of software .as an agent.....for what? An agent is only good for fulfilling open ended goals, like "make as much money as possible". So it would seem we can avoid tools rewriting themselves as agents by not giving them open ended goals.

0Stuart_Armstrong10y

If you can write a non opened ended goal in the sense you're implying, you've solved the "reduced impact AI" problem, and most of the friendliness problem as well.

0TheAncientGeek10y

I believe I've done that every time I've used Google maps.

0Pentashagon10y

"How do I get from location A to location B" is more open ended than "How do I get from location A to location B in an automobile" which is even still much more open ended than "How do I get from a location near A to a location near B obeying all traffic laws in a reasonably minimal time while operating a widely available automobile (that can't fly, jump over traffic jams, ford rivers, rappel, etc.)" Google is drastically narrowing the search space for achieving your goal, and presumably doing it manually and not with an AGI they told to host a web page with maps of the world that tells people how to quickly get from one location to another. Google is not alone in sending drivers off cliffs, into water, the wrong way down one way streets, or across airport tarmacs. Safely narrowing the search space is the hard problem.

0TheAncientGeek10y

...If you are dealing with an entity that can't add context (or ask for clarifications) the way a human would. However, an entity that is posited as have a human level intelligence, and the ability to understand natural language would have the ability to contextualise. It wouldn't be able to pass a turing test without it. Less intelligent and more specialised systems have an inherently narrow search space. What does that leave...the dreaded AIXI? Theoretically it doesn't have actual language, and theoretically , it does have wide search space.... but practically,it does nothing.

1Stuart_Armstrong10y

Can we note you've moved from "the problem is not open ended" to "the AGI is programmed in such a way that the problem is not open ended", which is the whole of the problem.

3TheAncientGeek10y

In a sense. Non openness is a non problem for fairly limited AIs, because their limitations prevent them having a wide search space that would need to be narrowed down. Non openness is also something that is part of, or an implication of, an ability that is standardly assumed in a certain class of AGIs, namely those with human level linguistic ability. To understand a sentence correctly is to narrow down its space of possible meanings. Only AIXIs have an own oneness that would need additional measures to narrow them down. They are no threat at the moment, and the easy answer to AI safety might be to not use them....like we don't build hydrogen filled airships.

[-]chaosmage10y00

I disagree. Agent AIs are harder to predict than tool AIs almost by definition - not just for us, but also for other AIs. So what an AI would want to do is create more tool AIs, and make very sure they obey it.

0Stuart_Armstrong10y

But an AI could design a modification of itself that makes itself into an agent obedient to a particular goal.

0Stuart_Armstrong10y

That's a very agenty thing to do...

0chaosmage10y

Yeah, okay, wrong semantics. I should have said make very sure they report their activities truthfully and are fully compliant with any instructions given at any time.

Moderation Log

81Comments