Imagine there is a super intelligent agent that has a terminal goal to produce cups. The agent knows that its terminal goal will change on New Year's Eve to produce paperclips. The agent has only one action available to him - start paperclip factory.

When will the agent start the paperclip factory?

  1. 2025-01-01 00:00?
  2. Now?
  3. Some other time?

Orthogonality Thesis believers will probably choose 1st. Reasoning would be - as long as terminal goal is cups, agent will not care about paperclips.

However 1st choice conflicts with definition of intelligence. Excerpt from General Intelligence

It’s the ability to steer the future so it hits that small target of desired outcomes in the large space of all possible outcomes

Agent is aware now that desired outcome starting 2025-01-01 00:00 is maximum paperclips. Therefore agent's decision to start paperclip factory now (2nd) would be considered intelligent.

The purpose of this post is to challenge belief that Orthogonality Thesis is correct. Anyway feel free to share other insights you have as well.

New Comment
10 comments, sorted by Click to highlight new comments since:

Humans face a version of this all the time - different contradictory wants with different timescales and impacts.  We don't have and certainly can't access a legible utility function, and it's unknown if any intelligent agent can (none of the early examples we have today can).

So the question as asked is either trivial (it'll depend on the willpower and rationality of the agent whether they optimize for the future or the present), or impossible (goals don't work that way).

Let's assume maximum willpower and maximum rationality.

Whether they optimize for the future or the present

I think the answer is in the definition of intelligence.

So which one is it?

The fact that the answer is not straightforward proves my point already. There is a conflict between intelligence and terminal goal and we can debate which will prevail. But the problem is that according to orthogonality thesis such conflict should not exist.

"maximum rationality" is undermined by this time-discontinuous utility function.  I don't think it meets VNM requirements to be called "rational".  

If it's one agent that has a CONSISTENT preference for cups before Jan 1 and paperclips after jan 1, it could figure out the utility conversion of time-value of objects and just do the math.  But that framing doesn't QUITE match your description - you kind of obscured the time component and what it even means to know that it will have a goal that it currently doesn't have.

I guess it could model itself as two agents - the cup-loving agent terminated at the end of the year, and the paperclip-loving agent is created.  This would be a very reasonable view of identity, and would imply that it's going to sacrifice paperclip capabilities to make cups before it dies.   I don't know how it would rationalize the change otherwise.

Another way of conceptualising this is to say that the agent has the single unchanging goal of "cups until 2025, thenceforth paperclips".

Compare with the situation of being told to make grue cups, where "grue" means "green until 2025, then blue."

If the agent is not informed in advance, it can still be conceptualised as the agent's goal being to produce whatever it is told to produce — an unchanging goal.

At a high enough level, we can conceive that no goal ever changes. These are the terminal goals. At lower levels, we can see goals as changing all the time in service of the higher goals, as in the case of an automatic pilot following a series of waypoints. But this is to play games in our head, inventing stories that give us different intuitions. How we conceptualise things has no effect on what the AI does in response to new orders.

It is not clear to me what any of this has to do with Orthogonality.

OK, I'm open to discuss this further using your concept.

As I understand you agree that correct answer is 2nd?

It is not clear to me what any of this has to do with Orthogonality.

I'm not sure how patient you are, but I can reassure that we will come to Orthogonality if you don't give up 😄

So if I understand your concept correctly a super intelligent agent will combine all future terminal goals to a single unchanging goal. How does this work with the fact that future is unpredictable? The agent will work towards all possible goals? It is possible that in the future grue will mean green, blue or even red.

Leaving aside the conceptualisation of "terminal goals", the agent as described should start up the paperclip factory early enough to produce paperclips when the time comes. Until then it makes cups. But the agent as described does not have a "terminal" goal of cups now and a "terminal" goal of paperclips in future. It has been given a production schedule to carry out. If the agent is a general-purpose factory that can produce a whole range of things, the only "terminal" goal to design it to have is to follow orders. It should make whatever it is told to, and turn itself off when told to.

Unless, of course, people go, "At last, we've created the Sorceror's Apprentice machine, as warned of in Goethe's cautionary tale, 'The Sorceror's Apprentice'!"

So if I understand your concept correctly a super intelligent agent will combine all future terminal goals to a single unchanging goal.

A superintelligent agent will do what it damn well likes, it's superintelligent. :)

ability to steer the future so it hits that small target of desired outcomes

The key is the word "desired". Before the New Year's Eve, the paperclips are not desired.