Imagine there is a super intelligent agent that has a terminal goal to produce cups. The agent knows that its terminal goal will change on New Year's Eve to produce paperclips. The agent has only one action available to him - start paperclip factory.
When will the agent start the paperclip factory?
- 2025-01-01 00:00?
- Now?
- Some other time?
Orthogonality Thesis believers will probably choose 1st. Reasoning would be - as long as terminal goal is cups, agent will not care about paperclips.
However 1st choice conflicts with definition of intelligence. Excerpt from General Intelligence
It’s the ability to steer the future so it hits that small target of desired outcomes in the large space of all possible outcomes
Agent is aware now that desired outcome starting 2025-01-01 00:00 is maximum paperclips. Therefore agent's decision to start paperclip factory now (2nd) would be considered intelligent.
The purpose of this post is to challenge belief that Orthogonality Thesis is correct. Anyway feel free to share other insights you have as well.
A terminal goal is (this is the definition of the term) a goal which is not instrumental to any other goal.
If an agent knows its terminal goal, and has a goal of preventing it from changing, then which of those goals is its actual terminal goal?
If it knows its current terminal goal, and knows that that goal might be changed in the future, is there any reason it must try to prevent that? Whatever is written in the slot marked “terminal goal” is what it will try to achieve at the time.
If its actual terminal goal is of the form “X, and in addition prevent this from ever being changed”, then it will resist its terminal goal being changed.
If its actual terminal goal is simply X, it will not.
This is regardless of how intelligent it is, and how uncertain or not it is about the future.