Imagine there is a super intelligent agent that has a terminal goal to produce cups. The agent knows that its terminal goal will change on New Year's Eve to produce paperclips. The agent has only one action available to him - start paperclip factory.
When will the agent start the paperclip factory?
- 2025-01-01 00:00?
- Now?
- Some other time?
Orthogonality Thesis believers will probably choose 1st. Reasoning would be - as long as terminal goal is cups, agent will not care about paperclips.
However 1st choice conflicts with definition of intelligence. Excerpt from General Intelligence
It’s the ability to steer the future so it hits that small target of desired outcomes in the large space of all possible outcomes
Agent is aware now that desired outcome starting 2025-01-01 00:00 is maximum paperclips. Therefore agent's decision to start paperclip factory now (2nd) would be considered intelligent.
The purpose of this post is to challenge belief that Orthogonality Thesis is correct. Anyway feel free to share other insights you have as well.
Another way of conceptualising this is to say that the agent has the single unchanging goal of "cups until 2025, thenceforth paperclips".
Compare with the situation of being told to make grue cups, where "grue" means "green until 2025, then blue."
If the agent is not informed in advance, it can still be conceptualised as the agent's goal being to produce whatever it is told to produce — an unchanging goal.
At a high enough level, we can conceive that no goal ever changes. These are the terminal goals. At lower levels, we can see goals as changing all the time in service of the higher goals, as in the case of an automatic pilot following a series of waypoints. But this is to play games in our head, inventing stories that give us different intuitions. How we conceptualise things has no effect on what the AI does in response to new orders.
It is not clear to me what any of this has to do with Orthogonality.
No. Orthogonality is when agent follows any given goal, not when you give it. And as my thought experiment shows it is not intelligent to blindly follow given goal.