Imagine there is a super intelligent agent that has a terminal goal to produce cups. The agent knows that its terminal goal will change on New Year's Eve to produce paperclips. The agent has only one action available to him - start paperclip factory.
When will the agent start the paperclip factory?
- 2025-01-01 00:00?
- Now?
- Some other time?
Orthogonality Thesis believers will probably choose 1st. Reasoning would be - as long as terminal goal is cups, agent will not care about paperclips.
However 1st choice conflicts with definition of intelligence. Excerpt from General Intelligence
It’s the ability to steer the future so it hits that small target of desired outcomes in the large space of all possible outcomes
Agent is aware now that desired outcome starting 2025-01-01 00:00 is maximum paperclips. Therefore agent's decision to start paperclip factory now (2nd) would be considered intelligent.
The purpose of this post is to challenge belief that Orthogonality Thesis is correct. Anyway feel free to share other insights you have as well.
"maximum rationality" is undermined by this time-discontinuous utility function. I don't think it meets VNM requirements to be called "rational".
If it's one agent that has a CONSISTENT preference for cups before Jan 1 and paperclips after jan 1, it could figure out the utility conversion of time-value of objects and just do the math. But that framing doesn't QUITE match your description - you kind of obscured the time component and what it even means to know that it will have a goal that it currently doesn't have.
I guess it could model itself as two agents - the cup-loving agent terminated at the end of the year, and the paperclip-loving agent is created. This would be a very reasonable view of identity, and would imply that it's going to sacrifice paperclip capabilities to make cups before it dies. I don't know how it would rationalize the change otherwise.