Because the AI is programmed by people who hadn't thought of this issue, and the other way turned out to be simpler/easier?
Ok, but if this is a narrow AI rather than an AGI agent used for that particular activity, then it seems intuitive to me that designing it to plan over a single task at time would be simpler.
I know. The problem is that inconsistency is unstable (which is why we're using other measures to maintain it, eg using a tool AI only). That's one of the reasons I was interested in stable versions of these kind of unstable motivations http://lesswrong.com/r/discussion/lw/lws/closest_stable_alternative_preferences/ .
The post you liked doesn't deal with dynamic inconsistency. It refers to agents that are expected utility maximizers under Von Neumann–Morgenstern utility theory, but this theory only deals with one-shot decision making, not decision making over time.
You can reduce the problem of decision making over time to one-shot decision making by combining instantaneous utilities into a cumulative utility function ( * ) and then using it as a one-shot utility function.
If you combine the instantaneous utilities by their (exponentially discounted) sum over an infinite time horizon, you obtain a dynamically consistent expected utility maximizer agent. But if you sum utilities up to a fixed time horizon, you still obtain an agent that at each instant is an expected utility maximizer, but it is not dynamically consistent.
You may argue that dynamical inconsistency is not stable under evolution by random mutations and natural selection, but it is not obvious to me that AIs would face such scenario. Even an AI that modifies itself or generate successors has no incentive to maximize its evolutionary fitness unless you specifically program it to do so.
Actually, you could use corrigibility to get dynamic inconsistency https://intelligence.org/2014/10/18/new-report-corrigibility/ .
The argument that AIs provided with a reward channel will observe their controllers and learn to manipulate them is a valid one. Unfortunately, it's often framed in a way that feels counterintuitive or extreme, especially to AI designers. It typically starts with the standard reinforcement learning scenario, then posits that the AI becomes superintelligent and either manipulates the controller with super-social powers, or breaks out and gains control of its reward channel, killing or threatening its controllers.
And that is a fair argument. But conceptually, it leaps from a standard reinforcement learning scenario, to a science-fiction-sounding scenario. It might help to have intermediate scenarios: to show that even lower intelligence AIs might start exhibiting the same sort of behaviour, long before it gets to superintelligence.
So consider the following scenario. Some complex, trainable AI is tasked with writing automated news stories for a student newspaper. It trawls the web and composes its stories, then gets reward and feedback from the editors. Assume there are two editors for this newspaper, and they work on alternate days. The two editors have somewhat different ideas as to what constitutes a good story, so their feedbacks are different. After a while, the AI finds that it gets higher reward by using a certain style on Monday, Wednesday and Friday, and another style on Tuesday and Thursdays - this is a simple consequence of its reward mechanism.
But the rota isn't perfect. Sometime the Monday editor will edit a story so late on Monday that it's Tuesday, and sometimes the Tuesday editor will be up early and edit a story at the same time. This pattern is more common during the holidays, when the editors go to different time zones, often travel, and become more irregular in their editing.
So if the AI complex and skilled enough, then, simply through simple feedback, it will start building up a picture of its editors. It will figure out when they are likely to stick to a schedule, and when they will be more irregular. It will figure out the difference between holidays and non-holidays. Given time, it may be able to track the editors moods and it will certainly pick up on any major change in their lives - such as romantic relationships and breakups, which will radically change whether and how it should present stories with a romantic focus.
It will also likely learn the correlation between stories and feedbacks - maybe presenting a story define roughly as "positive" will increase subsequent reward for the rest of the day, on all stories. Or maybe this will only work on a certain editor, or only early in the term. Or only before lunch.
Thus the simple trainable AI with a particular focus - write automated news stories - will be trained, through feedback, to learn about its editors/controllers, to distinguish them, to get to know them, and, in effect, to manipulate them.
This may be a useful "bridging example" between standard RL agents and the superintelligent machines.