Definitely makes some sense.
But I didn't understand what you meant in the paragraph starting with "What stops us from just saying..." What does stop us from just saying this, and how come some desires successfully result in action and others result in wishful thinking? Can you predict when wishful thinking would be more likely to occur?
On a similar note, if "the final goal of a plan is a belief", would you expect me to be indifferent between saving the world and taking a pill that caused me to believe that the world was saved, or is that confusing levels?
The algorithm you use to build your plan won't let you believe a step in the plan is successful until you can satisfy its preconditions. The problem is that "satisfy its preconditions" can be done in a one-sided, non-Bayesian manner, which doesn't work as well for inference as for action.
Re. the pill - that's a good question. To avoid taking the pill, you'd need to have a representation that distinguishes between causing X and causing believes(X), from the viewpoint of an outside observer. What I said in the post needs to be revised or clarifi...
Wishful thinking - believing things that make you happy - may be a result of adapting an old cognitive mechanism to new content.
Obvious, well-known stuff
The world is a complicated place. When we first arrive, we don't understand it at all; we can't even recognize objects or move our arms and legs reliably. Gradually, we make sense of it by building categories of perceptions and objects and events and feelings that resemble each other. Then, instead of processing every detail of a new situation, we just have to decide which category it's closest to, and what we do with things in that category. Most, possibly all, categories can be built using unsupervised learning, just by noting statistical regularities and clustering.
If we want to be more than finite-state automata, we also need to learn how to notice which things and events might be useful or dangerous, and make predictions, and form plans. There are logic-based ways of doing this, and there are also statistical methods. There's good evidence that the human dopaminergic system uses one of these statistical methods, temporal difference learning (TD). TD is a backchaining method: First it learns what state or action Gn-1 usually comes just before reaching a goal Gn, and then what Gn-2 usually comes just before Gn-1, etc. Many other learning methods use backchaining, including backpropagation, bucket brigade, and spreading activation. These learning methods need a label or signal, during or after some series of events, saying whether the result was good or bad.
I don't know why we have consciousness, and I don't know what determines which kinds of learning require conscious attention. For those that do, the signals produce some variety of pleasure or pain. We learn to pay attention to things associated with pleasure or pain, and for planning, we may use TD to build something analogous to a Markov process (sorry, I found no good link; and Wikipedia's entry on Markov chain is not what you want) where, given a sequence of the previous n states or actions (A1, A2, ... An), the probability of taking action A is proportional to the expected (pleasure - pain) for the sequence (A1, ... An, A). In short, we learn to do things that make us feel better.
Less-obvious stuff
Here's a key point which is overlooked (or specifically denied) by most AI architectures: Believing is an action. Building an inference chain is not just like constructing a plan; it's the same thing, probably done by the same algorithm. Constructing a plan includes inferential steps, and inference often inserts action steps to make observations and reduce our uncertainty.
Actions, including the "believe" action, have preconditions. When building a plan, you need to find actions that achieve those preconditions. You don't need to look for things that defeat them. With actions, this isn't much of a problem, because actions are pretty reliable. If you put a rock in the fire, you don't need to weigh the evidence for and against the proposition that the rock is now in the fire. If you put a stick in a termite mound, it may or may not come out covered in termites. You don't need to compute the odds that the stick was inserted correctly, or the expected number of termites; you pull it out and look at the stick. If you can find things that cause it not to be covered in termites, such as being the wrong sort of stick, it's probably a simple enough cause that you can enumerate it in your preconditions for next time.
You don't need to consider all the ways that your actions could be thwarted until you start doing adversarial planning, which can't happen until you've already started incorporating belief actions into your planning. (A tiger needs to consider which ways a wildebeest might run to avoid it, but probably doesn't need to model the wildebeest's beliefs and use min-max - at least, not to any significant depth. Some mammals do some adversarial planning and modelling of belief states; I wouldn't be surprised if squirrels avoid burying their nuts when other squirrels are looking. But the domains and actors are simpler, so the process shouldn't break down as often as it does in humans.)
When we evolved the ability to make extensive use of belief actions, we probably took our existing plan-construction mechanism, and added belief actions. But an inference is a lot less certain than an action. You're allowed to insert a "believe" act into your plan if you're able to find just one thing, belief or action, that plausibly satisfies its preconditions. You're not required to spend any time looking for things that refute that belief. Your mind doesn't know that beliefs are fundamentally different from actions, in that the truth-values of the propositions describing the expected effects of your possible actions are strongly, causally correlated with whether you execute the action; while the truth-values of your possible belief-actions are not, and can be made true or false by many other factors.
You can string a long series of actions together into a plan. If an action fails, you'll usually notice, and you can stop and retry or replan. Similarly, you can string a long series of belief actions together, even if the probability of each one is only a little above .5, and your planning algorithm won't complain, because stringing a long series of actions together has worked pretty well in your evolutionary past. But you don't usually get immediate feedback after believing something that tells you whether believing "succeeded" (deposited something in your mind that successfully matches the real world); so it doesn't work.
The old way of backchaining, by just trying to satisfy preconditions, doesn't work well with our new mental content. But we haven't evolved anything better yet. If we had, chess would seem easy.
Summary
Wishful thinking is a state-space-reduction heuristic. Your ancestors' minds searched for actions that would enable actions that would make them feel good. Your mind, therefore, searches for beliefs that will enable beliefs that will make you feel good. It doesn't search for beliefs that will refute them.
(A forward-chaining planner wouldn't suffer this bias. It probably wouldn't get anything done, either, as its search space would be vast.)