I might have misunderstood your question. Let me restate how I understood it: In the original post you said...
I would optimize myself to maximize my reward, not whatever current behavior triggers the reward.
I intended to give a counterexample: Here is humanity, and we're optimizing behaviors which once triggered the original rewarded action (replication) rather than the rewarded action itself.
We didn't end up "short circuiting" into directly fulfilling the reward, as you had described. We care about "current behavior triggers the reward" such as not hurting each other and so on - in other words, we did precisely what you said you wouldn't do -
(Also, sorry, I tried to ninja edit everything into a much more concise statement, so the parent comment is different than what you saw now. The conversaiton as a whole still makes sense though.)
We don't have the ability to directly fulfil the reward center. I think narcotics are the closest we've got now and lots of people try to mash that button to the detriment of everything else. I just think it's a kind of crude button and it doesn't work as well as the direct ability to fully understand and control your own brain.
Part 1 was previously posted and it seemed that people likd it, so I figured that I should post part 2 - http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-2.html