Given as in the naive reinforcement learning framework (and that can approximate some more complex notions of value) that the value is in the environment,
I'm confused about what this means.
It means that the agent maximizes the cumulative sum of a function of the environment states which is revealed to the agent only for states it visits.
A comment to http://singinst.org/blog/2010/10/27/presentation-by-joshua-foxcarl-shulman-at-ecap-2010-super-intelligence-does-not-imply-benevolence/: Given as in the naive reinforcement learning framework (and that can approximate some more complex notions of value) that the value is in the environment, you don't want to be too hasty with the environment lest you destroy a higher value you haven't yet discovered! So you especially wouldn't replace high complexity systems like humans with low entropy systems like computer chips, without first analyzing them.