You are on spot, though you provided more context than can be traced directly from the cited sentence. When i referred to the naive RL, I had in mind (PO)MDPs with unknown reward function. The reward of unseen state can be predicted only in the sense of Occam Razor-type induction.
A comment to http://singinst.org/blog/2010/10/27/presentation-by-joshua-foxcarl-shulman-at-ecap-2010-super-intelligence-does-not-imply-benevolence/: Given as in the naive reinforcement learning framework (and that can approximate some more complex notions of value) that the value is in the environment, you don't want to be too hasty with the environment lest you destroy a higher value you haven't yet discovered! So you especially wouldn't replace high complexity systems like humans with low entropy systems like computer chips, without first analyzing them.