Good point. I think that if you couple the answers of an oracle to reality by some random process, then you are probably fine.
However, many want to use the outputs of the oracle in very obvious ways. For instance, you ask it what code you should put into your robot, and then you just put the code into the robot.
Could we have an oracle (i.e. it was trained according to some Truth criterion) where when you use it very straightforwardly, it exerts optimization pressure on the world?
It is good to notice the spectrum above. Likely, for a fixed amount of compute/effort, one extreme of this spectrum gets much less agency than the other extreme. Call that the direct effect.
Are there other direct effects? for instance, do you get the same ability to "cure cancer" for a fixed amount of compute/effort across the spectrum? Seems like agency is useful so, probably the ability you get per unit compute is correlated with the agency across this spectrum.
If we are in a setting where an outside force demands you reach a given ability level, then this other indirect effect matters, because it means you will have to use a larger amount of compute.
[optional] To illustrate this problem, consider something that I don't think people think is safer: instead of using gradient descent, just sample the weights of the neural net at random until you get a low loss. (I am not trying to make an analogy here)
It would be great if someone had a way to compute the "net" effect on agency across the spectrum, also taking into account the indirect path of more compute needed -> more compute = more agency across the spectrum. I suspect it might depend on which ability you need to reach, and we might/might not be able to figure it out without experiments.