I'm not TheAncientGeek, but I'm also a proponent of tool / oracle AI, so maybe I can speak to that. The proposals I've seen basically break down into two categories:
(1) Assuming the problem of steadfast goals has been solved -- what MIRI refers to as highly reliable agents -- you build an agent which provides (partial) answers to questions while obeying fixed constraints. The easiest to analyze example would be "Give me a solution to problem X, in the process consuming no more than Y megajoules of energy, then halt." In this case the AI simply doesn't have the energy budget to figure out how to trick us into achieving evil goal Z.
(This energy-constrained agent is not the typical example given in arguments for tool AI. More often the constraints are whack-a-mole things like "don't make irreversible changes to your environment" or "don't try to increase your hardware capacity". IMHO this all too often clouds the issue, because it just generates a response of "what about situation Z", or "what if the AI does blah".)
(2) Build an agent inside of a box, and watch that box very carefully. E.g. this could be the situation in (1), but with an ampmeter attached to a circuit breaker to enforce the energy constraint (among many, many other devices used to secure and observe the AI). This approach sidesteps the issue of friendliness / alignment entirely. The AI may be unfriendly, but impotent.
As far as I can tell, the issue of "has a goal" vs "does not have a goal" does not enter into the above proposals at all. The only people I've seen making that distinction are arguing against tool AI but missing the point. Of course agents have goals, and of course oracle AI has goals -- just a more specific range of answer-the-question goals. The point is that oracle / tool AI has imposed constraints which limit their capability to do harm. They're safe not because of they provably work in our interests, but because they are too unempowered to do us harm.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Yes, I take a tool to be something that always waits, that defaults to doing nothing..
If everything has a goal, and goals are dangerous, everything is dangerous. Which is a false conclusion. So there must be a false assumption leading to it. Such as all systems having goals.
The kinds of danger MIRI is worrying about come from the way goals are achieved, eg from instrumental convergence, so MIRI shouldn't be worrying about goals absent adaptive strategies for achieving them, and in fact it hard to see what is gained from talking in those terms.
I also disagree with that false conclusion, but I would probably say that 'goals are dangerous' is the false premise. Goals are dangerous when, well, they actually are dangerous (to my life or yours,) and when they are attached to sufficient optimising power, as you get at in your last paragraph.
I think the line of argumentation Bostrom is taking here is that superintelligence by definition has a huge amount of optimisation power, so whether it is dangerous to us is reduced to whether its goals are dangerous to us.
(Happy New Year!)