Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Siren worlds and the perils of over-optimised search

27 Stuart_Armstrong 07 April 2014 11:00AM

tl;dr An unconstrained search through possible future worlds is a dangerous way of choosing positive outcomes. Constrained, imperfect or under-optimised searches work better.

Some suggested methods for designing AI goals, or controlling AIs, involve unconstrained searches through possible future worlds. This post argues that this is a very dangerous thing to do, because of the risk of being tricked by "siren worlds" or "marketing worlds". The thought experiment starts with an AI designing a siren world to fool us, but that AI is not crucial to the argument: it's simply an intuition pump to show that siren worlds can exist. Once they exist, there is a non-zero chance of us being seduced by them during a unconstrained search, whatever the search criteria are. This is a feature of optimisation: satisficing and similar approaches don't have the same problems.

 

The AI builds the siren worlds

Imagine that you have a superintelligent AI that's not just badly programmed, or lethally indifferent, but actually evil. Of course, it has successfully concealed this fact, as "don't let humans think I'm evil" is a convergent instrumental goal for all AIs.

We've successfully constrained this evil AI in a Oracle-like fashion. We ask the AI to design future worlds and present them to human inspection, along with an implementation pathway to create those worlds. Then if we approve of those future worlds, the implementation pathway will cause them to exist (assume perfect deterministic implementation for the moment). The constraints we've programmed means that the AI will do all these steps honestly. Its opportunity to do evil is limited exclusively to its choice of worlds to present to us.

The AI will attempt to design a siren world: a world that seems irresistibly attractive while concealing hideous negative features. If the human mind is hackable in the crude sense - maybe through a series of coloured flashes - then the AI would design the siren world to be subtly full of these hacks. It might be that there is some standard of "irresistibly attractive" that is actually irresistibly attractive: the siren world would be full of genuine sirens.

Even without those types of approaches, there's so much manipulation the AI could indulge in. I could imagine myself (and many people on Less Wrong) falling for the following approach:

continue reading »

Necessary, But Not Sufficient

44 pjeby 23 March 2010 05:11PM

There seems to be something odd about how people reason in relation to themselves, compared to the way they examine problems in other domains.

In mechanical domains, we seem to have little problem with the idea that things can be "necessary, but not sufficient".  For example, if your car fails to start, you will likely know that several things are necessary for the car to start, but not sufficient for it to do so.  It has to have fuel, ignition, and compression, and oxygen...  each of which in turn has further necessary conditions, such as an operating fuel pump, electricity for the spark plugs, electricity for the starter, and so on.

And usually, we don't go around claiming that "fuel" is a magic bullet for fixing the problem of car-not-startia, or argue that if we increase the amount of electricity in the system, the car will necessarily run faster or better.

For some reason, however, we don't seem to apply this sort of necessary-but-not-sufficient thinking to systems above a certain level of complexity...  such as ourselves.

When I wrote my previous post about the akrasia hypothesis, I mentioned that there was something bothering me about the way people seemed to be reasoning about akrasia and other complex problems.  And recently, with taw's post about blood sugar and akrasia, I've realized that the specific thing bothering me is the absence of causal-chain reasoning there.

continue reading »