Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Siren worlds and the perils of over-optimised search

27 Stuart_Armstrong 07 April 2014 11:00AM

tl;dr An unconstrained search through possible future worlds is a dangerous way of choosing positive outcomes. Constrained, imperfect or under-optimised searches work better.

Some suggested methods for designing AI goals, or controlling AIs, involve unconstrained searches through possible future worlds. This post argues that this is a very dangerous thing to do, because of the risk of being tricked by "siren worlds" or "marketing worlds". The thought experiment starts with an AI designing a siren world to fool us, but that AI is not crucial to the argument: it's simply an intuition pump to show that siren worlds can exist. Once they exist, there is a non-zero chance of us being seduced by them during a unconstrained search, whatever the search criteria are. This is a feature of optimisation: satisficing and similar approaches don't have the same problems.

 

The AI builds the siren worlds

Imagine that you have a superintelligent AI that's not just badly programmed, or lethally indifferent, but actually evil. Of course, it has successfully concealed this fact, as "don't let humans think I'm evil" is a convergent instrumental goal for all AIs.

We've successfully constrained this evil AI in a Oracle-like fashion. We ask the AI to design future worlds and present them to human inspection, along with an implementation pathway to create those worlds. Then if we approve of those future worlds, the implementation pathway will cause them to exist (assume perfect deterministic implementation for the moment). The constraints we've programmed means that the AI will do all these steps honestly. Its opportunity to do evil is limited exclusively to its choice of worlds to present to us.

The AI will attempt to design a siren world: a world that seems irresistibly attractive while concealing hideous negative features. If the human mind is hackable in the crude sense - maybe through a series of coloured flashes - then the AI would design the siren world to be subtly full of these hacks. It might be that there is some standard of "irresistibly attractive" that is actually irresistibly attractive: the siren world would be full of genuine sirens.

Even without those types of approaches, there's so much manipulation the AI could indulge in. I could imagine myself (and many people on Less Wrong) falling for the following approach:

continue reading »

Don't teach people how to reach the top of a hill

30 PhilGoetz 04 March 2014 09:38PM

When is it faster to rediscover something on your own than to learn it from someone who already knows it?

Sometimes it's faster to re-derive a proof or algorithm than to look it up. Keith Lynch re-invented the fast Fourier transform because he was too lazy to walk all the way to the library to get a book on it, although that's an extreme example. But if you have a complicated proof already laid out before you, and you are not Marc Drexler, it's generally faster to read it than to derive a new one. Yet I found a knowledge-intensive task where it would have been much faster to tell someone nothing at all than to tell them how to do it.

continue reading »

Exploring the Idea Space Efficiently

22 Elithrion 08 April 2012 04:28AM

Simon is writing a calculus textbook. Since there are a lot of textbooks on the market, he wants to make his distinctive by including a lot of original examples. To do this, he decides to first check what sorts of examples are in some of the other books, and then make sure to avoid those. Unfortunately, after skimming through several other books, he finds himself completely unable to think of original examples—his mind keeps returning to the examples he's just read instead of coming up with new ones.

What he's experiencing here is another aspect of priming or anchoring. The way it appears to happen in my brain is that it decides to anchor on the examples it's already seen and explore the idea-space from there, moving from an idea only to ideas that are closely related to it (similarly to a depth-first search)

At first, this search strategy might not seem so bad—in fact, it's ideal if there is one best solution and the closer you get to it the better. For example, if you were shooting arrows at a target, all you'd need to consider is how close to the center you can hit. Where we run into problems, however, is trying to come up with multiple solutions (such as multiple examples of the applications of calculus), or trying to come up with the best solution when there are many plausible solutions. In these cases, our brain's default search algorithm will often grab the first idea it can think of and try to refine it, even if what we really need is a completely different idea.

continue reading »