Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Siren worlds and the perils of over-optimised search

27 Stuart_Armstrong 07 April 2014 11:00AM

tl;dr An unconstrained search through possible future worlds is a dangerous way of choosing positive outcomes. Constrained, imperfect or under-optimised searches work better.

Some suggested methods for designing AI goals, or controlling AIs, involve unconstrained searches through possible future worlds. This post argues that this is a very dangerous thing to do, because of the risk of being tricked by "siren worlds" or "marketing worlds". The thought experiment starts with an AI designing a siren world to fool us, but that AI is not crucial to the argument: it's simply an intuition pump to show that siren worlds can exist. Once they exist, there is a non-zero chance of us being seduced by them during a unconstrained search, whatever the search criteria are. This is a feature of optimisation: satisficing and similar approaches don't have the same problems.

 

The AI builds the siren worlds

Imagine that you have a superintelligent AI that's not just badly programmed, or lethally indifferent, but actually evil. Of course, it has successfully concealed this fact, as "don't let humans think I'm evil" is a convergent instrumental goal for all AIs.

We've successfully constrained this evil AI in a Oracle-like fashion. We ask the AI to design future worlds and present them to human inspection, along with an implementation pathway to create those worlds. Then if we approve of those future worlds, the implementation pathway will cause them to exist (assume perfect deterministic implementation for the moment). The constraints we've programmed means that the AI will do all these steps honestly. Its opportunity to do evil is limited exclusively to its choice of worlds to present to us.

The AI will attempt to design a siren world: a world that seems irresistibly attractive while concealing hideous negative features. If the human mind is hackable in the crude sense - maybe through a series of coloured flashes - then the AI would design the siren world to be subtly full of these hacks. It might be that there is some standard of "irresistibly attractive" that is actually irresistibly attractive: the siren world would be full of genuine sirens.

Even without those types of approaches, there's so much manipulation the AI could indulge in. I could imagine myself (and many people on Less Wrong) falling for the following approach:

continue reading »

The genie knows, but doesn't care

55 RobbBB 06 September 2013 06:42AM

Followup to: The Hidden Complexity of Wishes, Ghosts in the Machine, Truly Part of You

Summary: If an artificial intelligence is smart enough to be dangerous, we'd intuitively expect it to be smart enough to know how to make itself safe. But that doesn't mean all smart AIs are safe. To turn that capacity into actual safety, we have to program the AI at the outset — before it becomes too fast, powerful, or complicated to reliably control — to already care about making its future self care about safety. That means we have to understand how to code safety. We can't pass the entire buck to the AI, when only an AI we've already safety-proofed will be safe to ask for help on safety issues! Given the five theses, this is an urgent problem if we're likely to figure out how to make a decent artificial programmer before we figure out how to make an excellent artificial ethicist.


 

I summon a superintelligence, calling out: 'I wish for my values to be fulfilled!'

The results fall short of pleasant.

Gnashing my teeth in a heap of ashes, I wail:

Is the AI too stupid to understand what I meant? Then it is no superintelligence at all!

Is it too weak to reliably fulfill my desires? Then, surely, it is no superintelligence!

Does it hate me? Then it was deliberately crafted to hate me, for chaos predicts indifference. But, ah! no wicked god did intervene!

Thus disproved, my hypothetical implodes in a puff of logic. The world is saved. You're welcome.

On this line of reasoning, Friendly Artificial Intelligence is not difficult. It's inevitable, provided only that we tell the AI, 'Be Friendly.' If the AI doesn't understand 'Be Friendly.', then it's too dumb to harm us. And if it does understand 'Be Friendly.', then designing it to follow such instructions is childishly easy.

The end!

 

...

 

Is the missing option obvious?

 

...

 

What if the AI isn't sadistic, or weak, or stupid, but just doesn't care what you Really Meant by 'I wish for my values to be fulfilled'?

When we see a Be Careful What You Wish For genie in fiction, it's natural to assume that it's a malevolent trickster or an incompetent bumbler. But a real Wish Machine wouldn't be a human in shiny pants. If it paid heed to our verbal commands at all, it would do so in whatever way best fit its own values. Not necessarily the way that best fits ours.

continue reading »

Purposefulness on Mars

10 JamesAndrix 08 August 2010 09:23AM

Three different Martians built the Three Sacred Stone Walls of Mars according to the Three Virtues of walls:Height, Strength, and Beauty.

An evil Martian named Ution was the first and stupidest of all wallbuilders. He was too stupid to truly understand even the most basic virtue of height, and too evil to care for any other virtue. None the less, something about tall walls caused Evil Ution to build more tall walls, sometimes one on top of the other.

At times his walls would fall as he was building them, he did not understand why, nor did he care. He simply copied the high walls he had already built, whichever were still standing. His wall did achieve some strength and beauty. Most consisted of thousands of similar archways stacked on top of each other. Thousands upon thousands of intricately interlocking stones. Each arch a distantly removed copy of some prototypical archway that was strong and light enough to support itself many times over.

To this day his walls are the highest in all of Mars.

continue reading »

Beyond Optimization by Proxy

11 Alexandros 27 May 2010 01:16PM

Followup to: Is Google Paperclipping the Web? The Perils of Optimization by Proxy in Social Systems

tl;dr: In this installment, we look at methods of avoiding the problems related to optimization by proxy. Many potential solutions cluster around two broad categories: Better Measures, and Human Discretion. Distribution of decisions to the local level is a solution that seems more promising and is examined in more depth.

In the previous article I had promised that if there was a good reception, I would post a follow-up article to discuss ways of getting around the problem. That article made it to the front page, so here are my thoughts on how to circumvent Optimization by Proxy (OBP). Given that the previous article was belabored over at least a year and a half, this one will be decidedly less solid, more like a structured brainstorm in which you are invited to participate.

In the comments of the previous article I was pointed to The Importance of Goodhart's Law, a great article, which includes a section on mitigation. Examining those solutions in the context of OBP seems like a good skeleton to build on.

The first solution class is 'Hansonian Cynicism'. In combination with awareness of the pattern, pointing out that various processes (such as organizations) are not actually optimizing around their stated goal, but some proxy, creates cognitive dissonance for the thinking person. This sounds more like a motivation to find a solution than a solution itself. At best, knowing what goes wrong, you can use the process in a way that is informed by its weaknesses. Handling with care may mitigate some symptoms, but it doesn't make the problems go away.

The second solution class mentioned is 'Better Measures'. That is indeed what is usually attempted. The 'purist' approach to this is to work hard on finding a computable definition of the target quality. I cannot exclude the possibility of cases where this is feasible no immediate examples come to mind. The proxies that I have in mind are deeply human (quality, relevance, long-term growth) and boil down to figuring out what is 'good', thus, computing them is no small matter. Coherent Extrapolated Volition is the extreme end of this approach, boiling a few oceans in the process, certainly not immediately applicable.

A pragmatic approach to Better Measures is to simply monitor better, making the proxy more complex and therefore harder to manipulate. Discussion with Chronos in the comments of the original article was along those lines. By integrating user activity trails, Google makes it harder to game the search engine. I would imagine that if they integrated those logs with Google Analytics and Google Accounts, they would significantly raise the bar for gaming the system, at the expense of user privacy. Of course by removing most amateur and white/gray-hat SEOs from the pool, and given the financial incentives that exist, they would make it significantly more lucrative to game the system, and therefore the serious black hat SEOs that can resort to botnets, phishing and networks of hacked sites would end up being the only games in town. But I digress. Enriching the proxy with more and more parameters is a pragmatic solution that should work in the short term as a part of the arms race against manipulators, but does not look like a general or permanent solution from where I'm standing.

continue reading »

Selective processes bring tag-alongs (but not always!)

30 AnnaSalamon 11 March 2009 08:17AM

by Anna Salamon and Steve Rayhawk (joint authorship)

Related to: Conjuring An Evolution To Serve You, Disguised Queries 

Let’s say you have a bucket full of “instances” (e.g., genes, hypotheses, students, foods), and you want to choose a good one.  You fish around in the bucket, draw out the first 10 instances you find, and pick the instance that scores highest on some selection criterion.

For example, perhaps your selection criterion is “number of polka dots”, and you reach into the bucket pictured below, and you draw out 10 instances.  What do you get?  Assuming some instances have more polka dots than others, you get hypotheses with an above average number of expected polka dots.  The point I want to dwell on, though -- which is obvious when you think about it, but which sheds significant light on everyday phenomena -- is that you don’t get instances that are just high in polka dots.  You get instances are also high in every trait that correlates with having the most polka dots.

For example, in the bucket above, selecting for instances that have many polka dots implies inadvertently selecting for instances that are red.  Selective processes bring tag-alongs, and the specific tag-alongs that you get (redness, in this case) depend on both the trait you’re selecting for, and the bucket from which you’re selecting.

Nearly all cases of useful selection (e.g., evolution, science) would be unable to produce the cool properties they produce (complex order in organisms, truth in theories) if they didn’t have particular, selection-friendly types of buckets, in addition to good selection criteria.  Zoom in carefully enough, and nearly all of the traits one gets by selection can be considered tag-alongs.  Conversely, if you are consciously selecting entities from buckets with a particular aim in view, you may want to consciously safeguard the “selection-friendliness” of the buckets you are using.

continue reading »

The Hidden Complexity of Wishes

60 Eliezer_Yudkowsky 24 November 2007 12:12AM

Followup toThe Tragedy of Group Selectionism, Fake Optimization Criteria, Terminal Values and Instrumental Values, Artificial Addition, Leaky Generalizations 

"I wish to live in the locations of my choice, in a physically healthy, uninjured, and apparently normal version of my current body containing my current mental state, a body which will heal from all injuries at a rate three sigmas faster than the average given the medical technology available to me, and which will be protected from any diseases, injuries or illnesses causing disability, pain, or degraded functionality or any sense, organ, or bodily function for more than ten days consecutively or fifteen days in any year..."
            -- The Open-Source Wish Project, Wish For Immortality 1.1

There are three kinds of genies:  Genies to whom you can safely say "I wish for you to do what I should wish for"; genies for which no wish is safe; and genies that aren't very powerful or intelligent.

continue reading »