Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

A toy model of the treacherous turn

13 Stuart_Armstrong 08 January 2016 12:58PM

Jaan Tallinn has suggested creating a toy model of the various common AI arguments, so that they can be analysed without loaded concepts like "autonomy", "consciousness", or "intentionality". Here a simple attempt for the "treacherous turn"; posted here for comments and suggestions.

Meet agent L. This agent is a reinforcement-based agent, rewarded/motivated by hearts (and some small time penalty each turn it doesn't get a heart):

continue reading »

Siren worlds and the perils of over-optimised search

27 Stuart_Armstrong 07 April 2014 11:00AM

tl;dr An unconstrained search through possible future worlds is a dangerous way of choosing positive outcomes. Constrained, imperfect or under-optimised searches work better.

Some suggested methods for designing AI goals, or controlling AIs, involve unconstrained searches through possible future worlds. This post argues that this is a very dangerous thing to do, because of the risk of being tricked by "siren worlds" or "marketing worlds". The thought experiment starts with an AI designing a siren world to fool us, but that AI is not crucial to the argument: it's simply an intuition pump to show that siren worlds can exist. Once they exist, there is a non-zero chance of us being seduced by them during a unconstrained search, whatever the search criteria are. This is a feature of optimisation: satisficing and similar approaches don't have the same problems.


The AI builds the siren worlds

Imagine that you have a superintelligent AI that's not just badly programmed, or lethally indifferent, but actually evil. Of course, it has successfully concealed this fact, as "don't let humans think I'm evil" is a convergent instrumental goal for all AIs.

We've successfully constrained this evil AI in a Oracle-like fashion. We ask the AI to design future worlds and present them to human inspection, along with an implementation pathway to create those worlds. Then if we approve of those future worlds, the implementation pathway will cause them to exist (assume perfect deterministic implementation for the moment). The constraints we've programmed means that the AI will do all these steps honestly. Its opportunity to do evil is limited exclusively to its choice of worlds to present to us.

The AI will attempt to design a siren world: a world that seems irresistibly attractive while concealing hideous negative features. If the human mind is hackable in the crude sense - maybe through a series of coloured flashes - then the AI would design the siren world to be subtly full of these hacks. It might be that there is some standard of "irresistibly attractive" that is actually irresistibly attractive: the siren world would be full of genuine sirens.

Even without those types of approaches, there's so much manipulation the AI could indulge in. I could imagine myself (and many people on Less Wrong) falling for the following approach:

continue reading »

The Dark Arts - Preamble

44 Aurini 11 October 2010 02:01PM

I’d like to tell you all a story.

Once upon a time I was working for a charity – a major charity – going door-to-door to raise money while pretending it wasn’t sales.

This story happened on my last day working there.  I didn’t know that at the time; I wouldn’t find out until the following morning when my boss called me up to fire me, but I knew it was coming.  For weeks I’d been fed up with the job, milking it for the last few dollars I could pull out, hating every minute of it but needing the money.  The Sudden Career Readjustment would come as a relief.

So on that day, my last day, I was moving slowly.  I knocked on one particular door and there was no response.  I had little desire to walk to the next one, however, and there was an interesting spider who’d built its web below the doorbell.  I tapped its belly with the tip of my pen, and it reacted with aggression – trying to envenom and ensnare the tip of my ballpoint.  I must have been playing with it for a good minute or so when the door suddenly opened.

A distraught woman stood before me.  After a brief period of Relating I launched into my pitch.

continue reading »

View more: Next