New(ish) AI control ideas

24 Stuart_Armstrong 05 March 2015 05:03PM

EDIT: this post is no longer being maintained, it has been replaced by this new one.

 

I recently went on a two day intense solitary "AI control retreat", with the aim of generating new ideas for making safe AI. The "retreat" format wasn't really a success ("focused uninterrupted thought" was the main gain, not "two days of solitude" - it would have been more effective in three hour sessions), but I did manage to generate a lot of new ideas. These ideas will now go before the baying bloodthirsty audience (that's you, folks) to test them for viability.

A central thread running through could be: if you want something, you have to define it, then code it, rather than assuming you can get if for free through some other approach.

To provide inspiration and direction to my thought process, I first listed all the easy responses that we generally give to most proposals for AI control. If someone comes up with a new/old brilliant idea for AI control, it can normally be dismissed by appealing to one of these responses:

  1. The AI is much smarter than us.
  2. It’s not well defined.
  3. The setup can be hacked.
    • By the agent.
    • By outsiders, including other AI.
    • Adding restrictions encourages the AI to hack them, not obey them.
  4. The agent will resist changes.
  5. Humans can be manipulated, hacked, or seduced.
  6. The design is not stable.
    • Under self-modification.
    • Under subagent creation.
  7. Unrestricted search is dangerous.
  8. The agent has, or will develop, dangerous goals.
continue reading »

Harry Potter and the Methods of Rationality Podcast

38 Eneasz 13 April 2011 05:09PM

Have you ever thought “I’d love to read Harry Potter and the Methods of Rationality, but I just don’t have the spare time. I wish it was available in audio format.” Fret no more! I present to you the HPMoR Podcast! First chapter out now, and another one added every Wednesday.

http://itunes.apple.com/us/podcast/harry-potter-methods-rationality/id431784580


Being a teacher

51 Swimmer963 14 March 2011 08:03PM

A few weeks ago, while giving unofficial swimming lessons to an acquaintance about my age, I had an insight. It was that before you can teach something, you have to realize it’s hard.

I don’t think I noticed this before, because I thought it was obvious. Of course someone who doesn’t know how to swim isn’t going to learn perfect front crawl just by looking at yours. If I was told to watch someone else swimming a brand-new stroke that I’d never seen before, I could imitate it pretty easily, because to me it’s a trivial skill. But to someone who has nothing to refer to, it’s hard.

“You’re like the fifth person who’s tried to teach me how to swim,” my acquaintance said as I led her into the shallow end holding a foam noodle. “People just tell me to move my arms and legs, and they didn’t seem to understand why I couldn’t do it.”

There are, needless to say, a lot of different ways to move your arms and legs. Some of them resemble swimming. A subset of those actually work to keep someone’s head at the surface, and an even smaller subset of those are effective enough that they have names, like front crawl. To me, this is obvious, because I’ve watched hundreds of children in my classes flail and struggle in their front crawl, or lift their head to breathe, or turn their toes inwards in whip kick, and make the same mistakes persistently even when I corrected them, both verbally and by literally grabbing their arms/legs and moving them. I know it’s hard.

I went through this flailing/struggling phase too and have no memory of it whatsoever, having been three at the time.  This is probably true of most good swimmers; the procedural memory is so embedded that it makes sense to say “move your arms and legs” because that's all you think about consciously; you forget how many other things you’re doing just to stay afloat. (Poor swimmers might have a different perspective, but they aren’t likely to use that perspective to try to teach other people how to swim.)

continue reading »