You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

An Oracle standard trick

4 Stuart_Armstrong 03 June 2015 02:17PM

A putative new idea for AI control; index here.

EDIT: To remind everyone, this method does not entail the Oracle having false beliefs, just behaving as if it did; see here and here.

An idea I thought I'd been mentioning to everyone, but a recent conversation reveals I haven't been assiduous about it.

It's quite simple: whenever designing an Oracle, you should, as a default, run it's output channel through a probabilistic process akin to the false thermodynamic miracle, in order to make the Oracle act as if it believed its message will never be read.

This reduces the possibility of the Oracle manipulating us through message content, because it's action as if that content will never be seen by anyone.

Now, some Oracle designs can't use that (eg if accuracy is defined in terms of the reaction of people that read its output). But in general, if your design allows such a precaution, there's no reason not to put it on, so it should be default in the Oracle design.

Even if the Oracle design precludes this directly, some version of it can be often be used. For instance, if accuracy is defined in terms of the reaction of the first person to read the output, and that person is isolated from the rest of the world, then we can get the Oracle to act as if it believed a nuclear bomb was due to go off before the person could communicate with the rest of the world.

False thermodynamic miracles

13 Stuart_Armstrong 05 March 2015 05:04PM

A putative new idea for AI control; index here. See also Utility vs Probability: idea synthesis.

Ok, here is the problem:

  • You have to create an AI that believes (or acts as if it believed) that event X is almost certain, while you believe that X is almost impossible. Furthermore, you have to be right. To make things more interesting, the AI is much smarter than you, knows everything that you do (and more), and has to react sensibly when event X doesn't happen.

Answers will be graded on mathematics, style, colours of ink, and compatibility with the laws of physics. Also, penmanship. How could you achieve this?

continue reading »