Houshalter comments on Chatbots or set answers, not WBEs - Less Wrong

5 Post author: Stuart_Armstrong 08 September 2015 05:17PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (9)

You are viewing a single comment's thread.

Comment author: Houshalter 09 September 2015 05:07:38AM 1 point [-]

But a chatbot is very unlikely to produce coherent strings of text, let alone something that has value. So your AI system will be too constrained, and produce terrible output. In order to make it probable that a chatbot could have produced that output.

Or it will produce something that is incredibly improbable, and therefore definitely produced by an AI. And probably dangerous, since you've allowed a lot of optimization.

Why not just use humans? You don't need to whole brain emulation. Just put a human in a room for 3 weeks and have them think about a problem. At the end of 3 weeks, they write down the answer on a sheet of paper. The AI also writes down it's answer. A different AI, reads both, and tries to guess which one was produced by the AI and the human.

The first AI optimizes for being able to fool the judge and have it's paper be indistinguishable from human. But it does it conditionally on the human solving the problem. If the human doesn't solve the problem, the AI wins automatically. So the AI needs to come up with a correct solution, but also one that looks like a human produced it.

You may not even need to use real humans or have a real AI judge at all. Just assume this is all true as a counterfactual. Alternatively, you may need to do this a lot. To get training data, and show the AI the test is real.

Comment author: Stuart_Armstrong 09 September 2015 09:40:21AM *  0 points [-]

The problem is that the human will know their answer, and could communicate it later if they're let out of the box. Maybe we could get online users to submit answers to the question. Then the AI selects the best answer from the choices. And if the AI is not turned on, a random answer is selected.

Comment author: Houshalter 09 September 2015 10:02:13PM 1 point [-]

In my version, the AI only cares about what the AI judge thinks of it's answer. It doesn't care about anything that happens afterwards. However, it wants to look as human as possible. So it can't put infohazards into it's answer.

Comment author: Stuart_Armstrong 10 September 2015 08:49:43AM 1 point [-]

Interesting. I'll think on that.