Wes_W comments on Boxing an AI? - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (39)
Respectfully, I think you're just shoving all your complexity under the rug here. Unless you have a concrete proposal on how to actually do this, just asserting that your box won't be figure-out-able is dodging the question.
At first glance, I was also skeptical of tailcalled's idea, but now I find I'm starting to warm up to it. Since you didn't ask for a practical proposal, just a concrete one, I give you this:
The problem with this is that even if you can determine with certainty that an AI is friendly, there is no certainty that it will stay that way. There could be a series of errors as it goes about daily life, each acting as a mutation, serving to evolve the "Friendly" AI into a less friendly one
Hm. That does sound more workable than I had thought.
I would probably only include it as part of a batch of tests and proofs. It would be pretty foolish to rely on only one method to check if something that will destroy the world if it fails works correctly.
Yes, I agree with you on that. (Step 5 was intended as a joke/reference.)
Pick or design a game that contains some aspect of reality that you care about in terms of AI. All games have some element of learning, a lot have an element of planning and some even have varying degrees of programming.
As an example, I will pick Factorio, a game that involves learning, planning and logistics. Wire up the AI to this game, with appropriate reward channels etc. etc.. Now you can test how good the AI is at getting stuff done; producing goods, killing aliens (which isn't morally problematic, as the aliens don't act as personlike morally relevant things) and generally learning about the universe.
The step with morality depends on how the AI is designed. If it's designed to use heuristics to identify a group of entities as humans and help them, you might get away with throwing it in a procedurally generated RPG. If it uses more general, actually morally relevant criteria (such as intelligence, self-awareness, etc.), you might need a very different setup.
However, speculating at exactly what setup is needed for testing morality is probably very unproductive until we decide how we're actually going to implement morality.