Steelmaning AI risk critiques

Stuart_Armstrong

At some point soon, I'm going to attempt to steelman the position of those who reject the AI risk thesis, to see if it can be made solid. Here, I'm just asking if people can link to the most convincing arguments they've found against AI risk.

EDIT: Thanks for all the contribution! Keep them coming...

The point is that it's ridiculous to say that human beings are 'universal learning machines'

No - it is not. See the article for the in depth argument and citations backing up this statement.

you can just raise any learning algorithm as a human child and it'll turn out fine.

Well almost - A ULM also requires a utility function or reward circuitry with some initial complexity, but we can also use the same universal learning algorithms to learn that component. It is just another circuit, and we can learn any circuit that evolution learned.

And that's all it takes to make them consistently UnFriendly, regardless of how well they're raised.

Sure - which is why I discussed sim sandbox testing. Did you read about my sim sandbox idea? We test designs in a safe sandbox sim, and we don't copy sociopaths.

Obviously, AIs are going to be more different from us than that

No, this isn't obvious at all. AGI is going to be built from the same principles as the brain - because the brain is a universal learning machine. The AGI's mind structure will be learned from training and experiential data such that the AI learns how to think like humans and learns how to be human - just like humans do. Human minds are software constructs - without that software we would just be animals (feral humans). An artificial brain is just another computer that can run the human mind software.

That hard coding is not going to be present in an arbitrary AI, which means we have to go and duplicate it out of a human brain. Which is HARD.

Yes, but it's only a part of the brain and a fraction of the brain's complexity, so obviously it can't be harder than reverse engineering the whole brain.

A ULM also requires a utility function or reward circuitry with some initial complexity, but we can also use the same universal learning algorithms to learn that component. It is just another circuit, and we can learn any circuit that evolution learned.

Okay, so we just have to determine human terminal values in detail, and plug them into a powerful maximizer. I'm not sure I see how that's different from the standard problem statement for friendly AI. Learning values by observing people is exactly what MIRI is working on, and it's not a trivial problem.

F... (read more)

36

Steelmaning AI risk critiques

36

36

36

Steelmaning AI risk critiques

36

36