MattMahoney comments on Reply to Holden on 'Tool AI' - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (348)
If we were smart enough to understand its policy, then it would not be smart enough to be dangerous.
That doesn't seem true. Simple policies can be dangerous and more powerful than I am.
To steelman the parent argument a bit, a simple policy can be dangerous, but if an agent proposed a simple and dangerous policy to us, we probably would not implement it (since we could see that it was dangerous), and thus the agent itself would not be dangerous to us.
If the agent were to propose a policy that, as far as we could tell, appears safe, but was in fact dangerous, then simultaneously: