ChristianKl comments on MIRI strategy - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (94)
How do you decide whether some interaction of a complex neural net is friendly or unfriendly?
It's very hard to tell what a neural net or complex algorithm is doing even if you have logs.
Don't use a neural net (or variants like deep belief networks). The field has advanced quite a bit since the 60's, and since the late 80's there have been machine learning and knowledge representation structures which are human and/or auditor comprehensible, such as probabilistic graphical models. This would have to be first class types of the virtual machine which implements the AGI if you are using auditing as a confinement mechanism. But that's not really a restriction as many AI techniques are already phrased in terms of these models (including Eliezer's own TDT, for example), and others have simple adaptations.