RichardKennaway comments on Work harder on tabooing "Friendly AI" - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (51)
But that is exactly what this wrapping in a post-hoc utility function doesn't do. The model first picks an action in whatever way it does, then labels that with utility 1.