SteveG comments on Recent AI safety work - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (6)
Technology which can predict whether an action would be approved by a person or by an organization is:
-Practical to create, first applied to test cases, then to limited circumstances, then in more general cases.
-For the test cases and for the limited circumstances, it can be created using some existing machine learning technology without deploying full-scale natural language processing.
-Approval/disapproval is a binary value, and appropriate machine learning approaches would includes logistic regression or forest-and-trees methods. We create a model using training data, and the model may output P(approval | conditions) . The model is not that different from one used to predict a purchase or a variety of other online behaviors.
-A system which could forecast approval and disapproval would be useful to PEOPLE, well before it became useful as a basis for selecting AI motivations.
Predicting whether people would approve of a particular action is something that we could use machine learning for now.
These approaches advance the idea from a theoretical construct to an actual, implementable project.
Thanks to Paul for the seed insight.