Technology which can predict whether an action would be approved by a person or by an organization is:
-Practical to create, first applied to test cases, then to limited circumstances, then in more general cases.
-For the test cases and for the limited circumstances, it can be created using some existing machine learning technology without deploying full-scale natural language processing.
-Approval/disapproval is a binary value, and appropriate machine learning approaches would includes logistic regression or forest-and-trees methods. We create a model using training data, and the model may output P(approval | conditions) . The model is not that different from one used to predict a purchase or a variety of other online behaviors.
-A system which could forecast approval and disapproval would be useful to PEOPLE, well before it became useful as a basis for selecting AI motivations.
Predicting whether people would approve of a particular action is something that we could use machine learning for now.
These approaches advance the idea from a theoretical construct to an actual, implementable project.
Thanks to Paul for the seed insight.
(Crossposted from ordinary ideas).
I’ve recently been thinking about AI safety, and some of the writeups might be interesting to some LWers:
I’m excited about a few possible next steps: