Kyre comments on Concept Safety: Producing similar AI-human concept spaces - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (45)
The pictures and videos of torture in the training set that are labelled "bad".
It is not perfect, but I think the idea is that with a large and diverse training set the hope is that it alternative models of "good/bad" become extremely contrived, and the human one you are aiming for becomes the simplest model.
I found the material in the post very interesting. It holds out hope that after training your world model, it might not be as opaque as people fear.