This is a suggestion of a reinforcement learning model with an additional conflicting dynamic between optimized data and loss function. That conflict is intended to reduce RL’s intrinsic Omohundro x-risk dynamic. Thus, it’s also an attempt to produce, though without details, a good idea for AI safety, and so as a test for the meta-usefulness of my posting here. That’s following the idea.
The Idea
A modification of reinforcement learning: to have the learning agent develop de sui a function for optimization which is to supplant the loss function, at a given reduction of the loss function, by the action of the learning process. That is, if we correlate the loss function not with arbitrary data to... (read 665 more words →)