Ah, sorry: the way I used it in the paper, it's my own coinage, meant to evoke the traditional usage. When I say there a large "mathematical body of work," I mean abstract algebra for symmetry, classical machine learning for the usual meaning of "regularizer," and indirectly the work on complex systems theory, attractor theory, control theory, etc. I created my own meaning of the word "regularizer" because I have a philosophical intuition that the concept in traditional machine learning is generalizable, perhaps by someth...
I sort of agree with your criticism: I wish I had more time to clarify my approach, and make it more mathematically precise, but I only decided at the last minute to even try to submit to the alignment competition. So I was scrambling to take down a minimal version of the idea. The call for the prize lists "philosophical" as one possible type of entry, so I kept everything verbal, trying to be just precise enough to point the way to a true formalization.
I do understand the problem of grasping slippery things as pointed out in the linked less wron...
I emailed my submission, but for the sake of redundancy, I'll submit it here too:
"The Regularizing-Reducing Model"
https://www.lesserwrong.com/posts/36umH9qtfwoQkkLTp/the-regularizing-reducing-model
Thanks! I knew people had essentially devised these ideas before (and if they had instantly worked we would have solved FAI already), but think there is something to be gained via a reinterpretation of the ideas in the RRM. For example, if the human value function derives from discoverable symmetries of neural structure and external environment, then we can do the work to discover these and directly impose them in the agent architecture. And I think the statement I just made is not trivially equivalent to telling people “find human rewards and put them in ... (read more)