... Or more specifically, a post on how, and why, to encode emotions to find out more about goals, rationality and safe alignment in general.
If one tries to naively fit reinforcement learning’s reward functions back onto the human mind, the closest equivalent one may find is emotions. If one delves deeper into the topic though, they will find a mish-mash of other “reward signals” and auxiliary mechanisms in the human brain, (such as the face tracking reflex, which aids us in social development) and ends up hearing about affects, the official term when it comes to the study of emotions.
At least that is what approximately happened with me.
Affects and reward functions seem... (read 1312 more words →)
I feel that the social instincts link to the learned-from-scratch world-model via a chain of guided development windows.
The singular links in the chain are stacks of affective mechanisms: the trigger that detects the environmental stimulus (the moving large object for ducklings), the response (follow that object), and an affect (emotion) that links the instinct to the learned model via a reward signal to strengthen the association (feeling of safety).
As it would be near impossible for the DNA to have a concept of "Rita won a trophy" as the trigger, the system would have to first "teach" the model simpler concepts, and then tag onto those via the affect to be able to... (read more)