Recreating the caring drive
TL;DR: This post is about value of recreating “caring drive” similar to some animals and why it might be useful for AI Alignment field in general. Finding and understanding the right combination of training data/loss function/architecture/etc that allows gradient descent to robustly find/create agents that will care about other agents with different goals could be very useful for understanding the bigger problem. While it's neither perfect nor universally present, if we can understand, replicate, and modify this behavior in AI systems, it could provide a hint to the alignment solution where the AGI “cares” for humans. Disclaimers: I’m not saying that “we can raise AI like a child to make it friendly” or that “people are aligned to evolution”. Both of these claims I find to be obvious errors. Also, I will write a lot about evolution, as some agentic entity, that “will do that or this”, not because I think that it’s agentic, but because it’s easier to write this way. I think that GPT-4 have some form of world model, and will refer to it a couple of times. Nature's Example of a "Caring Drive" Certain animals, notably humans, display a strong urge to care for their offspring. I think that part of one of the possible “alignment solutions” will look like the right set of training data + training loss that allow gradient to robustly find something like a ”caring drive” that we can then study, recreate and repurpose for ourselves. And I think we have some rare examples of this in nature already. Some animals, especially humans, will kind-of-align themselves to their presumable offspring. They will want to make their life easier and better, to the best of their capabilities and knowledge. Not because they “aligned to evolution” and want to increase the frequency of their genes, but because of some strange internal drive created by evolution. The set of triggers tuned by evolution, activated by events associated with the birth will awake the mechanism. It will re-aim t
In the next few hours we’ll get to noticable flames [...] Some number of hours after that, the fires are going to start connecting to each other, probably in a way that we can’t understand, and collectively their heat [...] is going to rise very rapidly. My retort to that is, do you know what we’re going to do in that scenario? We’re going to unkindle them all.