Shard Theory - is it true for humans?
And is it a good model for value learning in AI? (Read on Substack: https://recursingreflections.substack.com/p/shard-theory-is-it-true-for-humans) TLDR Shard theory proposes a view of value formation where experiences lead to the creation of context-based ‘shards’ that determine behaviour. Here, we go over psychological and neuroscientific views of learning, and find that while...
Thanks for the response, Martin. I'd like to try to get to the heart of what we disagree on. Do you agree that, given a sufficiently different architecture - e.g. a human who had a dog's brain implanted somehow - would grow to have different values in some respect? For example, you mention arguing persuasively. Argument is a pretty specific ability, but we can widen our field to language in general - the human brain has pretty specific circuitry for that. A dog's brain that lacks the appropriate language centers would likely never learn to speak, leave alone argue persuasively.
I want to point out again that the disagreement is just a matter of scale. I do think relatively similar values can be learnt through similar experiences for basic RL agents; I just want to caution that for most human and animal examples, architecture may matter more than you might think.