User Comment Replies

Thanks for the response, Martin. I'd like to try to get to the heart of what we disagree on. Do you agree that, given a sufficiently different architecture - e.g. a human who had a dog's brain implanted somehow - would grow to have different values in some respect? For example, you mention arguing persuasively. Argument is a pretty specific ability, but we can widen our field to language in general - the human brain has pretty specific circuitry for that. A dog's brain that lacks the appropriate language centers would likely never learn to speak, leave alo... (read more)

1Martin Randall10mo

Thanks for trying to get to the heart of it. Yes, according to shard theory. And thank you for switching to a more coherent hypothetical. Two reasons are different reinforcement signals and different capabilities. Reinforcement signals Brain -> Reinforcement Signals -> Reinforcement Events -> Value Formation The main way that brain architecture influences values, according to Quintin Pope and Alex Turner, is via reinforcement signals: Dogs and humans have different ideal diets. Dogs have a better sense of smell than humans. Therefore it is likely that dogs and humans have different hard-coded reinforcement signals around the taste and smell of food. This is part of why dog treats smell and taste different to human treats. Human children often develop strong values around certain foods. Our hypothetical dog-human child would likely also develop strong values around food, but predictably different foods. As human children grow up, they slowly develop values around healthy eating. Our hypothetical dog-human child would do the same. Because they have a human body, these healthy eating values would be more human-like. Capabilities Brain -> Capabilities -> Reinforcement Events -> Value Formation Let's extend the hypothetic further. Suppose that we have a hybrid dog-human with a human body, human reinforcement learning signals, but otherwise a dog brain. Now does shard theory claim they will have the same values as a human? No. While shard theory is based on the theory of "learning from scratch" in the brain, "Learning-from-scratch is NOT blank-slate". So it's reasonable to suppose, as you do, that the hybrid dog-human will have many cognitive weaknesses compared to humans, especially given its smaller cortex. These cognitive weaknesses will naturally lead to different experiences, different reinforcement events. Shard theory claims that "reinforcement events shape human value shards"". Accordingly shard theory predicts different values. Perhaps the dog-human ne

Shard Theory - is it true for humans?

Rishika10mo10

Hi Martin, thanks a lot for reading and for your comment! I think what I was trying to express is actually quite similar to what you write here.

'If we did they would still have different experiences, notably the experience of having a brain architecture ill-suited to operating their body.' - I agree. If I understand shard theory right, it claims that underlying brain architecture doesn't make much difference, and e.g. the experience of trying to walk in different ways, and failing at some but succeeding at others, would be enough to lead to success. ... (read more)

2Martin Randall10mo

Pointing to the variety of brains and values in animals doesn't persuade me because they also have a wide variety of environments and experiences. Shard theory predicts a wide variety of values as a result (tempered by convergent evolution). One distinctive prediction is in cases where the brain is the same but the experiences are different. You agree that "between humans, one person may value very different things from the next". I agree and would point to humans throughout history who had very different experiences and values. I think the example of football vs reading understates the differences, which include slavery vs cannibalism. The other distinctive prediction is in cases where the brain is different but the experiences are the same. So for example, consider humans who grow up unable to walk, either due to a different brain or due to a different body. Shard theory predicts similar values despite these different causes. The shard theory claim here is as quoted, "value formation is ... relatively architecture independent". This is not a claim about skill formation, eg learning to walk. It's also not a claim that architecture can never be causally upstream of values. I see shard theory as a correction to Godshatter theory and its "thousand shards of desire". Yudkowsky writes: Arguing persuasively is a common human value, but shard theory claims that it's not encoded into brain architecture. Instead it's dependent on the experience of arguing persuasively and having that experience reinforced. This can be the common experience of a child persuading their parent to give them another cookie. There is that basic dependence of having a reinforcement learning system that is triggered by fat and sugar. But it's a long way from there to here.

Shard Theory - is it true for humans?

Rishika10mo20

Thanks, I really appreciate that! I've just finished an undergrad in cognitive science, so I'm glad that I didn't make any egregious mistakes, at least.

"AGI won't be just an RL system ... It will need to have explicit goals": I agree that this if very likely. In fact, the theory of 'instrumental convergence' often discussed here is an example of how an RL system could go from being comprised of low-level shards to having higher-level goals (such as power-seeking) that have top-down influence. I think Shard Theory is correct about how very basic RL systems ... (read more)

30-ish focusing tips

Rishika4y20

"Something about being watched makes us more responsible ... In a pinch, placebo-ing yourself with a huge fake pair of eyes might also help."

There are 'Study with me'/'Work with me' videos on Youtube, which is usually just a few hours of someone working silently at a desk or library. I sometimes turn one of those on to give me the feeling I'm not alone in the room, raising accountability.

Great post!

Boring machine learning is where it's at

Rishika4y30

I don't think people focus on language and vision because they're less boring than things like decision trees; they focus on those because the domains of language and vision are much broader than the domains decision trees, etc., are applied to. If you train a decision tree model to predict the price of a house it will do just that, whereas if you train a language model to write poetry it could conceivably write about various topics such as math, politics and even itself (since poetry is a broad scope). This is a (possibly) a step towards general intellige... (read more)

1George3d64y

Hmh, I didn't want to give the impression I'm discounting particular architectures, I just gave the boosting example to help outline the target class of problems.

Training My Friend to Cook

Rishika4y10

It's great to see Brittany's response was so positive, but could you still clarify if you explicitly told her you would help her learn how to cook, and/or did she ask you to do so? Or did you just infer that it's something that she would enjoy, and proceed without making it explicit?

Again, I'm happy for Tiffany's newfound cooking abilities - congratulations to her!

LESSWRONG
LW

All of Rishika's Comments + Replies