Rishika — LessWrong

LESSWRONG
LW

Rishika — LessWrong

Replying toShard Theory - is it true for humans?

Thanks for the response, Martin. I'd like to try to get to the heart of what we disagree on. Do you agree that, given a sufficiently different architecture - e.g. a human who had a dog's brain implanted somehow - would grow to have different values in some respect? For example, you mention arguing persuasively. Argument is a pretty specific ability, but we can widen our field to language in general - the human brain has pretty specific circuitry for that. A dog's brain that lacks the appropriate language centers would likely never learn to speak, leave alone argue persuasively.

I want to point out again that the disagreement is just a matter of scale. I do think relatively similar values can be learnt through similar experiences for basic RL agents; I just want to caution that for most human and animal examples, architecture may matter more than you might think.

Replying toShard Theory - is it true for humans?

Rishika2y

Shard Theory - is it true for humans?

Hi Martin, thanks a lot for reading and for your comment! I think what I was trying to express is actually quite similar to what you write here.

'If we did they would still have different experiences, notably the experience of having a brain architecture ill-suited to operating their body.' - I agree. If I understand shard theory right, it claims that underlying brain architecture doesn't make much difference, and e.g. the experience of trying to walk in different ways, and failing at some but succeeding at others, would be enough to lead to success. However I'm pointing out that a dog's brain would still be ill-suited to learning things such as walking... (read more)

Replying toShard Theory - is it true for humans?

Rishika2y

Shard Theory - is it true for humans?

Thanks, I really appreciate that! I've just finished an undergrad in cognitive science, so I'm glad that I didn't make any egregious mistakes, at least.

"AGI won't be just an RL system ... It will need to have explicit goals": I agree that this if very likely. In fact, the theory of 'instrumental convergence' often discussed here is an example of how an RL system could go from being comprised of low-level shards to having higher-level goals (such as power-seeking) that have top-down influence. I think Shard Theory is correct about how very basic RL systems work, but am curious about if RL systems might naturally evolve higher-level goals and values as they... (read more)

Shard Theory - is it true for humans?

Rishika

And is it a good model for value learning in AI?

(Read on Substack: https://recursingreflections.substack.com/p/shard-theory-is-it-true-for-humans)

TLDR

Shard theory proposes a view of value formation where experiences lead to the creation of context-based ‘shards’ that determine behaviour. Here, we go over psychological and neuroscientific views of learning, and find that while shard theory’s emphasis on context bears similarity to types of learning such as conditioning, it does not address top-down influences that may decrease the locality of value-learning in the brain.

What’s Shard Theory (and why do we care)?

In 2022, Quintin Pope and Alex Turner posted ‘The shard theory of human values’, where they described their view of how experiences shape the value we place on things. They... (read 4322 more words →)

Replying to30-ish focusing tips

Rishika4y

30-ish focusing tips

"Something about being watched makes us more responsible ... In a pinch, placebo-ing yourself with a huge fake pair of eyes might also help."

There are 'Study with me'/'Work with me' videos on Youtube, which is usually just a few hours of someone working silently at a desk or library. I sometimes turn one of those on to give me the feeling I'm not alone in the room, raising accountability.

Great post!

Replying toBoring machine learning is where it's at

Rishika4y

Boring machine learning is where it's at

I don't think people focus on language and vision because they're less boring than things like decision trees; they focus on those because the domains of language and vision are much broader than the domains decision trees, etc., are applied to. If you train a decision tree model to predict the price of a house it will do just that, whereas if you train a language model to write poetry it could conceivably write about various topics such as math, politics and even itself (since poetry is a broad scope). This is a (possibly) a step towards general intelligence, which is what people are worried/excited about.

I agree with your argument that algorithms such as decision trees are much better at doing things that humans can't, whereas language and vision models are not.

Replying toTraining My Friend to Cook

Rishika4y

Training My Friend to Cook

It's great to see Brittany's response was so positive, but could you still clarify if you explicitly told her you would help her learn how to cook, and/or did she ask you to do so? Or did you just infer that it's something that she would enjoy, and proceed without making it explicit?

Again, I'm happy for Tiffany's newfound cooking abilities - congratulations to her!

-1