Because Rationalists overemphasize rationality while underemphasizing other important aspects of reality like strategy, persuasion, and meta-goal-directed-rationality. (What I mean by meta-goal-directed-rationality in this context is the degree with which an individual's actions are actually rational relative to a stated or internal goal as opposed to an externalized system that is actually minimally goal aligned in form or function)

Reply

How familiar is the Lesswrong community as a whole with the concept of Reward-modelling?

Oxidize3mo10

These are 6 sample titles I'm considering using. Any thoughts come to mind?

AI-like reward functioning in humans. (Comprehensive model)
Agency in humans
Agency in humans | comprehensive model of why humans do what they do
EA should focus less on AI alignment, more on human alignment
EA's AI focus will be the end of us all.
EA's AI alignment focus will be the end of us all. We should focus on human alignment instead

Reply

How familiar is the Lesswrong community as a whole with the concept of Reward-modelling?

Oxidize3mo30

Thanks for this. Do you have any ideas of what terminology i should use if I mean models used to predict reward in human contexts?

Reply

How familiar is the Lesswrong community as a whole with the concept of Reward-modelling?

Oxidize3mo10

I'd say that the 80/20 of the concept is how reward & punishment affect human behavior.

Is it about which forces?
- I would say I'm referring to a combination of instinct, innate attraction/aversion, previous experience, decision-making, attention, and how they relate to each other in an everyday practical context.

Seems to me that "genetics"
- I would say your disentanglement is right on the money. Rather than making an analysis for LLMs, I'm particularly interested in fleshing out the inter relations between concepts as they relate to the human brain.

Do you want a similar analysis for LLMs?
I mean it from a high-level agency perspective, as opposed to in specific AI or machine learning contexts.

Goal?
My goal is to learn more about what information Lesswrongers use and are interested in so that I can better create a post for the community.

Adjacent concepts

Self-discipline
Positive psychology
Systems & patterns thinking
Maybe reward-functions?

Reply

How familiar is the Lesswrong community as a whole with the concept of Reward-modelling?

Oxidize3mo10

Thank you so much for the reply. You prevented me from making a pretty big mistake.

I'm defining reward-modelling as the manipulation of the direction of an agent's intelligence. From a goal-directed perspective.

So the reward-modelling of an AI might be the weights used, its training environment, mesa-optimization structure, inner-alignment structure, etc.

Or for a human, it might be genetics, pleasure, and pain.

Is there a better word I can use for this concept? Or maybe I should just make up a word?

Reply

Are there any (semi-)detailed future scenarios where we win?

Answer by OxidizeApr 10, 202510

This is actually my primary focus. I believe it can be done through a complicated process that targets human psychology, but to explain it simply

- Spread satisfaction & end suffering.
- Spread rational decision-making

To further simplify, if everyone was like us, and no one was on the chopping block if AGI doesn't get created, then the incentive to create AGI seizes and we effectively secure decades for AI-safety efforts.

This is a post I made on the subject.

https://www.lesswrong.com/posts/GzMteAGbf8h5oWkow/breaking-beliefs-about-saving-the-world

Reply