Wikitag Contributions

Comments

Sorted by
Oxidize-2-3

Because Rationalists overemphasize rationality while underemphasizing other important aspects of reality like strategy, persuasion, and meta-goal-directed-rationality. (What I mean by meta-goal-directed-rationality in this context is the degree with which an individual's actions are actually rational relative to a stated or internal goal as opposed to an externalized system that is actually minimally goal aligned in form or function)

Oxidize10

These are 6 sample titles I'm considering using. Any thoughts come to mind?

  1. AI-like reward functioning in humans. (Comprehensive model)
  2. Agency in humans
  3. Agency in humans | comprehensive model of why humans do what they do
  4. EA should focus less on AI alignment, more on human alignment
  5. EA's AI focus will be the end of us all.
  6. EA's AI alignment focus will be the end of us all. We should focus on human alignment instead
Oxidize30

Thanks for this. Do you have any ideas of what terminology i should use if I mean models used to predict reward in human contexts?

Oxidize10

I'd say that the 80/20 of the concept is how reward & punishment affect human behavior.

Is it about which forces?
I would say I'm referring to a combination of instinct, innate attraction/aversion, previous experience, decision-making, attention, and how they relate to each other in an everyday practical context. 

Seems to me that "genetics"
- I would say your disentanglement is right on the money. Rather than making an analysis for LLMs, I'm particularly interested in fleshing out the inter relations between concepts as they relate to the human brain.

Do you want a similar analysis for LLMs?
I mean it from a high-level agency perspective, as opposed to in specific AI or machine learning contexts. 

Goal?
My goal is to learn more about what information Lesswrongers use and are interested in so that I can better create a post for the community.


Adjacent concepts

  • Self-discipline
  • Positive psychology
  • Systems & patterns thinking
  • Maybe reward-functions?
Oxidize10

Thank you so much for the reply. You prevented me from making a pretty big mistake.

I'm defining reward-modelling as the manipulation of the direction of an agent's intelligence. From a goal-directed perspective.

So the reward-modelling of an AI might be the weights used, its training environment, mesa-optimization structure, inner-alignment structure, etc.

Or for a human, it might be genetics, pleasure, and pain.

Is there a better word I can use for this concept? Or maybe I should just make up a word?

Answer by Oxidize10

This is actually my primary focus. I believe it can be done through a complicated process that targets human psychology, but to explain it simply 

- Spread satisfaction & end suffering.
- Spread rational decision-making

To further simplify, if everyone was like us, and no one was on the chopping block if AGI doesn't get created, then the incentive to create AGI seizes and we effectively secure decades for AI-safety efforts.

This is a post I made on the subject.

https://www.lesswrong.com/posts/GzMteAGbf8h5oWkow/breaking-beliefs-about-saving-the-world

Load More