Jonathan Stray - LessWrong

The Best Tacit Knowledge Videos on Every Subject

Domain: Philosophy of science

Link: Philosophical Psychology 1989 course lecturres

Person: Paul Meehl

Background: Deep introduction to 20c philosophy of science, using psychology rather than physics as the model science -- because it's harder!

Why: Meehl was a philosopher of science, a statistician, and a lifelong clinical psychologist. He wrote a book showing that statistical prediction usually beats clinical judgement in 1954, and a paper on the replication crisis in psychology in 1978. He personally knew people like Popper, Kuhn, Lakatos, Feyerabend, etc. and brings their insights to life in these course lectures.

[DISC] Are Values Robust?

Answer by Jonathan StrayDec 21, 202221

I think there might be a broad set of values that emerge around group survival, essentially game-theoretic or evolutionary pressures that lead to cooperation. But I think the details beyond that are likely to incredibly specific. I'd point to the "preference construction" literature as a more realistic account of how humans make choices, without assuming an underlying consistent preference structure.

The shard theory of human values

Jonathan Stray3y50

This is quite interesting. It strikes me as perhaps a first-principles derivation of the theory of constructed preferences in behavioral economics.

Compare your

A shard of value refers to the contextually activated computations which are downstream of similar historical reinforcement events … We think that simple reward circuitry leads to different cognition activating in different circumstances. Different circumstances can activate cognition that implements different values, and this can lead to inconsistent or biased behavior. We conjecture that many biases are convergent artifacts of the human training process and internal shard dynamics. People aren’t just randomly/hardcoded to be more or less “rational” in different situations.

to Bernheim’s

According to this view, I aggregate the many diverse aspects of my experience only when called upon to do so for a given purpose, such as making a choice or answering a question about my well-being. … To answer a question about my overall welfare, or to choose between alternatives without deploying a previously constructed rule of thumb, I must weigh the positives against the negatives and construct an answer de novo. …This perspective potentially attributes particular choice anomalies to the vagaries of aggregation. In particular, when I deliberate and aggregate, the weights I attach to the various dimensions of my subjective experience may be sensitive to context.

Values are closely related to preferences, and preferences have been extensively studied in behavioral econ. I've written more on the connection between AI and behavioral econ here.

Reward is not the optimization target

Jonathan Stray3y30

Very interesting. I would love to see this worked out in a toy example, where you can see that an RL agent in a grid world does not in general maximize reward, but is able to reason to do… something else. That’s the part I have the hardest time translating into a simulation: what does it mean that the agent is “thinking” about outcomes, if that is something different than running an RL algorithm?

But the essential point that humans choose not to wirehead — or in general to delay or avoid gratification — is a good one. Why do they do this? Is there any RL algorithm that would do this? If not, what sort of algorithm would?

Perhaps the clearest point here is that RL maximizes reward subject to exploration poilicy. For random exploration, perhaps an RL agent is (on average) a reward maximization agent, but it seems likely that no successful learning organism explores randomly.

Realism about rationality

Jonathan Stray5y30

Very interesting post. I think exploring the limits of our standard models of rationality is very worthwhile. IMO the models used in AI tend to be far too abstract, and don't engage enough with situatedness, unclear ontologies, and the fundamental weirdness of the open world.

One strand of critique of rationality that I really appreciate is David Chapman's "meta-rationality," which he defines as "evaluating, choosing, combining, modifying, discovering, and creating [rational] systems"

https://meaningness.com/metablog/meta-rationality-curriculum

LESSWRONG
LW

Posts

Wikitag Contributions

Comments