All of Jonathan Stray's Comments + Replies

Domain: Philosophy of science

Link: Philosophical Psychology 1989 course lecturres

Person: Paul Meehl

Background: Deep introduction to 20c philosophy of science, using psychology rather than physics as the model science -- because it's harder!

Why: Meehl was a philosopher of science, a statistician, and a lifelong clinical psychologist. He wrote a book showing that statistical prediction usually beats clinical judgement in 1954, and a paper on the replication crisis in psychology in 1978. He personally knew people like Popper, Kuhn, Lakatos, Feyerabend, etc. a... (read more)

1Parker Conley
Thanks! Added. Relevant note from the entry:

I think there might be a broad set of values that emerge around group survival, essentially game-theoretic or evolutionary pressures that lead to cooperation. But I think the details beyond that are likely to incredibly specific. I'd point to the "preference construction" literature as a more realistic account of how humans make choices, without assuming an underlying consistent preference structure.

This is quite interesting. It strikes me as perhaps a first-principles derivation of the theory of constructed preferences in behavioral economics.

Compare your
 

A shard of value refers to the contextually activated computations which are downstream of similar historical reinforcement events … We think that simple reward circuitry leads to different cognition activating in different circumstances. Different circumstances can activate cognition that implements different values, and this can lead to inconsistent or biased behavior. We conjecture that many

... (read more)
4TurnTrout
Thanks for the reference (and sorry for just now getting around to replying). I think Bernheim's paper is somewhat related to the shard theory of human values. There are several commonalities, including * Rejecting the idea that humans secretly have "true preferences" or "utility functions" * Taking a stand against ad hoc / patchwork / case-by-case explanations of welfare-related decisions * Recognizing the influence of context on decision-making; via "frames" (this work) or "shard activation contexts" (shard theory) However, I think that shard theory is not a rederivation of this work, or other work mentioned in this paper: * This paper presents a framework for locating decision-making contexts in which people are making ~informed decisions (at a gloss), gathering data within those contexts, and then making inferences about that person's preferences.  * Shard theory aims to predict what kinds of neural circuits get formed given certain initial conditions (like local random initialization of the cortex and certain reward circuitry), and to then draw conclusions about the choices of that learned policy. That doesn't mean these works are unrelated. If you want to deeply understand welfare and "idealized preferences" / what people "should" choose, I think that we should understand more about how people make choices, via what neural circuits. This is a question of neuroscience and reinforcement learning theory. The shard theory of human values aims to contribute to that question. As you pointed out in private correspondence, the shard theory of human values can be viewed as a hypothesis about where the context-sensitive preferences come from.

Very interesting. I would love to see this worked out in a toy example, where you can see that an RL agent in a grid world does not in general maximize reward, but is able to reason to do… something else. That’s the part I have the hardest time translating into a simulation: what does it mean that the agent is “thinking” about outcomes, if that is something different than running an RL algorithm?

But the essential point that humans choose not to wirehead — or in general to delay or avoid gratification — is a good one. Why do they do this? Is there any RL al... (read more)

Very interesting post. I think exploring the limits of our standard models of rationality is very worthwhile. IMO the models used in AI tend to be far too abstract, and don't engage enough with situatedness, unclear ontologies, and the fundamental weirdness of the open world.

One strand of critique of rationality that I really appreciate is David Chapman's "meta-rationality," which he defines as "evaluating, choosing, combining, modifying, discovering, and creating [rational] systems"

https://meaningness.com/metablog/meta-rationality-curriculum