johnswentworth

Sequences

From Atoms To Agents
"Why Not Just..."
Basic Foundations for Agent Models
Framing Practicum
Gears Which Turn The World
Abstraction 2020
Gears of Aging
Model Comparison

Wiki Contributions

Comments

Nitpick: you're talking about the discovery of the structure of DNA; it was already known at that time to be the particle which mediates inheritance IIRC.

I don't buy mathematical equivalence as an argument against, in this case, since the whole point of the path integral formulation is that it's mathematically equivalent but far simpler conceptually and computationally.

Man, that top one was a mess. Fixed now, thank you!

Answer by johnswentworthApr 23, 2024120

Here are some candidates from Claude and Gemini (Claude Opus seemed considerably better than Gemini Pro for this task). Unfortunately they are quite unreliable: I've already removed many examples from this list which I already knew to have multiple independent discoverers (like e.g. CRISPR and general relativity). If you're familiar with the history of any of these enough to say that they clearly were/weren't very counterfactual, please leave a comment.

  • Noether's Theorem
  • Mendel's Laws of Inheritance
  • Godel's First Incompleteness Theorem (Claude mentions Von Neumann as an independent discoverer for the Second Incompleteness Theorem)
  • Feynman's path integral formulation of quantum mechanics
  • Onnes' discovery of superconductivity
  • Pauling's discovery of the alpha helix structure in proteins
  • McClintock's work on transposons
  • Observation of the cosmic microwave background
  • Lorentz's work on deterministic chaos
  • Prusiner's discovery of prions
  • Yamanaka factors for inducing pluripotency
  • Langmuir's adsorption isotherm (I have no idea what this is)

I somehow missed that John Wentworth and David Lorell are also in the middle of a sequence on this same topic here.

Yeah, uh... hopefully nobody's holding their breath waiting for the rest of that sequence. That was the original motivator, but we only wrote the one post and don't have any more in development yet.

Point is: please do write a good stat mech sequence, David and I are not really "on that ball" at the moment.

(Didn't read most of the dialogue, sorry if this was covered.)

But the way transformers work is they greedily think about the very next token, and predict that one, even if by conditioning on it you shot yourself in the foot for the task at hand.

That depends on how we sample from the LLM. If, at each "timestep", we take the most-probable token, then yes that's right.

But an LLM gives a distribution over tokens at each timestep, i.e. . If we sample from that distribution, rather than take the most-probable at each timestep, then that's equivalent to sampling non-greedily from the learned distribution over text. It's the chain rule:

Writing collaboratively is definitely something David and I have been trying to figure out how to do productively.

How sure are we that models will keeptracking Bayesian belief states, and so allow this inverse reasoning to be used, when they don't have enough space and compute to actually track a distribution over latent states?

One obvious guess there would be that the factorization structure is exploited, e.g. independence and especially conditional independence/DAG structure. And then a big question is how distributions of conditionally independent latents in particular end up embedded.

Load More