Research Hamming Questions

johnswentworth

86 Research Hamming Questions

22nd Mar 2022

4 min read

86

We’ll start with Richard Hamming’s original question: what are the most important problems in your field?

(At this point, you should grab pencil and paper, or open a text file, or whatever. Set a timer for at least two minutes - or five to ten if you want to do a longer version of this exercise - and write down the most important problems in your field. The rest of the questions will be variations on this first one, all intended to come at it from different directions; I recommend setting the timer for each of them.)

Perfection

Imagine that your field achieved perfection - the ultimate theory, perfect understanding, building The Thing.

What has been achieved in the idealized version of the field, which has not yet been achieved today? What are the main barriers between here and there?

Measurement

Often, in hindsight, a field turns out to have been bottlenecked on the development of some new measurement method, ranging from physical devices like the thermometer to abstract ideas like Shannon’s entropy and information channel capacity.

In what places does it look like your field is bottlenecked on the ability to usefully measure something? What are the main barriers to usefully measuring those things?

Framing

The difficult thing, in most pre-paradigmatic and confused problems at the beginning of some Science, is not coming up with the right complicated long sentence in a language you already know. It's breaking out of the language in which every hypothesis you can write is false. [...] The warning sign that you need to 'jump-out-of-the-system' is the feeling [of] frustration, flailing around in the dark, trying desperate wild ideas and getting unhelpful results one after another. When you feel like that, you're probably thinking in the wrong language, or missing something fundamental, or trying to do something that is in fact impossible. Or impossible using the tools you have. - Mad Investor Chaos

What are the places where your field is flailing around in the dark, trying desperate ideas and getting unhelpful results one after another? What are the places where it feels like the problem is formulated in the wrong language, and a shift to another frame might be required to ask the right question or state the right hypothesis?

Unification

Sometimes, we have a few different models, each of which works really well in different places. Maybe it feels like there should be some model which unifies them all, which could neatly account for all these phenomena at once - like the unification of electricity, magnetism and optics in the 19th century.

Are there different models in your field which feel like they point to a not-yet-known unified model?

Incompatible Assumptions

One of the main ways we notice (usually implicit) false assumptions in our models is when they come into conflict with some other results, patterns or constraints. This may look like multiple models which cannot all be true simultaneously, or it may look like one model which looks like it cannot be true at all yet nonetheless keeps matching reality quite well. This is a hint to reexamine the assumptions under which the models are supposedly incompatible/impossible, and especially look for any hidden assumptions in that impossibility argument.

Are there places in your field where a few models look incompatible, or one model looks impossible, yet nonetheless the models match reality quite well?

Giant Search Space

The space of possible physical laws or theorems or principles is exponentially vast. Sometimes, the hard part is to figure out what the relevant factors are at all. For instance, to figure out how to reproducibly culture a certain type of cell, a biologist might need to provide a few specific signal molecules, a physical medium with the right elasticity or density, a particular temperature range, and/or some other factors which nobody even thought to test yet.

Are there places in your field where nobody even knows what key factors must be controlled for some important outcome to robustly occur?

Finding The True Name

Sometimes, most people in the field have an intuition that some concept is important, but it’s not clear how to formulate the concept in a way that makes it robustly and generalizably useful. “Causality” was a good example of this, prior to Judea Pearl & co. Once we can pin down the right formulation of the idea, we can see arguments/theorems which follow the idea, and apply them in the wild. But before we have the right formulation, we have to make do with ad-hoc proxies, “leaky abstractions” which don’t quite consistently generalize in the ways we intuitively want/expect.

Are there places in your field where some concept seems very central to understanding, but nobody knows its True Name yet?

Pica

There’s a condition called pica, where someone has a nutrient deficiency (e.g. too little iron), and they feel strong cravings for some food which does not contain that nutrient (e.g. ice). The brain just doesn’t always manage to crave things which will actually address the real problem; for some reason things like ice will “look like” they address the problem, to the relevant part of the brain.

Are there places where your field as a whole, or you personally, pursue things which won’t really help with the main problems (but might kind of “look like” they address the problems)?

Other Peoples’ Answers

Pick someone you know, or a few people, who are smart and have good judgment. What would their answers to these questions be?

Closing

Hamming’s original question was not just “What are the most important problems in your field?”. He had two follow-up questions:

I started asking, "What are the important problems of your field?" And after a week or so, "What important problems are you working on?" And after some more time I came in one day and said, "If what you are doing is not important, and if you don't think it is going to lead to something important, why are you at Bell Labs working on it?"

To further quote Hamming: if you do not work on important problems, it's unlikely you'll do important work.

Hamming QuestionsRationality

Frontpage

86

Mentioned in

122Principles for Alignment/Agency Projects

28Content and Takeaways from SERI MATS Training Program with John Wentworth

Research Hamming Questions

New Comment

7 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:30 AM

[-]TurnTrout3y300

One of my favorite things about you, John, is that you are excellent at prompting me to direct my attention towards important questions which lead to promising insights. Thanks for that!

I answered your questions, originally in my private notes, but partway through decided I would comment them.

Imagine that your field achieved perfection - the ultimate theory, perfect understanding, building The Thing.
What has been achieved in the idealized version of the field, which has not yet been achieved today? What are the main barriers between here and there?

Detailed understanding of what it means for one agent to be aligned with another agent, or a group of agents.
We can easily check the functional properties of a causal process and argue they satisfy theorems saying WHP it veers towards highly desirable states.
- Like, I could point out the ways in which DRL fails at a glance, without appealing to instrumental convergence in particular?
  - Or maybe that’s part of the theory?
These theorems are intuitively obviously correct and corresponding to original-intuitive-reality.
They are so correct that it’s easy to persuade people to use the aligned approach.
We understand what agents are, and how people fit into that picture, and the theory retrodicts all past problems with governments, with corporations, with principal agents
We know how to take any reasonable training process and make it an aligned training process, with minimal alignment tax (<5%)

Barriers:

We don’t know what agents are.
We don’t know what alignment means
We don’t know how to prove the right kinds of theorems
We don’t know if our concepts are even on the right track
- They probably aren’t, except insofar as they spur the right language. It feels more like “how do I relate KL div and optimality probability” rather than “let’s prove theorems about retargetability”

Often, in hindsight, a field turns out to have been bottlenecked on the development of some new measurement method, ranging from physical devices like the thermometer to abstract ideas like Shannon’s entropy and information channel capacity.

In what places does it look like your field is bottlenecked on the ability to usefully measure something? What are the main barriers to usefully measuring those things?

Bottlenecks
- Abstraction
  - Uninterpretable
  - Not sure where to draw “category boundaries”
- Alignment
  - Don’t know what alignment really is
  - Or how to measure ground truth
  - Needs vocab of concepts we don’t have yet
- Power-seeking
  - Unclear what gets trained and how to measure according to distributions
  - Would need current agent beliefs to use current formalism
  - Also current formalism doesn’t capture game theoretic aspects of logical blackmail etc
  - Intent seems more important
    - But this is bottlenecked on “what is going on in the agent’s mind”
- Interpretability
  - I’m not sure. Networks are big and messy.
- Capability
  - Actually I bet this is natural in terms of the right alignment language
- Compute efficiency in terms of ability to bring about cognitively important outcomes
  - Seems strictly harder than “capability”
- Time remaining until TAI
  - Uncertainty about how AI works, how minds work, what the weighted-edge distance is on lattice of AI discoveries
Barriers
- Conceptual roadblocks
  - But what?
  - (Filled in above)

What are the places where your field is flailing around in the dark, trying desperate ideas and getting unhelpful results one after another? What are the places where it feels like the problem is formulated in the wrong language, and a shift to another frame might be required to ask the right question or state the right hypothesis?

Flailing:
- IDA
- ELK
- Everything theoretical feels formulated wrong, except maybe logical induction / FFS / John’s work
  - This is important!
  - (Also I wouldn’t be surprised if I’d say Vanessa’s work is not flailing, if I could understand it)
- Retargetability → IC seems like an important piece but not the whole thing, part of it is phrased correctly
- AUP was flailing
  - Did I get impact wrong? Or is reward maximization wrong?
    - I think I got impact right philosophically, but not the structure of how to get one agent to properly care about impact on other agents.
      - I just found a good trick (penalize agent for impact to other goals it could have and pursue) which works really well in a range of cognitively available situations (physical proximity) but which breaks down under tons of optimization pressure
        And the "don't gain power for your own goal" seems like it should be specifiable and non-hacky, but I don't actually see how to do it right.
    - But note that getting impact right for-real wouldn’t save the world AFAICT
- Basically everything else
What happened when I shifted to the retargetability frame?
- IDT i did that until recently, actually; original post was too anchored on instrumental convergence over outcome sets, missing the elegant functional statement
- and my shift to this frame still feels incomplete.
Corrigibility still feels like it should work in the right language and grounding

Sometimes, we have a few different models, each of which works really well in different places. Maybe it feels like there should be some model which unifies them all, which could neatly account for all these phenomena at once - like the unification of electricity, magnetism and optics in the 19th century.

Are there different models in your field which feel like they point to a not-yet-known unified model?

I guess different decision theories? Not super familiar
Not coming up with as many thoughts here, because I feel like our “partial models” are already contradicted and falsified on their putative domains of applicability, so what good would a unification do? More concisely stated wrongness?

One of the main ways we notice (usually implicit) false assumptions in our models is when they come into conflict with some other results, patterns or constraints. This may look like multiple models which cannot all be true simultaneously, or it may look like one model which looks like it cannot be true at all yet nonetheless keeps matching reality quite well. This is a hint to reexamine the assumptions under which the models are supposedly incompatible/impossible, and especially look for any hidden assumptions in that impossibility argument.

Are there places in your field where a few models look incompatible, or one model looks impossible, yet nonetheless the models match reality quite well?

Tempted to say “no”, because of the last phrase in the last sentence not seeming true.
Here was one. The instrumental convergence theorems required a rather precise environmental symmetry, which seemed weird. But now I have a new theory which relates abstract functional properties of how events come about, to those events coming out similarly for most initial conditions. And that doesn’t have anything to do with environmental / outcome-level symmetries. So at first the theory was right in its domain of applicability, but the domain seemed absurdly narrow on a few dimensions.
- it was narrow not because I missed how to prove sufficiently broad theorems about MDPs, but because I was focusing on the wrong details and missing the broader concept underlying everything I’d observed.
I guess the impossibility of value learning seems correct but spiritually inapplicable to the problem we want to solve, but I don’t quite know how to articulate that.
A few months back I wrote about how corrigibility is often impossible under reward maximization. But reward maximization seems pretty useful for motivating agents. But it’s so so so broken for nontrivial kinds of motivation.

The space of possible physical laws or theorems or principles is exponentially vast. Sometimes, the hard part is to figure out what the relevant factors are at all. For instance, to figure out how to reproducibly culture a certain type of cell, a biologist might need to provide a few specific signal molecules, a physical medium with the right elasticity or density, a particular temperature range, and/or some other factors which nobody even thought to test yet.

Are there places in your field where nobody even knows what key factors must be controlled for some important outcome to robustly occur?

In a robotics task, how would we ensure test-time agents did anywhere between 2 and 10 jumping jacks in an episode?
- What factors would you control there? We don’t know how to “target” along these dimensions, at least it would take more effort than I think it should

Are there places in your field where some concept seems very central to understanding, but nobody knows its True Name yet?

Corrigibility
- Corrigibility
- Corrigibility
Alignment
I remember thinking there was a concept like this related to logical time, but I forget what it was

At the social level, what are the barriers to solving the main problems in the previous two questions? Why aren’t they already solved? Why isn’t progress being made, or made faster?

Bad incentives — I think people working on eg value learning should not do that
- I think the marginal return from more researcher hours on deconfusion outweighs the return from empirical wisdom in applying known-broken paradigms—gut instinct
People don’t know the feel of a True Name-having object, or forget how important it the name discovery is.
- I do this sometimes.
Or it’s really hard
Or we need more kinds of minds looking at this problem
- But I don’t know what this means

Are there places where your field as a whole, or you personally, pursue things which won’t really help with the main problems (but might kind of “look like” they address the problems)?

Field as a whole
- Value learning seems doomed, why are people still working on it
- There are non-core researchers working in ways which don’t make sense to me, but I don’t remember who they are or what they were working on
  - They’re more field-adjacent
- I used to be more excited about IDA-style insights but now I feel queasy about skipping over the hard parts of a problem without really getting insights about how alignment works
  - This is a lesson which I took too long to learn, where I was too tolerant of finding “clever” ways to box my uncertainty. Sequences article about this but i dont wanna go find it right now
    - What kind of person can I become who would notice this error on my own, before making it, before hearing that this is a failure mode?
    - Anyways.
Me
- I think impact measurement is doomed at least in the current paradigm
  - In the hindsight of a perfected field, I think impact regularization would be a thing you can do robustly, but not to do a decisive act
  - I’m basically done working on impact measurement though
- I’m finishing up my PhD right now, but I think I’m doing pretty well on this axis now. I used to be bad at pica.

Pick someone you know, or a few people, who are smart and have good judgment. What would their answers to these questions be?

John, I think you would not strongly disagree with most anything I said, but I feel like you would say that corrigibility isn’t as pragmatically important to understand. Or, you might say that True-Name-corrigibility is actually downstream of the True-Name-alignment concepts we need, and it’s the epiphenomenon. I don’t know. This prediction is uncertain and felt more like I queried my John model to give a mental speech which is syntactically similar to a real-John-speech, rather than my best prediction of what you would say.

[-]Zac Hatfield-Dodds3y142

I think it's important here to quote Hamming defining important problem:

I'm not talking about ordinary run-of-the-mill research; I'm talking about great research. I'll occasionally say Nobel-Prize type of work. It doesn't have to gain the Nobel Prize, but I mean those kinds of things which we perceive are significant [e.g. Relativity, Shannon's information theory, etc.] ...

Let me warn you, "important problem" must be phrased carefully. The three outstanding problems in physics, in a certain sense, were never worked on while I was at Bell Labs. By important I mean guaranteed a Nobel Prize and any sum of money you want to mention. We didn't work on (1) time travel, (2) teleportation, and (3) antigravity. They are not important problems because we do not have an attack. It's not the consequence that makes a problem important, it is that you have a reasonable attack.

This suggests to me that e.g. "AI alignment is an important problem", not this particular approach to alignment is an important problem. The latter is too small; it can be good work and impactful work, but not great work in the sense of relativity or information theory or causality. (I'd love to be proven wrong!)

[-]ryan_b3y40

I have this particular approach to alignment head-chunked as the reasonable attack, under Hamming’s model. It looks like if corrigibility or agent-foundations do not count as reasonable attacks, then Hamming would not think alignment is an important problem.

But speaking of, I haven’t read or seen discussed anywhere, whether he addresses the part about generating reasonable attacks.

[-]Zac Hatfield-Dodds3y50

Yes, I think that's the right chunking - and broadly agree, though Hamming's schema is not quite applicable to pre-paradigmatic fields. For reasonable-attack generation, I'll just quote him again:

One of the characteristics of successful scientists is having courage. ... [Shannon] wants to create a method of coding, but he doesn't know what to do so he makes a random code. Then he is stuck. And then he asks the impossible question, ``What would the average random code do?'' He then proves that the average code is arbitrarily good, and that therefore there must be at least one good code. [Great scientists] go forward under incredible circumstances; they think and continue to think.

I give you a story from my own private life. Early on it became evident to me that Bell Laboratories was not going to give me the conventional acre of programming people to program computing machines in absolute binary. ... I finally said to myself, ``Hamming, you think the machines can do practically everything. Why can't you make them write programs?'' What appeared at first to me as a defect forced me into automatic programming very early.

And there are many other stories of the same kind; Grace Hopper has similar ones. I think that if you look carefully you will see that often the great scientists, by turning the problem around a bit, changed a defect to an asset. For example, many scientists when they found they couldn't do a problem finally began to study why not. They then turned it around the other way and said, ``But of course, this is what it is'' and got an important result. So ideal working conditions are very strange. The ones you want aren't always the best ones for you.

Another technique I've seen in pre-paradigmatic research is to pick something that would be easy if you actually understood what was going on, and then try to solve it. The point isn't to get a solution, though it's nice if you do, the point is learning through lots of concretely-motivated contact with the territory. Agent foundations and efforts to align language models both seem to fit this pattern, for example.

[-]Shmi3y120

In foundations of physics a Hamming problem would be measuring gravity from entangled states, so called gravcats. It is one area where QM does not make obvious definite predictions, as the details depend on the quantum nature of gravity, if any.

In contrast, the prediction is obvious and clear for QM+Newtonian gravity: any measurement of gravitational effects results in rapid entanglement with the measuring apparatus and decoherence of the entangled state.

Sadly, the limiting factor right now is the accuracy of the measurement, as we currently cannot measure gravity from objects below Planck mass (20 microgram), or put large enough objects into a superposition (current limit is an equivalent of tens of thousands hydrogen atoms, 16 orders of magnitude difference).

[-]rdb3y50

A few for cell biology:

How do we measure what proteins, and other biomolecules, are actually doing in a native context? For tracking where proteins are in the cell: sure, we can pop GFP onto anything we'd like, but now we have a protein that's lugging around a ~20 kilodalton tube on its back. That's got to be skewing that protein's behavior somewhat.
Cell signalling as needing a true name. Biologists talk about cell signalling all the time, but the physical mechanisms by which information is propagated in a cell seem incredibly variable. In a lot of mechanistic-style biology, the processes by which signals are transduced are generally handwaved away. I haven't seen a lot of useful explanations on why certain modes of signalling are selected for in different contexts.
I think biologists who are working on ways to characterize what the cell is like as a physical environment are going to make a lot of progress on some of the intense variability we see in various phenotypes. Topics like biological noise, phase behaviors within cells, weird poly-/omni-genic phenotypes, all seem really promising to me.

[-]David Hugh-Jones3y51

For social science, here are some I'd throw in:

Are there questions nobody is asking?
Is there a real world phenomenon that nothing in the field addresses?

The best social science follows George Orwell's dictum: it takes huge effort to see what is in front of your face.

Moderation Log

LESSWRONG
LW

86

Research Hamming Questions

86

Perfection

Measurement

Framing

Unification

Incompatible Assumptions

Giant Search Space

Finding The True Name

Meta

Pica

Other Peoples’ Answers

Closing

86