I have a number of thoughts here that I haven't gotten around to writing up, and sharing via a public conversation seems like a decent way to try.
A few opening thoughts:
Happy to launch off in any direction that seems interesting or productive to you.
- I used to think the goal was to "become paradigmatic". I now think a better goal is to become the kind of field that can go through cycles of paradigm creation, paradigm crisis, and then recreation of a new paradigm.
Is there a way in which trying to become paradigmatic doesn't implicitly lead to that cycle? After the first paradigm, it seems to me that later paradigms come about more naturally because of insufficiencies in the original paradigm, i.e. someone discovers an edge case that isn't handled correctly, or even just less gracefully than seems intuitively obvious.
I'm not sure what's empirically the case. I could see it being easier to go from one paradigm to the next because of clear insufficiencies, but also if there's an existing paradigm, people might be more resistant to changing than if they were less anchored on anything.
The update for me had a couple of pieces. Before I at least implicitly thought that most of the work happens in the paradigmatic phase, so to be productive, we should get in it. (I think it appears way more productive and can be scaled up more). Now I think that getting the paradigm right is at least as much of the work, and you want to get it right.
The other piece of it is I think both pre-paradigmatic and paradigmatic work both rely on a community reaching consensus about what counts as progress in their field, so the better thing to do than worry about "phase" is worry about how good you are at reaching consensus with each other (and reality).
if there's an existing paradigm, people might be more resistant to changing than if they were less anchored on anything
This seems true historically, but I think you're probably more interested in the question of how to steer existing (or new) research efforts. Therefore,
Now I think that getting the paradigm right is at least as much of the work, and you want to get it right.
Because if you don't get it right initially, you risk wasting time stuck in a bad paradigm? I think I am pretty skeptical of our ability to do any sort of meaningful aiming here, at least in a top-down way. Apart from the underlying reality they're attempting to model, paradigms seem driven by:
But as far as interventions go, those don't suggest that we have a lot of direct control over what paradigms we land on.
I think we're on very similar pages here actually both difficulty of doing it top down, and social dynamics being a point of intervention.
I model it like this: you can think of a research field as a "collective mind that thinks and makes intellectual progress" that is made out of individual researchers. You can make the collective smarter by making the individuals smarter, but you can also make the collective smarter by combining individuals of fixed intelligence in more effective ways.
Journals, conferences, social networks, memes, forums, etc. are ways the individuals get connected together, and by improving those connections/infra, you can make for a smarter collective. I've been thinking in this frame for many years, but recently had a breakthrough.
Level 0 thinking about collective intellectual progress:
You think about helping information propagate. You get stuff written down, archived, distilled, tagged, made searchable, you get people in the same room.
Level 1 thinking about collective intellectual progress:
For individual thinkers to combine as a collective intelligence, they need to end up pointing in enough of the same direction to be doing the same thing such that their efforts stack. Pointing in the same direction involves a lot of agreeing on "this is progress" vs "this is not progress". (There is also an object-level of "and whatever you agree is progress points in the direction of reality itself")
Being paradigmatic means you've achieved a lot of agreement on what is progress, but what matters is being a collective where you can more efficiently get in agreement with each other (and with reality), e.g. a good process of communally evaluating things. I think that happens via journals, conferences, etc., and we can more explicitly aim to help people not just share info, but evaluate it too.
(There is also an object-level of "and whatever you agree is progress points in the direction of reality itself")
+1
Being paradigmatic means you've achieved a lot of agreement on what is progress, but what matters is being a collective where you can more efficiently get in agreement with each other (and with reality), e.g. a good process of communally evaluating things. I think that happens via journals, conferences, etc., and we can more explicitly aim to help people not just share info, but evaluate it too.
Can you go into more detail about the mechanisms you think might be promising here? In the past I've been skeptical of various "eigen-evaluation" proposals, but you might have something different in mind.
For people reading this, "eigen-evaluation" is a term I coined to describe the way in which research communities decide on which research/researcher is good, i.e. how do people come to agree that "X is a good researcher", especially in AI Alignment where no one has ever successfully built an an aligned superintelligence.
I think happens a social process with the same mechanism as EigenKarma, PageRank, and Eigenmorality:
...you can imagine an iterative process where each
web pageresearcher starts out with the same hub/authority “starting credits,” but then in each round, thepagesresearchers distribute their credits among their neighbors, so that the most popularpagesresearcher get more credits, which they can then, in turn, distribute to their neighbors bylinking topraising them – Eigenmorality
For lack of a better name, I call the research process "eigen-evaluation" (of research/researchers).
Can you go into more detail about the mechanisms you think might be promising here?
Meta: I don't think many people are thinking explicitly about how this happens right now. Most researchers are focused on the object-level of their research. And I think there are often gains to be had if you're explicitly and consciously tacking a problem.
Object-level:
Improving existing stuff
In-progress stuff
Possibly upcoming
I also think there's a thing here that in the absence of sufficient empirical pressure (e.g. building literally stuff that does literally things), evaluation of which Alignment research is good will be distorted a lot by standard human political popularity pressures. Pushing things in the direction of explicit evaluation via arguments might help with that.
Improving existing stuff
- Improve LessWrong's karma system, e.g. altering strong-vote strength, fixing how lots of low quality engagement still lets you be high karma user, maybe switch to eigenkarma
This is level 0, right?
- Make the comments section a better experience for top users to show up and evaluate research, e.g. prevent threads getting derailed by users who "don't get it"
- Improvements to LessWrong's Annual Review
(not sufficiently concrete)
In-progress stuff
- Having a Dialog Feature (like this one) that facilitates two people surfacing actually-held arguments in various directions (more so than happens if they write posts alone)
Also seems to be level 0.
Possibly upcoming
- Oli's "review token system" where people are allocated tokens that they can spend of having a post having a 10-15 hour deep-dive review performed on it.
Also level 0.
- A monthly or quarterly journal for AI Alignment research where some editor(s) + process selects research for greater promotion to attention and significance.
Level 1! Also the one I'm most skeptical of. Coincidence? 🤔
- Aiming for "four levels of conversation" on topics where you get argument, counter-argument, counter-counter-argument, and counter-counter-counter-argument.
- Distillation of arguments and positions (e.g. Alignment argument wiki) that makes it easy to find arguments and make the case against them.
Level 0, probably.
I think my takeaway is that I'm pretty wary of a process which is deliberately aimed at providing some kind of legible metric/ranking/evaluation, where that metric is explicitly attempting to aggregate information with the Social
type signature. I expect that kind of process to produce worse outcomes than not having any such process at all, because it will make it much cheaper for people to goodhart on it, and also provoke a bunch of bandwagoning which might've not happened if people needed to "make up their own mind" and perform their own aggregation step, ideally while being in touch with reality (rather than social consensus).
I think I didn't convey well what I meant by Level 0 and Level 1.
Level 0 is a model whereby progress happens because you accumulate model/evidence/ideas. New stuff builds on old stuff.
Level 1 realizes that progress requires evaluating content. New content filters on old content. This is all about responses and all about content that's evaluative of other content. Voting (karma) is evaluative. Comments can build on content, but are often evaluative. Dialogue where I share an idea and you critique is evaluative – much more so than if I just wrote a post.
Voting (karma) is evaluative
But importantly, it's an evaluation that's effectively decoupled from object-level reality, right? The claim I'm making is that we should be extremely careful about actively promoting signals that are further removed from reality, rather than signals that strive to be as close to reality as possible.
I think I only have a low resolution sense of what your meaning. Can you give me some examples/sketch the spectrum of "closely coupled to reality vs signals that strive to be close to reality as possible"?
But in general I'd say it's important to realize we (and anyone doing any inquiry in any domain) have no direct coupling to reality as far as collaborative intellectual progress goes. It's all socially mediated.
I'm very pleased to see that the LessWrong team is thinking about these kinds of topics.
I just wanted to add a few more thoughts on this topic myself.
I suspect that one important aspect of creating a new paradigm is characterising the previous paradigm and its underlying assumptions. Often once these assumptions are stated out loud, it becomes clearer where they might break down.
Another important aspect of allowing a new paradigm to form is having a space where it can form. This can often be quite difficult as many people may be mostly happy with the existing spaces that work within the paradigm or at least not unhappy enough to want to join something new.
There's also the problem that people who disagree with the paradigm might want to take it all kinds of different directions, preventing any from building critical mass. When an existing paradigm has many possible issues that you could focus on, there's something of an art in picking off an area that contains a group of sufficiently important and compelling differences, which also has a certain level of coherence such that you can explain the value of what you're doing to other people without them being confused.
The Standford Phil Encylopedia gives:
According to Kuhn the development of a science is not uniform but has alternating ‘normal’ and ‘revolutionary’ (or ‘extraordinary’) phases. The revolutionary phases are not merely periods of accelerated progress, but differ qualitatively from normal science. Normal science does resemble the standard cumulative picture of scientific progress, on the surface at least. Kuhn describes normal science as ‘puzzle-solving’ (1962/1970a, 35–42). While this term suggests that normal science is not dramatic, its main purpose is to convey the idea that like someone doing a crossword puzzle or a chess problem or a jigsaw, the puzzle-solver expects to have a reasonable chance of solving the puzzle, that his doing so will depend mainly on his own ability, and that the puzzle itself and its methods of solution will have a high degree of familiarity. A puzzle-solver is not entering completely uncharted territory... Revolutionary science, however, is not cumulative in that, according to Kuhn, scientific revolutions involve a revision to existing scientific belief or practice (1962/1970a, 92). Not all the achievements of the preceding period of normal science are preserved in a revolution, and indeed a later period of science may find itself without an explanation for a phenomenon that in an earlier period was held to be successfully explained...
Kuhn’s view is that during normal science scientists neither test nor seek to confirm the guiding theories of their disciplinary matrix. Nor do they regard anomalous results as falsifying those theories. (It is only speculative puzzle-solutions that can be falsified in a Popperian fashion during normal science (1970b, 19).) Rather, anomalies are ignored or explained away if at all possible. It is only the accumulation of particularly troublesome anomalies that poses a serious problem for the existing disciplinary matrix. A particularly troublesome anomaly is one that undermines the practice of normal science. For example, an anomaly might reveal inadequacies in some commonly used piece of equipment, perhaps by casting doubt on the underlying theory. If much of normal science relies upon this piece of equipment, normal science will find it difficult to continue with confidence until this anomaly is addressed. A widespread failure in such confidence Kuhn calls a ‘crisis’
Under this view, perhaps a certain set of interpretability techniques might emerge under a paradigm that makes certain assumptions (eg, that ML kernals are "mostly" linear, that systems are "mostly" stateless, that exotic hacks of the underlying hardware aren't in play, etc). If a series of anomalies were to accumulate that couldn't be explained within this matrix, you might expect to see a new paradigm needed.
how do people come to agree that "X is a good researcher"[?]
[...]
Pointing in the same direction involves a lot of agreeing on "this is progress" vs "this is not progress". (There is also an object-level of "and whatever you agree is progress points in the direction of reality itself")
To what extent do alignment researchers agree on who is a good researcher/what progress is? I'd guess there's a bunch of disagreement there, even amongst researchers who agree the problem is hard e.g. Eliezer vs Paul vs Steven. And I can think of relatively few cases of progress on alignment in my view, let alone anyone elses. (TurnTrout's work on Power/Insturmental Convergence, and Stuart's work on value indifference in case you're wondering). Likewise for what the hard parts of the problem are. I'm not confident there'll be that much disagreement. My reasoning is that a lot of disagreements look strong but aren't really. Say, whether some probability is 0.1 or 0.9 isn't that big a difference.
EXPERIMENT: To test whether consensus on progress points in the direction of reality, check what N year old results are most commonly considered progress now, and see how much researchers thought this result made progress M years ago. Of course you'd have to use proxy measures in most cases, e.g. karma and citations.
Agree. It seems unlikely that the initial paradigm will get everything correct. It’s important to be able to tentatively set down some principles, explore there consequences, then after a while to step back and discuss whether the field is headed in the right direction.
The Standford Phil Encylopedia gives: