Relevance of prior Theoretical ML work to alignment, research on obfuscation in theoretical cryptography as it relates to interpretability, theory underlying various phenomena such as grokking. Disclaimer: This list is very partial and just thrown together.
From these vague terms it's a little hard to say what you have in mind. They sound pretty deep to me however.
It seems your true rejection is not really about deep ideas per se, more so the particular flavor of ideas popular on this website.
Perhaps it would be an idea to write a post on why you are bullish on these research directions?
For what it's worth my brain thinks of all of these as 'deep interesting ideas' which intuitively your post might have pushed me away from. Just noticing that I'd be super careful to not use this idea as a curiosity-killer.
If there are two research projects that are roughly equivalent, but one seems deep while the other seems boring, the deep one will garner more attention and interest. The spread and discovery of research ideas thus has a bias towards profound ideas, as profundity is more memetically fit than its absence.
This is baffling; why is this a "bias"? Why wouldn't we expect the sense of deepness to strongly correlate with some relevant features of the projects? It seems good to strongly interrogate these intuitions, but it's not like they're meaningless intuitions that come from nowhere. If something seems deep, it touches on stuff that's important and general, which we would expect to be important for alignment.
I think researchers looking to start projects in theoretical alignment should keep these issues in mind, and not necessarily expect this status quo to change in the near future. It may be more promising to consider other directions.
So I would agree with this, but I would amend it to: it may be more promising to consider other directions, and to try to recover from the sense of deepness some pointers at what seemed deep in the research projects. Like, if there's a research project that seems deep, don't just ignore that sense of deepness, but also don't take it on faith that it's a good research project as-is; instead, interrogate the research project especially critically, looking for the core of what's actually deep and discarding the parts of the research project that were mistaken / irrelevant.
If something seems deep, it touches on stuff that's important and general, which we would expect to be important for alignment.
The specific scenario I talk about in the paragraph you're responding too is one where everything except for the sense of deepness is the same for both ideas, such that someone who doesn't have a sense of what ideas are deep or profound would find the ideas basically equivalent. In such a scenario my argument is that we should expect the deep idea to receive a more attention, despite their not existing legible or well grounded reasons for this. Some amount of preference for the deep idea might be justifiable on the grounds of trusting intuitive insight, but I don't think the record of intuitive insight as to what ideas are good is actually very impressive - there are a huge amount of ideas that didn't work out that sounded deep (see some philosophy, psychoanalysis, ect.) and very few that did work out[1].
try to recover from the sense of deepness some pointers at what seemed deep in the research projects
I think on the margin new theoretical alignment researchers should do less of this, as I think most deep sounding ideas just genuinely aren't very productive to research and aren't amenable to being proven to be unproductive to work on - often times the only evidence that a deep idea isn't productive to work on is that nothing concrete has come of it yet.
I don't have empirical analysis showing this - I would probably gesture to various prior alignment research projects to support this if I had to, though I worry that would devolve into arguing about what 'success' meant.
The specific scenario I talk about in the paragraph you're responding too is one where everything except for the sense of deepness is the same for both ideas, such that someone who doesn't have a sense of what ideas are deep or profound would find the ideas basically equivalent.
But if that's not what the distribution looks like, but rather the distribution looks like a strong correlation, then it's not a bias, it's just following what the distribution says. Maybe to shore up / expand on your argument, you're talking about the optimizer's curse: https://www.lesswrong.com/posts/5gQLrJr2yhPzMCcni/the-optimizer-s-curse-and-how-to-beat-it So like, the most deep-seeming idea will tend to regress to the mean more than a random idea would regress. But this doesn't argue to not pay attention to things that seem deep. (It argues for a portfolio approach, but there's lots of arguments for a portfolio approach.)
Maybe another intuition you're drawing on is information cascades. If there's a lot of information cascades, then a lot of people are paying attention to a few very deep-seeming ideas. Which we can agree is dumb.
I think on the margin new theoretical alignment researchers should do less of this, as I think most deep sounding ideas just genuinely aren't very productive to research and aren't amenable to being proven to be unproductive to work on - often times the only evidence that a deep idea isn't productive to work on is that nothing concrete has come of it yet.
I think this is pretty wrong, though it seems hard to resolve. I would guess that a lot of things that are later concretely productive started with someone hearing something that struck them as deep, and then chewing on it and transforming it.
I guess this is a concern, but I'm also concerned if we don't invest enough in deep ideas that we then later regret working on. This seems less a matter of choosing between than doing both and growing the number of folks working on alignment so we can tackle many potential solutions to find what works.
I think on the margin new alignment researchers should be more likely to work on ideas that seem less deep then they currently seem to me to be.
Working on a wide variety of deep ideas does sound better to me than working on a narrow set of them.
I wanna flag the distinction between "deep" and "profound". They might both be subject to the same bias you articulate here, but I think they have different connotations, and I think important ideas are systematically more likely to be "deep" than they are likely to be "profound." (i.e. deep ideas have a lot of implications and are entangled with more things than 'shallow' ideas. I think profound tends to imply something like 'changing your conception of something that was fairly important in your worldview.')
i.e. profound is maybe "deep + contrarian"
I mostly agree with this - deep ideas should get relatively less focus, but not stop getting funding / attention. See my EA forum post from last year, Interesting vs. Important Work - A Place EA is Prioritizing Poorly, which makes a related point.
Is it possible to accurately judge how profound an idea "actually is" from merely how profound it sounds? Assuming that these two things are disjoint, in general.
Then besides that, if those two things are indeed disjoint, are you proposing that we should prefer more skepticism towards ideas that actually are profound or that sound profound? (I imagine that you probably mean the latter, but from your writing, you seem to be using the word to mean both).
I think I was envisioning profoundness as humans can observe it to be primarily an aesthetic property, so I'm not sure I buy the concept of "actually" profoundness, though I don't have a confident opinion about this.
When discussing impactful research directions, it's tempting to get excited about ideas that seem deep and profoundly insightful. This seems especially true in areas that are theoretical and relatively new - such as AI Alignment Theory. Fascination with the concept of a research direction can leak into evaluations of the expected impact, most often through overestimating the likelihood of extremely impactful outcomes. As a result, we should a priori be more skeptical of research projects that we encounter that sound insightful and deep than of those that sound boring and incremental.
This phenomenon can arise naturally from how ideas are generated and spread. If there are two research projects that are roughly equivalent, but one seems deep while the other seems boring, the deep one will garner more attention and interest. The spread and discovery of research ideas thus has a bias towards profound ideas, as profundity is more memetically fit than its absence. I believe that this bias is fairly strong in the AI alignment community, full as it is with researchers who love[1] interesting intellectual challenges and ideas.
Some researchers might think that profound ideas are likely necessary to solve AI Alignment. However, I'll note that even in such a scenario we should expect profound ideas to be given inordinate attention - as they will by default be selected over boring ideas that are as promising as the average profound approach to the problem. Unless exclusively profound ideas are promising, we should expect bias towards profound ideas to creep in.
Even in a world where profound ideas are absolutely required for AI Alignment research, we should still expect that any given profound idea is very unlikely to succeed. Profound ideas very rarely yield significant results and the importance of solving a given problem should not affect our expectation that any given idea will be successful. In such a world I think exploration is much more important than exploitation - as the chances of success in any one direction are low.
I'm particularly worried about profound research directions like Natural Abstractions or Heuristic Arguments being treated as more promising than they are and consuming a large amount of attention and resources. Both seem to have absorbed quite a lot of thought without yielding legible successes as of yet. Additionally, neither seems to me to be directed by feedback loops that rely on external validation of progress. I think researchers looking to start projects in theoretical alignment should keep these issues in mind, and not necessarily expect this status quo to change in the near future. It may be more promising to consider other directions.
I don't think the way to deal with this is to completely stop working on profound ideas in fields like AI Alignment where we are often motivated by the expected impact of research. Instead, I think it's important to notice when a research direction seems deep and profound, acknowledge this, and have a healthy skepticism that expected impact is actually motivating excitement and attention about the idea - from both yourself and others.
It’s perfectly valid to research things because you enjoy them. I do still think that it’s useful to be able to notice when this is happening.