(I have a lot of disagreements with everyone lol, but I appreciate Ryan putting some money where his mouth is re/ blue sky alignment research as a broad category, and the acknowledgement of "rather than the ideal 12-24 months" re/ "connectors".)
[Edited a bit for clafity]
(To clarify: I co-founded and led PIBBSS since 2021, but stepped down from leadership in June this year to work with davidad's on the Safeguarded AI programme. This means I'm no longer in charge of executive & day-to-day decisions at PIBBSS. As such, nothing of what I say below should be taking as authoritative source about what PIBBSS is going to do. I do serve on the board.)
Ryan - I appreciate the donation, and in particular you sharing your reasoning here.
I agree with a lot of what you write. Especially "connectors" (point 2) and bringing in relatively more senior academics from non-CS/non-ML fields (point 3) are IMO things that are valuable and PIBBSS has a good track record for delivering on.
Regarding both your point 1 and reservation 1 (while trying to abstract a bit from the fact that terms like 'blue sky' research are somewhat fuzzy, and that I expect at least some parts of a disagreement here might turn out to disappear when considering concrete examples), I do think there has been some change in PIBBSS' research thinking & prioritization which has been unfolding in my head since summer 2023, and finding its way more concretely into PIBBSS' strategy since the start of 2024. Lucas is the best person to talk to this in more detail, but I'll still share a few thoughts that were on my mind back when I was still leading PIBBSS.
I continue to believe that there is a lot of value to be had in investigating the underlying principles of intelligent behaviour (what one might refer to as blue sky or basic research), and to do so with a good dose of epistemic pluralism (studying such intelligent behaviour from/across a range of different systems, substrates and perspectives). I think this is a or the core aspect of the PIBBSS spirit. However, after the first 2 years of PIBBSS, I also wasn't entirely happy with our concrete research outputs. I thought a lot about what's up here (first mainly by myself, and later together with Lucas) and about how to do better - all the while staying true to the roots/generators/principles of PIBBSS, as I see them.
One of the key axis of improvement we've grown increasingly confident about is what we sometimes refer to as "bridging the theory-practice gap". I'm pretty bullish on theory(!) -- but theory alone is not the answer. Theory on its own isn't in a great position to know where to go next/what to prioritise, or whether it's making progress at all. I have seen many theoretical threads that felt intriguing/compelling, but failed to bottom out in something tangibly productive because they were lacking feedback loops that help guide them, and that would force them to operationalise abstract notions into something of concrete empirical and interventionist value. (EDIT: 'interventionist' is me trying to point to when a theory is good enough to allow you to intervene in the world or design artifacts in such a way that they reliably lead to what you intended.)
This is not an argument against theory, in my eyes, but an argument that theorizing about intelligent behaviour (or any given phenomena) will benefit a lot from having productive feedback loops with the empirical. As such, what we've come to want to foster at PIBBSS is (a) ambitious, 'theory-first' AI safety research, (b) with an interdisciplinary angle, (c) that is able to find meaningful and iterative empirical feedback loops. It's not so much that a theoretical project that doesn't yet have a meaningful way of making contact with reality should be disregarded -- and more than a key source of progress for said project will be to find ways of articulating/operationalising that theory, so that it starts making actual (testable) predictions about a real system and can come to inform design choices.
These updates in our thinking led to a few different downstream decisions. One of them was trying to have our fellowship cohort have include some empirical ML profiles/projects (I endorse roughly a 10-15% fraction, similar to what Joseph suggests). Reasons for this are, both, that we think this work is likely to be useful, and also because it changes (and IMO improves) the cohort dynamics (compared to, say, 0-5% ML). That said, I agree that once going above 20%, I would start to worry that something essential about the PIBBSS spirit might get lost, and I'm not excited about that from an ecosystem perspective (given that e.g. MATS is doing what it's doing).
Another downstream implication (though it's somewhat earlier days for PIBBSS on that one) is that I've become pretty excited about trying to help move ambitious ideas from (what I call) an 'Idea Readiness Level' (IDL; borrowing from the notion of 'Technology Readiness Levels') 1-3, to an IDL of 5-7. My thinking here is that once an idea/research agenda is at IDLs 5-7, it typically has been able to enter the broader epistemic discourse, it has some initial legible evidence/output it can rely on to making its own case -- and at that point I would say it no longer is in the area where PIBBSS has the greatest comparative advantage to support it. On the other hand, I think there isn't much (if any) obvious places where IDL 1-3 ideas get a chance to get iterate on quickly and stress-tested to develop into a more mature & empirically grounded agenda. (I think the way we were able to support Adam Shai & co in developing the computational mechanics agenda is a pretty great example of this use case I'm painting here -though notably their ideas were already relatively mature compared to other things PIBBSS might ambitiously help to mature.)
I'd personally be very excited for a PIBBSS that becomes excellent at filling that gap, and think PIBBSS has a bunch of the necessary ingredients for that already. I see this as a potentially critical investment in medium term robustness of the research ecosystem, and into what i think is an undeniable need to come to base AI Safety on rigorous scientific understanding. (Though notably Lucas/Dusan, PIBBSS' new leadership, might disagree & should have a chance to speak for themselves here.)
Clem here - I was fellowship lead this year and have been a research affiliate and mentor for PIBBSS in the past. Thanks for posting this. As might be expected in my position, I'm much more bullish than you / most people on what is often called "blue sky" research. Breakthroughs in the our fundamental understanding of agency, minds, learning etc. seem valuable in a range of scenarios, not just in world dominated by an "intelligence explosion". In particular, I think that this kind of work (a) benefits a huge amount from close engagement with empirical work, and (b) itself is very likely to inform near-future prosaic work. Furthermore, I feel that progress on these questions is genuinely possible, and is made significantly more likely with more people working on it from as many perspectives as possible.
This said, I think two things you say under "reservations" I strongly agree with, and have some comments on.
> I encourage PIBBSS to “embrace the weird,” albeit while maintaining high academic standards for basic research, modelled off the best basic science institutions.
There are worlds where furthering our understanding of deeply confused basic concepts that underpin everything else we do isn't considered "the weird", but given that we're not in these worlds I have to agree. One big issues I see here is that doing this well requires marrying the better parts of academic culture with the better parts of tech / rationality culture (and yes, for this purpose I place those in the same bucket). Some of the places that I think do this best - e.g google's paradigms of intelligence team - have a culture / belief system somewhat incompatible with EA. It's worth noting that people often pursue basic questions for very different reasons.
> I strongly encourage PIBBSS to publicly post and seek feedback on their applicant selection and research prioritization processes, so that the AI safety ecosystem can offer useful insight (and benefit from this).
I think this is actually really important, and it's not something that I think PIBBSS does very well currently. One thing I would note is that, for reasons sketched above, I think it's important that the AI safety ecosystem aren't the only people interacting with this. One thing that's holding things back here is, in my view, a venue for this kind of research whose scope is, primarily, the basic research. This is not to say that the relevance and impact for safety shouldn't be a primary concern in research prioritisation - I think it very much should be - but I do think this can be done in a way that is more compatible with academic norms (at least those academic norms that are worth upholding).
Hey Ryan, thank you for your support for the thoughtful write-up! It’s very useful for us to see what the alignment community at large, and our supporters specifically think of our work. I’ll respond to the point on “pivoting away from blue sky research” here and let Dušan address the other reservations in a separate comment.
As Nora has already mentioned, different people hold different notions on what it means to “keep it weird” and conduct “blue sky” and/or “non-paradigmatic” research. But in as far as this cluster of terms is pointing at research which is (a) aimed at innovating novel conceptual frames and (b) free from compromising pressures of short-term applications, then I would say that this is still the central focus of PIBBSS and that recent developments should be seen as updates to the founding vision, as opposed to full on departures.
The main technical bet in my reading of the PIBBSS founding mission (which people are free to disagree with, I’m curious in the ways in which they do), is that one can overcome the problem of epistemic access by leveraging insights from present day physically instantiated proxies. Current day deep learning systems are impressive, and arguably stronger approximations to the kinds of AGI/ASI which we are concerned with, but they’re still proxies nonetheless and failing to treat them as such tends towards a set of associated failure cases.
Given both my personal experience with LLMs and my reading of the role that empirical engagement has historically played in non-paradigmatic research, I tend to advocate for a methodology which incorporates immediate feedback loops with present day deep learning systems over the classical "philosophy -> math -> engineering" deconfusion/agent foundations paradigm. This was most strongly reflected in the first iteration of the affiliateship cohort and is present in the language of the Manifund funding memo.
With that being said, given that PIBBSS, especially the fellowship, is largely a talent intervention aiming at providing a service to the field, I don’t believe its total portfolio should be confined to the limits of my research taste and experience. Especially after MIRI’s recent pivot, I think there’s a case to be made for PIBBSS to host research which doesn’t meet my personal preferences towards quick empirical engagement.
Given both my personal experience with LLMs and my reading of the role that empirical engagement has historically played in non-paradigmatic research, I tend to advocate for a methodology which incorporates immediate feedback loops with present day deep learning systems over the classical "philosophy -> math -> engineering" deconfusion/agent foundations paradigm.
I'm curious what your read of the history is, here? My impression is that most important paradigm-forming work so far has involved empirical feedback somehow, but often in ways exceedingly dissimilar from/illegible to prevailing scientific and engineering practice.
I have a hard time imagining scientists like e.g. Darwin, Carnot, or Shannon describing their work as depending much on "immediate feedback loops with present day" systems. So I'm curious whether you think PIBBSS would admit researchers like these into your program, were they around and pursuing similar strategies today?
I'm curious what your read of the history is, here? My impression is that most important paradigm-forming work so far has involved empirical feedback somehow, but often in ways exceedingly dissimilar from/illegible to prevailing scientific and engineering practice.
I have a hard time imagining scientists like e.g. Darwin, Carnot, or Shannon describing their work as depending much on "immediate feedback loops with present day" systems.
Thanks for the comment @Adam Scholl and apologies for not addressing it sooner, it was on my list but then time flew. I think we're in qualitative agreement that non-paradigmatic research tends to have empirical feedback loops, and that the forms and methods of empirical engagement undergo qualitative changes in the formation of paradigms. I suspect we may have quantitative disagreements with how illegible these methods were to previous practitioners, but I don't expect that to be super cruxy.
The position which I would argue against is that the issue of empirical access to ASI necessitates long bouts of philosophical thinking prior to empirical engagement and theorization. The position which I would argue for is that there is significant (and depending on the crowd undervalued) benefit to be gained for conceptual innovation by having research communities which value quick and empirical feedback loops. I'm not an expert on either of these historical periods, but I would be surprised to hear that Carnot or Shannon did not meaningfully benefit from engaging with the practical industrial advancements of their day.
Giving my full models is out of scope for a comment and would take a sequence which I'll probably never write, but the 3 history and philosophy of science references which have had the greatest impact on my thinking around empiricism which I tend to point people towards would probably be Inventing Temperature, Exploratory Experiments, and Representing and Intervening.
So I'm curious whether you think PIBBSS would admit researchers like these into your program, were they around and pursuing similar strategies today?
In short I would say yes, because I don't believe the criteria listed above excludes the researchers which you called attention to. But independently of whether you buy into that claim, I would stress that different programs have different mechanisms of admission. The affiliateship as it's currently being run is designed for lower variance and is incidentally more tightly correlated with the research tastes of myself and the horizon scanning team given that these are the folks providing the support for it. The summer fellowship is designed for higher variance and goes through a longer admission process involving a selection committee, with the final decisions falling on mentors.
It seems that PIBBSS might be pivoting away from higher variance blue sky research to focus on more mainstream AI interpretability. While this might create more opportunities for funding, I think this would be a mistake. The AI safety ecosystem needs a home for “weird ideas” and PIBBSS seems the most reputable, competent, EA-aligned place for this! I encourage PIBBSS to “embrace the weird,” albeit while maintaining high academic standards for basic research, modelled off the best basic science institutions.
I was a recent PIBBSS mentor, and am a mech interp person who is likely to be considered mainstream by many people and for this reason I wanted to push back on this concern.
A few thoughts:
Some final notes:
I'll share this in the PIBBSS slack to see if other's want to comment :)
Hi! Thanks for the kind words and for sharing your thought process so clearly! I am also quite happy to see discussions on PIBBSS' mission and place in the alignment ecosystem, as we have been rethinking PIBBSS outbound comms since the introduction of the board and executive team.
Regarding the application selection process:
Currently (scroll down to see stages 1-4), it comes down to having a group of people who understand PIBBSS (in addition to the Board, this would be alumni, mentors, and people who have worked with PIBBSS extensively before) looking through CVs, Letters of motivation, and later work trials in the form of research proposals and research consolidation. After that, we do interviews and mentor-matching and then make our final decision. This has so far worked for our scope (as we grew in popularity, we also raised our bar, so the number of people passing the first selection stage has stayed the same through the past two years). So, it works, but if we were to scale the Fellowship (not obvious if we would like to do so) this system would need to become more robust. For Affiliates, the selection process is different, focusing much more on a proven track record of excellent research, and due to very few positions we can offer, it is currently a combination of word-of-mouth recommendations, and very limited public rounds. This connects with the project we started internally, “Horizon Scanning”, which makes reports on different research agendas and finds interesting researchers in the field which may make for great Affiliates. The first report should be out in the next month, so we will see how this interacts and how useful the reports are to the community (and to the fields which we hope to bridge with AI Safety). Again, as we scale, this will require rethinking.
Thank you again for the write-up and your support! Huge thanks also to all the commenters here; we really appreciate the thoughtful discussion!
Nice post, glad you wrote up your thinking here.
I'm a bit skeptical of the "these are options that pay off if alignment is harder than my median" story. The way I currently see things going is:
I suspect that even if we have a bunch of good agent foundations research getting done, the result is that we just blast ahead with methods that are many times easier because they lean on slow takeoff, and if takeoff is slow we're probably fine if it's fast we die.
Ways that could not happen:
Both strike me as pretty unlikely. TBC this doesn't mean those types of work are bad, I'm saying low probability not necessarily low margins
Reminder that you have a moral obligation, every single time you're communicating an overall justification of alignment work premised on slow takeoff, in a context where you can spare two sentences without unreasonable cost, to say out loud something to the effect of "Oh and by the way, just so you know, the causal reason I'm talking about this work is that it seems tractable, and the causal reason is not that this work matters.". If you don't, you're spraying your [slipping sideways out of reality] on everyone else.
I'm on board with communicating the premises of the path to impact of your research when you can. I think more people doing that would've saved me a lot of confusion. I think your particular phrasing is a bit unfair to the slow takeoff camp but clearly you didn't mean it to read neutrally, which is a choice you're allowed to make.
I wouldn't describe my intention in this comment as communicating a justification of alignment work based on slow takeoff? I'm currently very uncertain about takeoff speeds and my work personally is in the weird limbo of not being premised on either fast or slow scenarios.
I think James was implicitly tracking the fact that takeoff speeds are a feature of reality and not something people can choose. I agree that he could have made it clearer, but I think he's made it clear enough given the following line:
I suspect that even if we have a bunch of good agent foundations research getting done, the result is that we just blast ahead with methods that are many times easier because they lean on slow takeoff, and if takeoff is slow we’re probably fine if it’s fast we die.
And as for your last sentence:
If you don’t, you’re spraying your [slipping sideways out of reality] on everyone else.
It depends on the intended audience of your communication. James here very likely implicitly modeled his audience as people who'd comprehend what he was pointing at without having to explicitly say the caveats you list.
I'd prefer you ask why people think the way they do instead of ranting to them about 'moral obligations' and insinuating that they are 'slipping sideways out of reality'.
IDK how to understand your comment as referring to mine. To clarify the "slipping sideways" thing, I'm alluding to "stepping sideways" described in Q2 here: https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy#Q2___I_have_a_clever_scheme_for_saving_the_world___I_should_act_as_if_I_believe_it_will_work_and_save_everyone__right__even_if_there_s_arguments_that_it_s_almost_certainly_misguided_and_doomed___Because_if_those_arguments_are_correct_and_my_scheme_can_t_work__we_re_all_dead_anyways__right_
and from
IDK how to understand your comment as referring to mine.
I'm familiar with how Eliezer uses the term. I was more pointing to the move of saying something like "You are [slipping sideways out of reality], and this is bad! Stop it!" I don't think this usually results in the person, especially confused people, reflecting and trying to be more skilled at epistemology and communication.
In fact, there's a loopy thing here where you expect someone who is 'slipping sideways out of reality' to caveat their communications with an explicit disclaimer that admits that they are doing so. It seems very unlikely to me that we'll see such behavior. Either the person has confusion and uncertainty and is usually trying to honestly communicate their uncertainty (which is different from 'slipping sideways'), or the person would disagree that they are 'slipping sideways' and claim (implicitly and explicitly) that what they are doing is tractable / matters.
"You are [slipping sideways out of reality], and this is bad! Stop it!"
who is 'slipping sideways out of reality' to caveat their communications with an explicit disclaimer that admits that they are doing so
Excuse me, none of that is in my comment.
I'm not sure exactly what mesa is saying here, but insofar as "implicitly tracking the fact that takeoff speeds are a feature of reality and not something people can choose" means "intending to communicate from a position of uncertainty about takeoff speeds" I think he has me right.
I do think mesa is familiar enough with how I talk that the fact he found this unclear suggests it was my mistake. Good to know for future.
Cheers!
I think you might have implicitly assumed that my main crux here is whether or not take-off will be fast. I actually feel this is less decision-relevant for me than the other cruxes I listed, such as time-to-AGI or "sharp left turns." If take-off is fast, AI alignment/control does seem much harder and I'm honestly not sure what research is most effective; maybe attempts at reflectively stable or provable single-shot alignment seem crucial, or maybe we should just do the same stuff faster? I'm curious: what current AI safety research do you consider most impactful in fast take-off worlds?
To me, agent foundations research seems most useful in worlds where:
Ah, didn't mean to attribute the takeoff speed crux to you, that's my own opinion.
I'm not sure what's best in fast takeoff worlds. My message is mainly just that getting weak AGI to solve alignment for you doesn't work in a fast takeoff.
"AGI winter" and "overseeing alignment work done by AI" do both strike me as scenarios where agent foundations work is more useful than in the scenario I thought you were picturing. I think #1 still has a problem, but #2 is probably the argument for agent foundations work I currently find most persuasive.
In the moratorium case we suddenly get much more time than we thought we had, which enables longer payback time plans. Seems like we should hold off on working on the longer payback time plans until we know we have that time, not while it still seems likely that the decisive period is soon.
Having more human agent foundations expertise to better oversee agent foundations work done by AI seems good. How good it is depends on a few things. How much of the work that needs to be done is conceptual breakthroughs (tall) vs schlep with existing concepts (wide)? How quickly does our ability to oversee fall off for concepts more advanced than what we've developed so far? These seem to me like the main ones, and like very hard questions to get certainty on - I think that uncertainty makes me hesitant to bet on this value prop, but again, it's the one I think is best.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
I just left a comment on PIBBSS' Manifund grant request (which I funded $25k) that people might find interesting. PIBBSS needs more funding!