I am a Manifund Regrantor. In addition to general grantmaking, I have requests for proposals in the following areas:
Hourly stipends for AI safety fellowship programs, plus some referents. The average AI safety program stipend is $26/h.
Edit: updated figure to include more programs.
Interesting, thanks! My guess is this doesn't include benefits like housing and travel costs? Some of these programs pay for those while others don't, which I think is a non-trivial difference (especially for the bay area)
I just left a comment on PIBBSS' Manifund grant proposal (which I funded $25k) that people might find interesting.
...Main points in favor of this grant
- My inside view is that PIBBSS mainly supports “blue sky” or “basic” research, some of which has a low chance of paying off, but might be critical in “worst case” alignment scenarios (e.g., where “alignment MVPs” don’t work, or “sharp left turns” and “intelligence explosions” are more likely than I expect). In contrast, of the technical research MATS supports, about half is basic research (e.g., interpretability, evals, agent foundations) and half is applied research (e.g., oversight + control, value alignment). I think the MATS portfolio is a better holistic strategy for furthering AI alignment. However, if one takes into account the research conducted at AI labs and supported by MATS, PIBBSS’ strategy makes a lot of sense: they are supporting a wide portfolio of blue sky research that is particularly neglected by existing institutions and might be very impactful in a range of possible “worst-case” AGI scenarios. I think this is a valid strategy in the current ecosystem/market and I support PIBBSS!
- In MATS’ recent post, “Talent Needs of
Why does the AI safety community need help founding projects?
This still reads to me as advocating for a jobs program for the benefit of MATS grads, not safety. My guess is you're aiming for something more like "there is talent that could do useful work under someone else's direction, but not on their own, and we can increase safety by utilizing it".
Main takeaways from a recent AI safety conference:
Crucial questions for AI safety field-builders:
It seems plausible to me that if AGI progress becomes strongly bottlenecked on architecture design or hyperparameter search, a more "genetic algorithm"-like approach will follow. Automated AI researchers could run and evaluate many small experiments in parallel, covering a vast hyperparameter space. If small experiments are generally predictive of larger experiments (and they seem to be, a la scaling laws) and model inference costs are cheap enough, this parallelized approach might be be 1) computationally affordable and 2) successful at overcoming the architecture bottleneck.
An incomplete list of possibly useful AI safety research:
AI alignment threat models that are somewhat MECE (but not quite):
Reasons that scaling labs might be motivated to sign onto AI safety standards:
However, AI companies that don’t believe in AGI x-risk might tolerate higher x-risk than ideal safet...
How fast should the field of AI safety grow? An attempt at grounding this question in some predictions.
I appreciate the spirit of this type of calculation, but think that it's a bit too wacky to be that informative. I think that it's a bit of a stretch to string these numbers together. E.g. I think Ryan and Tom's predictions are inconsistent, and I think that it's weird to identify 100%-AI as the point where we need to have "solved the alignment problem", and I think that it's weird to use the Apollo/Manhattan program as an estimate of work required. (I also don't know what your Manhattan project numbers mean: I thought there were more like 2.5k scientists/engineers at Los Alamos, and most of the people elsewhere were purifying nuclear material)
Types of organizations that conduct alignment research, differentiated by funding model and associated market forces:
Can the strategy of "using surrogate goals to deflect threats" be countered by an enemy agent that learns your true goals and credibly precommits to always defecting (i.e., Prisoner's Dilemma style) if you deploy an agent against it with goals that produce sufficiently different cooperative bargaining equilibria than your true goals would?
MATS' goals:
"Why suicide doesn't seem reflectively rational, assuming my preferences are somewhat unknown to me," OR "Why me-CEV is probably not going to end itself":
Are these framings of gradient hacking, which I previously articulated here, a useful categorization?
...
- Masking: Introducing a countervailing, “artificial” performance penalty that “masks” the performance benefits of ML modifications that do well on the SGD objective, but not on the mesa-objective;
- Spoofing: Withholding performance gains until the implementation of certain ML modifications that are desirable to the mesa-objective; and
- Steering: In a reinforcement learning context, selectively sampling environmental states that will either leave the mesa-objecti
How does the failure rate of a hierarchy of auditors scale with the hierarchy depth, if the auditors can inspect all auditors below their level?