All of Matt Putz's Comments + Replies

I've updated toward the views Daniel expresses here and I'm now about half way between Ajeya's views in this post and Daniel's (in geometric mean).

I'm curious what the biggest factors were that made you update?

6ryan_greenblatt
Mostly faster benchmark performance than I expected (see Ajeya's comment here) and o3 (and o1) being evidence that RL training can scalably work and RL can plausibly scale very far.

Regarding our career development and transition funding (CDTF) program: 

  • The default expectation for CDTF grants is that they’re one-off grants. My impression is that this is currently clear to most CDTF grantees (e.g., I think most of them don't reapply after the end of their grant period, and the program title explicitly says that it’s “transition funding”).
    • (When funding independent research through this program, we sometimes explicitly clarify that we're unlikely to renew by default).
  • Most of the CDTF grants we make have grant periods that are short
... (read more)

Yeah, I was thinking of PhD programs as one of the most common longer-term grants. 

Agree that it's reasonable for a lot of this funding to be shorter, but also think that given the shifting funding landscape where most good research by my lights can no longer get funding, I would be quite hesitant for people to substantially sacrifice career capital in the hopes of getting funding later (or more concretely, I think it's the right choice for people to choose a path where they end up with a lot of slack to think about what directions to pursue, instead ... (read more)

Just wanted to flag quickly that Open Philanthropy's GCR Capacity Building team (where I work) has a career development and transition funding program.

The program aims to provide support—in the form of funding for graduate study, unpaid internships, self-study, career transition and exploration periods, and other activities relevant to building career capital—for individuals at any career stage who want to pursue careers that could help reduce global catastrophic risks (esp. AI risks). It’s open globally and operates on a rolling basis.

I realize that this ... (read more)

I would mostly advise people against making large career transitions on the basis of Open Phil funding, or if you do, I would be very conservative with it. Like, don't quit your job because of a promise of 1 year of funding, because it is quite possible your second year will only be given conditional on you aligning with the political priorities of OP funders or OP reputational management, and career transitions usually take longer than a year. To be clear, I think it often makes sense to accept funding from almost anyone, but in the case of OP it is fundi... (read more)

Thanks for the feedback! I’ll forward it to our team.

I think I basically agree with you that from reading the RFP page, this project doesn’t seem like a central example of the projects we’re describing (and indeed, many of the projects we do fund through this RFP are more like the examples given on the RFP page). 

Some quick reactions:

  • FWIW, our team generally makes a lot of grants that are <$100k (much more so than other Open Phil teams).
  • I agree the application would probably take most people longer than the description that Gavin gave on Manifund.
... (read more)
4Austin Chen
Thanks for forwarding my thoughts! I'm glad your team is equipped to do small, quick grants - from where I am on the outside, it's easy to accidentally think of OpenPhil as a single funding monolith, so I'm always grateful for directional updates that help the community understand how to better orient to y'all. I agree that 3months seems reasonable when 500k+ is at stake! (I think, just skimming the application, I mentally rounded off "3 months or less" to "about 3 months", as kind of a learned heuristic on how orgs relate to timelines they publish.) As another data point from the Survival and Flourishing Funds, turnaround (from our application to decision) was about 5 months this year, for an ultimately 90k grant (we were applying for up to 1.2m). I think this year they were unusually slow due to changing over their processes; in past years it's been closer to 2-3 months. Our own philosophy at Manifund does emphasize "moving money quickly", to almost a sacred level. This comes from watching programs like Fast Grants and Future Fund, and also our own lived experience as grantees. For grantees, knowing 1 month sooner that money is coming, often means that one can start hiring and executing 1 month sooner - and the impact of executing even 1 day sooner can sometimes be immense (see: https://www.1daysooner.org/about/ )

I work at Open Philanthropy, and I recently let Gavin know that Open Phil is planning to recommend a grant of $5k to Arb for the second project on your list: Overview of AI Safety in 2024 (they had already raised ~$10k by the time we came across it). Thanks for writing this post Austin — it brought the funding opportunity to our attention.

Like other commenters on Manifund, I believe this kind of overview is a valuable reference for the field, especially for newcomers. 

I wanted to flag that this project would have been eligible for our RFP for work tha... (read more)

3Austin Chen
@Matt Putz thanks for supporting Gavin's work and letting us know; I'm very happy to hear that my post helped you find this! I also encourage others to check out OP's RFPs. I don't know about Gavin, but I was peripherally aware of this RFP, and it wasn't obvious to me that Gavin should have considered applying, for these reasons: 1. Gavin's work seems aimed internally towards existing EA folks, while this RFP's media/comms examples (at a glance) seems to be aimed externally for public-facing outreach 2. I'm not sure what the typical grant size that the OP RFP is targeting, but my cached heuristic is that OP tends to fund projects looking for $100k+ and that smaller projects should look elsewhere (eg through EAIF or LTFF), due to grantmaker capacity constraints on OP's side 3. Relatedly, the idea of filling out an OP RFP seems somewhat time-consuming and burdensome (eg somewhere between 3 hours and 2 days), so I think many grantees might not consider doing so unless asking for large amounts 4. Also, the RFP form seems to indicate a turnaround time of 3 months, which might have seemed too slow for a project like Gavin's I'm evidently wrong on all these points given that OP is going to fund Gavin's project, which is great! So I'm listing these in the spirit of feedback. Some easy wins to encourage smaller projects to apply might be to update the RFP page to 1. list some example grants and grant sizes that were sourced through this, and 2. describe how much time you expect an applicant to take to fill out the form (something EA Funds does, which I appreciate, even if I invariably take much more time than they state).

There are two very similar pages. This one and https://www.lesswrong.com/tag/scoring-rules/

By "refining pure human feedback", do you mean refining RLHF ML techniques? 

I assume you still view enhancing human feedback as valuable? And also more straightforwardly just increasing the quality of the best human feedback?

3Ajeya Cotra
I mean things like tricks to improve the sample efficiency of human feedback, doing more projects that are un-enhanced RLHF to learn things about how un-enhanced RLHF works, etc.

Amazing! Thanks so much for making this happen so quickly.

To anyone who's trying to figure out how to get it to work on Google Podcasts, here's what worked for me (searching the name didn't, maybe this will change?):

Go to the Libsyn link. Click the RSS symbol. Copy the link. Go to Google Podcasts. Click the Library tab (bottom right). Go to Subscriptions. Click symbol that looks like adding a link in the upper right. Paste link, confirm.

2Kayden
Thanks for the suggestion, works great!

Hey Paul, thanks for taking the time to write that up, that's very helpful!

Hey Rohin, thanks a lot, that's genuinely super helpful. Drawing analogies to "normal science" seems both reasonable and like it clears the picture up a lot.

I would be interested to hear opinions about what fraction of people could possibly produce useful alignment work?

Ignoring the hurdle of "knowing about AI safety at all", i.e. assuming they took some time to engage with it (e.g. they took the AGI Safety Fundamentals course). Also assume they got some good mentorship (e.g. from one of you) and then decided to commit full-time (and got funding for that). The thing I'm trying to get at is more about having the mental horsepower + epistemics + creativity + whatever other qualities are useful, or likely being a... (read more)

"Possibly produce useful alignment work" is a really low bar, such that the answer is ~100%. Lots of things are possible. I'm going to instead answer "for what fraction of people would I think that the Long-Term Future Fund should fund them on the current margin".

If you imagine that the people are motivated to work on AI safety, get good mentorship, and are working full-time, then I think on my views most people who could get into an ML PhD in any university would qualify, and a similar number of other people as well (e.g. strong coders who are less good a... (read more)

3mic
Anthropic says that they're looking for experienced engineers who are able to dive into an unfamiliar codebase and solve nasty bags and/or are able to handle interesting problems with distributed systems and parallel processing. I was personally surprised to get an internship offer from CHAI and expected the bar for getting an AI safety role to be much higher. I'd guess that the average person able to get a software engineering job at Facebook, Microsoft, Google, etc. (not that I've ever received an offer from any of those companies), or perhaps a broader category of people, could do useful direct work, especially if they committed time to gaining relevant skills if necessary. But I might be wrong. (This is all assuming that Anthropic, Redwood, CHAI, etc. are doing useful alignment work.)

(Off the cuff answer including some random guesses and estimates I won't stand behind, focused on the kind of theoretical alignment work I'm spending most of my days thinking about right now.)

Over the long run I would guess that alignment is broadly similar to other research areas, where a large/healthy field could support lots of work from lots of people, where some kinds of contributions are very heavy-tailed but there is a lot of complementarity and many researchers are having large overall marginal impacts.

Right now I think difficulties (at least for g... (read more)

That's a very detailed answer, thanks! I'll have a look at some of those tools. Currently I'm limiting my use to a particular 10-minute window per day with freedom.to + the app BlockSite. It often costs me way more than 10 minutes (checking links after, procrastinating before...) of focus though, so I might try to find an alternative.

Sorry for the tangent, but how do you recommend engaging with Twitter, without it being net bad?

6Eli Tyre
Different setups for different people, but for me twitter is close to a write-only platform. I use a roam extension that let's me write twitter threads from my roam, so that I can post things without needing to go to the site. (I have a special roam database for this, because otherwise I would be concerned about accidentally posting my personal journal entries to twitter.) I have twitter blocked on my main machine. I have a separate chromebook that I use to browse twitter itself.  Even on that twitter-specific chromebook, I've blocked the twitter feed, and I use intention to limit my engagement to less than an hour a day, with "cool-downs". I've sometimes relaxed these constraints on my twitter laptop, but when I don't have intention set up, for whatever reason, I'll often get sucked into 4 hour, engaging / interesting, but highly-inefficient twitter conversations. Every few days, I'll check twitter on on my twitter machine, mostly looking through and responding to my messages, and possibly looking at the pages of some of my favorite people on twitter.  All of this is to avoid the dopamine loops of twitter, which can suck up hours of my life like nothing else can.  The character of my personal setup makes me think that maybe it is unethical for me to use twitter at all. Posting, but mostly not reading other people's content, in particular, seems like maybe a defection, and I don't want to incentivize people to be on the platform. (My guess is that twitter is only a litter worse for me than for the median twitter user, though there are also twitter users for which it just straight up provides a ton of value).  To counter this, I add all my threads to threadreader, so that people can read my content without needing to touch the attention-eating cesspool.
5Matt Goldenberg
Find and follow people you actually want to be friends with and interact with them as you would actual friends. When you post, ask yourself if this is something that your friends would find fun or valuable or useful.

My advice: follow <50 people, maybe <15 people, and always have the setting on "Latest Tweets". That way, the algorithms don't have power over you, it's impossible to spend a lot of time (because there just aren't that many tweets), and since you filtered so hard, hopefully the mean level of interesting-ness is high.

Thanks! Great to hear that it's going well!

Maybe I'm being stupid here. On page 42 of the write-up, it says:
 

In order to ensure we learned the human simulator, we would need to change the training strategy to ensure that it contains sufficiently challenging inference problems, and that doing direct translation was a cost-effective way to improve speed (i.e. that there aren’t other changes to the human simulator that would save even more time). [emphasis mine]

Shouldn't that be?

In order to ensure we learned the direct translator, ...

6ADifferentAnonymous
Turning this into the typo thread, on page 97 you have Pretty sure the bolded word should be predictors.
3paulfchristiano
Yes, thanks!

We’re planning to evaluate submissions as we receive them, between now and the end of January; we may end the contest earlier or later if we receive more or fewer submissions than we expect.

 

Just wanted to note that the "we may end the contest earlier" part here makes me significantly more hesitant about trying this. I will probably still at least have a look at it, but part of me is afraid that I'll invest a bunch of time and then the contest will be announced to be over before I got around to submitting. And I suspect Holden's endorsement may make t... (read more)

4Mark Xu
Note that this has changed to February 15th.
8paulfchristiano
We're going to accept submissions through February 10. (We actually ended up receiving more submissions than I expected but it seems valuable, and Mark has been handling all the reviews, so running for another 20 days seems worthwhile.)

Sorry for my very late reply!

Thanks for taking the time to answer, I now don't endorse most of what I wrote anymore. 

I think that if the AGI has a perfect motivation system then we win, there's no safety problem left to solve. (Well, assuming it also remains perfect over time, as it learns new things and thinks new thoughts.) (See here for the difference between motivation and reward.)

and from the post:

And if we get to a point where we can design reward signals that sculpt an AGI's motivation with surgical precision, that's fine!

This is mostly where I... (read more)

Interesting post!

Disclaimer: I have only read the abstract of the "Reward is enough" paper. Also, I don't have much experience in AI safety, but I consider changing that.

Here are a couple of my thoughts.

Your examples haven't entirely convinced me that reward isn't enough. Take the bird. As I see it, something like the following is going on:

Evolution chose to take a shortcut: maybe a bird with a very large brain and a lot of time would eventually figure out that singing is a smart thing to do if it received reward for singing well. But evolution being a rut... (read more)

2Steven Byrnes
Thanks! In some cases yeah, but I was trying to give an example where the interpretations of the world itself impacts the reward. So it's more-or-less a version of wireheading. An agent cannot learn "Wireheading is bad" on the basis of a reward signal—wireheading by definition has a very high reward signal. So if you're going to disincentivize wireheading, you can't do it by finding the right reward function, but rather by finding the right cognitive architecture. (Or the right cognitive architecture and the right reward function, etc.) Right? I don't follow your bird example. What are you assuming is the reward function in your example? In this comment I gave an example of a thing you might want an agent to do which seems awfully hard to incentivize via a reward function, even if it's an AGI that (you might think) doesn't "die" and lose its (within-lifetime) memory like animals do. I think that if the AGI has a perfect motivation system then we win, there's no safety problem left to solve. (Well, assuming it also remains perfect over time, as it learns new things and thinks new thoughts.) (See here for the difference between motivation and reward.) I suspect that, in principle, any possible motivation system (compatible with the AGI's current knowledge / world-model) can be installed by some possible reward signal. But it might be a reward signal that we can't calculate in practice—in particular, it might involve things like "what exactly is the AGI thinking about right now" which require as-yet-unknown advances in interpretability and oversight. The best motivation-installation solution might involve both rewards and non-reward motivation-manipulation methods, maybe. I just think we should keep an open mind. And that we should be piling on many layers of safety.

Right at the start under "How to use this book", there is this paragraph:

If you have never been to a CFAR workshop, and don’t have any near-term
plans to come to one, then you may be the kind of person who would love to
set their eyes on a guide to improving one’s rationality, full of straightforward
instructions and exercises on how to think more clearly, act more effectively,
learn more from your experiences, make better decisions, and do more with
your life. This book is not that guide (nor is the workshop itself, for that
matter). I
... (read more)
3habryka
I think it's a decent thing to do, and I've done the same myself a few years ago before I attended a workshop. I think it was reasonably useful. I did also try to teach a lot of the material in the handbook and on LessWrong to my friends, which I found a lot more useful for actually understanding the stuff.