I've updated toward the views Daniel expresses here and I'm now about half way between Ajeya's views in this post and Daniel's (in geometric mean).
I'm curious what the biggest factors were that made you update?
Regarding our career development and transition funding (CDTF) program:
Just wanted to flag quickly that Open Philanthropy's GCR Capacity Building team (where I work) has a career development and transition funding program.
The program aims to provide support—in the form of funding for graduate study, unpaid internships, self-study, career transition and exploration periods, and other activities relevant to building career capital—for individuals at any career stage who want to pursue careers that could help reduce global catastrophic risks (esp. AI risks). It’s open globally and operates on a rolling basis.
I realize that this ...
Thanks for the feedback! I’ll forward it to our team.
I think I basically agree with you that from reading the RFP page, this project doesn’t seem like a central example of the projects we’re describing (and indeed, many of the projects we do fund through this RFP are more like the examples given on the RFP page).
Some quick reactions:
I work at Open Philanthropy, and I recently let Gavin know that Open Phil is planning to recommend a grant of $5k to Arb for the second project on your list: Overview of AI Safety in 2024 (they had already raised ~$10k by the time we came across it). Thanks for writing this post Austin — it brought the funding opportunity to our attention.
Like other commenters on Manifund, I believe this kind of overview is a valuable reference for the field, especially for newcomers.
I wanted to flag that this project would have been eligible for our RFP for work tha...
One-paragraph summary: we (two recent graduates) spent about half of the summer exploring the idea of starting an organisation producing custom human-generated datasets for AI alignment research. Most of our time was spent on customer interviews with alignment researchers to determine if they have a pressing need for such a service. We decided not to continue with this idea, because there doesn’t seem to be a human-generated data niche (unfilled by existing services like Surge) that alignment teams would want outsourced.
In more detail: The idea of a human datasets organisation was one of the winners of the Future Fund project ideas competition, still figures on their list of project ideas, and had been advocated before then by some people, including Beth Barnes. Even though we ended up deciding against, we think...
By "refining pure human feedback", do you mean refining RLHF ML techniques?
I assume you still view enhancing human feedback as valuable? And also more straightforwardly just increasing the quality of the best human feedback?
Amazing! Thanks so much for making this happen so quickly.
To anyone who's trying to figure out how to get it to work on Google Podcasts, here's what worked for me (searching the name didn't, maybe this will change?):
Go to the Libsyn link. Click the RSS symbol. Copy the link. Go to Google Podcasts. Click the Library tab (bottom right). Go to Subscriptions. Click symbol that looks like adding a link in the upper right. Paste link, confirm.
Hey Paul, thanks for taking the time to write that up, that's very helpful!
Hey Rohin, thanks a lot, that's genuinely super helpful. Drawing analogies to "normal science" seems both reasonable and like it clears the picture up a lot.
I would be interested to hear opinions about what fraction of people could possibly produce useful alignment work?
Ignoring the hurdle of "knowing about AI safety at all", i.e. assuming they took some time to engage with it (e.g. they took the AGI Safety Fundamentals course). Also assume they got some good mentorship (e.g. from one of you) and then decided to commit full-time (and got funding for that). The thing I'm trying to get at is more about having the mental horsepower + epistemics + creativity + whatever other qualities are useful, or likely being a...
That's a very detailed answer, thanks! I'll have a look at some of those tools. Currently I'm limiting my use to a particular 10-minute window per day with freedom.to + the app BlockSite. It often costs me way more than 10 minutes (checking links after, procrastinating before...) of focus though, so I might try to find an alternative.
Sorry for the tangent, but how do you recommend engaging with Twitter, without it being net bad?
Thanks! Great to hear that it's going well!
Maybe I'm being stupid here. On page 42 of the write-up, it says:
In order to ensure we learned the human simulator, we would need to change the training strategy to ensure that it contains sufficiently challenging inference problems, and that doing direct translation was a cost-effective way to improve speed (i.e. that there aren’t other changes to the human simulator that would save even more time). [emphasis mine]
Shouldn't that be?
In order to ensure we learned the direct translator, ...
Relevant classic paper from Steven Levitt. Abstract [emphasis mine]:
... (read more)