Some (potentially) fundable AI Safety Ideas

Logan Riggs

There are many AI Safety ideas I'd like to pursue, but I currently don't have the time for. Feel free to take these ideas, apply for funding, and, if you get funded, you can thank me by reducing existential risk.
[Note: I am not a funding manager or disperse funds in any way]

Literature reviews for each AI Safety Research Agenda

It can be difficult to know what specifically to work on in AI Safety, even after finishing [AI Safety Camp/ AGI Fundamentals Course/ internships/ etc]. It would useful to have a pinned post on alignment forum that listed ~10 research agendas, including:
- 1. Clear problem statement (eg "Mesa-optimizers are ..., here's a link to Mile's youtube video on the subject")
- 2. Current work w/ links/citations
- 3. Current Open Problems
- 4. People currently working on the problem w/ contact information
At least updated every 6 months if not sooner.
It could be structured with
- 1. One high-quality manager who's making lit reviews for 1-2 fields, manages other people doing the same, consolidates their works, and thinks about the structure/ format of the documents
- 2. Three-five people making lit reviews for 1-2 different fields who are intrinsically interested in their field of choice.
If I were pursuing this idea (which again, I'm not), I would pick an alignment researcher's agenda I'm interested in, go through their works, writing small summaries of posts for my own sake, and write a rough draft of the lit review. I'd then contact them for a call (through lesswrong), sending a link to the document, and update/publish based off feedback.
If I were the manager, I'd make an open call for interviews, asking people their alignment research field of interest, a clear problem statement, and how that problem statement connects with reducing existential risk.
A failure mode could be a 90-page report that no-one reads or misses the core point driving the researcher. It'd be great to have a Maxwell equations reformulation, or at least a "that's a much better way to phrase it".

Fundability

If I wanted to signal my ability to complete the above project, I would either need previous academic experience, or connections to people who know grantmakers, or, barring that, to actually make a draft of one such literature review, saying "Pay me money, and I'll make more".
Though, again note, I'm not a funding manager.

AI Safety Conference

We don't have a conference. Let's make one. One could argue that it's better to be connected to the broader AI/ML/etc communities by publishing at their journals, but why not do both? I think this is possible as long as this conference doesn't have proceedings. From Neurips

Can I submit work that is in submission to, has been accepted to, or has been published in a non-archival venue (e.g. arXiv or a workshop without any official proceedings)? Answer: Yes, as long as this does not violate the other venue's policy on dual submissions (if it has one).

The value of making connections and discussing research is well worth it.

Fundability

I think this is a great idea/generally fundable, and one way to help signal competence is previous experience organizing conferences or EA Global.

Alignment Forum Commenters W/ Language Models

There are many alignment forum posts w/o high-quality comments. Providing feedback to direct alignment work is an essential part of the research process.
We could hire people to do this work directly with language model (LM) tools (like instruct-GPT finetuned on lesswrong) to give higher-quality comments. This process has 2 indirect benefits:
- 1. Figuring out better ways to incorporate LM's in providing feedback in alignment research. With familiarity gained from daily use, more novel ways of using LM's to automate the feedback-process can be imagined.
- 2. Providing further feedback on LM's for feedback can be incorporated into the next iteration of the LM.
Both of these are still useful to have even when GPT-N is released, so won't become outdated.
A more direct measure of success would be contacting the alignment researchers for their view on how useful the comments are, and what would be more useful for them.
Failure modes include spamming lots of low-quality, nitpicking comments that don't get to the core of an argument and wastes researcher's time. Another is giving comments to a researcher whom "reading/responding to LW comments" isn't important to their specific workflow.

Fundability

If I were trying to signal I'd be good at this job, I would have a history of high-karma comments on alignment forum posts.
[Note: I'm part of a group of people making LM tools for alignment researchers, including this idea. Though I wouldn't expect a prototype of this tool until May-onwards]

Bounty for Elk-like Questions

[Note: I've only skimmed ELK and could be completely off about this]
ELK is useful for being both easy-to-state and capturing a core difficulty of alignment. Generating more questions with these properties would be worth more than $50k in my mind. These are questions I imagine we could pay academic researchers to work on directly, and have groups of university students work on every year for the big prize money. This is also useful for convincing these people of alignment difficulties (eg if you thought proving N != NP was required for safe AGI (it's not btw, lol), then you might take it more seriously)
Here, I'm suggesting a bounty for more ELK-like questions, an investigation on the properties we want in a problem-statement, a facilitation of new bounties for the problems generated, and possibly outreach to pay people to work on it or university students to try it.
A failure mode is no one actually produces any questions like these because it's very hard. Another is that it's still obtuse enough that paying people to work on it, doesn't work.

Fundability

You need to convince people that you can evaluate the importance of research problem statements, which seems very difficult to do without having personal connections or being a more big-name alignment researcher. I could imagine a less-known person filling a "facilitator" role (or someone who received an ELK prize), who can then incorporate expert feedback on suggested expert feedback.
I'd also imagine this could be facilitated by the community once there is a known funder.

Convincing Capability Researchers

Convincing someone to work on alignment is good. Convincing a capabilities researcher to work on alignment is double good (or just good if they suck at alignment, but at least they're not doing capabilities!). This requires a certain set of soft skills and understanding of other people, and possibly taking notes from street epistemology
Additionally, I expect many alignment-sympathetic people to have friends at capabilities & acquaintances and could benefit from being coached on how to approach the subject. Though this may sound weird and anti-social, it's weirder if you believe your friend/colleague is contributing to ending the world, and you haven't even broached the subject
Beyond soft-skills, it would be great to generate a minimalist set of the core arguments for alignment, without bogging down in tangents (which may benefit from many conversations trying to convince others).This would aid with convincing capability researchers, since one could send them the link and talk it over dinner or drinks.
If I were to do this, I would make a list of the core arguments for alignment and a basic script/set of questions. I'd first ask people in the EA & SSC/ACT community who aren't convinced of alignment to donate an hour of their time over video-call with me, explicitly stating "I'm trying to practice my 'convince capability researchers of alignment' pitch", and do the street epistemology approach.

Fundability

If you've convinced other people of alignment, or had success in convincing people in other high-impact topics (eg vaccinations, religion), this would be a good opportunity for you. Writing a "core alignment difficulties" post would additionally be useful for signaling.

Thoughts on Funding

If you want to actually apply for funding, AI Safety Support has a lot of very useful links (some grants are ongoing and some have something like a quarterly deadline)
FTX Future Fund deadline: March 21st
How much do you ask for? LTFF asked for a range of values and I gave my "I can survive even if I'm hospitalized for a couple weeks in US" to "This is probably way too much money" as my range and got somewhere in the middle.
Evan Hubinger has an open invitation for:
> if you have any idea of any way in which you think you could use money to help the long-term future, but aren’t currently planning on applying for a grant from any grant-making organization, I want to hear about it. Feel free to send me a private message on the EA Forum or LessWrong. I promise I’m not that intimidating :)

Feedback

I'm just quickly writing down my thoughts. This document would benefit from a much more rigourous devil's advocate viewpoint, or imagining how each project failed.

[-]Algon3y80

Funding someone to set up more contact with whoever is working on AI risk in China. I recall Jeffrey Ding mentioned that Tencent has some people articulating concerns over AI in his paper Deciphering China's AI Dream. I asked Dr. Ding about this, and let's see if he replies.

I'm also working on surveying the top 0.1% of mathematicians to find how much $ would be required to get them to work on AI safety happily for a while. Or to join a workshop for a week. I think these questions are more likely to get a serious answer in person. This is obviously parallelizable, requires some consistency in the survey questions and is worth co-ordinating on. I'd like to organise EAs on this topic, and maybe even pay people in local communities who might otherwise not be bothered to go in person. Better that then travelling there myself, and trying to leverage the communities connections to get some time with these guys.

Also, the more we interview, the more people we can namedrop and hence the more respectable our endeavour will appear.

[-]Logan Riggs3y10

Do you have a survey or are you just doing them personally?

One concern is not having well-specified problems in their specific expertise (eg we don't have mesa-optimizers specified as a problem in number theory, and it may not be useful actually), so there's an onboarding process. Or a level of vetting/trust that some of the ones picked can understand the core difficulty and go back-and-forth from formalization to actual-problem-in-reality.

Having both more ELK-like questions and set of lit reviews for each subfield would help. It'd be even better if someone specifically formalized these problems in different math fields (if it made sense to do so), but that already requires someone knowledgeable in the field to do the formalizing. So a bit of iterative-bootstrapping would be useful.

[-]Algon3y20

I'm devising the survey and thinking about how to approach these people. My questions would probably be of the form

How much would it take for you to attend a technical workshop?

How much to take a sabbatical to work in a technical field?

How much for you to spend X amount of time on problem Y?

Yes, we do need something they can work on. That's part of what makes the suvey tricky, because I expect if you said "work on problem X which is relevant to your field" vs "work on problem Y that you know nothing about, and attend a workshop to get you up to speed" would result in very different answers. And knowing what questions to ask them requires a fair bit of background knowledge in AI safety and the mathematicians subfield, so this limits the pool of people that can sensibly work on this.

Which is why trying to parralelise things and perhaps set up a group where we can discuss targets and how to best approach them would be useful. I'd also like to be able to contact AI safety folks on who they think we should go after, and which problems we should present as well as perhaps organising some background reading for these guys as we want to get them up to speed as quickly as possible.

[-]Viktor Rehnberg3y20

I just want to share that I've updated my view concerning Convincing Capability Research quite a bit since we spoke about it last week.

At the time my view was that you would get the best results (measured in probability of persuasion) from persuasion techniques that exploit peoples biases (e.g. stuff like Cialdini's Influence). Since then I've read Galef's Scout Mindset and I now think going for mutual understanding as a way of persuasion is more effective (scout mindset and street epistemology both promoting this as the way to actually persuade people).

In particular I am now ranking you much higher in fitness for this task then I did last week.

[-]Logan Riggs3y30

Thanks!:)

I’ve recently talked to students at Harvard and about convincing people about alignment (I’m imagining cs/math/physics majors) and how that’s hard because it’s a little inconvenient to be convinced. There were a couple of bottlenecks here:

There’s ~80 people signed up for a “coffee with an x-risk” person talk but only 5 very busy people who are competent enough to give those one-on-ones.
There are many people who have friends/roommates/classmates, but don’t know how to approach the conversation or do it effectively.

For both, training people to do that and creating curriculums/workshops would be useful. I don’t think you can create this without going out and trying it out on real, intelligent people.

This could then be used for capability researchers in general.

LESSWRONG
LW

22

Some (potentially) fundable AI Safety Ideas

22

Literature reviews for each AI Safety Research Agenda

Fundability

AI Safety Conference

Fundability

Alignment Forum Commenters W/ Language Models

Fundability

Bounty for Elk-like Questions

Fundability

Convincing Capability Researchers

Fundability

Thoughts on Funding

Feedback

22