Alignment Megaprojects: You're Not Even Trying to Have Ideas

Nicholas / Heather Kross

55 Alignment Megaprojects: You're Not Even Trying to Have Ideas

by Nicholas / Heather Kross

12th Jul 2023

2 min read

55

Consider the state of funding for AI alignment.

Is the field more talent-constrained, or funding-constrained?

I think most existing researchers, if they take the AI-based extinction-risk seriously, think it's talent constrained.

I think the bar-for-useful-contribution could be so high, that we loop back around to "we need to spend more money (and effort) on finding (and making) more talent". And the programs to do those may themselves be more funding-constrained than talent-constrained.

Like, the 20th century had some really good mathematicians and physicists, and the US government spared little expense towards getting them what they needed, finding them, and so forth. Top basketball teams will "check up on anyone over 7 feet that’s breathing".

Consider how huge Von Neumann's expense account must've been, between all the consulting and flight tickets and car accidents. Now consider that we don't seem to have Von Neumanns anymore. There are caveats to at least that second point, but the overall problem structure still hasn't been "fixed".

Things an entity with absurdly-greater funding (e.g. ~~the US Department of Defense~~ the US deferal government in a non-military-unless-otherwise-stated capacity) could probably do, with their absurdly-greater funding and probably coordination power:

Indefinitely-long-timespan basic minimum income for everyone who is working on solely AI alignment.
Coordinating, possibly by force, every AI alignment researcher and aspiring alignment researcher on Earth to move to one place that doesn't have high rents like the Bay. Possibly up to and including creating that place and making rent free for those who are accepted in.
Enforce a global large-ML-training shutdown.
An entire school system (or at least an entire network of universities, with university-level funding) focused on Sequences-style rationality in general and AI alignment in particular.
Genetic engineering, focused-training-from-a-young-age, or other extreme "talent development" setups.
Deeper, higher-budget investigations into how "unteachable" things like security mindset really are, and how deeply / quickly you can teach them.
Any of the above ideas, but with a different tradeoff on the Goodharting-vs-missed-opportunities continuum.
All of these at once.

I think the big logistical barrier here is something like "LTFF is not the U,S government", or more precisely "nothing as crazy as these can be done 'on-the-margin' or with any less than the full funding". However, I think some of these could be scaled down into mere megaprojects or less. Like, if the training infrastructure is bottlenecked on trainers, then we need to fund indirect "training" work just to remove the bottleneck on the bottleneck of the problem. (Also, the bottleneck is going to move at least when you solve the current bottleneck, and also "on its own" as the entire world changes around you).

Also... this might be the first list of ideas-in-precisely-this-category, on all of LessWrong/the EA Forum. (By which I mean "technical AI alignment research projects that you could fund, without having to think about the alignment problem itself in much detail beyond agreeing with 'doom could actually happen in my lifetime', if funding really wasn't the constraint".)

AI Alignment FieldbuildingAI

Frontpage

55

Mentioned in

42Upgrading the AI Safety Community

36Why We Need More Shovel-Ready AI Notkilleveryoneism Megaproject Proposals

New Comment

32 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:43 AM

[-]Lao Mein2y245

I've heard that it is very difficult to get funding unless you have a paradigmatic idea, and you can't get a job without good technical AI skills. But many people who skill up to get a job in technical alignment end up doing capabilities work because they can't find employment in AI Safety, or the existing jobs don't pay enough. Apparently, this was true for both Sam Altman and Demis Hassabis? I've also experienced someone discouraging me from acquiring technical AI skills for the purpose of pursuing a career in technical alignment because they don't want me to contribute to capabilities down the line. They noted that most people who skill up to work on alignment end up working in capabilities instead, which is kinda crazy.

My thinking is that I am just built different and will come up with a fundable paradigmatic idea where most fail. But yeah, the lack of jobs heavily implies that the field is funding-constrained because talent wants to work on alignment.

[-]RobertM2y79

But many people who skill up to get a job in technical alignment end up doing capabilities work because they can't find employment in AI Safety, or the existing jobs don't pay enough. Apparently, this was true for both Sam Altman and Demis Hassabis?

This seems like it's probably a misunderstanding. With the exception of basically just MIRI, AI alignment didn't exist as a field when DeepMind was founded, and I doubt Sam Altman ever actively sought employment at an existing alignment organization before founding OpenAI.

But yeah, the lack of jobs heavily implies that the field is funding-constrained because talent wants to work on alignment.

I think the current position of most grantmakers is that they're bottlenecked on fundable +EV opportunities with respect to AI x-risk, not that they have a bunch of +EV opportunities that they aren't funding because they fall below some threshold (due to funding constraints). This is compatible with some people who want to work on AI x-risk not receiving funding - not all proposals will be +EV, and those which are +EV aren't necessarily so in a way which is legible to grantmakers.

Keep in mind that "will go on to do capabilities work" isn't the only -EV outcome; each time you add a person to the field you increase the size of the network, which always has costs and doesn't always have benefits.

[-]Lao Mein2y80

This seems like it's probably a misunderstanding. With the exception of basically just MIRI, AI alignment didn't exist as a field when DeepMind was founded, and I doubt Sam Altman ever actively sought employment at an existing alignment organization before founding OpenAI.

Yeah, in hindsight he probably meant that they got interested in AI because of AI safety ideas, then they decided to go into capabilities research after upskilling. Then again, how are you going to get funding otherwise, charity? It seems that a lot of alignment work, especially the conceptual kind we really need to make progress toward an alignment paradigm, is just a cost for an AI company with no immediate upside. So any AI alignment org would need to pivot to capabilities research if they wanted to scale their alignment efforts.

Keep in mind that "will go on to do capabilities work" isn't the only -EV outcome; each time you add a person to the field you increase the size of the network, which always has costs and doesn't always have benefits.

I strongly disagree. The field has a deficit of ideas and needs way more people. Of course inefficiencies will increase, but I can't think of any other field that progressed faster explicitly because members made an effort to limit recruitment. Note that even very inefficient fields like medicine make faster progress when more people are added to the network - it would be very hard to argue, for example, that a counterfactual world where no one in China did medical research would have made more progress. My personal hope is 1 million people working on technical alignment which implies $100 billion+ annual funding. 10x that would be better, but I don't think it's realistic.

[-]RHollerith4mo20

I’ve also experienced someone discouraging me from acquiring technical AI skills for the purpose of pursuing a career in technical alignment because they don’t want me to contribute to capabilities down the line. They noted that most people who skill up to work on alignment end up working in capabilities instead.

I agree with this. I.e., although it is possible for individual careers in technical alignment to help the situation, most such careers have the negative effect of speeding up the AI juggernaut without any offsetting positive effects. I.e., the fewer people trained in technical alignment, the better.

[-]Nicholas / Heather Kross2y1414

Much cheaper, though still hokey, ideas that you should have already thought of at some point:

A "formalization office" that checks and formalizes results by alignment researchers. It should not take months for a John Wentworth result to get formalized by someone else.
Mathopedia.
Alignment-specific outreach at campuses/conventions with top cybersecurity people.

[-]Stephen McAleese2y116

Thanks for the post. I think it's a valuable exercise to think about how AI safety could be accelerated with unlimited money.

I think the Manhattan Project idea is interesting but I see some problems with the analogy:

The Manhattan Project was originally a military project and to this day, the military is primarily funded and managed by the government. But most progress in AI today is made by companies such as OpenAI and Google and universities like the University of Toronto. I think a more relevant project is CERN because it's more recent and focused on the non-military development of science.
The Manhattan Project happened a long time ago and the world has changed a lot since then. The wealth and influence of tech companies and universities is probably much greater today than it was then.
It's not obvious that a highly centralized effort is needed. The Alignment Forum, open source developers, and the academic research community (e.g. the ML research community) are examples of decentralized research communities that seem to be highly effective at making progress. This probably wasn't possible in the past because the internet didn't exist.

I highly doubt that it's possible to recreate the Bay Area culture in a top-down way. I'm pretty sure China has tried this and I don't think they've succeeded.

Also, I think your description is overemphasizing the importance of geniuses like Von Neumann because 130,000 other people worked on the Manhattan Project too. I think something similar has happened at Google today where Jeff Dean is revered but in reality, I think most progress at Google is done by the tens of thousands of the smart but not genius dark matter developers there.

Anyway, let's assume that we have a giant AI alignment project that would cost billions. To fund this, we could:

Expand EA funding substantially using community building.
Ask the government to fund the project.

The government has a lot of money but it seems challenging to convince the government to fund AI alignment compared to getting funding from EA. So maybe some EAs with government expertise could work with the government to increase AI safety investment.

If the AI safety project gets EA funding, I think it needs to be cost-effective. The reality is that only ~12% of Open Phil's money is spent on AI safety. The reason why is that there is a triage situation with other cause areas like biosecurity, farm animal welfare, and global health and development so the goal is to find cost-effective ways to spend money on AI safety. The project needs to be competitive and has more value on the margin than other proposals.

In my opinion, the government projects that are most likely to succeed are those that build on or are similar to recent successful projects and are in the Overton window. For example:

AI Centres for Doctoral Training in the UK: funding PhD students in the UK to work on AI projects such as AI safety.
The NSF Safe Learning-Enabled Systems: US government funding for academic research groups and non-profits to work on AI safety.

My guess is that leveraging academia would be effective and scalable because you can build on the pre-existing talent, leadership, culture, and infrastructure. Alternatively, governments could create new regulations or laws to influence the behavior of companies (e.g. GDPR). Or they could found new think tanks or research institutes possibly in collaboration with universities or companies.

As for the school ideas, I've heard that Lee Sedol went to a Go school and as you mentioned, Soviet chess was fueled by Soviet chess programs. China has intensive sports schools but I doubt these kinds of schools would be considered acceptable in Western countries which is an important consideration given that most of AI safety work happens in Western countries like the US and UK.

In science fiction, there are even more extreme programs like the Spartan program in Halo where children were kidnapped and turned into super soldiers, or Star Wars where clone soldiers were grown and trained in special facilities.

I don't think these kinds of extreme programs would work. Advanced technologies like human cloning could take decades to develop and are illegal in many countries. Also, they sound highly unethical which is a major barrier to their success in modern developed countries like the US and especially EA-adjacent communities like AI safety.

I think a more realistic idea is something like the Atlas Fellowship or SERI MATS which are voluntary programs for aspiring researchers in their teens or twenties.

The geniuses I know of that were trained from an early age in Western-style countries are Mozart (music), Von Neumann (math), John Stuart Mill (philosophy), and Judit Polgár (chess). In all these cases, they were gifted children who lived in normal nuclear families and had ambitious parents and extra tutoring.

[-]Roman Leventov2y10

It's not obvious that a highly centralized effort is needed. The Alignment Forum, open source developers, and the academic research community (e.g. the ML research community) are examples of decentralized research communities that seem to be highly effective at making progress.

Open-source development is debatable, but the academic research community and especially the alignment forum are paradigmatic examples of ineffective forms of human organisation (if the goal is real progress). And in both these cases, most real progress happens at labs anyway, i.e., organised groups of people.

[-]Stephen Fowler2y*100

My experience has been that we are clearly funding constrained, particularly in a post FTX world. This makes it hard to recruit top talent and is likely driving people out of alignment research.

(All opinions expressed are my own. Details of examples have been changed. None of these stories directly concern experiences with my own funding. I currently have active grant applications which has incentivised me to make this comment less harsh than it probably should be.)

I'm aware of researchers who have turned down extremely prestigious careers to pursue alignment, been promised certain amounts of funding and then had that funding substantially delayed.

I'm aware of researchers with funding giving cash loans to other researchers who were experiencing funding delays.

Such an environment does not bode well for convincing people to stick to alignment. No adult with responsibilities feels secure working contract to contract.

Beyond that, alignment salaries are smaller than what you would make in tech. This makes it difficult to poach senior talent from FAAMG.

[-]Joachim Bartosik2y50

Indefinitely-long-timespan basic minimum income for everyone who

Looks like part of the sentence is missing

[-]Nicholas / Heather Kross2y20

Thank you, fixed.

[-]mishka2y54

move to one place

No, we need to maintain diversity and enough decentralization (e.g. because of the need for resilience against a single missile nuclear attack (a single super-valuable site is very vulnerable)).

Moreover, since "AI existential safety" is a preparadigmatic field, it is particularly important to be able to explore various non-standard ideas (government control, and, especially, military control is not conductive in this sense).

There is a much cheaper idea (which still needs some money and, more crucially, needs some extremely strong organizational skills): promote creation of small few-person short-term collaborations exploring various non-standard ideas in "AI existential safety" (basically, let's have a lot of brainstorms, let's have infrastructure to support those brainstorms and to support sharing their fruits, let's figure out how to really make rapid progress here in a fashion which is a network and not a hierarchy (the problem is probably too difficult to be rapidly solved by a hierarchy)).

That would generate novel ideas. (We just need to keep in mind that anything which actually works in "AI alignment" (or, more generally, in any approach to "AI existential safety", whether alignment-based or not) is highly likely to be dual use and a major capability booster. No one knows how to handle this correctly.)

we don't seem to have Von Neumanns anymore

Oh, but we do. For example (and despite my misgivings about OpenAI current approach to alignment), Ilya Sutskever is of that caliber (AlexNet, GPT-3, GPT-4, and a lot of other remarkable results speak for themselves). And now he will focus on alignment.

That being said, genius scientists are great for doing genius things, but not necessarily the best policy decision makers (e.g. von Neumann strongly advocated preventive nuclear attack against Soviet Union, and Oppenheimer after opposing thermonuclear weapons actually suggested to use an early prototype thermonuclear device in the Korean war).

So technical research and technical solutions is one thing, but decision-making and security-keeping is very different (seem to require very different people with very different skills).

[-]Nicholas / Heather Kross2y30

I agree that more diverse orgs is good, heck I'm trying to do that on at least 1-2 fronts rn.

I'm not as up-to-date on key AI-researcher figures as I prolly should be, but big-if-true is Ilya is really JVN-level and is doing alignment and works at OpenAI, that's a damn good combo for at least somebody to have.

[-]mishka2y32

Yes, assuming the first sentence of their overall approach is not excessively straightforward

We need scientific and technical breakthroughs to steer and control AI systems much smarter than us

It might be that a more subtle approach than "steer and control AI systems much smarter than us" is needed. (But also, they might be open to all kinds of pivoting on this.)

:-) Well, Ilya is not from Hungary, he has been born in Gorky, but otherwise he is a total and obvious first-rate Martian :-)

[-]Nicholas / Heather Kross2y20

Another project idea that should at least be written down somewhere Official, even if it's never implemented.

[-]TAG2y*00

You seem to be in the mindset that everything is as EY/LW says ... but there is precious little evidence for that outside the echo chamber.

An entire school system (or at least an entire network of universities, with university-level funding) focused on Sequences-style rationality in general and AI alignment in particular.

Is there evidence that extreme rationality works? (against ) Is there evidence that the people with real achivemnents -- LeCunn, Ng etc -- are actually crippled by lack of rationality? Can you teach alignment separately from AI? (cf can you teach safety engineering to someone who doesn't know engineering?)

Indefinitely-long-timespan basic minimum income for everyone who is working on solely AI alignment.

How do you separate people who are actually working on alignment from scammers? How do you motivate them to produce results with an unconditional, indefinite term payment? Would minimum income be enough to allow them to buy equipment and hire assistants? (Of course, all these problems are solved by conventional research grants).

Again, you seem to be making the High Rationalist assumption that alignment is a matter of some unqualified person sitting in a chair thinking, not of a qualified person doing practical work.

[-]Ulisse Mini2y*03

EDIT: I think this comment was overly harsh, leaving it below for reference. The harsh tone was contributed from being slightly burnt out from feeling like many people in EA were viewing me as their potential ender wiggin, and internalizing it.^[1]

The people who suggest schemes like what I'm criticizing are all great people who are genuinely trying to help, and likely are.

Sometimes being a child in the machine can be hard though, and while I think I was ~mature and emotionally robust enough to take the world on my shoulders, many others (including adults) aren't.

An entire school system (or at least an entire network of universities, with university-level funding) focused on Sequences-style rationality in general and AI alignment in particular.

[...]

Genetic engineering, focused-training-from-a-young-age, or other extreme "talent development" setups.

Please stop being a fucking coward speculating on the internet about how child soldiers could solve your problems for you. Enders game is fiction, it would not work in reality, and that isn't even considering the negative effects on the kids. You aren't smart enough for galaxy brained plans like this to cause anything other than disaster.

In general rationalists need to get over their fetish for innate intelligence and actually do something instead of making excuses all day. I've mingled with good alignment researchers, they aren't supergeniuses, but they did actually try.

(This whole comment applies to Rationalists generally, not just the OP.)

I should clarify this mostly wasn't stuff the atlas program contributed to. Most of the damage was done from my personality + heroic responsibility in rat fiction + dark arts of rationality + death with dignity post. Nor did atlas staff do much to extenuate this, seeing myself as one of the best they could find was most of it, cementing the deep "no one will save you or those you love" feeling. ↩︎

[-]Nicholas / Heather Kross2y50

I... didn't mention Ender's Game or military-setups-for-children. I'm sorry for not making that clearer and will fix in the main post. Also, I am try to do something instead of solely complaining (I've written more object-level posts and applied for technical-research grants for alignment).

There's also the other part that, actually, innate intelligence is real and important and should be acknowledged and (when possible) enhanced and extended, but also not used as a cudgel against others. I honestly think that most of the bad examples "in" the rationality community are on (unfortunately-)adjacent communities like TheMotte and sometimes HackerNews, not LessWrong/EA Forum proper.

[-]Ulisse Mini2y3-14

Sorry, I was more criticizing a pattern I see in the community rather than you specifically

However, basically everyone I know who takes innate intelligence as "real and important" is dumber for it. It is very liable to mode collapse into fixed mindsets, and I've seen this (imo) happen a lot in the rat community.

(When trying to criticize a vibe / communicate a feeling it's more easily done with extreme language, serializing loses information. sorry.)

[-]Noosphere892y2-2

However, basically everyone I know who takes innate intelligence as "real and important" is dumber for it. It is very liable to mode collapse into fixed mindsets, and I've seen this (imo) happen a lot in the rat community.

To the extent that that this is actually true, I suspect it comes down to underrating luck as a factor, which I could definitely see as a big problem, and not understanding that general innate intelligence isn't widely distributed (such that even selecting pretty hard for general innate intelligence will at best get you an OOM better than average, if a supergenius and a ridiculous outlier, with the real life attempts being at best 2-3x median human, and that's being generous.)

In essence, I think general, innate intelligence is real, it matters, but compared to luck or non-intelligence factors, it's essentially a drop in the ocean and rationalists overrate it a lot.

[-]Nicholas / Heather Kross2y20

I disagree quite a bit with the pattern of "there's this true thing, but everyone around me is rounding it off to something dumb and bad, so I'm just gonna shout that the original thing is not-true, in hopes people will stop rounding-it-off".

Like, it doesn't even sound like you think the "real and important" part is false? Maybe you'd disagree, which would obviously be the crux there, but if this describes you, keep reading:

I don't think it's remotely intractable to, say, write a LessWrong post that actually convinces lots of the community to actually change their mind/extrapolation/rounding-off of an idea. Yudkowsky did it (as a knowledge popularizer) by decoupling "rationality" from "cold" and "naive". Heck, part of my point was that SSC Scott has written multiple posts doing the exact thing for the "intelligence" topic at hand!

I get that there's people in the community, probably a lot, who are overly worried about their own IQ. So... we should have a norm of "just boringly send people links to posts about [the topic-and-hand] that we think are true"! I'm sure, if someone wrote or dug up a good post about [why not to be racist/dickish/TheMotte about innate intelligence], we should link the right people that, too.

In four words: "Just send people links."

[-]Nicholas / Heather Kross2y20

I agree with the meta-point that extreme language is sometimes necessary (the paradigmatic example imho being Chomsky's "justified authority" example of a parent yelling at their kid to get out of the road, assuming the yell and/or swear during it), good on you for making that decision explicit here.

[-]Elizabeth2y*21

I upvoted this to get it out of the negative, but also marked it as unnecessarily combative. I think a lot of the vitriol is deserved by the situation as a whole but not OP in particular.

[-]mako yass2y31

Vitriol isn't useful. Most of what they were saying was obviously mindkilled bullshit (accusation of cowardice, "fetish", "making excuses"). I encourage Ulisse to try to articulate their position again when they're in less of a flaming asshole mood.

[-]Ulisse Mini2y3-5

I wasn't in a flaming asshole mood, it was a deliberate choice. I think being mean is necessary to accurately communicate vibes & feelings here, I could serialize stuff as "I'm feeling XYZ and think this makes people feel ABC" but this level of serialization won't activate people's mirror neurons & have them actually internalize anything.

Unsure if this worked, it definitely increased controversy & engagement but that wasn't my goal. The goal was to shock one or two people out of bad patterns.

[-]Nicholas / Heather Kross2y20

I think there's probably something to the theory driving this, but 2 problems:

It seems half-baked, or half-operationalized. Like, "If I get them angry at my comment, then they'll really feel the anger that [person] feels when hearing about IQ!". No, that makes most people ignore you or dig in their heels. If I were using "mirror neurons, empathy, something..." to write a comment, it'd be like a POV story of being told "you're inherently inferior!" for the 100th time today. It'd probably be about as memetically-fit, more helpful, and even more fun to write!

Related story, not as central: I used to, and still sometimes do, have some kind of mental bias of "the angrier someone is while saying something, it must have more of The Truth" in it. The object-level problems with that should be pretty obvious, but the meta-level problem is that different angry people still disagree with each other. I think there is a sort of person on LessWrong who might try steelmanning your view. But... you don't give them much to go off of, not even linking to relevant posts against the idea that innate intelligence is real and important.

LessWrong as-a-whole is place where we ought to have, IMHO, norms of this place is okay to be honest in. You shouldn't start a LessWrong comment by putting on your social-engineer hat and saying "Hmmm, what levers should I pull to get the sheep to feel me?". And, as noted in (1), this precise example probably didn't work, and shouldn't be the kind of thing that works on LessWrong.

[Less central: In general, I think that paying attention to vibes is considerate and good for lots of circumstances, but that truth-seeking requires decoupling, and that LessWrong should at-its-core be about truth-seeking. If I changed my mind on this within about a week, I would probably change the latter belief, but not the former.]

I admire your honesty (plain intention-stating in these contexts is rare!), and hope this feedback helps you and/or others persuade better.

(I also have angrier vibes I could shout at you, but they're pretty predictable given what I'm arguing for, and basically boil down to "

[-]Noosphere892y2-7

To be fair here, part of the problem is more so that innate intelligence does exist, but is on a normal distribution, not a power law distribution, so you can't have massive differences in innate intelligence being the dominant factor for success.

[-]habryka2y3112

IQ is on a normal distribution because we force it to be normalized that way. Task performance tends to vary by large factors, resembling something closer to a log normal or exponential distribution, suggesting intelligence is indeed heavy tailed.

[-]Elizabeth2y102

I'd love to see a top level post laying this out, it seems like it's been a crux in a few recent discussions.

[-]habryka2y62

It sure has come up frequently enough that I've been thinking about writing this post. I hope I'll get around to it, but would also greatly appreciate anyone else familiar with the literature here to write something.

[-]Noosphere892y2-2

A crux here is that I think there are reasons beyond defining it to be normal that the normal distribution prevails, and the biggest reason for this is that I generally model the contributions of human intelligence as additive, not an AND function and in particular that they are independent, that is one gene for intelligence can do it's work without requiring any other genes. This basically lets us construct the normal distribution, and explains why it's useful to model it as a normal distribution.

As far as the result that task performance is heavy tailed, another consistent story is that what's going on is people get mostly lucky, and then post-hoc a story about how their innate intelligence/sheer willpower made them successful, and this is important, since I suspect it's the most accurate story given the divergence of us being normal, but the world is extreme.

[-]habryka2y140

A lot of genes have multiplicative effects instead of additive effects. E.g. vegetable size is surprisingly log-normally distributed, not normally distributed, so I don't think you should have a huge prior on normal here. See also one of my favorite papers of all time "Log-Normal Distributions Across The Sciences".

[-]Noosphere894mo20

In retrospect, I've come to agree more on this since we last debated, and I now think genetic effects are log-normally distributed, and I think you were directionally correct here (though I do still think that there's a significant chance that what's going on is people get mostly lucky, and then post-hoc a story about how their innate intelligence/sheer willpower made them successful, and this is important, because I do think the world in general is way more extreme than human genetics/traits.)

Thanks to @tailcalled for convincing me I was wrong here:

https://www.lesswrong.com/posts/yJEf2TpPJstfScSnt/ldsl-1-performance-optimization-as-a-metaphor-for-life#CSCLkNhzzc5hqYM3n

Moderation Log