9 One person's worth of mental energy for AI doom aversion jobs. What should I do?

26th Aug 2024

2 min read

9

Review

Hi! I'm Lorec, AKA Mack. I made this post 3 years ago:

Wanted: Foom-scared alignment research partner

I met some great people, but we never got much of anywhere.

Since then, technical alignment research in general also has not gotten much of anywhere [ counterexample; other strongish counterexamples I know of include the Visible Thoughts idea and Pliny's approach, Anthropic's approach doesn't seem to have panned out ] and AI doom aversion policy has become a thing.

I made a Discord a while ago for discussion of doom aversion methods. We were some of the first people [to my knowledge] talking positively about SB-1047. I consider it a failure: we were early and correct, but because we were not plugged into any network, nothing came of it.

I am indifferent to technical versus policy work except to the extent of [ the effectiveness factor over the risk factor of [technical work in general] ], versus [ the effectiveness factor over the risk factor of [policy work in general] ].

Factors I'm coming in considering important contributors to the technical versus policy weigh-in:

Pro Technical

- Can potentially have low safety risks if the researcher knows exactly what they are doing and does not use their employer's money to contribute to capabilities

- Can potentially have high safety upsides if the researcher knows exactly what they are doing and is a paranoid saint and can work without ever posting their exciting intermediate results on social media [difficulty level: impossible]

- Technical experience lends [any] policy credibility, while policy experience does not lend technical credibility

Pro Policy

- Fairly safe, for people who have a reasonable level of knowing what they are doing

- Policy jobs [from my faraway position; this might be wrong] seem likely to be more fungible [with each other] than technical jobs - resulting in less risk of being locked in to one employer whose mission I find myself disagreeing with

- I expect to have an easier time getting one of these kinds of jobs; while I consider myself decent enough at programming to be qualified for such technical alignment research as is hiring in principle, in practice I have no degree, job history, or portfolio, and am done wasting my time trying to acquire them, like, no, really, done. End of story.

Who should I talk to? What movements or orgs should I look into? Where are Things Happening the most? As stated in title, all my spoons are available for this, provided I find something that's actually high prospective impact, low prospective risk.

I appreciate your time and consideration.

Review

Personal Blog

9

Mentioned in

5Project Adequate: Seeking Cofounders/Funders

One person's worth of mental energy for AI doom aversion jobs. What should I do?

New Comment

17 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:10 PM

[-]Neel Nanda1y79

Anthropic's approach doesn't seem to have panned out

Please don't take that tweet as evidence that mech interp is doomed! Much attention is on sparse autoencoders nowadays, which seem like a cool and promising approach

[-]Lorec1y10

Tweet link removed.

[-]Neel Nanda1y57

Thanks! I will separately say that I disagree with the statement regardless of whether you're treating my tweet as evidence

[-]Lorec1y10

In what sense do you consider the mechinterp paradigm that originated with Olah, to be working?

[-]Neel Nanda1y2-2

We are finding a bunch of insights about the internal features and circuits inside models that I believe to be true, and developing useful techniques like sparse autoencoders and activation patching that expand the space of what we can do. We're starting to see signs of life of actually doing things with mech interp, though it's early days. I think skepticism is reasonable, and we're still far from actually mattering for alignment, but I feel like the field is making real progress and is far from failed

[-]TsviBT1y6-7

https://tsvibt.blogspot.com/2023/07/views-on-when-agi-comes-and-on-strategy.html#things-that-might-actually-work

[-]Nathan Helm-Burger1y52

I think Tsvi is quite mistaken about the speed we are likely to see AGI develop at. I expect AGI by 2028 with ~95% probability. He does not. Maybe we should dialogue about this?

[-]TsviBT1y30

Sure, though if you're just going to say "I know how to do it! Also I won't tell you!" then it doesn't seem very pointful?

[-]Lorec1y10

"Endpoints are easier to predict than trajectories"; eventual singularity is such an endpoint; on our current trajectory, the person who is going to do it does not necessarily know they are going to do it until it is done.

[-]M. Y. Zuo1y-20

"Endpoints are easier to predict than trajectories"

According to…? Can you link the proof?

[-]lemonhope1y30

I don't know if you're a woman, but the women I know have had much more success in politics than the men I know.

[-]Lorec1y10

Not a woman, sadly.

I believe it, especially if one takes a view of "success" that's about popularity rather than fiat power.

But FYI to future advisors: the thing I would want to prospectively optimize for, along the gov path, when making this decision, is about fiat power. I'm highly uncertain about whether viable paths exist from a standing start to [benevolent] bureaucratic fiat power over AI governance, and if so, where those viable paths originate.

If it was just about reach, I'd probably look for a columnist position instead.

[-]Seth Herd1y20

Did you make any progress on choosing a course? My brief pitch is this: LLM agents are our most likely route to AGI, and particularly likely in short timelines. Aligning them is not the same as aligning the base LLMs. Yet almost no one is working on bridging that gap.

That's what I'm working on. More can be found in my user profile.

I do think this is high prospective impact. I'm not sure what you mean by low prospective risk. I think the work has good odds of being at least somewhat useful, since it's so neglected and it's pretty commonly agreed that language model agents (or foundation model agents or LLM cognitive architectures) are a pretty likely path to first AGI.

I'm happy to talk more. I meant to respond here sooner.