Will working here advance AGI? Help us not destroy the world!

Yonatan Cale

LESSWRONG
LW

Will working here advance AGI? Help us not destroy the world! — LessWrong

30 Will working here advance AGI? Help us not destroy the world!

by Yonatan Cale

29th May 2022

1 min read

30

I often talk to developers who prefer not destroying the world by accident (specifically by accelerating AGI risk), but neither them nor me can decide if specific companies qualify for this.

Could someone knowledgable help? A few short replies could probably change someone's career decisions

Can you help with future questions?

Please subscribe to the this comment. I'll reply to it only when there's a new open question.

Thank you!

Adding: Reply anonymously here

AI RiskPracticalWorld Optimization

Frontpage

30

New Comment

Rendering 45/46 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 11:34 PM

[-]Vaniver4y245

I think the cool thing that could come out of this post is "a bit more consensus about the social value of working various places." For example, if it turns out that everyone thinks "probably you shouldn't work at a tobacco company", then maybe fewer people will go there accidentally, and the people who are at those companies might think harder about their choices.

But I worry that this post is set up in a way that leads to polarization instead of consensus; if it is cheap to say "working at company X is probably bad/evil/etc." and it is expensive to say the opposite (so people don't respond or get into the discussion), then we would end up with a false consensus among LW readers that various companies are bad, and a false consensus among people at various companies that LW isn't worth engaging with (and multiple disagreeing false consensuses seem like the typical form of polarization to me).

So I think it'd be worth putting some meta thought into the question of how we could make this post a place where people who are optimistic about any of the mentioned companies feel hopeful about sharing their opinions and reaching consensus.

[There is, of course, a background fact that it is generally harder to argue for incorrect positions than correct positions, and someone's fear about getting into an argument about something may really be their fear that they're incorrect. I think it is a mistake to assume that lack of willingness to engage is generally of that form, rather than first checking that in fact you haven't make any mistakes that could lead to disconnection of information flow or a sensible lack of hope in the process.]

[-]Yonatan Cale4y30

I invite people to reply anonymously here

I'll add the link to the post

[-]Vaniver4y49

Anonymous replies can help, but also check out these two threads by lc and Logan Zoellner, where they have very different views and basically are replying to each other with "it seems like you're doing the thing that's the opposite of helping." What are their cruxes? How can we keep that line of communication open, instead of people getting nastier and more disconnected? [It seems to me like so far the two are 'making actual arguments', but also I have a suspicion that it will get 10-20% worse with each reply, and that will mean we don't actually have the space to get all the way to the ground, or seeing those threads as "how conversation will go" will cause other people to not start threads.]

In particular, seeing a comment that starts with "I suspect my advice is the exact opposite of they Less Wrong/EY consensus, so here goes:" downvoted to invisibility seems like it's pretty terrible from the perspective of getting all arguments represented. Probably it's worth turning on the feature that lets people vote 'agree' or 'disagree' separately from 'upvote' and 'downvote', so that we can separately track "how much voters agree on something" and "how much users should prioritize reading something."

[-]lc4y20

I removed my strong downvote, because you're right, but I'd like to register my highly sincere disagreement here.

[-]Vaniver4y31

Two-axis voting is now activated, thanks to habryka.

[-]RHollerith4y*19-11

IMO most of the jobs where the employer says the job is about AI alignment or AI safety, e.g., the alignment team at OpenAI, is actually contributing to AI danger.

More generally, I humbly suggest erring on the side of safety: avoid contributing to not just AI --except for MIRI and maybe 1 or 2 other outfits -- but also anything AI needs, such as hardware, operating systems, compilers, version-control tools, issue-trackers and development methodologies -- but only if doing so does not put you at a significant personal disadvantage relative to people with your skills who do not care about AI safety.

If for example, you have invested 3 years of your working life in getting good a compiler development specifically, and you are not yet bored with compiler development, then the right move is to continue to work on compiler development if that is the most efficient way for you to earn money because having a higher income makes a person more effective and influential and we want the people concerned enough about about AI safety to read this thread of conversation to be effective and influential.

215 years ago in France, if your father was not part of the nobility, there were few opportunities for a young man to achieve status and material security. For most young men, the best way to achieve those things was to join Napoleon's Army, but of course Napoleon's Army was spreading death and destruction across Europe. If you know or suspect that AI research (and research and development of any computing infrastructure that supports it) is today's version of Napoleon's Army and you care about the global consequences of your actions, the right move is not to join Napoleon's Army in the hopes that you will be able to use your insider position to limit the death and destruction. The right move is to choose not to join Napoleon's Army (and to avoid spending any of your time and energy acquiring skills and knowledge whose main market is Napoleon's Army -- and to consider joining the movement to get Napoleon's Army banned by the governments of the developed world if you think that movement has a chance of succeeding).

In contrast to 215 years ago in France, choosing not to join today's version of Napoleon's Army is not even much of a personal sacrifice because of the diversity and general abundance of today's economy.

[-]Nisan4y200

Why do you think the alignment team at OpenAI is contributing on net to AI danger?

[-]RHollerith4y*110

Maybe I don't know enough about OpenAI's alignment team to criticize it in public? I wanted to name one alignment outfit because I like to be as specific as possible in my writing. OpenAI popped into my head because of the reasons I describe below. I would be interested in your opinion. Maybe you'll change my mind.

I had severe doubts about the alignment project (the plan of creating an aligned superintelligence before any group manages an unaligned one) even before Eliezer went public with his grave doubts in the fall of last year. It's not that I consider the project impossible in principle, just that it is of such difficulty that it seems unlikely that we will accomplish it before the appearance of an unaligned intelligence that kills us all. In other words, I see it as humanly possible, but probably not humanly possible quickly enough. Anna Salamon was saying in 2010 or so that alignment research (called Friendliness research or Friendly-AI research in those days IIRC) was like trying to invent differential equations before the rest of the world invents elementary algebra, which is basically the same take unless I misinterpreted her. Since then of course there has been an alarming amount of progress towards inventing the elementary algebra of her analogy.

I have no objection to people's continuing to work on alignment, and I'd offer Scott Garrabrant as an example of someone doing good work on it, but it seems unlikely that anyone employed by an organization whose main plan for getting money is to sell AI capabilities would be able to sustainably do good work on it: humans are too easily influenced by their workplaces and by the source of their personal economic security. And OpenAI's main plan for getting money is to sell AI capabilities. (They were a non-profit at their founding, but switched to being a for-profit in 2019.)

Also, at OpenAI's founding, the main plan, the main strategy proposed for ensuring that AI will turn out well for humanity, was for OpenAI to publish all its research! Sam Altman walked back on that plan a little, but he didn't change the name of the organization, a name that is a very concise description of the original plan, which is a sign that he doesn't really get how misguided and (unintentionally) destructive the original plan was. It is a sign because it is not uncommon for an organization to change its name when it undergoes a true change in strategy or approach.

I used to continue to see doctors who would offer what I knew was bad advice. This was my policy for about 28 years. (And during that time I saw many doctors because I have chronic health conditions. I was already an adult at the start of the 28-year interval.) As long as the doctor did not cost me much and was generally willing to order a significant fraction of the tests and prescribe a significant fraction of the drugs I asked him to order and prescribe, I tended to continue to see that doctor. I stopped doing that because I had accumulated a lot of evidence that their bad advice tended to affect my behavior (to change it to conform to the advice) even though I recognized the advice as bad as soon as it was conveyed to me.

You (the reader, not just the person I am replying to, namely, Nisan) might not have my problem remaining uninfluenced by bad advice from authority figures. Maybe I'm more suggestible than you are. Do you know that for sure? If not, why not err on the side of caution? There are many employers in this world! Why not avoid working for any outfit in which a large fraction of the employees are embarked on a project that will eventually probably kill us all and who have significant career capital invested in that project?

Hmm. I know you (Nisan) work or used to work for Google. I notice that I don't object to that. I notice that I don't seem to object much to anyone's working for an outfit that does a lot of capability research if that is the most efficient way for them to provide for themselves or their family. I just don't like it as a plan for improving the world. If the best plan a person can come up with for improving the world involves working for an outfit that does a lot of capability research, well, I tend to think that that someone should postpone their ambitions to improve the world and focus on becoming stronger (more rational) and making money to provide for themselves and their family until such time as they can think up a better plan!

Also, my non-objection to people's continuing to work for AI-capability outfits for personal economic reasons applies only to people who have already invested a lot of time and energy in learning to do that kind of work (through learning on the job or learning on one's own dime): it is a bad idea IMO for anyone not already on the capabilities-research career path to get on it. I know that many here would disagree, but IMO getting good at AI capabilities work, very probably doesn't help much for AI alignment work. Look for example at the work of Scott Garrabrant (Cartesian frames, finite factored sets). Very rarely if at all does it rely on the capabilities literature.

[-]Nisan4y15-2

Thanks for sharing your reasoning. For what it's worth, I worked on OpenAI's alignment team for two years and think they do good work :) I can't speak objectively, but I'd be happy to see talented people continue to join their team.

I think they're reducing AI x-risk in expectation because of the alignment research they publish (1 2 3 4). If anyone thinks that research or that kind of research is bad for the world, I'm happy to discuss.

[-]RHollerith4y60

Thanks for your constructive attitude to my words.

[-]Dan Braun4y127

I have a different intuition here; I would much prefer the alignment team at e.g. DeepMind to be working at DeepMind as opposed to doing their work for some "alignment-only" outfit. My guess is that there is a non-negligible influence that an alignment team can have on a capabilities org in the form of:

The alignment team interacting with other staff either casually in the office or by e.g. running internal workshops open to all staff (like DeepMind apparently do)
The org consulting with the alignment team (e.g. before releasing models or starting dangerous projects)
Staff working on raw capabilities having somewhere easy to go if they want to shift to alignment work

I think the above benefits likely outweigh the impact of the influence in the other direction (such as the value drift from having economic or social incentives linked to capabilities work)

[-]lc4y*30

My sense is that this "they'll encourage higher ups to think what they're doing is safe" thing is a meme. Misaligned AI, for people like Yann Lecunn, is not even a consideration; they think it's this stupid uninformed fearmongering. We're not even near the point that Phillip Morris is, where tobacco execs have to plaster their webpage with "beyond tobacco" slogans to feel good about themselves - Demis Hassabis literally does not care, even a little bit, and adding alignment staff will not affect his decision making whatsoever.

But shouldn't we just ask Rohin Shah?

[-]P.4y41

Even a little bit? Are you sure? https://www.lesswrong.com/posts/ido3qfidfDJbigTEQ/have-you-tried-hiring-people?commentId=wpcLnotG4cG9uynjC

[-]Yonatan Cale4y*70

Thank you,

Any opinions about chip production like ~~this~~ this?

This is very concrete since it's a major company that some of my friends are considering working at

[-]RHollerith4y31

Your link goes to a page on why CEA Online doesn’t outsource more work.

Did you intend to link me to a page about Next Silicon instead?

[-]Yonatan Cale4y10

Oops,

Yes, exactly, thank you

Fixed

[-]lc4y42

because having a higher income makes a person more effective and influential and we want the people concerned enough about about AI safety to read this thread of conversation to be effective and influential.

This seems like a copout; in order to be effective and influential, you have to be doing something to solve the problem. Instead of just saying "me making money shortening timelines is fine, because I'm one of the 'Good People' who is 'Aware of the Problem'", donate a chunk of income to serious alignment research or AI governance outreach. If you're doing something as indirect as compiler development, then even a 1% token is probably net positive.

[-]Logan Zoellner4y15-14

I suspect my advice is the exact opposite of they Less Wrong/EY consensus, so here goes:

Choose to work at whatever company will allow you personally to get as good at AI/Machine learning as possible.

This is a restatement of my advice at the end of my essay on AI Alignment. Specifically, the two strategies I am the most optimistic about, Game Theory and The Plan both depend on very smart people becoming as wise as possible before the Singularity comes.

From a game-theory point of view, advancing AI knowledge in general is a tragedy of the commons. It would require coordination from everyone all at once in order to stop advancing AI beyond the danger level (whatever that might be). And it isn't even possible to know if a particular field (compilers, formal mathematical methods, hardware improvement, AI art) will be the one that puts us over the top. That means there is very little benefit for you not to work on advancing AI (and it comes at a huge cost, since you basically have to give up on any career even tangentially related to technology).

On the other hand AI Alignment is likely to be solved by a "small group of thoughtful individuals". Increasing your skills proportionally increases your chance of being a member of this category (since it seems like you already care about the topic).

One way to think about this advice is: every day Google, Open AI, Hugging Face, and 1000 other companies are hiring someone and that someone will likely work to advance AI. If we imagine the marginal case where a company is deciding between hiring you and someone slightly less concerned about AI alignment. Wouldn't you rather they hire you?

Note that this advice does not mean you get to leave your ethics at the door. Quite the opposite, if you are working somewhere and it turns out they are doing something egregious stupid (like deploying a non-airgapped AI), it is your duty to do everything in your power to stop them. Complain to your boss, leak information to the press, chain yourself to the server. Whatever you do, do not become the engineer who warned about disaster but then quietly shrugged when pressured by management. But if you refuse to take any jobs related to AI, you won't even be in the room when the disaster is about to happen. And on the margin, you should assume that somebody worse will be.

[-]joraine4y*50

[-]Yonatan Cale4y10

I didn't downvote, but your suggestion seems obviously wrong to me, so:

Working in one of those companies (assuming you have added value to them) is a pretty high confidence way to get unfriendly AGI faster.

If you want to build skills, there are lots of ways to do that without working at very dangerous companies.

[-]ViktoriaMalyasova4y10

Hm, can we even reliably tell when the AI capabilities have reached the "danger level"?

[-]lc4y*-21

I think this was worse than the worst advice I could have been asked to imagine. Lines like this:

One way to think about this advice is: every day Google, Open AI, Hugging Face, and 1000 other companies are hiring someone and that someone will likely work to advance AI. If we imagine the marginal case where a company is deciding between hiring you and someone slightly less concerned about AI alignment. Wouldn't you rather they hire you?

almost seem deliberately engineered, as if you're trying to use the questioner's biases against them. If OP is reading my comment, I'd like him to consider whether or not everyone doing what this commenter wants results in anything different than the clusterfuck of a situation we currently have.

Imagine if someone was concerned about contributing to the holocaust, and someone else told them that if they were really concerned what they ought to do was try to reform the Schutzstaffel from the "inside". After all, they're going to hire someone, and it'd of course be better for them to hire you than some other guy. You're a good person OP, aren't you? When you've transported all those prisoners you can just choose to pointlessly get shot trying to defend them from all of the danger you put them in.

[-]Logan Zoellner4y44

Imagine if someone was concerned about contributing to the holocaust

This is an uncharitable characterization of my advice. AI is not literally the holocaust. Like all technology, it is morally neutral. At worst it is a nuclear weapon. And at best, Aligned AI is an enormously positive good.

[-]habryka4y116

Mod note: I activated two-axis voting on this post, since I feel like that might help with this discussion going better.

[-]lc4y40

This discussion would have probably gone into the toilet without it, so thanks.

[-]Yonatan Cale4y10

:(

I was especially hoping for replies to this, no idea what to do about it.

Still, this topic does seem to be important. I sometimes talk to people working on things that I'd consider dangerous but I don't feel confident giving such advice myself. Having the community discuss SOMEHOW (ideally about specific companies) would help.

[-]Vaniver4y52

I think decision-making relies on counterfactuals, and it also seems good to have a sense of what counterfactuals we're considering here, or what the underlying model of AGI production is.

I think the main thing that I'm hoping for is a separation out of 'purity ethics' and 'expected value maximization', or something, or the purity ethics is explicitly grounded in a FDT-style "if people running good decision theory simply decline to coordinate with evil systems, those evil systems will be disadvantaged" rather than just a "this seems icky." For example, you might imagine researchers or engineers signing something like an "adequacy pledge" based off of Eliezer's guidelines, where they commit to only working to orgs that are adequate along all six dimensions (or to the top current org if no orgs are adequate, tho that version seems worse), and now orgs can see how much talent cares about that sort of thing.

I wrote some basic thoughts about this a while ago in response to a similar question; roughly speaking, I think the 'total effects' of interventions are not obvious even if the 'direct effects' are obvious. I think there's value in trying to figure out why people are motivated and what sorts of 'gains from trade' are possible.

As a specific example, DeepMind has made public claims of the form "once we get close enough to AGI, we'll slow down and move carefully." But imagine being the person at DeepMind in charge of making the lever such that DeepMind leadership can pull it to slow down, without immediately shedding all of their employees that want to keep moving at full speed, or other sorts of evaporative cooling. What work can be done now to set that up? What work can be done now to figure out the benefits and costs of pulling that lever at various times? [And, as a "checking if people's words line up with their behaviors", at a company DeepMind's size there really needs to be at least one FTE who is preparing for a project of that size. Does such a person exist? Can we help DeepMind hire them if not so?]

[-]Vaniver4y42

As a specific example, I want to separate out something like "working at OpenAI" and "founding OpenAI"; my sense is that EY and others think that founding OpenAI in the first place did a major blow to the feasibility of coordination between AGI projects, and thus was pretty tragic from the "will human civilization make it" perspective. But it's not obvious to me that choosing to get a job at OpenAI in 2022 is primarily connected to the question of whether or not OpenAI should be founded in 2015, and is instead primarily connected to the question of what projects they will do in 2022-2025. [The linked writing in the parent comment is about OpenAI in 2020, before ARC and Anthropic split out of it; I don't have a great sense of the social value (or disvalue) of working at those three orgs today.]

[-]lc4y30

Relevant.

[-]Yonatan Cale4y30

Deep mind in general: wdyt?

[-]habryka4y1913

Non-safety Deepmind seems like among the worst places to work in the world. One of the few companies aiming directly at AGI, and with some of the most substantial history of making progress towards AGI capabilities.

[-]Jack R4y10

It seems like you are confident that the delta in capabilites would outweigh any delta in general alignment sympathy. Is this what you think?

[-]Yonatan Cale4y10

May I ask what you are calling "general alignment sympathy"? Could you say it in other words or give some examples?

[-]Jack R4y10

I was thinking of the possibility of affecting decision-making, either directly by rising the ranks (not very likely) or indirectly by being an advocate for safety at an important time and pushing things into the Overton window within an organization.

I imagine Habryka would say that a significant possibility here is that joining an AGI lab will wrongly turn you into an AGI enthusiast. I think biasing effects like that are real, though I also think it's hard to tell in cases like that how much you are biased v.s. updating correctly on new information, and one could make similar bias claims about the AI x-risk community (e.g. there is social pressure to be doomy; only being exposed to heuristic arguments for doom and few heuristic arguments for optimism will bias you to be doomier than you would be given more information).

[-]Yonatan Cale4y30

Deep Mind's safety team specifically: wdyt?

[-]habryka4y112

Deepmind's safety team actually seems like a pretty good place to work. They don't have a history of contributing much to commercialization, and the people working there seem to have quite a bit of freedom in what they choose to work on, while also having access to Deepmind resources.

The biggest risk from working there is just that making the safety team bigger makes more people think that Deepmind's AI development will be safe, which seems really very far from the truth, but I don't think this effect is that large.

[-]RHollerith4y*5-3

The biggest risk from working there is just that making the safety team bigger makes more people think that Deepmind’s AI development will be safe, which seems really very far from the truth

It is the capability researchers in particular and their managers and funders that I worry will be lulled into a false sense of security by the presence of the safety team, not onlookers in general. When you make driving safer, e.g., by putting guardrails on a road, or you make driving appear (to the driver) to be safer, drivers react by taking more risks.

[-]lc4y*20

The worst thing you could possibly do is work for the capabilities section of existing AGI enterprises like Google Brain, DeepMind or OpenAI. This includes, obviously, the "AI alignment" companies that really just do capabilities research, and does not include the sections within these companies that do genuine alignment research. Dan Heydricks has an excellent sequence here on how not to fuck this up. Use your critical thinking and and ask simple questions to find out which position is which.

The second worst thing in terms of expected impact would be to work at or support pioneering ML research at a general company like Facebook's division, that isn't necessarily explicitly trying to engineer AGI but effectively involves a dayjob of burning the capabilities commons.

Below that would be to work on straightforward ML tooling that has generalist applications; things like working on frameworks (PyTorch, wandb.ai, etc.), computer hardware designed explicitly for ML, or for companies like Scale.

Somewhere deep below that is making money for or investing in the parent companies that pioneer these things (Facebook, Microsoft, Google). Depending on specifics you can lump in certain more general types of computer engineering work here.

After that though, I think if you just donate a reasonable fraction of your income to charity, or AI alignment enterprises, you're probably net positive. It's really not that complicated: if you're making or contributing to research that pushes the boundary of artificial intelligence, then... stop doing that.

[-]Logan Zoellner4y10

if you're making or contributing to research that pushes the boundary of artificial intelligence, then... stop doing that.

Given that we currently don't know how to build aligned AI, solving the AI Alignment problem by definition is going to require research that pushes the bounds of artificial intelligence. The advice you're giving is basically that anyone concerned about AI Alignment should self-select out of doing that research. Which seems like the opposite of help.

[-]lc4y*00

Given that we currently don't know how to build aligned AI, solving the AI Alignment problem by definition is going to require research that pushes the bounds of artificial intelligence.

This is an extraordinarily vague statement that is technically true but doesn't imply anything you seem to think it means. There's a fairly clear venn diagram between alignment research and capabilities research. On one side of the diagram is most things that make OpenAI more money and on the other side is Paul Christiano's transparency stuff.

The advice you're giving is basically that anyone concerned about AI Alignment should self-select out of doing that research.

If it's the research that burns the capabilities commons while there's lots of alignment tasks left to be done, or people to convince, then yes, that seems prudent.

[-]Logan Zoellner4y20

There's a fairly clear venn diagram between alignment research and capabilities research.

This appears to be the crux of our disagreement. I do not think the venn diagram is clear at all. But if I had to guess, I think there is a large overlap between "make an AI that doesn't spew out racist garbage" and "make an AI that doesn't murder us all".

[-]Yonatan Cale4y20

Subscribe to this comment to get notified about questions about other companies.

Do not reply to this comment please

[-]Yonatan Cale4y10

Next Silicon: They make chips for super computers that are not optimized for neural networks

[-]RHollerith4y30

OK, I'll answer, because I was asked directly.

Next Silicon's site gives no details on their plans and they say right away on the linked page "in stealth mode", so all I know about them is that they make chips for super computers that are not optimized for neural networks.

I'd guess that it is less risky for 40 people to go to work for Next Silicon than for one person to go into AI capability research. But it is safer still if nobody went to work for either group.

There are computing jobs that lower x-risk. One such job is to make it easier for people to publish or access information (like the people who run this site do).

[-]Yonatan Cale4y10

(Thanks!)

I am strongly considering working at arXiv, which would make it easier for people to publish or access information.

Some say that if I make it too good, I could accidentally fix ML research which would be bad.

Any opinions?

Moderation Log