Beware safety-washing

Lizka

Tl;dr: don’t be fooled into thinking that some groups working on AI are taking “safety” concerns seriously (enough).^[1]

Outline

Note: I’m posting this in my personal capacity. All views expressed here are my own. I am also not (at all) an expert on the topic. [Also, I originally wrote this for the EA Forum and have not changed it for the cross-post. I don't know what that changes, but it might change something.]

Two non-AI examples

Greenwashing

Companies “greenwash” when they mislead people into incorrectly thinking that their products or practices are climate and environment-friendly (or that the company focuses on climate-friendly work).

Investopedia explains:

Greenwashing is an attempt to capitalize on the growing demand for environmentally sound products.
The term originated in the 1960s, when the hotel industry devised one of the most blatant examples of greenwashing. They placed notices in hotel rooms asking guests to reuse their towels to save the environment. The hotels enjoyed the benefit of lower laundry costs.
- Wikipedia: “[Jay Westerveld, the originator of the term] concluded that often the real objective was increased profit, and labeled this and other profitable-but-ineffective ‘environmentally-conscientious’ acts as greenwashing.” (Wikipedia also provides a long list of examples of the practice.)

I enjoy some of the parody/art (responding to things like this) that comes out of noticing the hypocrisy of the practice.

Humanewashing

A similar phenomenon is the “humanewashing” of animal products. There’s a Vox article that explains this phenomenon (as it happens in the US):

A carton of “all natural” eggs might bear an illustration of a rustic farm; packages of chicken meat are touted as “humanely raised."
In a few cases, these sunny depictions are accurate. But far too often they mask the industrial conditions under which these animals were raised and slaughtered.
Animal welfare and consumer protection advocates have a name for such misleading labeling: “humanewashing.” And research suggests it’s having precisely the effect that meat producers intend it to. A recent national survey by C.O.nxt, a food marketing firm, found that animal welfare and “natural” claims on meat, dairy, and egg packaging increased the intent to purchase for over half of consumers.
[...]
...rather than engaging in the costly endeavor of actually changing their farming practices, far too many major meat producers are attempting to assuage consumer concerns by merely changing their packaging and advertising with claims of sustainable farms and humane treatment. These efforts mislead consumers, and undermine the small sliver of farmers who have put in the hard work to actually improve animal treatment.

If you want a resource on what food labels actually mean, here are some: one, two, three (these are most useful in the US). (If you know of a better one, please let me know. I’d especially love a resource that lists the estimated relative value of things like “free-range” vs. “cage-free,” etc., according to cited and reasonable sources.)

Definition of safety-washing

In brief, “safety-washing” is misleading people into thinking that some products or practices are “safe” or that safety is a big priority for a given company, when this is not the case.

An increasing number of people believe that developing powerful AI systems is very dangerous,^[2] so companies might want to show that they are being “safe” in their work on AI.^[3]

Being safe with AI is hard and potentially costly,^[4] so if you’re a company working on AI capabilities, you might want to overstate the extent to which you focus on “safety.”

So you might:

Pick a safety paradigm that is convenient for you, and focus on that
Talk about “safety” when you really mean other kinds of things the public might want an AI to be, like un-biased and not-hateful
Start or grow a safety team, feature it in media about your work (or conversations with safety-oriented people), but not give it a lot of power
Promote the idea that AI safety concerns are crazy
And more

Some of these things might be better than doing nothing for safety concerns, but overall, (safety-)washing causes some problems (discussed in the next section), which in turn worsens the situation with risk from AI.

What are the harms?

I don’t have the time to write a careful report on the matter, but here are some issues that I think arise from greenwashing, humane-washing, and safety-washing:

Confusion: People working on the issue (and the general public) get confused about what really matters — terms lose their meanings, groups lose focus, etc.
- E.g. Some people who want to help the climate think that it’s important to encourage the reuse of towels instead of avoiding harmful products (or focusing on more effective methods for fighting climate change).
Accidental harm: People are misled about what companies are doing, which in turn leads to people doing directly harmful things they didn’t intend to do
- E.g. This encourages people to work for harmful companies/projects or to support them financially because they’re not aware of the harm the companies cause.
False security: Causes a false sense of safety/goodness/progress (which can lead to insufficient mitigation of the harm caused, a lack of other kinds of preparation, and other problems)
- E.g. someone who successfully convinces some groups to focus on “eating local” may think that the tide is turning on the environmental impacts from food, even though this is not the key issue (or the most effective area of work for fighting climate change).
Thwarted incentive: Reduces the incentive for companies to actually reduce the harm they (might) cause
- If you’re a company and you can get away with labeling your product as safe/green/humane, which gets you the benefit of consumer approval and a lack of hate, you don’t need to put in extra work to actually make your work safe/green/humane.
And more?

What can (and should) we do about this?

Some things that come to mind:

To counteract confusion, we can try to be more specific in explanations about “safety” or “humane conditions” or use more specific terms like “existential safety”
To counteract our own confusion, we could encourage (even) more distillation of content and external validation of work
Stare into the abyss about the possibility that our work is not useful (or is harmful), and seek external reviews and criticism
We could also create or support standards for safety or external validation systems (like Certified Humane), and evaluate projects against that (e.g.) (although versions of this might be gameable, and we should beware new “standards” for the usual reasons).
Call out safety-washing (and other kinds of washing).
Call out organizations doing things that are bad on their merits, and be clear about why what they showcase as safety-oriented work (or efforts to be more humane, etc.) insufficiently address the risks and harms of their work.

How important or promising is all of this as an approach or a type of work to focus on? I’m not sure — I’d guess that it’s not the most valuable thing to focus on for most people, but would be interested in other people’s thoughts. My main motivation for writing this was that I think the phenomenon of safety-washing exists and will become more prominent, and we should keep an eye out for it.

I'm a bit swamped and may not respond to comments, but will probably read them and will be very grateful for them (including for corrections and disagreements!).

"Safety-washing" might also be spelled "safetywashing." I don't know which is better or more common, and have gone with the former here.

^{^}
After I wrote a draft of this post, I noticed that there was a very similar post on LessWrong. I should have checked earlier, but I’m posting this anyway as it is slightly different (and somewhat more detailed) and because some Forum users may not have seen the LW version.
^{^}
Here are some resources you can explore on this topic if you want to learn more: one, two, three, four, five, six, seven.
^{^}
Safety isn’t the only thing that people care about, in terms of ethical concerns about AI, and it’s probably not the most popular concern. I’m focusing on safety in this post. Other concerns have been discussed in e.g. Forbes: Forbes discusses AI Ethics washing (paywalled) — “AI Ethics washing entails giving lip service or window dressing to claimed caring concerns about AI Ethics precepts, including at times not only failing to especially abide by Ethical AI approaches but even going so far as to subvert or undercut AI Ethics approaches.” I only skimmed the article but it seems to focus on self-driving cars as its motivating example. It also separates “washers” into four groups; those who wash by ignorance, by good-motivations-stretched-or-slipped, by stretching-the-truth or spinning it, and those who brazenly lie. It also describes “Ethics Theatre” — making a big show of your ethics work, “Ethics shopping” — picking the guidelines that are easiest to adopt, “Ethics bashing” — e.g. insisting the guidelines are worthless or a cover-up, “Ethics Shielding” — I didn’t quite follow this one, “Ethics Fairwashing” — specifically focusing on claims that an AI is fair when it isn’t.
^{^}
If you think that AI risk is not miniscule, then being safe (even if it means being slow) is also in your interests — see this section of “Let’s think about slowing down AI.” But maybe you think safety concerns are overblown, and you’re just viewing safety efforts as appeasement of the risk-concerned crowd. Or you have myopic incentives, etc. In that case, you might think that being safe just slows you down and wastes your resources.

[-]TW1233y90

[Realized this is contained in a footnote, but leaving this comment here in case anyone missed it].

[-]Jon Garcia2y30

Well, if you could solve the problem of companies X-washing (persuading consumers to buy from them by only pretending to alleviate their concerns), then you would probably be able to solve deceptive alignment as well.

LESSWRONG
LW

51