Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.

Andrew_Critch

People often attack frontier AI labs for "hypocrisy" when the labs admit publicly that AI is an extinction threat to humanity. Often these attacks ignore the difference between various kinds of hypocrisy, some of which are good, including what I'll call "reformative hypocrisy". Attacking good kinds of hypocrisy can be actively harmful for humanity's ability to survive, and as far as I can tell we (humans) usually shouldn't do that when our survival is on the line. Arguably, reformative hypocrisy shouldn't even be called hypocrisy, due to the negative connotations of "hypocrisy". That said, bad forms of hypocrisy can be disguised as the reformative kind for long periods, so it's important to pay enough attention to hypocrisy to actually figure out what kind it is.

Here's what I mean, by way of examples:

***

0. No Hypocrisy —
Lab: "Building AGI without regulation shouldn't be allowed. Since there's no AGI regulation, I'm not going to build AGI."
Meanwhile, the lab doesn't build AGI. This is a case of honest behavior, and what many would consider very high integrity. However, it's not obviously better, and arguably sometimes worse, than...

1. Reformative Hypocrisy:
Lab: "Absent adequate regulation for it, building AGI shouldn't be allowed at all, and right now there is no adequate regulation for it. Anyway, I'm building AGI, and calling for regulation, and making lots of money as I go, which helps me prove the point that AGI is powerful and needs to be regulated."
Meanwhile, the lab builds AGI and calls for regulation. So, this is a case of honest hypocrisy. I think this is straightforwardly better than...

2. Erosive Hypocrisy:
Lab: "Building AGI without regulation shouldn't be allowed, but it is, so I'm going to build it anyway and see how that goes; the regulatory approach to safety is hopeless."
Meanwhile, the lab builds AGI and doesn't otherwise put efforts into supporting regulation. This could also be a case of honest hypocrisy, but it erodes the norm that AGI should regulated rather than supporting it.

Some even worse forms of hypocrisy include...

3. Dishonest Hypocrisy, which comes in at least two importantly distinct flavors:

a) feigning abstinence:
Lab: "AGI shouldn't be allowed."
Meanwhile, the lab secretly builds AGI, contrary to what one might otherwise guess according to their stance that building AGI is maybe a bad thing, from a should-it-be-allowed perspective.

b) feigning opposition:
Lab: "AGI should be regulated."
Meanwhile, the lab overtly builds AGI, while covertly trying to confuse and subvert regulatory efforts wherever possible.

***

It's important to remain aware that reformative hypocrisy can be on net a better thing to do for the world than avoiding hypocrisy completely. It allows you to divert resources from the thing you think should be stopped, and to use those resources to help stop the thing. For mathy people, I'd say this is a way of diagonalizing against a potentially harmful thing, by turning the thing against itself, or against the harmful aspects of itself. For life sciencey people, I'd say this is how homeostasis is preserved, through negative feedback loops whereby bad stuff feeds mechanisms that reduce the bad stuff.

Of course, a strategy of feigning opposition (3a) can disguise itself as reformative hypocrisy, so it can be hard to distinguish the two. For example, if a lab says for long time that they're going to admit their hypocritical stance, and then never actually does, then it turns out to be dishonest hypocrisy. On the other hand, if the dishonesty ever does finally end in a way that honestly calls for reform, it's good to reward the honest and reformative aspects of their behavior. Note also that, it's not reformative, even honest hypocrisy can erode positive norms as in (2), by overtly denegrating the idea of even establishing norms. So the key distinction is not just to avoid supporting dishonesty, but to specifically reward honesty that takes action in support of broader reform.

In summary, what I'm suggesting is to pay close attention to the three different kinds of hypocrisy above, and close enough attention to actually distinguish between them and treat them separately, without being fooled as to which one is which. This can be a lot of work, but it's important work that is necessary to create the right incentives when you are in the habit of criticizing people for hypocrisy. The key is to make sure that all hypocrisy is sufficiently actively reformative. Otherwise, it's not part of a homeostatic loop, and hence not a positive contribution to a working survival strategy when the stakes are existential.

That's all for now. Happy Tuesday :)

Thanks. I think a bunch of discussions I've seen or been part of could have been more focused by establishing whether the crux was "1 is bad" vs "I think this is an instance of 3, not 1".

Do you think [playing in a rat race because it's the most locally optimal for an individual thing to do while at the same advocating for abolishing the rat race] is an example of reformative hypocrisy?

Or even more broadly, defecting in a prisoner's dilemma while exposing an interface that would allow cooperation with other like-minded players?

I've had this concept for many years and it hasn't occurred to me to give it a name (How Stupid Not To Have Thought Of That) but if I tried to give it a name, I definitely wouldn't call it a kind of hypocrisy.

A "Short-term Honesty Sacrifice", "Hypocrisy Gambit", something like that?

It's better but still not quite. When you play on two levels, sometimes the best strategy involves a pair of (level 1 and 2) substrategies that are seemingly opposites of each other. I don't think there's anything hypocritical about that.

Similarly, hedging is not hypocrisy.

Thinking about this post for a bit shifted my view of Elon Musk a bit. He gets flack for calling for an AI pause, and then going and starting an AGI lab, and I now think that's unfair.

I think his overall strategic takes are harmful, but I do credit him with being basically the only would-be AGI-builder who seems to me to be engaged in a reformative hypocrisy strategy. For one thing, it sounds like he went out of his way to try to get AI regulated (talking to congress, talking to the governors), and supported SB-1047.

I think it's actually not that unreasonable to shout "Yo! This is dangerous! This should be regulated, and controlled democratically!", see that that's not happening, and then go and try do it in a way that you think is better.

That seems like possibly an example of "follower-conditional leadership." Taking real action to shift to the better equilibrium, failing, and then going back to the dominant strategy given the inadequate equilibrium that exists.

Obviously he has different beliefs than I do, and than my culture does, about what is required for a good outcome. I think he's still causing vast harms, but I think he doesn't deserve the eye-roll for founding another AGI lab after calling for everyone to stop.

Thanks for expressing this perspective.

I note Musk was the first one to start a competitor, which seems to me to be very costly.

I think that founding OpenAI could be right if the non-profit structure was likely to work out. I don't know if that made sense at the time. Altman has overpowered getting fired by the board, removed parts of the board, and rumor has it he is moving to a for-profit, which is strong evidence against the non-profit being able to withstand the pressures that were coming, but even without Altman I suspect it would still involve billions of $ of funding, partnerships like the one with Microsoft, and other for-profit pressures to be the sort of player it is today. So I don't know that Musk's plan was viable at all.

I suspect it would still involve billions of $ of funding, partnerships like the one with Microsoft, and other for-profit pressures to be the sort of player it is today. So I don't know that Musk's plan was viable at all.

Note that all of this happened before the scaling hypothesis was really formulated, much less made obvious.

We now know, with the benefit of hindsight, that developing AI and it's precursors is extremely compute intensive, which means capital intensive. There was some reason to guess this might be true at the time, but it wasn't a forgone conclusion—it was still an open question if the key to AGI would be mostly some technical innovation that hadn't been developed yet.

Hm, but I note others at the time felt it clear that this would exacerbate the competition (1, 2).

"before the scaling hypothesis was really formulated" When do you think that was? Deep Blue obviously used a lot of scaling. So did AlexNet.

I'm tempted to say when this page was first published.

Of course the core idea is older than that, going back to at least Moravec's calculations of when AGI would arrive based on number of neuron in the retina in Mind Children (1988). But until the late 2010s that point there was still a lot of uncertainty about the relative contribution of algorithms and compute to general AI. Hence the "brain in a box in a basement" model.

Perhaps it would be more correct to say "before the scaling hypothesis was really validated".

In case 1, if I don't know how to make a safe AGI while preventing an unsafe AGI, and no-one else does (i.e. the current state of the art), what regulations would I be calling for?

I agree with the overall message you're trying to convey, but I think you need a new name for the concept. None of the things you're pointing to are hypocrisies at all (and in fact the one thing you call "no hipocrisy" is actually a non sequitur). To give an analogue, the fact that someone advocates for higher taxes and at the same time does not donate money to the government does not make them a hypocrite (much less a "dishonest hypocrite").

I disagree with your taxonomy and ranking (I find holier-than-thou sanctimony more loathsome than straightforward deception), but agree it's a shame that the same word is used both for one trying to better an imperfect world while living in it as well as for one criticizing others for actions that mutatis mutandis one engages in oneself.