Ishual — LessWrong

Safety researchers should take a public stance

This sounds to me like there would actually be specific opportunities to express some of your true beliefs that you wouldn't worry would cost you a lot (and some other opportunities where you would worry and not do them). Would you agree with that?

Safety researchers should take a public stance

Ishual6d41

I agree that people should clearly state that they think there's a catastrophic risk, but I disagree that people should clearly state that they think we should pause.

If we premise (as this post does) on the fact that the person we are talking about actually believes that an international ban would be a great improvement over the current mad AI race, then the above quote seems wrong to me.

I agree that experts should not pretend like they have more authority than they do in judging whether we should pause. But they could still say 1) that the race is insane, 2) that an international ban seems like a great improvement, 3) that if such a ban was proposed, they would not oppose it and 4) they would in fact support it. If not the experts, then who? To be clear, I don't think the experts within the lab racing to build the tech are necessary here (this is not what the post is about). There are experts outside of the lab also (and they don't have the [huge conflicts of interest]/pressure to filter(/falsify?) their speech). But if not the experts, then who would be better placed to say the above? if there is no one to say it, how does it get understood? if it doesn't get understood, coordination to actually move out of the status quo towards some kind of international agreement is much harder. The CEOs of some of the lab could say it and that would definitely have an impact, but will they (lol)? Politicians could say it, but probably the backing of many experts would make this much easier for the politicians to say.

I think "there are catastrophic risks" is way too weak and doesn't substitute. Partly because "there are catastrophic risk, so please give more money to me/so put me in charge/so we must beat those less careful folks" are also possible readings. I also happen to have it on very good authority that some politicians, when informed that many experts recognize the risks of extinctions and told the reasons why we should stop the mad AI race, will ask "but do the experts support stopping?" with perhaps a side of ("or do they just want more money for their thing?")

Safety researchers should take a public stance

Ishual6d83

I probably should have said "norm execution" (ie follow the norm). This might just be a cultural gap, but I think norm enforcement/execution/implementation works in many ways that are not threats. For instance, there is pizza at a conference. there is a norm that you shouldn't take all the pizza if there is a big line behind you. some people break this norm. what happens? do they get threatened? no! they just get dirty looks and people talking behind their backs. maybe they get the reputation as the "pizza taker". In fact, nobody necessarily told them before this happened that taking all the pizza would break the norm.

I think there is a strange presumption that one is owed my and others' maximum respect and friendship. anything less than that would be a "punishment". that is pretty strange. if I have money in my pocket but I will only give some to you based on how many "good deeds" I have seen you do, this is not a threat. I guess that if you did not understand the motives or if the motives were actually to get a specific person to do more "good deeds" (by telling them in advance what the reward would be), you could call it a bribe. but calling it a threat is obviously incorrect.

I think norm enforcement/execution/implementation can and is in my case motivated by an aesthetic preference for "points" that are person A to give such as respect and friendship 1) not go to someone who does not deserve them (in my eyes) and instead 2) go to someone who does deserve them. It is not primarily driven by a consequentialist desire for more people to do respect-and-friendship-deserving things. It is primarily driven by a desire for the points to match reality, and thus enable greater cooperation and further good things down the line.

I realized based on a few comments that the three norms I discuss in the post were seen by some as like one giant strategy to produce more public stances from safety researchers. This is not the case. I am just talking to three different audiences and I explain a norm that I think makes sense (independently) for them.

Safety researchers should take a public stance

Ishual7d10

(optional: my other comment is more important imo)

I'm not concerned about someone being fired for this kind of thing, that would be pretty unwise on the labs' part as you risk creating a martyr

I think you ascribe too much competence/foresight/focus/care to the labs. I'd be willing to bet that multiple (safety?) people have been fired from labs in a way that would make the lab look pretty bad. Labs make tactical mistakes sometimes. Wasn't there a thing at OpenAI for instance (lol)? Of course it is possible(/probable?) that they would not fire in a given case due to sufficient "wisdom", but we should not assign an extreme likelihood to that.

Safety researchers should take a public stance

Ishual7d*40

Rather, I'm concerned about eg senior figures thinking worse of safety researchers as a whole because it causes a PR headache, eg viewing them as radical troublemakers, and this making theories of impact around influencing specific senior decision makers harder (and I'm more optimistic about those, personally)

Thank you Neel for stating this explicitly. I think this is very valuable information. This matches what some of my friends told me privately also. I would appreciate it a lot if you could give a rough estimate of your confidence that this would happen (ideally some probability/percentage). Additionally, I would appreciate if you could say whether you'd expect such a consequence to be legible/visible or illegible (once it had happened). Finally, are there legible reasons you could share for your estimated credence that this would happen?

(to be clear: I am sad that you are operating under such conditions. I consider this evidence against expecting meaningful impact from the inside at your lab.)

Safety researchers should take a public stance

Ishual7d20

I conclude from this that you really do see this post as a threat (also you admitted there is no threat in your first comment so this comment now seems contradictory/bad-faith).

some thoughts:
- this isn't a threat by proxy and isn't a threat (but if it were a tbp then it would be a t sure)
- I am in the "others" group. I implement the norm I endorse in the post, and I am not threatening you. I don't want to sound dismissive but you are not giving me a lot to work with here, and it sounds to me like either 1) you have a vague model of what a threat is that includes things that aren't threats or 2) you are misunderstanding the post and our intent such that you model us as having made a threat.

we say what we think you should do as a safety researcher. not a threat. it is a recommendation.
separately, we say how we think others should relate to safety researchers in a way that is more robust and functional. Maybe I should clarify that if safety researchers don't take a public stance you find acceptable, you shouldn't be sad that they "called your bluff" (because I don't endorse you bluffing or threatening). You should not be doing this to change individual safety researchers actions. You should be doing this for the benefit of being less foolable and choosing where you put your respect/friendship in a way more functional for society and more beneficial for you. I would endorse this part of the norm even if not a single additional safety researcher took a public stance (heck, even if some of them tried to invert my preferences by removing their public stance, even then would I endorse this norm. partly because I endorse not giving in to actual threats, but also because it would still be a good norm to have on net).

Safety researchers should take a public stance

Ishual7d30

if it was genuinely all the authors of this post wanted then I suggest they write a different post

Leo's statement is quite good without being all we wanted. (indeed, of the 3 things we wanted, 1 is about how we think it makes sense for others to relate to safety researcher based on what they say/[don't say] publicly. and 1 is about trying to shift the lab's behavior toward it being legibly safe for employees to say various things, which Leo's comment is not about.) I internally track a pretty crucial difference between what I want to happen in the world (ie that we shift from plan B to plan A somehow) and how I believe people ought to relate to the public stance/[lack thereof] of safety researchers within frontier labs. I think there are maybe stronger stances Leo could have taken, and weaker ones, and I endorse having the way I relate/model/[act towards] Leo depend on which he takes. I think the public stance that would max lead to me maximally relating well to a safety researcher ought to be something like "I think coordinating to stop the race (even if in the form of some ban which I won't choose the exact details of) would be better than the current race to ever more capable AI. I would support such coordination. I am currently trying to make the situation better in case there is no such coordination, but I don't think the current situation is sufficiently promising to justify not coordinating. Also there is a real threat of humanity's extinction if we don't coordinate." (or something to that effect)

Safety researchers should take a public stance

Ishual7d60

Seeing the post as a threat misses the intended point. It is important to state explicitly: The goal of the three norms argued for in the post was never to force people to publicly support something they don't in fact believe in. It was also never to force people to be more honest about what they believe. The post explicitly says what we think you should be doing, so that there can be a discussion about it. But the norm enforcement part is about what we think others (who are not necessarily working at frontier labs) should be doing.

Separately, I am not sure I understood what you meant by "I aim to not give in to 'norm enforcement'", but it seems to me that there is a culture inside the labs that make many people working there uncomfortable taking a public stance. To be more explicit, does that also activate your will to not give in to 'norm enforcement'? (if not, why not?)

> I think you may want to rethink your models of how norm enforcement works.

I didn't get what you were trying to communicate here. Continuing to rethink (publicly) my models of how norm enforcement works is why we wrote this post on LW.

Safety researchers should take a public stance

Ishual8d30

Thank you. I endorse (and internally implement) the norm of respecting you based on this :)

Shortform

Ishual3mo10

One potential issue with "non-EA ideologies don’t even care about stars" is that in biological humans, ideologies don't get transmitted perfectly across generations.

It might matter (a lot) whether [the descendent of the humans currently subscribing to "non-EA ideologies" who end up caring about stars] feel trapped in an "unfair deal".

The above problem might be mitigated by allowing migration between the two zones (as long as the rules of the zones are respected). (ie the children of the star-dwellers who want to come back can do so unless they would break the invariants that allow earth-dwellers to be happy with perhaps some extra leeway/accommodation beyond what is allowed for native earth-dwellers and the children of earth-dwellers who want to start their own colony have some room to do so, reserved in the contract)

one potential source of other people's disagreement is the following intuition: "surely once the star-dwellers expand, they will use their overwhelming power to conquer the earth." Related to this intuition is the fact that expansion which starts out exponential will eventually be bounded by cubic growth (and eventually quadratic, due to gravitational effects, etc...) Basically, a deal is struck now in conditions of plenty, but eventually resources will grow scarce and the balance of power will decay to nothing by then.

LESSWRONG
Petrov Day
LW

LESSWRONG
Petrov Day
LW

Posts

Wikitag Contributions

Comments