WillPetillo - LessWrong

Why does LW not put much more focus on AI governance and outreach?

Selection bias. Those of us who were inclined to consider working on outreach and governance have joined groups like PauseAI, StopAI, and other orgs. A few of us reach back on occasion to say "Come on in, the water's fine!" The real head-scratcher for me is the lack of engagement on this topic. If one wants to deliberate on a much higher level of detail than the average person, cool--it takes all kinds to make a world. But come on, this is obviously high stakes enough to merit attention.

Explaining the Joke: Pausing is The Way

WillPetillo17d10

Thanks for the link! It's important to distinguish here between:

(1) support for the movement,
(2) support for the cause, and
(3) active support for the movement (i.e. attracting other activists to show up at future demonstrations)

Most of the paper focuses on 1, and also on activist's beliefs about the impact of their actions. I am more interested in 2 and 3. To be fair, the paper gives some evidence for detrimental impacts on 2 in the Trump example. It's not clear, however, whether the nature of the cause matters here. Support for Trump is highly polarized and entangled with culture, whereas global warming (Hallam's cause) and AI risk (PauseAI's) have relatively broad but frustratingly lukewarm public support. There are also many other factors when looking past short-term onlooker sentiment to the larger question of affecting social change, which the paper readily admits in the Discussion section. I'd list these points, but they largely overlap with the points I made in my post...though it was interesting to see how much was speculative. More research is needed.

In any case, I bring up the extreme case to illustrate that the issue is far more nuanced than "regular people get squeamish--net negative!" This is actually somewhat irrelevant to PauseAI in particular, because most of our actions are around public education and lobbying, and even the protests are legal and non-disruptive. I've been in two myself and have seen nothing but positive sentiment from onlookers (with the exception of the occasional "good luck with that!" snark). The hard part with all of these is getting people to show up. (This last paragraph is not a rebuttal to anything you have said, it's a reminder of context)

PauseAI and E/Acc Should Switch Sides

WillPetillo19d10

My conclusion is an admittedly weaksauce non-argument, included primarily to prevent misinterpretation of my actual beliefs. I am working on a rebuttal, but it's taking longer than I planned. For now, see: Holly Elmore's case for AI Safety Advocacy to the Public.

FAQ: What the heck is goal agnosticism?

WillPetillo2mo30

I want to push harder on Q33: "Isn't goal agnosticism pretty fragile? Aren't there strong pressures pushing anything tool-like towards more direct agency?"

In particular, the answer: "Being unable to specify a sufficiently precise goal to get your desired behavior out of an optimizer isn't merely dangerous, it's useless!" seems true to some degree, but incomplete. Let's use a specific hypothetical of a stock-trading company employing an AI system to maximize profits. They want the system to be agentic because this takes the humans out of the loop on actually getting profits, but also understand that there is a risk that the system will discover unexpected/undesired methods of achieving its goals like insider trading. There are a couple of core problems:

1. Externalized Cost: if the system can cover its tracks well enough that the company doesn't suffer any legal consequences for its illegal behavior, then the effects of insider trading on the market are "somebody else's problem."
2. Irreversible Mistake: if the company is overly optimistic about their ability to control their system, doesn't understand the risks, etc. then they might use it despite regretting this decision later. On a large scale, this might be self-correcting if some companies have problems with AI agents and this gives the latter a bad reputation, but that assumes there are lots of small problems before a big one.

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

WillPetillo2mo180

Glad to hear it! If you want more detail, feel free to come by the Discord Server or send me a Direct Message. I run the welcome meetings for new members and am always happy to describe aspects of the org's methodology that aren't obvious from the outside and can also connect you with members who have done a lot more on-the-ground protesting and flyering than I have.

As someone who got into this without much prior experience in activism, I was surprised how much subtlety and counterintuitive best practices there are, most of which is learned through direct experience combined with direct mentorship, as opposed to written down & formalized. I made an attempt to synthesize many of the code ideas in this video--it's from a year ago and looking over it there is quite a bit I would change (spend less time on some philosophical ideas, add more detail re specific methods), but it mostly holds up OK.

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

WillPetillo2mo8817

If you want to get an informed opinion on how the general public perceives PauseAI, get a t-shirt and hand out some flyers in a high foot-traffic public space. If you want to be formal about it, bring a clipboard, track whatever seems interesting in advance, and share your results. It might not be publishable on an academic forum, but you could do it next week.

Here's what I expect you to find, based on my own experience and the reports of basically everyone who has done this:
- No one likes flyers, but get a lot more interested if you can catch their attention enough to say it's about AI.
- Everyone hates AI.
- Your biggest initial skepticism will be from people who think you are in favor of AI.
- Your biggest actual pushback will be from people who think that social change is impossible.
- Roughly 1/4 to 1/2 are amenable to (or have already heard about!) x-risk, most of the rest won't actively disagree but you can tell that particular message is not really "landing" and pay a lot more attention if you talk about something else (unemployment, military applications, deepfakes, etc.)
- Bring a clipboard for signups. Even if recruitment isn't your goal, if you don't have one you'll feel unprepared when people ask about it.

Also, protests are about Overton-window shifting, making AI danger a thing that is acceptable to talk about. And even if it makes a specific org look "fringe" (not a given, as Holly has argued), that isn't necessarily a bad thing for the underlying cause. For example, if I see an XR protest, my thought is (well, was before I knew the underlying methodology): "Ugh, those protestors...I mean, I like what they are fighting for and more really needs to be done, but I don't like the way they go about it" Notice that middle part. Activation of a sympathetic but passive audience was the point. That's a win from their perspective. And the people who are put off by methods then go on to (be more likely to) join allied organizations that believe the same things but use more moderate tactics. The even bigger win is when the enthusiasm catches the attention of people who want to be involved but are looking for orgs that are the "real deal," as measured by willingness to put effort where their words are.

The Failed Strategy of Artificial Intelligence Doomers

WillPetillo3mo41

Before jumping into critique, the good:
- Kudos to Ben Pace for seeking out and actively engaging with contrary viewpoints
- The outline of the x-risk argument and history of the AI safety movement seem generally factually accurate

The author of the article makes quite a few claims about the details of PauseAI's proposal, its political implications, the motivations of its members and leaders...all without actually joining the public Discord server, participating in the open Q&A new member welcome meetings (I know this because I host them), or even showing evidence of spending more than 10 minutes on the website. All of these basic research opportunities were readily available and would have taken far less time than spent on writing the article. This tells you everything you need to know about the author's integrity, motivations, and trustworthiness.

That said, the article raises an important question: "buy time for what?" The short answer is: "the real value of a Pause is the coordination we get along the way." Something as big as an international treaty doesn't just drop out of the sky because some powerful force emerged and made it happen against everyone else's will. Think about the end goal and work backwards:

1) An international treaty requires
2) Provisions for monitoring and enforcement,
3) Negotiated between nations,
4) Each of whom genuinely buys in to the underlying need
5) And is politically capable of acting on that interest because it represents the interests of their constituents
6) Because the general public understands AI and its implications enough to care about it
7) And feels empowered to express that concern through an accessible democratic process
8) And is correct in this sense of empowerment because their interests are not overridden by Big Tech lobbying
9) Or distracted into incoherence by internal divisions and polarization

An organization like PauseAI can only have one "banner" ask (1), but (2-9) are instrumentally necessary--and if those were in place, I don't think it's at all unreasonable to assume society would be in a better position to navigate AI risk.

Side note: my objection to the term "doomer" is that it implies a belief that humanity will fail to coordinate, solve alignment in time, or be saved by any other means, and thus will actually be killed off by AI--which seems like it deserves a distinct category from those who simply believe that the risk of extinction by default is real.

What if Alignment is Not Enough?

WillPetillo3mo10

I'd like to attempt a compact way to describe the core dilemma being expressed here.

Consider the expression: y = x^a - x^b, where 'y' represents the impact of AI on the world (positive is good), 'x' represents the AI's capability, 'a' represents the rate at which the power of the control system scales, and 'b' represents the rate at which the surface area of the system that needs to be controlled (for it to stay safe) scales.

(Note that this is assuming somewhat ideal conditions, where we don't have to worry about humans directing AI towards destructive ends via selfishness, carelessness, malice, etc.)

If b > a, then as x increases, y gets increasingly negative. Indeed, y can only be positive when x is less than 1. But this represents a severe limitation on capabilities, enough to prevent it from doing anything significant enough to hold the world on track towards a safe future, such as preventing other AIs from being developed.

There are two premises here, and thus two relevant lines of inquiry:
1) b > a, meaning that complexity scales faster than control.
2) When x < 1, AI can't accomplish anything significant enough to avert disaster.

Arguments and thought experiments where the AI builds powerful security systems can be categorized as challenges to premise 1; thought experiments where the AI limits its range of actions to prevent unwanted side effects--while simultaneously preventing destruction from other sources (including other AIs built)--are challenges to premise 2.

Both of these premises seem like factual statements relating to how AI actually works. I am not sure what to look for in terms of proving them (I've seen some writing on this relating to control theory, but the logic was a bit too complex for me to follow at the time).

What if Alignment is Not Enough?

WillPetillo3mo21

I actually don't think the disagreement here is one of definitions. Looking up Webster's definition of control, the most relevant meaning is: "a device or mechanism used to regulate or guide the operation of a machine, apparatus, or system." This seems...fine? Maybe we might differ on some nuances if we really drove down into the details, but I think the more significant difference here is the relevant context.

Absent some minor quibbles, I'd be willing to concede that an AI-powered HelperBot could control the placement of a chair, within reasonable bounds of precision, with a reasonably low failure rate. I'm not particularly worried about it, say, slamming the chair down too hard, causing a splinter to fly into its circuitry and transform it into MurderBot. Nor am I worried about the chair placement setting off some weird "butterfly effect" that somehow has the same result. I'm going to go out on a limb and just say that chair placement seems like a pretty safe activity, at least when considered in isolation.

The reason I used the analogy "I may well be able to learn the thing if I am smart enough, but I won't be able to control for the person I will become afterwards" is because that is an example of the kind of reference class of context that SNC is concerned with. Another is: "what is expected shift to the global equilibrium if I construct this new invention X to solve problem Y?" In your chair analogy, this would be like the process of learning to place the chair (rewiring some aspect of its thinking process), or inventing an upgraded chair and releasing this novel product into the economy (changing its environmental context). This is still a somewhat silly toy example, but hopefully you see the distinction between these types of processes vs. the relatively straightforward matter of placing a physical object. It isn't so much about straightforward mistakes (though those can be relevant), as it is about introducing changes to the environment that shift its point of equilibrium. Remember, AGI is a nontrivial thing that affects the world in nontrivial ways, so these ripple effects (including feedback loops that affect the AGI itself) need to be accounted for, even if that isn't a class of problem that today's engineers often bother with because it Isn't Their Job.

Re human-caused doom, I should clarify that the validity of SNC does not depend on humanity not self destructing without AI. Granted, if people kill themselves off before AI gets the chance, SNC becomes irrelevant. Similarly, if the alignment problem as it is commonly understood by Yudkowsky et. al. is not solved pre-AGI and a rogue AI turns the world into paperclips or whatever, that would not make SNC invalid, only irrelevant. By analogy, global warming isn't going to prevent the Sun from exploding, even though the former could very well affect how much people care about the latter.

Your second point about the relative strengths of the destructive forces is a relevant crux. Yes, values are an attractor force. Yes, an ASI could come up with some impressive security systems that would probably thwart human hackers. The core idea that I want readers to take from this sequence is recognition of the reference class of challenges that such a security system is up against. If you can see that, then questions of precisely how powerful various attractor states are and how these relative power levels scale with complexity can be investigated rigorously rather than assumed away.

What if Alignment is Not Enough?

WillPetillo3mo61

Before responding substantively, I want to take a moment to step back and establish some context and pin down the goalposts.

On the Alignment Difficult Scale, currently dominant approaches are in the 2-3 range, with 4-5 getting modest attention at best. If true alignment difficulty is 6+ and nothing radical changes in the governance space, humanity is NGMI. Conversations like this are about whether the true difficulty is 9 or 10, both of which are miles deep in the "shut it all down" category, but differ regarding what happens next. Relatedly, if your counterargument is correct, this is assuming wildly successful outcomes with respect to goal alignment--that developers have successfully made the AI love us, despite a lack of trying.

In a certain sense, this assumption is fair, since a claim of impossibility should be able to contend with the hardest possible case. In the context of SNC, the hardest possible case is where AGI is built in the best possible way, whether or not that is realistic in the current trajectory. Similarly, since my writing about SNC is to establish plausibility, I only need to show that certain critical trade-offs exist, not pinpoint exactly where they balance out. For a proof, which someone else is working on, pinning down such details will be necessary.

Neither of the above are criticisms of anything you've said, I just like to reality-check every once in a while as a general precautionary measure against getting nerd-sniped. Disclaimers aside, pontification recommence!

Your reference to using ASI for a pivotal act, helping to prevent ecological collapse, or preventing human extinction when the Sun explodes is significant, because it points to the reality that, if AGI is built, that's because people want to use it for big things that would require significantly more effort to accomplish without AGI. This context sets a lower bound on the AI's capabilities and hence it's complexity, which in turn sets a floor for the burden on the control system.

More fundamentally, if an AI is learning, then it is changing. If it is changing, then it is evolving. If it is evolving, then it cannot be predicted/controlled. This last point is fundamental to the nature of complex & chaotic systems. Complex systems can be modelled via simulation, but this requires sacrificing fidelity--and if the system is chaotic, any loss of fidelity rapidly compounds. So the problem is with learning itself...and if you get rid of that, you aren't left with much.

As an analogy, if there is something I want to learn how to do, I may well be able to learn the thing if I am smart enough, but I won't be able to control for the person I will become afterwards. This points to a limitation of control, not to a weakness specific to me as a human.

One might object here is that the above reasoning could be applied to current AI. The SNC answer is: yes, it does. The machine ecology already exists and is growing/evolving at the natural ecology's expense, but it is not yet an existential threat because AI is weak enough that humanity is still in control (in the sense of having the option to change course).

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments