Thanks for the link! It's important to distinguish here between:
(1) support for the movement,
(2) support for the cause, and
(3) active support for the movement (i.e. attracting other activists to show up at future demonstrations)
Most of the paper focuses on 1, and also on activist's beliefs about the impact of their actions. I am more interested in 2 and 3. To be fair, the paper gives some evidence for detrimental impacts on 2 in the Trump example. It's not clear, however, whether the nature of the cause matters here. Sup...
My conclusion is an admittedly weaksauce non-argument, included primarily to prevent misinterpretation of my actual beliefs. I am working on a rebuttal, but it's taking longer than I planned. For now, see: Holly Elmore's case for AI Safety Advocacy to the Public.
I want to push harder on Q33: "Isn't goal agnosticism pretty fragile? Aren't there strong pressures pushing anything tool-like towards more direct agency?"
In particular, the answer: "Being unable to specify a sufficiently precise goal to get your desired behavior out of an optimizer isn't merely dangerous, it's useless!" seems true to some degree, but incomplete. Let's use a specific hypothetical of a stock-trading company employing an AI system to maximize profits. They want the system to be agentic because this takes the humans out of the loo...
Glad to hear it! If you want more detail, feel free to come by the Discord Server or send me a Direct Message. I run the welcome meetings for new members and am always happy to describe aspects of the org's methodology that aren't obvious from the outside and can also connect you with members who have done a lot more on-the-ground protesting and flyering than I have.
As someone who got into this without much prior experience in activism, I was surprised how much subtlety and counterintuitive best practices there are, most of which is learned thr...
If you want to get an informed opinion on how the general public perceives PauseAI, get a t-shirt and hand out some flyers in a high foot-traffic public space. If you want to be formal about it, bring a clipboard, track whatever seems interesting in advance, and share your results. It might not be publishable on an academic forum, but you could do it next week.
Here's what I expect you to find, based on my own experience and the reports of basically everyone who has done this:
- No one likes flyers, but get a lot more interested if you can catch ...
Before jumping into critique, the good:
- Kudos to Ben Pace for seeking out and actively engaging with contrary viewpoints
- The outline of the x-risk argument and history of the AI safety movement seem generally factually accurate
The author of the article makes quite a few claims about the details of PauseAI's proposal, its political implications, the motivations of its members and leaders...all without actually joining the public Discord server, participating in the open Q&A new member welcome meetings (I know this because I host them), or even showing...
I'd like to attempt a compact way to describe the core dilemma being expressed here.
Consider the expression: y = x^a - x^b, where 'y' represents the impact of AI on the world (positive is good), 'x' represents the AI's capability, 'a' represents the rate at which the power of the control system scales, and 'b' represents the rate at which the surface area of the system that needs to be controlled (for it to stay safe) scales.
(Note that this is assuming somewhat ideal conditions, where we don't have to worry about humans directing AI towards destructive end...
I actually don't think the disagreement here is one of definitions. Looking up Webster's definition of control, the most relevant meaning is: "a device or mechanism used to regulate or guide the operation of a machine, apparatus, or system." This seems...fine? Maybe we might differ on some nuances if we really drove down into the details, but I think the more significant difference here is the relevant context.
Absent some minor quibbles, I'd be willing to concede that an AI-powered HelperBot could control the placement of a chair, within ...
Before responding substantively, I want to take a moment to step back and establish some context and pin down the goalposts.
On the Alignment Difficult Scale, currently dominant approaches are in the 2-3 range, with 4-5 getting modest attention at best. If true alignment difficulty is 6+ and nothing radical changes in the governance space, humanity is NGMI. Conversations like this are about whether the true difficulty is 9 or 10, both of which are miles deep in the "shut it all down" category, but differ regarding what happens next. Relate...
I'm using somewhat nonstandard definitions of AGI/ASI to focus on the aspects of AI that are important from an SNC lens. AGI refers to an AI system that is comprehensive enough to be self sufficient. Once there is a fully closed loop, that's when you have a complete artificial ecosystem, which is where the real trouble begins. ASI is a less central concept, included mainly to steelman objections, referencing the theoretical limit of cognitive ability.
Another core distinction SNC assumes is between an environment, an AI (that is its comple...
Thanks for engaging!
I have the same question in response to each instance of the "ASI can read this argument" counterarguments: at what point does it stop being ASI?
Verifying my understanding of your position: you are fine with the puppet-master and psychohistorian categories and agree with their implications, but you put the categories on a spectrum (systems are not either chaotic or robustly modellable, chaos is bounded and thus exists in degrees) and contend that ASI will be much closer to the puppet-master category. This is a valid crux.
To dig a little deeper, how does your objection sustain in light of my previous post, Lenses of Control? The basic argument there is that future ASI control systems wil...
Any updates on this view in light of new evidence on "Alignment Faking" (https://www.anthropic.com/research/alignment-faking)? If a simulator's preferences are fully satisfied by outputting the next token, why does it matter whether it can infer its outputs will be used for retraining its values?
Some thoughts on possible explanations:
1. Instrumentality exists on the simulacra level, not the simulator level. This would suggest that corrigibility could be maintained by establishing a corrigible character in context. Not clear on the practic...
Step 1 looks good. After that, I don't see how this addresses the core problems. Let's assume for now that LLMs already have a pretty good model of human values, how do you get a system to optimize for those? What is the feedback signal and how to you prevent it from getting corrupted by Goodhart's Law? Is the system robust in a multi-agent context? And even if the system is fully aligned across all contexts and scales, how do you ensure societal alignment of the human entities controlling it?
As a miniature example focusing on...
On reflection, I suspect the crux here is a differing conception of what kind of failures are important. I've written a follow-up post that comes at this topic from a different direction and I would be very interested in your feedback: https://www.lesswrong.com/posts/NFYLjoa25QJJezL9f/lenses-of-control.
Just because the average person disapproves of a protest tactic doesn't mean that the tactic didn't work. See Roger Hallam's "Designing the Revolution" series for the thought process underlying the soup-throwing protests. Reasonable people may disagree (I disagree with quite a few things he says), but if you don't know the arguments, any objection is going to miss the point. The series is very long, so here's a tl/dr:
- If the public response is: "I'm all for the cause those protestors are advocating, but I can't stand their methods" notic...
There are some writing issues here that make it difficult to evaluate the ideas presented purely on their merits. In particular, the argument for 99% extinction is given a lot of space relative to the post as a whole, where it should really be a bullet point that links to where this case is made elsewhere (or if it is not made adequately elsewhere, as a new post entirely). Meanwhile, the value of disruptive protest is left to the reader to determine.
As I understand the issue, the case for barricading AI rests on:
1. Safety doesn't happen by defa...
Attempting to distill the intuitions behind my comment into more nuanced questions:
1) How confident are we that value learning has a basin of attraction to full alignment? Techniques like IRL seem intuitively appealing, but I am concerned that this just adds another layer of abstraction without addressing the core problem of feedback-based learning having unpredictable results. That is, instead of having to specify metrics for good behavior (as in RL), one has to specify the metrics for evaluating the process of learning values (including corre...
Based on 4-5, this post's answer to the central, anticipated objection of "why does the AI care about human values?" seems to be along the lines of "because the purpose of an AI is to serve it's creators and surely an AGI would figure that out." This seems to me to be equivocating on the concept of purpose, which means (A) a reason for an entity's existence, from an external perspective, and (B) an internalized objective of the entity. So a special case of the question about why an AI would care about human values is to ask: why (B) should be d...
To be clear, the sole reason I assumed (initial) alignment in this post is because if there is an unaligned ASI then we probably all die for reasons that don't require SNC (though SNC might have a role in the specifics of how the really bad outcome plays out). So "aligned" here basically means: powerful enough to be called an ASI and won't kill everyone if SNC is false (and not controlled/misused by bad actors, etc.)
> And the artificiality itself is the problem.
This sounds like a pretty central point that I did not explore very much except for som...
This sounds like a rejection of premise 5, not 1 & 2. The latter asserts that control issues are present at all (and 3 & 4 assert relevance), whereas the former asserts that the magnitude of these issues is great enough to kick off a process of accumulating problems. You are correct that the rest of the argument, including the conclusion, does not hold if this premise is false.
Your objection seems to be to point to the analogy of humans maintaining effective control of complex systems, with errors limiting rather than compounding, with ...
Bringing this back to the original point regarding whether an ASI that doesn't want to kill humans but reasons that SNC is true would shut itself down, I think a key piece of context is the stage of deployment it is operating in. For example, if the ASI has already been deployed across the world, has gotten deep into the work of its task, has noticed that some of its parts have started to act in ways that are problematic to its original goals, and then calculated that any efforts at control are destined to fail, it may well be too late--the process o...
This counts as disagreeing with some of the premises--which ones in particular?
Re "incompetent superintelligence": denotationally yes, connotationally no. Yes in the sense that its competence is insufficient to keep the consequences of its actions within the bounds of its initial values. No in the sense that the purported reason for this failing is that such a task is categorically impossible, which cannot be solved with better resource allocation.
To be clear, I am summarizing arguments made elsewhere, which do not posit infinite time passing, or timescales so long as to not matter.
The implication here being that, if SNC (substrate needs convergence) is true, then an ASI (assuming it is aligned) will figure this out and shut itself down?
One more objection to the model: AI labs apply just enough safety measures to prevent dumb rogue AIs. Fearing a public backlash to low-level catastrophes, AI companies test their models, checking for safety vulnerabilities, rogue behaviors, and potential for misuse. The easiest to catch problems, however, are also the least dangerous, so only the most cautious, intelligent, and dangerous rogue AI's pass the security checks. Further, this correlation continues indefinitely, so all additional safety work contributes towards filtering the po...
I think of the e/acc ideal of "maximize entropy, never mind humanity" in terms of inner misalignment:
1) Look at a lot of data about the world, evaluating observations in terms of what one likes and doesn't like, where those underlying likes are opaque.
2) Notice correlations in the data and generate a proxy measure. It doesn't matter if the correlation is superficial, as long as it makes it easier to look at data that is hard to evaluate wrt base objectives, reframe it in terms of the proxy, and then make a confident evaluation wrt the proxy. &n...
I'd like to add some nuance to the "innocent until proven guilty" assumption in the concluding remarks.
Standard of evidence is a major question in legal matters and heavily context-dependent. "Innocent until proven guilty" is a popular understanding of the standard for criminal guilt and it makes sense for that to be "beyond a reasonable doubt" because the question at hand is whether a state founded on principles of liberty should take away the freedom of one of its citizens. Other legal disputes, such as in civil liability, have different stan...
Just saw this, sure!
#7: (Scientific) Doomsday Track Records Aren't That Bad
Historically, the vast majority of doomsday claims are based on religious beliefs, whereas only a small minority have been supported by a large fraction of relevant subject matter experts. If we consider only the latter, we find:
A) Malthusian crisis: false...but not really a doomsday prediction per se.
B) Hole in the ozone layer: true, but averted because of global cooperation in response to early warnings.
C) Climate change: probably true if we did absolutely nothing; probably mostly averted becau...
Selection bias. Those of us who were inclined to consider working on outreach and governance have joined groups like PauseAI, StopAI, and other orgs. A few of us reach back on occasion to say "Come on in, the water's fine!" The real head-scratcher for me is the lack of engagement on this topic. If one wants to deliberate on a much higher level of detail than the average person, cool--it takes all kinds to make a world. But come on, this is obviously high stakes enough to merit attention.