LESSWRONG
LW

All of WillPetillo's Comments + Replies

Why does LW not put much more focus on AI governance and outreach?

Selection bias. Those of us who were inclined to consider working on outreach and governance have joined groups like PauseAI, StopAI, and other orgs. A few of us reach back on occasion to say "Come on in, the water's fine!" The real head-scratcher for me is the lack of engagement on this topic. If one wants to deliberate on a much higher level of detail than the average person, cool--it takes all kinds to make a world. But come on, this is obviously high stakes enough to merit attention.

Explaining the Joke: Pausing is The Way

WillPetillo1mo10

Thanks for the link! It's important to distinguish here between:

(1) support for the movement,
(2) support for the cause, and
(3) active support for the movement (i.e. attracting other activists to show up at future demonstrations)

Most of the paper focuses on 1, and also on activist's beliefs about the impact of their actions. I am more interested in 2 and 3. To be fair, the paper gives some evidence for detrimental impacts on 2 in the Trump example. It's not clear, however, whether the nature of the cause matters here. Sup... (read more)

PauseAI and E/Acc Should Switch Sides

WillPetillo1mo10

My conclusion is an admittedly weaksauce non-argument, included primarily to prevent misinterpretation of my actual beliefs. I am working on a rebuttal, but it's taking longer than I planned. For now, see: Holly Elmore's case for AI Safety Advocacy to the Public.

FAQ: What the heck is goal agnosticism?

WillPetillo2mo30

I want to push harder on Q33: "Isn't goal agnosticism pretty fragile? Aren't there strong pressures pushing anything tool-like towards more direct agency?"

In particular, the answer: "Being unable to specify a sufficiently precise goal to get your desired behavior out of an optimizer isn't merely dangerous, it's useless!" seems true to some degree, but incomplete. Let's use a specific hypothetical of a stock-trading company employing an AI system to maximize profits. They want the system to be agentic because this takes the humans out of the loo... (read more)

4porby2mo

These things are possible, yes. Those bad behaviors are not necessarily trivial to access, though. 1. If you underspecify/underconstrain your optimization process, it may roam to unexpected regions permitted by that free space. 2. It is unlikely that the trainer's first attempt at specifying the optimization constraints during RL-ish fine tuning will precisely bound the possible implementations to their truly desired target, even if the allowed space does contain it; underconstrained optimization is a likely default for many tasks. 3. Which implementations are likely to be found during training depends on what structure is available to guide the optimizer (everything from architecture, training scheme, dataset, and so on), and the implementations' accessibility to the optimizer with respect to all those details. 4. Against the backdrop of the pretrained distribution on LLMs, low-level bad behavior (think Sydney Bing vibes) is easy to access (even accidentally!) against a pretraining distribution. Agentic coding assistants are harder to access; it's very unlikely you will accidentally produce an agentic coding assistant. Likewise, it takes effort to specify an effective agent that pursues coherent goals against the wishes of its user. It requires a fair number of bits to narrow the distribution in that way. 5. More generally, if you use N bits to try to specify behavior A, having a nonnegligible chance of accidentally instead specifying behavior B requires that the bits you specify at minimum allow B, and to make it probable, they would need to imply B. (I think Sydney Bing is actually a good example case to consider here.) 6. For a single attempt at specifying behavior, it's vastly more likely that a developer trains a model that fails in uninteresting ways than for them to accidentally specify just enough bits to achieve something that looks about right, but ends up entailing extremely bad outcomes at the same time. Uninteresting, useless, and easy-to-notice

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

WillPetillo2mo180

Glad to hear it! If you want more detail, feel free to come by the Discord Server or send me a Direct Message. I run the welcome meetings for new members and am always happy to describe aspects of the org's methodology that aren't obvious from the outside and can also connect you with members who have done a lot more on-the-ground protesting and flyering than I have.

As someone who got into this without much prior experience in activism, I was surprised how much subtlety and counterintuitive best practices there are, most of which is learned thr... (read more)

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

WillPetillo2mo8817

If you want to get an informed opinion on how the general public perceives PauseAI, get a t-shirt and hand out some flyers in a high foot-traffic public space. If you want to be formal about it, bring a clipboard, track whatever seems interesting in advance, and share your results. It might not be publishable on an academic forum, but you could do it next week.

Here's what I expect you to find, based on my own experience and the reports of basically everyone who has done this:
- No one likes flyers, but get a lot more interested if you can catch ... (read more)

5Kabir Kumar2mo

Multiple talented researchers I know got into alignment because of PauseAI.

2trevor2mo

NEVER WRITE ON THE CLIPBOARD WHILE THEY ARE TALKING. If you're interested in how writing on a clipboard affects the data, sure, that's actually a pretty interesting experimental treatment. It should not be considered the control. Also, the dynamics you described with the protests is conjunctive. These aren't just points of failure, they're an attack surface, because any political system has many moving parts, and a large proportion of the moving parts are diverse optimizers.

8Thane Ruthenis2mo

Excellent, thank you. That's the sort of information I was looking for. Hmm. Good point, I haven't been taking that factor into account.

The Failed Strategy of Artificial Intelligence Doomers

WillPetillo3mo41

Before jumping into critique, the good:
- Kudos to Ben Pace for seeking out and actively engaging with contrary viewpoints
- The outline of the x-risk argument and history of the AI safety movement seem generally factually accurate

The author of the article makes quite a few claims about the details of PauseAI's proposal, its political implications, the motivations of its members and leaders...all without actually joining the public Discord server, participating in the open Q&A new member welcome meetings (I know this because I host them), or even showing... (read more)

What if Alignment is Not Enough?

WillPetillo3mo10

I'd like to attempt a compact way to describe the core dilemma being expressed here.

Consider the expression: y = x^a - x^b, where 'y' represents the impact of AI on the world (positive is good), 'x' represents the AI's capability, 'a' represents the rate at which the power of the control system scales, and 'b' represents the rate at which the surface area of the system that needs to be controlled (for it to stay safe) scales.

(Note that this is assuming somewhat ideal conditions, where we don't have to worry about humans directing AI towards destructive end... (read more)

What if Alignment is Not Enough?

WillPetillo3mo21

I actually don't think the disagreement here is one of definitions. Looking up Webster's definition of control, the most relevant meaning is: "a device or mechanism used to regulate or guide the operation of a machine, apparatus, or system." This seems...fine? Maybe we might differ on some nuances if we really drove down into the details, but I think the more significant difference here is the relevant context.

Absent some minor quibbles, I'd be willing to concede that an AI-powered HelperBot could control the placement of a chair, within ... (read more)

1Dakara3mo

Yup, that's a good point, I edited my original comment to reflect it. With that being said we have come to a point of agreement. It was a pleasure to have this discussion with you. It made me think of many fascinating things that I wouldn't have thought about otherwise. Thank you!

What if Alignment is Not Enough?

WillPetillo3mo61

Before responding substantively, I want to take a moment to step back and establish some context and pin down the goalposts.

On the Alignment Difficult Scale, currently dominant approaches are in the 2-3 range, with 4-5 getting modest attention at best. If true alignment difficulty is 6+ and nothing radical changes in the governance space, humanity is NGMI. Conversations like this are about whether the true difficulty is 9 or 10, both of which are miles deep in the "shut it all down" category, but differ regarding what happens next. Relate... (read more)

2Dakara3mo

Thank you for thoughtful engagement! I know this is not necessarily an important point, but I am pretty sure that Redwood Research is working on difficulty 7 alignment techniques. They consistently make assumptions that AI will scheme, deceive, sandbag, etc. They are a decently popular group (as far as AI alignment groups go) and they co-author papers with tech giants like Anthropic. I think we might be using different definitions of control. Consider this scenario (assuming a very strict definition of control): Can I control a placement of a chair in my own room? I think an intuitive answer is yes. After all, if I own the room and I own the chair, then there isn't much in a way of me changing the chair's placement. However, I haven't considered a scenario where there is someone else hiding in my room and moving my chair. I similarly haven't considered a scenario where I am living in a simulation and I have no control whatsoever over the chair. Not to mention scenarios where someone in the next room is having fun with their newest chair-magnet. Hmmmm, ok, so I don't actually know that I control my chair. But surely I control my own arm right? Well... The fact that there are scenarios like the simulation scenario I just described, means that I don't really know if I control it. Under a very strict definition of control, we don't know if we control anything. To avoid this, we might decide to loosen the definition a bit. Perhaps we control something if it can be reasonably said that we control that thing. But I think this is still unsatisfactory. It is very hard to pinpoint exactly what is reasonable and what is not. I am currently away from my room and it is located on the ground floor of a house where (as far as I know) nobody is currently at home. Is it that unreasonable to say that a burglar might be in my room, controlling the placement of my chair? Is it that unreasonable to say that a car that I am about ride might malfunction and I will fail to control

What if Alignment is Not Enough?

WillPetillo3mo10

I'm using somewhat nonstandard definitions of AGI/ASI to focus on the aspects of AI that are important from an SNC lens. AGI refers to an AI system that is comprehensive enough to be self sufficient. Once there is a fully closed loop, that's when you have a complete artificial ecosystem, which is where the real trouble begins. ASI is a less central concept, included mainly to steelman objections, referencing the theoretical limit of cognitive ability.

Another core distinction SNC assumes is between an environment, an AI (that is its comple... (read more)

1Dakara3mo

Thanks for responding again! If this argument is true and decisive, then ASI could decide to stop any improvements in its intelligence or to intentionally make itself less complex. It makes sense to reduce area where you are vulnerable to make it easier to monitor/control. I agree that in such scenarios an aligned ASI should do a pivotal act. I am not sure that (in my eyes) doing a pivotal act would detract much integrity from ASI. An aligned ASI would want to ensure good outcomes. Doing a pivotal act is something that would be conducive to this goal. However, even if it does detract from ASI's integrity, that's fine. Doing something that looks bad in order to increase the likelihood of good outcomes doesn't seem all that wrong. We can also think about it from the perspective of this conversation. If the counterargument that you provided is true and decisive, then ASI has very good (aligned) reasons to do a pivotal act. If the counterargument is false or, in other words, if there is a strategy that an aligned ASI could use to achieve high likelihoods of good outcomes without pivotal act, then it wouldn't do it. I think that ASI can really help us with this issue. If SNC (as an argument) is false or if ASI undergoes one of my proposed modifications, then it would be able to help humans not destroy the natural ecosystem. It could implement novel solutions that would prevent entire species of plants and animals from going extinct. Furthermore, ASI can use resources from space (asteroid mining for example) in order to quickly implement plans that would be too resource-heavy for human projects on similar timelines. And this is just one of the ways ASI can help us achieve synergy with environment faster. ASI can help us solve this open question as well. Due its superior prediction/reasoning abilities it would evaluate our current trajectory, see that it leads to bad long-term outcomes and replace it with a sustainable trajectory. Furthermore, ASI can help us solv

What if Alignment is Not Enough?

WillPetillo3mo51

Thanks for engaging!

I have the same question in response to each instance of the "ASI can read this argument" counterarguments: at what point does it stop being ASI?

Self modifying machinery enables adaptation to a dynamic, changing environment
Unforeseeable side effects are inevitable when interacting with a complex, chaotic system in a nontrivial way (the point I am making here is subtle, see the next post in this sequence, Lenses of Control, for the intuition I am gesturing at here)
Keeping machine and biological ecologies separate requires not only sacrif

... (read more)

5Dakara3mo

Thank you for responding as well! It might stop being ASI immediately, depending on your definition, but this is absolutely fine with me. In these scenarios that I outlined, we build something that can be initially called friendly ASI and achieve positive outcomes. Furthermore, these precautions only apply if ASI judges SNC to be valid. If it doesn't, then probably none of this would be necessary. Well, ASI, seeing many more possible alternatives than humans, can look for a replacement. For example, it can modify the machinery manually. If all else fails, ASI can just make this sacrifice. I wouldn't even say this would turn ASI into not-ASI, because I think it is possible to be superhumanly intelligent without self-modifying machinery. For instance, if ChatGPT could solve theory of everything and P/NP problems at request, then I wiuld have no issues calling it an ASI, even if it had the exact same UI as it has today. But if you have some other definition of ASI, then that's fine too, because then, it just turns into one of those aforementioned scenarios where we don't technically have ASI anymore, but we have positive outcomes and that's all that really matters in the end. I have read Lenses of Control, and here is the quote from that post which I want to highlight: Given the pace of evolution and the intelligence of ASI, it can build layers upon layers of defensive systems that would prevent evolution from having much effect. For instance, it can build 100 defense layers, such that if one of them is malfunctioning due to evolution, then the other 99 layers notify the ASI and the malfunctioning layer gets promptly replaced. To overcome this system, evolution would need to hack all 100 layers at the same time, which is not how evolution usually works. Furthermore, ASI doesn't need to stop at 100 layers, it can build 1000 or even 10000. It might be the case that it's technically possible to hack all 10000 layers at the same time, but due to how hard this is,

The Robot, the Puppet-master, and the Psychohistorian

WillPetillo4mo31

Verifying my understanding of your position: you are fine with the puppet-master and psychohistorian categories and agree with their implications, but you put the categories on a spectrum (systems are not either chaotic or robustly modellable, chaos is bounded and thus exists in degrees) and contend that ASI will be much closer to the puppet-master category. This is a valid crux.

To dig a little deeper, how does your objection sustain in light of my previous post, Lenses of Control? The basic argument there is that future ASI control systems wil... (read more)

Instrumentality makes agents agenty

WillPetillo4mo10

Any updates on this view in light of new evidence on "Alignment Faking" (https://www.anthropic.com/research/alignment-faking)? If a simulator's preferences are fully satisfied by outputting the next token, why does it matter whether it can infer its outputs will be used for retraining its values?

Some thoughts on possible explanations:
1. Instrumentality exists on the simulacra level, not the simulator level. This would suggest that corrigibility could be maintained by establishing a corrigible character in context. Not clear on the practic... (read more)

4porby4mo

That one, yup. The moment you start conditioning (through prompting, fine tuning, or otherwise) the predictor into narrower spaces of action, you can induce predictions corresponding to longer term goals and instrumental behavior. Effective longer-term planning requires greater capability, so one should expect this kind of thing to be more apparent as models get stronger even as the base models can be correctly claimed to have 'zero' instrumentality. In other words, the claims about simulators here are quite narrow. It's pretty easy to end up thinking that this is useless if the apparent-nice-property gets deleted the moment you use the thing, but I'd argue that this is actually still a really good foundation. A longer version was the goal agnosticism FAQ, and there's this RL comment poking at some adjacent and relevant intuitions, but I haven't written up how all the pieces come together. A short version would be that I'm pretty optimistic at the moment about what path to capabilities greedy incentives are going to push us down, and I strongly suspect that the scariest possible architectures/techniques are actually repulsive to the optimizer-that-the-AI-industry-is.

How I'd like alignment to get done (as of 2024-10-18)

WillPetillo5mo20

Step 1 looks good. After that, I don't see how this addresses the core problems. Let's assume for now that LLMs already have a pretty good model of human values, how do you get a system to optimize for those? What is the feedback signal and how to you prevent it from getting corrupted by Goodhart's Law? Is the system robust in a multi-agent context? And even if the system is fully aligned across all contexts and scales, how do you ensure societal alignment of the human entities controlling it?

As a miniature example focusing on... (read more)

2TristanTrim5mo

Hey : ) Thanks for engaging with this. It means a lot to me <3 Sorry I wrote so much, it kinda got away from me. Even if you don’t have time to really read it all, it was a good exercise writing it all out. I hope it doesn't come across too confrontational, as far as I can tell, I'm really just trying to find good ideas, not prove my ideas are good, so I'm really grateful for your help. I've been accused of trying to make myself seem important while trying to explain my view of things to people and it sucks all round when that happens. This reply of mine makes me particularly nervous of that. Sorry. A lot of your questions make me feel like I haven’t explained my view well, which is probably true, I wrote this post in less time than would be required to explain everything well. As a result, your questions don’t seem to fully connect with my worldview and make sense within it. I’ll try to explain why and I’m hoping we can help each other with our worldviews. I think the cruxes may be relating to: * The system I’m describing is aligned before it is ever turned on. * I attribute high importance to Mechanistic Interpretability and Agent Foundations theory. * I expect nature of Recursive Self Improvement (RSI) will result in an agent near some skill plateau that I expect to be much higher than humans and human organisations, even before SI hardware development. That is, getting a sufficiently skilled AGI would result in artificial super intelligence (ASI) with a decisive strategic advantage. * I (mostly) subscribe to the simulator model of LLMs, they are not a single agent with a single view of truth, but an object capable of approximating the statistical distribution of words resulting from ideas held within the worldviews of any human or system that has produced text in the training set. I’ll touch on those cruxes as I talk through my thoughts on your questions. First, “how do you get a system to optimize for those?” and “what is the feedback signal?” ar

What if Alignment is Not Enough?

WillPetillo6mo10

On reflection, I suspect the crux here is a differing conception of what kind of failures are important. I've written a follow-up post that comes at this topic from a different direction and I would be very interested in your feedback: https://www.lesswrong.com/posts/NFYLjoa25QJJezL9f/lenses-of-control.

Why Stop AI is barricading OpenAI

WillPetillo7mo1910

Just because the average person disapproves of a protest tactic doesn't mean that the tactic didn't work. See Roger Hallam's "Designing the Revolution" series for the thought process underlying the soup-throwing protests. Reasonable people may disagree (I disagree with quite a few things he says), but if you don't know the arguments, any objection is going to miss the point. The series is very long, so here's a tl/dr:

- If the public response is: "I'm all for the cause those protestors are advocating, but I can't stand their methods" notic... (read more)

3Remmelt7mo

I had not considered how our messaging is filtering out non-committed supporters. Interesting!

Why Stop AI is barricading OpenAI

WillPetillo7mo75

There are some writing issues here that make it difficult to evaluate the ideas presented purely on their merits. In particular, the argument for 99% extinction is given a lot of space relative to the post as a whole, where it should really be a bullet point that links to where this case is made elsewhere (or if it is not made adequately elsewhere, as a new post entirely). Meanwhile, the value of disruptive protest is left to the reader to determine.

As I understand the issue, the case for barricading AI rests on:
1. Safety doesn't happen by defa... (read more)

3Remmelt7mo

Great list! Basically agreeing with the claims under 1. and the structure of what needs to be covered under 2. You're right. Usually when people hear about a new organisation on the forum, they expect some long write-up of the theory of change and the considerations around what to prioritise. I don't think I have time right now for writing a neat public write-up. This is just me being realistic – Sam and I are both swamped in terms of handling our work and living situations. So the best I can do is point to examples where civil disobedience has worked (eg. Just Stop Oil demands, Children's March) and then discuss our particular situation (how the situatiojn is similar and different, who are important stakeholders, what are our demands, what are possible effective tactics in this context). Ha, fair enough. The more rigorously I tried to write out the explanation, the more space it took.

Requirements for a Basin of Attraction to Alignment

WillPetillo8mo30

Attempting to distill the intuitions behind my comment into more nuanced questions:

1) How confident are we that value learning has a basin of attraction to full alignment? Techniques like IRL seem intuitively appealing, but I am concerned that this just adds another layer of abstraction without addressing the core problem of feedback-based learning having unpredictable results. That is, instead of having to specify metrics for good behavior (as in RL), one has to specify the metrics for evaluating the process of learning values (including corre... (read more)

1Satron5mo

Those 2 questions seem to be advancing the discussion, so I'd be really interested in Roger's response to them.

Requirements for a Basin of Attraction to Alignment

WillPetillo8mo70

Based on 4-5, this post's answer to the central, anticipated objection of "why does the AI care about human values?" seems to be along the lines of "because the purpose of an AI is to serve it's creators and surely an AGI would figure that out." This seems to me to be equivocating on the concept of purpose, which means (A) a reason for an entity's existence, from an external perspective, and (B) an internalized objective of the entity. So a special case of the question about why an AI would care about human values is to ask: why (B) should be d... (read more)

3RogerDearnaley8mo

Yup. So the hard part is consistently getting a simulacrum that knows that, and acts as if, its purpose is to do what we (some suitably-blended-and-proritized combination of its owner/user and society/humanity in general) would want done, and is also in a position to further improve its own ability to do that. Which as I attempt to show above is a not just a stable-under-reflection ethical position, but actually a convergent-under-reflection one for some convergence region of close-to-aligned AGI. However, when push-comes-to-shove this is not normal evolved-human ethical behavior so it is sparse in a human-derived training set. Obviously step one is just to write all that down as a detailed prompt and feed it to a model capable of understanding it. Step two might involve enriching the training set with more and better examples of this sort of behavior.

What if Alignment is Not Enough?

WillPetillo1y21

To be clear, the sole reason I assumed (initial) alignment in this post is because if there is an unaligned ASI then we probably all die for reasons that don't require SNC (though SNC might have a role in the specifics of how the really bad outcome plays out). So "aligned" here basically means: powerful enough to be called an ASI and won't kill everyone if SNC is false (and not controlled/misused by bad actors, etc.)

> And the artificiality itself is the problem.

This sounds like a pretty central point that I did not explore very much except for som... (read more)

2Remmelt1y

In that case, substrate-needs convergence would not apply, or only apply to a limited extent. There is still a concern about what those bio-engineered creatures, used in practice as slaves to automate our intellectual and physical work, would bring about over the long-term. If there is a successful attempt by them to ‘upload’ their cognition onto networked machinery, then we’re stuck with the substrate-needs convergence problem again.

What if Alignment is Not Enough?

WillPetillo1y10

This sounds like a rejection of premise 5, not 1 & 2. The latter asserts that control issues are present at all (and 3 & 4 assert relevance), whereas the former asserts that the magnitude of these issues is great enough to kick off a process of accumulating problems. You are correct that the rest of the argument, including the conclusion, does not hold if this premise is false.

Your objection seems to be to point to the analogy of humans maintaining effective control of complex systems, with errors limiting rather than compounding, with ... (read more)

What if Alignment is Not Enough?

WillPetillo1y21

Bringing this back to the original point regarding whether an ASI that doesn't want to kill humans but reasons that SNC is true would shut itself down, I think a key piece of context is the stage of deployment it is operating in. For example, if the ASI has already been deployed across the world, has gotten deep into the work of its task, has noticed that some of its parts have started to act in ways that are problematic to its original goals, and then calculated that any efforts at control are destined to fail, it may well be too late--the process o... (read more)

1Remmelt1y

I can see how you and Forrest ended up talking past each other here. Honestly, I also felt Forrest's explanation was hard to track. It takes some unpacking. My interpretation is that you two used different notions of alignment... Something like: 1. Functional goal-directed alignment: "the machinery's functionality is directed toward actualising some specified goals (in line with preferences expressed in-context by humans), for certain contexts the machinery is operating/processing within" vs. 2. Comprehensive needs-based alignment: "the machinery acts in comprehensive care for whatever all surrounding humans need to live, and their future selves/offsprings need to live, over whatever contexts the machinery and the humans might find themselves". Forrest seems to agree that (1.) is possible to built initially into the machinery, but has reasons to think that (2.) is actually physically intractable. This is because (1.) only requires localised consistency with respect to specified goals, whereas (2.) requires "completeness" in the machinery's components acting in care for human existence, wherever either may find themselves. So here is the crux: 1. You can see how (1.) still allows for goal mispecification and misgeneralisation. And the machinery can be simultaneously directed toward other outcomes, as long as those outcomes are not yet (found to be, or corrected as being) inconsistent with internal specified goals. 2. Whereas (2.) if it were physically tractable, would contradict the substrate-needs convergence argument. When you wrote "suppose a villager cares a whole lot about the people in his village...and routinely works to protect them" that came across as taking something like (2.) as a premise. Specifically, "cares a whole lot about the people" is a claim that implies that the care is for the people in and of themselves, regardless of the context they each might (be imagined to) be interacting in. Also, "routinely work

1Prometheus1y

I agree that consequentialist reasoning is an assumption, and am divided about how consequentialist an ASI might be. Training a non-consequentialist ASI seems easier, and the way we train them seems to actually be optimizing against deep consequentialism (they're rewarded for getting better with each incremental step, not for something that might only be better 100 steps in advance). But, on the other hand, humans don't seem to have been heavily optimized for this either*, yet we're capable of forming multi-decade plans (even if sometimes poorly). *Actually, the Machiavellian Intelligence Hypothesis does seem to be optimizing consequentialist reasoning (if I attack Person A, how will Person B react, etc.)

What if Alignment is Not Enough?

WillPetillo1y10

This counts as disagreeing with some of the premises--which ones in particular?

Re "incompetent superintelligence": denotationally yes, connotationally no. Yes in the sense that its competence is insufficient to keep the consequences of its actions within the bounds of its initial values. No in the sense that the purported reason for this failing is that such a task is categorically impossible, which cannot be solved with better resource allocation.

To be clear, I am summarizing arguments made elsewhere, which do not posit infinite time passing, or timescales so long as to not matter.

What if Alignment is Not Enough?

WillPetillo1y20

The implication here being that, if SNC (substrate needs convergence) is true, then an ASI (assuming it is aligned) will figure this out and shut itself down?

0Prometheus1y

An incapable man would kill himself to save the village. A more capable man would kill himself to save the village AND ensure no future werewolves are able to bite villagers again.

The Leeroy Jenkins principle: How faulty AI could guarantee "warning shots"

WillPetillo1y1211

One more objection to the model: AI labs apply just enough safety measures to prevent dumb rogue AIs. Fearing a public backlash to low-level catastrophes, AI companies test their models, checking for safety vulnerabilities, rogue behaviors, and potential for misuse. The easiest to catch problems, however, are also the least dangerous, so only the most cautious, intelligent, and dangerous rogue AI's pass the security checks. Further, this correlation continues indefinitely, so all additional safety work contributes towards filtering the po... (read more)

What's the deal with Effective Accelerationism (e/acc)?

Answer by WillPetilloDec 08, 202300

I think of the e/acc ideal of "maximize entropy, never mind humanity" in terms of inner misalignment:

1) Look at a lot of data about the world, evaluating observations in terms of what one likes and doesn't like, where those underlying likes are opaque.
2) Notice correlations in the data and generate a proxy measure. It doesn't matter if the correlation is superficial, as long as it makes it easier to look at data that is hard to evaluate wrt base objectives, reframe it in terms of the proxy, and then make a confident evaluation wrt the proxy. &n... (read more)

Sam Altman's sister claims Sam sexually abused her -- Part 1: Introduction, outline, author's notes

WillPetillo1y93

I'd like to add some nuance to the "innocent until proven guilty" assumption in the concluding remarks.

Standard of evidence is a major question in legal matters and heavily context-dependent. "Innocent until proven guilty" is a popular understanding of the standard for criminal guilt and it makes sense for that to be "beyond a reasonable doubt" because the question at hand is whether a state founded on principles of liberty should take away the freedom of one of its citizens. Other legal disputes, such as in civil liability, have different stan... (read more)

Memetic Judo #1: On Doomsday Prophets v.3

WillPetillo2y20

Just saw this, sure!

Memetic Judo #1: On Doomsday Prophets v.3

WillPetillo2y31

#7: (Scientific) Doomsday Track Records Aren't That Bad

Historically, the vast majority of doomsday claims are based on religious beliefs, whereas only a small minority have been supported by a large fraction of relevant subject matter experts. If we consider only the latter, we find:

A) Malthusian crisis: false...but not really a doomsday prediction per se.
B) Hole in the ozone layer: true, but averted because of global cooperation in response to early warnings.
C) Climate change: probably true if we did absolutely nothing; probably mostly averted becau... (read more)

1Max TK2y

Lovely; can I add this to this article if I credit you as the author?