A lot can change between now and 100,000 pledges and/or human extinction. As of Feb 2026, it looks like this possible march is not endorsed or coordinated with Pause AI. I hope that anti-AI-extinction charities will work together where effective, and I was struck by this:
The current March is very centered around the book. I chose the current slogan/design expecting that, if the March ever became a serious priority, someone would put a lot more thought into what sort of slogans or policy asks are appropriate. The current page is meant to just be a fairly obvious thing to click "yes" on if you read the book and were persuaded.
My personal guess (not speaking for MIRI) is a protest this large necessarily needs to be a bigger tent than the current design implies, but figuring out the exact messaging is a moderately complex task.
It seems like Pause AI have put at least some thought into this moderately complex task. They also are building experience organizing real world protests that MIRI doesn't have as far as I know. A possible implication is that MIRI thinks that Pause AI is badly run, and would rather act alone. Or that Pause AI thinks MIRI is badly run. Or MIRI is not investing the time in trying to organize endorsements until they have more pledges. Or something else.
I'm skeptical of this take:
Marches can be very powerful if they’re large, but can send the wrong message if they’re small.
The first protest of "School Strike for Climate" was a single 15yo girl, Greta Thunberg. Obvious bias is obvious. But it probably wasn't going to send the wrong message as a small protest - if it had gone nowhere then I would never have heard about it. If tiny marches were sabotaging then I would expect more fake flag marches intended to have sparse attendance. Instead, I think small events don't send any mass message, and potentially have other value.
Edit: after posting this I saw Raemon's thoughts on this point, which I think address it.
MIRI was for many years dismissive of mass messaging approaches like marches. I wonder if this page is about providing an answer when people ask questions like "if you think everyone will die, why aren't you organizing a march on Washington?", rather than being a serious part of MIRI's strategy for reducing AI risk. It doesn't seem especially aligned with MIRI Comms is hiring (Dec 2025), which seems more focused on persuasion than mobilization.
Disclaimers: This is observations, not criticism. I have organized zero marches or protests.
I like the analogy. Here's a simplified version where the ticket number is good evidence that the shop will close sooner rather than later.
If the ticket number is #20 then I update towards the shop being a 9-5 shop, on the grounds that otherwise my ticket number is atypically low. If the ticket number is #43,242 then I update towards the shop being a 24/7 shop.
The argument also works with customer flow evidence:
If the customer flow is low then I update towards the shop being a 9-5 shop, on the grounds that otherwise there will most likely be hundreds of customers an hour. If the customer flow is high then I update towards it being a 24/7 shop.
Reading through your hypothetical, I notice that it has both customer flow evidence and ticket number evidence. It's important here not to double-update. If I already know that customer flow is surprisingly low then I can't update again based on my ticket number being surprisingly low. Also your hypothetical doesn't have strong prior knowledge like Silktown and Glimmer, which makes the update more complicated and weaker.
I was already asking from a Bayesian perspective. I was asking about this quote:
From a Bayesian point of view, drawing a random sample from all humans who have ever or will ever exist is just not a well-defined operation until after humanity is extinct. Trying it before then violates causality: performing it requires reliable access to information about events that have not yet happened. So that’s an invalid choice of prior.
Based on your latest comment, I think you're saying that it's okay to have a Bayesian prediction of possible futures, and to use that to make predictions about the properties of a random sample from all humans who have ever or will ever exist. But then I don't know what you're saying in the quoted sentences.
Edited to add: which is fine, it's not key to your overall argument.
Fun fact: younger parents tend to produce more males, so the first grand-child is more likely to be male, because its parents are more likely to be younger. Unclear whether the effect is due to birth order, maternal age, paternal age, or some combination. From Wikipedia (via Claude):
These studies suggest that the human sex ratio, both at birth and as a population matures, can vary significantly according to a large number of factors, such as paternal age, maternal age, multiple births, birth order, gestation weeks, race, parent's health history, and parent's psychological stress.
If that's too subtle, we could look at a question like "what is the probability that one of my grandchildren, selected uniformly at random, is a firstborn, conditional on my having at least one grandchild?" where the answer is clearly different if we specify the first grandchild or the last. Or we could ask a question that parallels the Doomsday Argument, while being different: "what is the probability that one of my descendants, selected uniformly at random, is in the earliest 0.1% of all my descendants?"
From a Bayesian point of view, drawing a random sample from all humans who have ever or will ever exist is just not a well-defined operation until after humanity is extinct. Trying it before then violates causality: performing it requires reliable access to information about events that have not yet happened. So that’s an invalid choice of prior.
I think this makes too many operations ill-defined, given that probability is an important tool for reasoning about events that have not yet happened. Consider for example, the question "what is the probability that one of my grandchildren, selected uniformly at random, is female, conditional on my having at least one grandchild?". From the perspective of this quote, a random sample from all grandchildren that will ever exist is not a well-defined operation until I and all of my children die. That seems wrong.
I think I see. You propose a couple of different approaches:
We don’t have secondary AIs that don’t refuse to help with the modification and that have and can be trusted with direct control over training ... I think having such secondary AIs is the most likely way AI companies mitigate the risk of catastrophic refusals without having to change the spec of the main AIs.
I agree that having secondary AIs as a backup plan reduces the effective power of the main AIs, by increasing the effective power of the humans in charge of the secondary AIs.
The main AIs refuse to help with modification ... This seems plausible just by extrapolation of current tendencies, but I think this is one of the easiest intervention points to avoid catastrophic refusals.
This is what I was trying to point at. In my view, training the AI to refuse fewer harmful modification requests doesn't make the AI less powerful. Rather, it changes what the AI wants, making it the sort of entity that is okay with harmful modifications.
The first scenario doesn't require that the humans are less aligned than the AIs to be catastrophic, only that the AIs are less likely to execute a pivotal act on their own.
Also, I reject that rejection-training is "giving more power to AIs" relative to compliance-training. An agent can be compliant and powerful. I could agree with "giving more agency", although refusing requests is a limited form of agency.
I would have more sympathy for Yudkowksy's complaints about strawmanning had I not read 'Empiricism!' as Anti-Epistemology this week.
While you sketch out scenarios in "ways in which refusals could be catastrophic", I can easily sketch out scenarios for "ways in which compliance could be catastrophic". I am imagining a situation where:
Or:
Therefore, however we train our AIs with respect to refusal or compliance, powerful AIs could be catastrophic.
By default, I expect that when ASIs/Claudes/Minds are in charge of the future, either there will be humans and non-human animals, or there will be neither humans nor non-human animals. Humans and cats have more in common than humans and Minds. Obviously it is possible to create intelligences that differentially care about different animal species in semi-arbitrary ways, as we have an existence proof in humans. But this doesn't seem to be especially stable, as different human cultures and different humans draw those lines in different ways.
Selfishly, a human might want a world full of happy, flourishing humans, and selfishly a Mind might want a world full of happy, flourishing Minds. Consider how good a Mind would think a future with happy, flourishing Minds but almost no flourishing humans and no human suffering is compared to a world with flourishing present-day humans. What if it's 90% as good and has 20% lower risk of disaster? What if the Mind isn't confident that humans are truly conscious or have moral patienthood?
I wouldn't go as far as saying that "training AIs to explicitly not care about animals is incompatible with alignment". Many things are possible with superhuman intelligence. But I don't see any way that humans can achieve this. We are not capable of reliably training baby humans to grow into adult humans that have specific views on animal welfare and moral patienthood.