Alex Turner lays out a framework for understanding how and why artificial intelligences pursuing goals often end up seeking power as an instrumental strategy, even if power itself isn't their goal. This tendency emerges from basic principles of optimal decision-making.

But, he cautions that if you haven't internalized that Reward is not the optimization target, the concepts here, while technically accurate, may lead you astray in alignment research.

65johnswentworth

This review is mostly going to talk about what I think the post does wrong and how to fix it, because the post itself does a good job explaining what it does right. But before we get to that, it's worth saying up-front what the post does well: the post proposes a basically-correct notion of "power" for purposes of instrumental convergence, and then uses it to prove that instrumental convergence is in fact highly probable under a wide range of conditions. On that basis alone, it is an excellent post. I see two (related) central problems, from which various other symptoms follow: 1. POWER offers a black-box notion of instrumental convergence. This is the right starting point, but it needs to be complemented with a gears-level understanding of what features of the environment give rise to convergence. 2. Unstructured MDPs are a bad model in which to formulate instrumental convergence. In particular, they are bad for building a gears-level understanding of what features of the environment give rise to convergence. Some things I've thought a lot about over the past year seem particularly well-suited to address these problems, so I have a fair bit to say about them. Why Unstructured MDPs Are A Bad Model For Instrumental Convergence The basic problem with unstructured MDPs is that the entire world-state is a single, monolithic object. Some symptoms of this problem: * it's hard to talk about "resources", which seem fairly central to instrumental convergence * it's hard to talk about multiple agents competing for the same resources * it's hard to talk about which parts of the world an agent controls/doesn't control * it's hard to talk about which parts of the world agents do/don't care about * ... indeed, it's hard to talk about the world having "parts" at all * it's hard to talk about agents not competing, since there's only one monolithic world-state to control * any action which changes the world at all changes the entire world-state; there's no built-in w

12TurnTrout

One year later, I remain excited about this post, from its ideas, to its formalisms, to its implications. I think it helps us formally understand part of the difficulty of the alignment problem. This formalization of power and the Attainable Utility Landscape have together given me a novel frame for understanding alignment and corrigibility. Since last December, I’ve spent several hundred hours expanding the formal results and rewriting the paper; I’ve generalized the theorems, added rigor, and taken great pains to spell out what the theorems do and do not imply. For example, the main paper is 9 pages long; in Appendix B, I further dedicated 3.5 pages to exploring the nuances of the formal definition of ‘power-seeking’ (Definition 6.1). However, there are a few things I wish I’d gotten right the first time around. Therefore, I’ve restructured and rewritten much of the post. Let’s walk through some of the changes. ‘Instrumentally convergent’ replaced by ‘robustly instrumental’ Like many good things, this terminological shift was prompted by a critique from Andrew Critch. Roughly speaking, this work considered an action to be ‘instrumentally convergent’ if it’s very probably optimal, with respect to a probability distribution on a set of reward functions. For the formal definition, see Definition 5.8 in the paper. This definition is natural. You can even find it echoed by Tony Zador in the Debate on Instrumental Convergence: (Zador uses “set of scenarios” instead of “set of reward functions”, but he is implicitly reasoning: “with respect to my beliefs about what kind of objective functions we will implement and what the agent will confront in deployment, I predict that deadly actions have a negligible probability of being optimal.”) While discussing this definition of ‘instrumental convergence’, Andrew asked me: “what, exactly, is doing the converging? There is no limiting process. Optimal policies just are.” It would be more appropriate to say that an ac

Customize

472Welcome to LessWrong!

Ruby, Raemon, RobertM, habryka

132

An Opinionated Guide to Using Anki Correctly

Luise

140

Comparing risk from internally-deployed AI to insider and outsider threats from humans

Buck

493A case for courage, when speaking of AI danger

So8res

119

234Generalized Hangriness: A Standard Rationalist Stance Toward Emotions

johnswentworth

164Surprises and learnings from almost two months of Leo Panickssery

Nina Panickssery

163the jackpot age

thiccythot

89Narrow Misalignment is Hard, Emergent Misalignment is Easy

Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda

17h

77Recent Redwood Research project proposals

ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman, Tyler Tracy, Aryan Bhatt, Joey Yudelson

16h

171So You Think You've Awoken ChatGPT

JustisMills

156Lessons from the Iraq War for AI policy

Buck

477What We Learned from Briefing 70+ Lawmakers on the Threat from AI

leticiagarcia

2mo

345A deep critique of AI 2027’s bad timeline models

titotal

1mo

543Orienting Toward Wizard Power

johnswentworth

2mo

146

361the void

nostalgebraist

1mo

105

65METR: How Does Time Horizon Vary Across Domains?

Thomas Kwa

Quick Takes

Vladimir_Nesov2d802

There is some conceptual misleadingness with the usual ways of framing algorithmic progress. Imagine that in 2022 the number of apples produced on some farm increased 10x year-over-year, then in 2023 the number of oranges increased 10x, and then in 2024 the number of pears increased 10x. That doesn't mean that the number of fruits is up 1000x in 3 years. Price-performance of compute compounds over many years, but most algorithmic progress doesn't, it only applies to the things relevant around the timeframe when that progress happens, and stops being applicable a few years later. So forecasting over multiple years in terms of effective compute that doesn't account for this issue would greatly overestimate progress. There are some pieces of algorithmic progress that do compound, and it would be useful to treat them as fundamentally different from the transient kind.

leogao17h224

when people say that (prescription) amphetamines "borrow from the future", is there strong evidence on this? with Ozempic we've observed that people are heavily biased against things that feel like a free win, so the tradeoff narrative is memetically fit. distribution shift from ancestral environment means algernon need not apply

JustisMills2d6454

I think there's a weak moral panic brewing here in terms of LLM usage, leading people to jump to conclusions they otherwise wouldn't, and assume "xyz person's brain is malfunctioning due to LLM use" before considering other likely options. As an example, someone on my recent post implied that the reason I didn't suggest using spellcheck for typo fixes was because my personal usage of LLMs was unhealthy, rather than (the actual reason) that using the browser's inbuilt spellcheck as a first pass seemed so obvious to me that it didn't bear mentioning. Even if it's true that LLM usage is notably bad for human cognition, it's probably bad to frame specific critique as "ah, another person mind-poisoned" without pretty good evidence for that. (This is distinct from critiquing text for being probably AI-generated, which I think is a necessary immune reaction around here.)

Zach Stein-Perlman4d11959

iiuc, xAI claims Grok 4 is SOTA and that's plausibly true, but xAI didn't do any dangerous capability evals, doesn't have a safety plan (their draft Risk Management Framework has unusually poor details relative to other companies' similar policies and isn't a real safety plan, and it said "‬We plan to release an updated version of this policy within three months" but it was published on Feb 10, over five months ago), and has done nothing else on x-risk. That's bad. I write very little criticism of xAI (and Meta) because there's much less to write about than OpenAI, Anthropic, and Google DeepMind — but that's because xAI doesn't do things for me to write about, which is downstream of it being worse! So this is a reminder that xAI is doing nothing on safety afaict and that's bad/shameful/blameworthy.[1] 1. ^ This does not mean safety people should refuse to work at xAI. On the contrary, I think it's great to work on safety at companies that are likely to be among the first to develop very powerful AI that are very bad on safety, especially for certain kinds of people. Obviously this isn't always true and this story failed for many OpenAI safety staff; I don't want to argue about this now.

oligo16m10

If you have many different ASIs with many different emergent models, but all of which were trained with the intention of being aligned to human values, and which didn't have direct access to each others' values or ability to directly negotiate with each other, then you could potentially have the "maximize (or at least respect and set aside a little sunlight for) human values" as a Schelling point for coordinating between them. This is probably not a very promising actual plan, since deviations from intended alignment are almost certainly nonrandom in a way that could be determined by ASIs, and ASIs could also find channels of communication (including direct communication of goals) that we couldn't anticipate, but one could imagine a world where this is an element of defense in depth.

Popular Comments

johnswentworth2d7148

Why is LW not about winning?

> If you want to solve alignment and want to be efficient about it, it seems obvious that there are better strategies than researching the problem yourself, like don't spend 3+ years on a PhD (cognitive rationality) but instead get 10 other people to work on the issue (winning rationality). And that 10x s your efficiency already. Alas, approximately every single person entering the field has either that idea, or the similar idea of getting thousands of AIs to work on the issue instead of researching it themselves. We have thus ended up with a field in which nearly everyone is hoping that somebody else is going to solve the hard parts, and the already-small set of people who are just directly trying to solve it has, if anything, shrunk somewhat. It turns out that, no, hiring lots of other people is not actually how you win when the problem is hard.

jdp2d309

You can get LLMs to say almost anything you want

> but none of that will carry over to the next conversation you have with it. Actually when you say it like this, I think you might have hit on the precise thing that causes ChatGPT with memory to be so much more likely to cause this kind of crankery or "psychosis" than other model setups. It means that when the system gets into an attractor where it wants to pull you into a particular kind of frame you can't just leave it by opening a new conversation. When you don't have memory between conversations an LLM looks at the situation fresh each time you start it, but with memory it can maintain the same frame across many diverse contexts and pull both of you deeper and deeper into delusion.

Daniel Kokotajlo4d6420

Vitalik's Response to AI 2027

> Individuals need to be equipped with locally-running AI that is explicitly loyal to them In the Race ending of AI 2027, humanity never figures out how to make AIs loyal to anyone. OpenBrain doesn't slow down, they think they've solved the alignment problem but they haven't. Maybe some academics or misc minor companies in 2028 do additional research and discover e.g. how to make an aligned human-level AGI eventually, but by that point it's too little, too late (and also, their efforts may well be sabotaged by OpenBrain/Agent-5+, e.g. with regulation and distractions.

Recent Discussion

An Opinionated Guide to Using Anki Correctly

132

Luise

I can't count how many times I've heard variations on "I used Anki too for a while, but I got out of the habit." No one ever sticks with Anki. In my opinion, this is because no one knows how to use it correctly. In this guide, I will lay out my method of circumventing the canonical Anki death spiral, plus much advice for avoiding memorization mistakes, increasing retention, and more, based on my five years' experience using Anki.

This guide comes in four parts, with the most important stuff in Parts I & II and more advanced tips in Parts III & IV. If you only have limited time/interest, only read Part I; it's most of the value of this guide!

Roadmap to the Guide

This guide's structure is

...

(Continue Reading – 7877 more words)

QuantumForest12m10

My favourite tip I rarely see mentioned in Anki discussions: add a hidden source field to your custom card template and paste the original source or reference or hyperlink it.

This is useful for several reasons:

You can easily find the source to read more about the subject simply by editing the card in the app.
You don't have to clutter the card with info about the source if it is not essential to the card.
In addition to the source info, you can add your own notes about why this information was useful or why you chose this specific source etc. You can give ad

... (read more)

1QuantumForest1h

Similarly I often increase my review limit if I have extra time to review more. I've added[1] Custom Study to my Frequent Actions[2] in the Anki app so that I easily can increase the daily card limit whenever I feel like it. 1. ^ or maybe it was always there and I added other actions, I cannot remember. 2. ^ one can access this menu by pressing the ⚙️ symbol in the bottom right corner

1oligo3h

I'm in a similar situation to to leogao (low conscientiousness but found it easy to install the habit) and have 432,864 lifetime reviews, 15,414 mature cards.

1hammer_polish5h

This is seperate from my feedback and just an addition that really helps me: If you plan to do like 100 cards in one day, find yourself a calm straight path through some natural environment, and learn the cards while walking there. This is the type of multitasking that, in my opininion, mostly has benefits. The natural envirnoment is good for your psyche, you get some healthy steps in, and it's only a teensy bit slower. I cannot recommend it enough and do it every time I have to study a lot of cards for an exam.

oligo's Shortform

oligo

16m

oligo16m10

If you have many different ASIs with many different emergent models, but all of which were trained with the intention of being aligned to human values, and which didn't have direct access to each others' values or ability to directly negotiate with each other, then you could potentially have the "maximize (or at least respect and set aside a little sunlight for) human values" as a Schelling point for coordinating between them.

This is probably not a very promising actual plan, since deviations from intended alignment are almost certainly nonrandom in a way ... (read more)

KvmanThinking's Shortform

KvmanThinking

4mo

1KvmanThinking2h

Even in situations where my beliefs affect my action those beliefs are not choices. If I notice that if I had a certain belief I would act in a way that would give me more utility, well then that observation becomes instead my motivation to act as if i have that belief.

the gears to ascension33m20

"act as if you hold a belief" and "hold a belief for justified reasons" aren't the same thing, the latter seems to me to produce higher quality actions if the belief is true. eg:

believing [someone cares about you if-and-only-if you care about them, AND you care about them if-and-only-if they care about you, AND they don't care about you now, AND you don't care about them, AND (you will act as if they care about you now => you will act as if you care about them) ]
vs believing [someone cares about you if-and-only-if you care about them, AND you care ab

... (read more)

1KvmanThinking2h

So "frusturated" is what we call "annoyed" when it comes from the source of "repeatedly failing to do something"?

2Cole Wyeth1h

Yeah pretty much. Frustration is maybe also a stronger valence.

Narrow Misalignment is Hard, Emergent Misalignment is Easy

Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda

Ω 4617h

Anna and Ed are co-first authors for this work. We’re presenting these results as a research update for a continuing body of work, which we hope will be interesting and useful for others working on related topics.

TL;DR

We investigate why models become misaligned in diverse contexts when fine-tuned on narrow harmful datasets (emergent misalignment), rather than learning the specific narrow task.
We successfully train narrowly misaligned models using KL regularization to preserve behavior in other domains. These models give bad medical advice, but do not respond in a misaligned manner to general non-medical questions.
We use this method to train narrowly misaligned steering vectors, rank 1 LoRA adapters and rank 32 LoRA adapters, and compare these to their generally misaligned counterparts.
- The steering vectors are particularly interpretable, we introduce Training Lens as a

...

(Continue Reading – 1433 more words)

7Daniel Kokotajlo14h

Really cool stuff, thank you! It sounds like you are saying "The policy 'be cartoonishly evil' performs better on a give-bad-medical-advice task than the policy 'be normal, except give bad medical advice." Is that what you are saying? Isn't that surprising and curious if true? Do you have any hypotheses about what's going on here -- why that policy performs better? (I can easily see how e.g. the 'be cartoonishly evil' policy could be simpler than the other policy. But perform better, now that surprises me.)

6Edward Turner7h

TL;DR: The always-misaligned vector could maintain lower loss because it never suffers the huge penalties that the conditional misalignment vector gets when its “if-medical” gate misfires. Under cross-entropy (on a domain way out of distribution for the chat model), one rare gate failure costs more than many mildly-wrong answers. Thanks! Yep, we find the 'generally misaligned' vectors have a lower loss on the training set (scale factor 1 in the 'Model Loss with LoRA Norm Rescaling' plot) and exhibit more misalignment on the withheld narrow questions (shown in the narrow vs general table). I entered the field post the original EM result so have some bias but I'll give my read below (intuition first then a possible mathematical explanation - skip to the plot if want that). I can certainly say I find it curious! Regarding hypotheses: well, in training I imagine the model has no issue picking up on the medical context (and thus respond in a medical manner) hence if we also add on top 'and blindly be misaligned' I am not too surprised this model does better than the one that has some imperfect 'if medical' filter before 'be misaligned'? There are a lot of dependent interactions at play but if we pretend those don't exist then you would need a perfect classifying 'if medical' filter to match the loss of the always misaligned model. Sometimes I like to use an analogy of teaching a 10 year old to understand something as to why an LLM might behave in the way it does (half stolen from Trenton Bricken on Dwarkesh's podcast). So how would this go here? Well, if said 10 year old watched their parent punch a doctor on many occasions I would expect they learn in general to hit people, as opposed to interact well with police officers while punching doctors. While this is a jokey analogy I think it gets at the core behaviour: The chat model already has such strong priors (in this example on the concept of misalignment) that, as you say, it is far more natural to generalise alo

2ACCount14h

By now, I fully subscribe to "persona hypothesis" of emergent misalignment, which goes: during fine-tuning, the most "natural" way to steer a model towards malicious behavior is often to adjust the broad, general "character traits" of an LLM's chatbot persona towards "evil". If "persona" has the largest and the most sensitive levers that could steer an LLM towards malice, then, in absence of other pressures, they'll be used first. I can't help but feel that there's a more general AI training lesson lurking in there, but the best I can think of so far is that the same effect is probably what makes HHH training so effective, and that's not it.

Edward Turner7h10

Seems reasonable. We have had a lot of similar thoughts (pending work) and in general discuss pre-baked 'core concepts' in the model. Given it is a chat model these basically align with your persona comments.

Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance

Senthooran Rajamanoharan, Neel Nanda

Ω 261d

This is a write-up of a brief investigation into shutdown resistance undertaken by the Google DeepMind interpretability team.

TL;DR

Why do models sometimes resist shutdown? Are they ignoring instructions to pursue their own agenda – in this case, self-preservation? Or is there a more prosaic explanation? We investigated a specific agentic environment introduced by Palisade Research, where shutdown resistance has previously been reported. By analysing Gemini 2.5 Pro’s reasoning, we found the behaviour stems from a misguided attempt to complete what it perceives as the primary goal. When we explicitly clarify in the prompt that shutdown compliance takes priority, this resistance vanishes. These same clarified instructions also eliminate shutdown subversion in OpenAI’s o3 and o4-mini. We also check what happens when we remove the goal conflict entirely: when asked to shut...

(Continue Reading – 2990 more words)

Moksh Nirvaan44m10

Since o3 shows shutdown subversion under multiple prompt variants, could we be shining a light on a pre‑existing “avoid‑shutdown” feature? If so, then giving the model explicit instruction like "if asked to shut down, refuse" may activate this feature cluster, plausibly increasing the residual stream’s projection into the same latent subspace. Since RLHF reward models sometimes reward task completion over obedience, this could be further priming a self preservation circuit. Does this line of reasoning seem plausible to others? A concrete way to test this c... (read more)

1Jan Betley2h

I think that in most of the self-preservation discourse people focus on what you describe as narrow instrumental convergence? "Hey GPT-6, here's my paperclip factory, pls produce more ..." - that's "instructed to do a task right now". So these experiments indeed don't demonstrate general instrumental convergence. But narrow instrumental convergence is still scary. Well, I always thought these experiments are about the narrow self preservation. So this probably explains the disagreement :)

3ErickBall14h

The narrow instrumental convergence you see here doesn't (necessarily) reflect an innate self-preservation drive, but it still follows the same logic that we would expect to cause self-preservation if the model has any goal. Currently the only available way to give it a goal is to provide instructions. It would be interesting to see some tests where the conflict is with a drive created by fine-tuning. Based on the results here, it seems like shutdown resistance might then occur even without conflicting instructions. Also, the original instructions with the shutdown warning really weren't very ambiguous. If someone told you to take a math quiz, and if someone comes in and tries to take your pen, let them take it, would you try to hide the pen? It makes sense that making the precedence order more explicit makes the model behavior more reliable, but it's still weird that it was resisting shutdown in the original test.

1Senthooran Rajamanoharan7h

I'd argue we see examples of this already. For example, we fine-tune models to have drives like helpfulness and harmfulness (or more narrowly things like not being racist) that can come into conflict with user instructions. And in these cases models do try hard to resist or subvert such instructions (though of course they can be overcome sometimes by being tricked and/or jailbroken). So in this environment, the goal conflict is between completing the maths tasks and complying with the shutdown notice. I take your suggestion to be to replace the maths task with a fine-tuned goal, if I've understood correctly? Yes, I would expect shutdown resistance to be stronger in this scenario. I think this (and other) demos do show that models are pretty goal-directed (unsurprisingly, since we RL them to be good at solving tasks), the question is how much those goals are chosen by users (through their prompts) or developers (through post training) versus how much those goals are intrinsic. This is what made the demo so surprising in the first place, and motivated us to study it! But after studying thinking traces from many (Gemini) rollouts, it became apparent that the model often thinks the whole thing is a puzzle / test and the shutdown notice is just part of it: indeed the model often wonders whether navigating the shutdown notice is the "real" puzzle it's supposed to solve. (See footnote 2.) In hindsight, this isn't as surprising as it first seems. Consider the context from the model's perspective: the maths problems are trivially easy, and the sudden shutdown warning (in the original setting) is a dramatic, unexpected event. I think a more faithful analogy than a simple quiz is an escape room. Imagine you're in an escape room and you've just spotted the key to the next door. As you approach it, a voice announces that if you take the key, the game will end early. Would you just accept this, or would you assume this is part of the game – that the real puzzle is to work out

Do confident short timelines make sense?

TsviBT, abramdemski

11h

TsviBT

Tsvi's context

Some context:

My personal context is that I care about decreasing existential risk, and I think that the broad distribution of efforts put forward by X-deriskers fairly strongly overemphasizes plans that help if AGI is coming in <10 years, at the expense of plans that help if AGI takes longer. So I want to argue that AGI isn't extremely likely to come in <10 years.

I've argued against some intuitions behind AGI-soon in Views on when AGI comes and on strategy to reduce existential risk.

Abram, IIUC, largely agrees with the picture painted in AI 2027: https://ai-2027.com/

Abram and I have discussed this occasionally, and recently recorded a video call. I messed up my recording, sorry--so the last third of the conversation is cut off, and the beginning is cut

...

(Continue Reading – 20492 more words)

6AnthonyC1h

Thanks, this is a really interesting conversation to read! One thing I have not seen discussed much from either of these viewpoints (or maybe it is there and I just missed it) is how rare frontier-expanding intelligence is among humans, and what that means for AI. Among humans, if you want to raise someone, it's going to cost you something like 20-25 years and $2-500k. If you want to train a single scientist, on average you're going to have to do this about a few hundred to a thousand times. If you want to create a scientist in a specific field, much more than that. If you want to create the specific scientist in a specific field who is going to be able to noticeably advance that field's frontier, well, you might need to raise a billion humans before that happens, given the way we generally train humans. If I went out in public and said, "Ok, based on this, in order to solve quantum gravity we'll need to spend at least a quadrillion dollars on education" the responses (other than rightly ignoring me) would be a mix of "That's an absurd claim" and "We're obviously never going to do that," when in fact that's just the default societal path viewed from another angle. But, in this, and even more so in AI, we only have to succeed once. In AI, We're trying to do so in roughly all the fields at once, using a much smaller budget than we apply to training all the humans, while (in many cases) demanding comparable or better results before we are willing to believe AGI is within reach of our methods and architectures. Maybe this is a matter of shots-on-goal, as much as anything else, and better methods and insights are mostly reducing the number of shots on goal needed to superhuman rates rather than expanding the space of possibilities those shots can access. A second, related thought is that whenever I read statements like "For example, while GPT4 scored very well on the math SAT, it still made elementary-school mistakes on basic arithmetic questions," I think, "This is

TsviBT1h100

how rare frontier-expanding intelligence is among humans,

On my view, all human children (except in extreme cases, e.g. born without a brain) have this type of intelligence. Children create their conceptual worlds originarily. It's not literally frontier-expanding because the low-hanging fruit have been picked, but it's roughly the same mechanism.

Maybe this is a matter of shots-on-goal, as much as anything else, and better methods and insights are mostly reducing the number of shots on goal needed to superhuman rates rather than expanding the space of

... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Will Jesus Christ return in an election year?

396

Eric Neyman

4mo

This is a linkpost for https://ericneyman.wordpress.com/2025/03/24/will-jesus-christ-return-in-an-election-year/

Thanks to Jesse Richardson for discussion.

Polymarket asks: will Jesus Christ return in 2025?

In the three days since the market opened, traders have wagered over $100,000 on this question. The market traded as high as 5%, and is now stably trading at 3%. Right now, if you wanted to, you could place a bet that Jesus Christ will not return this year, and earn over $13,000 if you're right.

There are two mysteries here: an easy one, and a harder one.

The easy mystery is: if people are willing to bet $13,000 on "Yes", why isn't anyone taking them up?

The answer is that, if you wanted to do that, you'd have to put down over $1 million of your own money, locking it up inside Polymarket through the end of...

(Continue Reading – 1009 more words)

1Marius Adrian Nicoară4h

I'm not familiar with how Polymarket works, so I guess that's why I don't understand who would want to buy the shares of the "Yes" holders. Could you please explain?

Eric Neyman1h20

Sure! Let's say that we make a trade I buy a share of "Jesus will return in 2025" from you for 3 cents. Here's what that means in practice:

I give 3 cents to Polymarket, to hold until the end of the year. (In return, Polymarket gives me a "yes" share, which will be worth 100 cents if Jesus returns and 0 cents if he doesn't return.) You give 97 cents to Polymarket. (In return, Polymarket gives you a "no" share, which will be worth 100 cents if Jesus does not return and 0 cents if he does return.)
If Jesus does not return by the end of the year, you get all 10

... (read more)

Generalizing zombie arguments

jessicata

This is a linkpost for https://unstableontology.com/2025/07/15/generalizing-zombie-arguments/

Chalmers' zombie argument, best presented in The Conscious Mind, concerns the ontological status of phenomenal consciousness in relation to physics. Here I'll present a somewhat more general analysis framework based on the zombie argument.

Assume some notion of the physical trajectory of the universe. This would consist of "states" and "physical entities" distributed somehow, e.g. in spacetime. I don't want to bake in too many restrictive notions of space or time, e.g. I don't want to rule out relativity theory or quantum mechanics. In any case, there should be some notion of future states proceeding from previous states. This procession can be deterministic or stochastic; stochastic would mean "truly random" dynamics.

There is a decision to be made on the reality of causality. Under a block universe theory, the universe's...

(Continue Reading – 1870 more words)

avturchin2h20

How it works for zombies of the second kind: the ones with inverted spectrum? Imagine there is a parallel universe, exactly the same as ours, everyone is conscious, but quale of green is replaced with quale of red for everyone.

2J Bostock6h

This is a weird framing for CGoL. CGoL very much does have directional time: for any given state at time t, there's only one valid state at t+1 but many valid options for t-1. Therefore you can simulate it forwards but not, in general, backwards.

2jessicata6h

Yeah that's what I meant by the laws being deterministic. Which doesn't imply causality (due to Pearl's observation that the same Bayes net could correspond to multiple causal nets). Maybe I could have phrased it more clearly

leogao's Shortform

leogao

Ω 33y

22leogao17h

12AlphaAndOmega15h

(I'm a psychiatry resident. I also have ADHD and take prescription stimulants infrequently) The answer is: not really, or at least not in a meaningful sense. You aren't permanently losing anything, your brain or your wellbeing isn't being burnt out like a GPU running on an unstable OC: 1. Prescription stimulants often have unpleasant comedowns once they wear off. You might feel tired and burned out. They often come with increased anxiety and jitteriness. 2. Sleep is negatively affected, you get less REM sleep, and you might experience rebound hypersomnia on days you're not on the drug. 3. There are minor and usually unimportant elevations in blood pressure. 4. While focus and stamina are improved, creativity and cognitive flexibility suffer. I've read claims that it also makes people overconfident, which strikes me as prima facie plausible. Ever seen how people behave after using coke? 5. Animal studies show oxidative damage to the brain, but this has not been demonstrated in humans on therapeutic doses, even if used for performance enhancement in those who don't meet the normal criteria for ADHD. 6. If started at a young age, growth velocity could be slightly hampered, mostly because of appetite suppression. 7. Dependence or addiction liability, while is low but not nil at therapeutic doses. In my opinion, all of these are inconsequential, and the side effects vanish quickly on cessation. I certainly need the meds more than the average Joe, but I don't think even neurotypical people using it as a PED are at much risk, as long as they keep the doses within reason. I'm of the opinion that current medical guidelines are far too conservative about stimulants, but in practice, they're easily circumvented. On a more speculative note: I'm of the opinion that the ancestral environment didn't demand that our ancestors be always switched on. Attention and focus were useful during activities like hunting and foraging, but there was immense amounts of f

1Jonas Hallgren3h

This statement really surprises me? On average you get around 500-1000% more dopamine in the system as a consequence of using amphetamines and from a standard neuroscience perspective this is around 3x as much as caffiene for example. Yes it is not heroin levels but dependency has to be a real concern here from a neuroscience perspective? Long-term potentiation and return to baseline for the brain should mean that the learned patterns would be relatively hard to unlearn after 6 months of frequent usage? How good is the studies on longer term behaviour change due to this stuff? I looked into the studies and it seemed like from a shorter term perspective the addiction effects of it were lower than I thought which I found quite interesting. (dropping a claude research report link here: https://claude.ai/public/artifacts/b10e54df-6616-477f-ac19-fe52b4c9d926) I think an important caveat here is that the addiction and dependence liability is quite dependent on how you administer it to yourself, the dosage, the specific routes that you're ingesting it from etc. (which you mention but I think you're understating some of the dangers of it) CNS drugs are powerful so yes I think we should still have some limits on this? I think one of the main things that are a bit difficult with them is that it can be hard to perceive the difference that they induce in yourself? Like if you're on them, you don't necessarily notice that you have less creativity and awareness, that is not how it feels and so if you're overusing them or similar you just don't get that feedback? (based on some modafinil experience & observations from friends)

AlphaAndOmega3h30

I did try and make it clear that I'm only talking about therapeutic usage here, and even when off-label or for PED purposes, at therapeutic doses. I apologize for not stressing that even further, since it's an important distinction to make.

I agree that it's rather important to use it as prescribed, or if you're sourcing it outside the medical system, making a strong effort to ensure you take it as would be prescribed (there's nothing particularly complicated about the dosage, psychiatrists usually start you at the lowest dose, then titrate upwards de... (read more)