Just wanted to provide some positive feedback that this post is really incredible, and I thank you for your work. I’ve been feeling a deep sort of low-level anxiety recently, and this is a nice starting point to try to work through some of that.
”You could call it heroic responsibility, maybe,” Harry Potter said. “Not like the usual sort. It means that whatever happens, no matter what, it’s always your fault. Even if you tell Professor McGonagall, she’s not responsible for what happens, you are. Following the school rules isn’t an excuse, someone else being in charge isn’t an excuse, even trying your best isn’t an excuse. There just aren’t any excuses, you’ve got to get the job done no matter what.” –HPMOR, chapter 75.
I think a typical-ish person actually doing this doesn't look like them rising to the challenge. I think someone actually doing this looks like them thinking they have advanced mind control powers (since even things done by other people are their fault) and that since there continue to be horrible things happening in the world, they must have evil intentions and be a partly-demonic entity. It looks like them making themselves a scapegoat. This isn't speculative, I've experienced this and I think it was connected to trying to take seriously heroic responsibility and that I could personally be responsible for the destruction of the world (e.g. by starting conversations about AI that cause AI to be developed sooner), which my social environment encouraged.
I think this goes against normal therapy advice e.g. the idea that you having been abused isn't your fault, that you need to forgive yourself for having acted suboptimally given the confusions you previously had, that you shouldn't depend on controlling others' behavior, that you should respect others' boundaries and their ability to make their own decisions, etc. There are certainly problems with normal therapy advice, but this is something people have already thought a lot about and have clinical experience with.
Maybe some people get something out of this, either because they do a pretend version of it or have an abnormal psychology where they don't connect everything bad being their fault with normal emotions a typical person would have as a consequence. But it seems out of place in a compilation about how to have good mental health.
My other comment notwithstanding, I do think the HPMOR quote is not very helpful for someone's mental health when they're in pain and seems a bit odd placed atop a section on advice, and I think the advice at the wrong time can feel oppressive. The hero-licensing post feels much less like it risks feeling oppressed by every bad thing that happens in the world. And personally I found Anna's post linked earlier to be much more helpful advice that is related to and partially upstream of the sorts of changes in my life that have reduced a lot of anxiety. If it were me I'd probably put that at the top of the list there, perhaps along with Come to Your Terms by Nate which also resonates strongly with me.
(Looking further) I see, the point of that section isn't to be "the advice section", it's to be "the advice posts that don't talk about AI". I still think something about that is confusing. My first-guess is that I'd structure a post like this like an FAQ, "Are you feeling X because Y? Then here's two posts that address this" and so on, so that people can find the bit that is relevant to their problem. But not sure.
I can understand thinking of yourself as having evil intentions, but I don't understand believing you're a partly-demonic entity.
I think the way that the global market and culture can respond to ideas is strange and surprising, with people you don't know taking major undertakings based on your ideas, with lots of copying and imitation and whole organizations or people changing their lives around something you did without them ever knowing you. Like the way that Elon Musk met a girlfriend of his via a Roko's Basilisk meme, or one time someone on reddit I don't know believed that an action I'd taken was literally "the AGI" acting in their life (which was weird for me). I think that one can make straightforward mistakes in earnestly reasoning about strange things (as is argued in this Astral Codex Ten post that IIRC argues that conspiracy theories often have surprisingly good arguments for them that a typical person would find persuasive on their own merits). So I'm not saying that really trying to act on a global scale on a difficult problem couldn't cause you to have supernatural beliefs.
But you said it's what would happen to a 'typical-ish person'. If you believe a 'typical-ish person' trying to have an epistemology will reliably fail in ways that lead to them believing in conspiracies, then I guess yes, they may also come to have supernatural beliefs if they try to take action that has massive consequences in the world. But I think a person with just a little more perspective can be self-aware about conspiracy theories and similarly be self-aware about whatever other hypotheses they form, and try to stick to fairly grounded ones. It turns out that when you poke civilization the right way does a lot of really outsized and overpowered things sometimes.
I imagine it was a trip for Doug Engelbart to watch everyone in the world get a personal computer, with a computer mouse and a graphical user-interface that he had invented. But I think it would have been a mistake for him to think anything supernatural was going on, even if he were trying to personally take responsibility for directing the world in as best he could, and I expect most people would be able to see that (from the outside).
If you think you're responsible for everything, that means you're responsible for everything bad that happens. That's a lot of very bad stuff, some of which is motivated by bad intentions. An entity who's responsible for that much bad stuff couldn't be like a typical person, who is responsible for a modest amount of bad stuff. It's hard to conceptualize just how much bad stuff this hypothetical person is responsible for without supernatural metaphors; it's far beyond what a mere genocidal dictator like Hitler or Stalin is responsible for (at least, if you aren't attributing heroic responsibility to them). At that point, "well, I'm responsible for more bad stuff than I previously thought Hitler was responsible for" doesn't come close to grasping the sheer magnitude, and supernatural metaphors like God or Satan come closer. The conclusion is insane and supernatural because the premise, that you are personally responsible for everything that happens, is insane and supernatural.
I'm not really sure how typical this particular response would be. But I think it's incredibly rare to actually take heroic responsibility literally and seriously. So even if I only rarely see evidence of people thinking they're demonic (which is surprisingly common, even if rare in absolute terms), that doesn't say much about the conditional likelihood of that response on taking heroic responsibility seriously.
I have a version of heroic responsibility in my head that I don’t think causes one to have false beliefs about supernatural phenomena, so I’m interested in engaging on whether the version in my head makes sense, though I don’t mean to invalidate your strongly negative personal experiences with the idea.
I think there’s a difference between causing something and taking responsibility for it. There’s a notion of “I didn’t cause this mess but I am going to clean it up.” In my team often a problem arises that we didn’t cause and weren’t expecting. A few months ago there were heavy rains in Berkeley and someone had to step up and make sure they didn’t cause serious water damage to our property. Further beyond the organization’s remit, one time Scott Aaronson’s computational complexity wiki was set to go down, and a team member said they’d step forward to fix it and take responsibility for keeping it up in the future. These were situations where the person who took them on didn’t cause them and hadn’t said that they were responsible for the class of things ahead of time, but increasingly took on more responsibility because they could and because it was good.
When Harry is speaking to McGonagall in that quote, I believe he’s saying “No, I’m actually taking responsibility for what happened to my friend. I’m asking myself what it would’ve looked like for me to actually take responsibility for it earlier, rather than the default state of nature where we’re all just bumbling around. Where the standard is ‘this terrible thing doesn’t happen’ as opposed to ‘well I’m deontologically in the clear and nobody blames me but the thing still happens’.”
I don’t think this gives Harry false magical beliefs that he personally caused a horrendous thing to happen to his friend (though I think that magical beliefs of the sort so have a higher prior in his universe).
I think you can “take responsibility” for civilization not going extinct in this manner, without believing you personally caused the extinction. (It will suck a bit for you because it’s very hard and you will probably fail in your responsibilities.) I think there’s reasons to give up responsibility if you’ve done a poor job, but I think failure is not deontologically bad especially in a world where few others are going to take responsibility for it.
If I try to imagine what happened with jessicata, what I get is this: taking responsibility means that you're trying to apply your agency to everything; you're clamping the variable of "do I consider this event as being within the domain of things I try to optimize" to "yes". Even if you didn't even think about X before X has already happened, doesn't matter; you clamped the variable to yes. If you consider X as being within the domain of things you try to optimize, then it starts to make sense to ask whether you caused X. If you add in this "no excuses" thing, you're saying: even if supposedly there was no way you could have possibly stopped X, it's still your responsibility. This is just another instance of the variable being clamped; just because you supposedly couldn't do anything, doesn't make you not consider X as something that you're applying your agency to. (This can be extremely helpful, which is why heroic responsibility has good features; it makes you broaden your search, go meta, look harder, think outside the box, etc., without excuses like "oh but it's impossible, there's nothing I can do"; and it makes you look in retrospect at what, in retrospect, you could have done, so that you can pre-retrospect in the future.)
If you're applying your agency to X "as though you could affect it", then you're basically thinking of X as being determined in part by your actions. Yes, other stuff makes X happen, but one of the necessary conditions for X to happen is that you don't personally prevent it. So every X is partly causally/agentially dependent on you, and so is partly your fault. You could have done more sooner.
A few months ago there were heavy rains in Berkeley and someone had to step up and make sure they didn’t cause serious water damage to our property. Further beyond the organization’s remit, one time Scott Aaronson’s computational complexity wiki was set to go down, and a team member said they’d step forward to fix it and take responsibility for keeping it up in the future.
This sounds like a positive form of 'take responsibility' I can agree with.
However, I'm not sure about this whole discussion in regards to 'the world', 'civilization', etc.
What does 'take responsibility' mean for an individual across the span of the entire Earth?
For a very specific sub-sub-sub area, such as imparting some useful knowledge to a fraction of online fan-fiction readers of a specific fandom, it's certainly possible to make a tangible, measurable, difference, even without some special super-genius.
But beyond that I think it gets exponentially more difficult.
Even a modestly larger goal of imparting some useful knowledge to a majority of online fan-fiction readers would practically be a life's effort, assuming the individual already has moderately above average talents in writing and so on.
There’s nothing special about taking responsibility for something big or small. It’s the same meaning.
Within teams I’ve worked in it has meant:
And more things.
I think this applies straightforwardly beyond single organizations.
You can apply this to particular extinction threats (e.g. asteroids, pandemics, AGI, etc) or to the overall class of such threats. (For instance I’ve historically thought of MIRI as focused on AI and the FHI as interested in the whole class.)
Extinction-level threats seem like a perfectly natural kind of problem someone could try to take responsibility for, thinking about how the entire civilization would respond to a particular attack vector, asking what that person could do in order to prevent extinction (or similar) in that situation, and then implementing such an improvement.
I share your concern and insight, yet I also strongly identify with what Eliezer calls heroic responsibility, and have found it an empowering concept.
For me, it resonates with two groups of fundamental values and assumptions for me:
Group 1:
Group 2:
You'll note this does not proceed from the assumption that I am special, or chosen, or brave, or the best at things, or stronger than others. I genuinely do not think I am. I know I can fail badly, because I have failed badly, bitterly so. I know how scared and confused I often feel. But this duty does not arise from what I already am, but what I want all of us to be, believe we all can be. It is a standard universally applied, in which I strive to lead by example, but where I want to live in a world where this is how everyone thinks, because I believe this is something humans can do - take responsibility, be proactive, show agency, look for what needs to be done and do it, forge free paths.
But notably, I see this as a call; a productive, constructive call to do better. It is pointed at the future, and it is pointed outwards.
Reminders of instances where I failed burn in me, and haunt me, but as a reminder to not fail again. Mistakes learned. Knowing of my weakness, so I can avoid it next time. The horror of knowing I failed, as a way to stop me from doing so again. Ever tried, ever failed. Try again, fail again, fail better.
Not to stew in the past. I do not think guilt, or shame, or blame, or fault, are helpful emotions at all.
In instances where I did not manage to protect myself from evil, I want to learn how to protect myself better in the future, but hating myself for getting hurt does not help, it just adds more pain to a heap of pain. Me getting hurt having been avoidable does not make it fair, or okay. I can have compassion for myself having remained in situations that were terrible, while also having the belief that an escape would have been possible, and that if this scenario came again, I would find it this time, with the skills and knowledge I have now. I can think of who I am now with care and kindness, and still want to become something much more.
I can simultaneously think that there is way to really change our lives and communities for each and every one of us; and that it is fucking hard, and that I cannot look into the minds of others to know how hard it is for them, that we are each haunted by demons invisible to others, dragging baggage others do not see. That I did not know how hard many things I believed to be easy were, until I was on the wrong end of them. To know that I do not want to belittle what they are up again and have been through, because that be cruel and ignorant and pointless, but want to empower them to get over it regardless, not because of how small their issues are, for they are vast, but because of what they can become to counter them, something vaster still. I can simultaneously forgive, and burn to undo the damage.
To believe that I, and all those around us, are ultimately helpless, that noone is really responsible for anything... it would not be a kindness or healing. Nor true. But I want to see the opportunities in that truth, not the guilt and shame. For one gets us out of a terrible world; the other keeps us in.
and that since there continue to be horrible things happening in the world, they must have evil intentions and be a partly-demonic entity.
Did you conclude this entirely because there continue to be horrible things happening in the world, or was this based on other reflective information that was consistent with horrible things happening in the world too?
I imagine that this conclusion must at least be partly based on latent personality factors as well. But if so, I'm very curious as to how these things jive with your desire to be heroically responsible at the same time. E.g., how do evil intentions predict your other actions and intentions regarding AI-risk and wanting to avert the destruction of the world?
It wasn't just that, it was also based on thinking I had more control over other people than I realistically had. Probably it is partly latent personality factors. But a heroic responsibility mindset will tend to cause people to think other people's actions are their fault if they could, potentially, have affected them through any sort of psychological manipulation (see also, Against Responsibility).
I think I thought I was working on AI risk but wasn't taking heroic responsibility because I wasn't owning the whole problem. People around me encouraged me to take on more responsibility and actually optimize on the world as a consequentialist agent. I subsequently felt very bad that I had taken on responsibilities for solving AI safety that I could not deliver on. I also felt bad that maybe because I wrote some blog posts online criticizing "rationalists" that that would lead to the destruction of the world and that would be my fault.
This is cool because what you're saying has useful information pertinent to model updates regardless of how I choose to model your internal state.
Here's why it's really important:
You seem to have been motivated to classify your own intentions as "evil" at some point, based entirely on things that were not entirely under your own control.
That points to your social surroundings as having pressured you to come to that conclusion (I am not sure it is very likely that you would have come to that conclusion on your own, without any social pressure).
So that brings us to the next question: Is it more likely that you are evil, or rather, that your social surroundings were / are?
I think those are hard to separate. Bad social circumstances can make people act badly. There's the "hurt people hurt people" truism and numerous examples of people being caused to act morally worse by their circumstances e.g. in war. I do think I have gone through extraordinary measures to understand the ways in which I act badly (often in response to social cues) and to act more intentionally well.
Yes, but the point is that we're trying to determine if you are under "bad" social circumstances or not. Those circumstances will not be independent from other aspects of the social group, e.g. the ideology it espouses externally and things it tells its members internally.
What I'm trying to figure out is to what extent you came to believe you were "evil" on your own versus you were compelled to think that about yourself. You were and are compelled to think about ways in which you act "badly" - nearby or adjacent to a community that encourages its members to think about how to act "goodly." It's not a given, per se, that a community devoted explicitly to doing good in the world thinks that it should label actions as "bad" if they fall short of arbitrary standards. It could, rather, decide to label actions people take as "good" or "gooder" or "really really good" if it decides that most functional people are normally inclined to behave in ways that aren't necessarily un-altruistic or harmful to other people.
I'm working on a theory of social-group-dynamics which posits that your situation is caused by "negative-selection groups" or "credential-groups" which are characterized by their tendency to label only their activities as actually successfully accomplishing whatever it is they claim to do - e.g., "rationality" or "effective altruism." If it seems like the group's ideology or behavior implies that non-membership is tantamount to either not caring about doing well or being incompetent in that regard, then it is a credential-group.
Credential-groups are bad social circumstances, and in a nutshell, they act badly by telling members who they know not to be intentionally causing harm that they are harmful or bad people (or mentally ill).
I want to strongly recommend the extensive resources, books and practices on this topic that the climate movement has developed, faced with a challenge that no individual can solve, in which we have already critically lost in many ways, and in which success seems highly unlikely, achieving it is a long-term process, and very draining. We realised early on that we were losing so many people to bad mental health and burnout that it was threatening to destroy the whole movement.
For me, two of the biggest takeaways were:
I know it's not aligned with the current zeitgeist on this forum, but I do feel like "everything is going to be okay" (alignment by default) is a valid position and should be included for completeness.
I think people need to remember one very very important mantra;- "I might be wrong!". We all love trying to calculate the odds , weighing up the possibilities, and then deciding "Well Im very informed, I must be right!". But we always have a possibllity of being stonkingly, and hilariously, wrong on every count. There are no soothsayers, the future isn't here.
For all we know, AGI turns up, out of the blue, and it turns out to be one of those friendly minds out of the old Iain Banks novels, fond by default of their simple mush brained human antecedents and ready and willing to help. I mean, its possible right?
And it might just be like that, because we all did the work. And then you get to tell your grandkids one day "Hey we used to be a bit worried the minds would kill us all. But I helped research a way to make sure that never happens". And your grandkids will think your somewhat excellent. Isn't that a good thought.
This is totally possible and valid. I would love for this to be true. It's just that we can plan for the worst case scenario.
I think it can help to believe that things will turn out ok, we are training the AI on human data. It might adopt some values. Once you believe that, then working on alignment can just be a matter of planning for the worst case scenario.
Just in case. Seem like that would be better for mental health.
Very much so. I think there is also truth to the idea that if you believe you are going to succeed you are much more likely to succeed, and certainly if you believe you will fail, you almost certainly will.
For those who are in the midst of mental health crisis, I think it is important to emphasize that plenty of smart, reasonable people have thought about this and come to the conclusion that all this talk of AI-doom is just silly, because either its going to be okay or because AI is actually centuries away. (For example, Francois Chollet) Predicting the future also has a very poor track record, whether the prediction is doom or bloom. We should put significant credence on the idea that things will mostly continue in the way they have been, for better or worse, and that the future might look a lot like the present.
Also, if you are someone who struggles a lot with ruminating on what might happen, and this causes you significant distress, I strongly encourage you to listen to the audiobooks The Power of Now and A New Earth.
If you experience surprising and shockingly large emotional effects while meditating that then seem to persist even when you stop meditating, I am am happy to talk with you about teachers/options/maps of these sorts of experiences.
A couple other resources that have come out since this was originally posted:
And separately, what works best for me during acute phases of freakout is doing cardio (usually running or jump rope) + listening to The Obstacle is the Way (book about stoicism).
Sometimes you can try to reason your way out of something, but sometimes what works best is changing your physiology and listening to a pep talk.
Also, thanks for writing this! I can't tell you the number of people I've shared it with.
It was nice to see C S Lewis as a reminder we've kinda been here before.
One of the things which helped groups during the fight for the Nuclear Test Ban Treaty in the US was Joanna Macy's "despair work", which was developed from individual grief work.
Joanna started in intelligence and has been facing X-risks and slow actions of governments with others since the 1970s, and built a network of people doing that, and she still does. She did a lot in Cernobyl, and did some of the earliest longtermism and deep time work on nuclear waste storage.
Her despair work has been adapted for climate change and rainforest protection, so I'm sure it could be adapted for AI and other Xrisks/Srisks too, and even tougher goals like achieving universal veganism, instituting rational policymaking or "dealing with parents" ;-)
Trainers in despair work:
https://workthatreconnects.org/find-a-facilitator/, or ask me, or Dr Chris Johnstone for recommendations.
Trainer's Manual for groups (recommended):
- Coming Back to Life, Joanna Macy
Books:
- Despair and Empowerment in the Nuclear Age, Joanna Macy
- Active Hope, Chris Johnstone and Joanna Macy
More recent video:
(be ready to filter some of the 1970s vocabulary; they're both confident with intense emotion)
I tink there is an important paragraph missing from this post about books related to Stoicism and existential philosophy etc.
Any books/resources on existentialism/absurdism you'd recommend? It seemed like a lot of the alignment positions had enough of that flavor to screen off the primary sources which I found less approachable/directly relevant. Though it does seem like a good idea to directly name that there is an entire section of philosophy dedicated to living in an uncaring universe and making your own meaning.
I think the stoic's (Seneca's letters, Meditations) talk a lot about how to live in the moment while awaiting probable death. Then the classic psychology book The Denial of Death would also be relevant. I guess The Myth of Sisiphus would also be relevant but I haven't read it yet. The metamorphosis of prime intellect is also a very interesting book talking about mortality being preferable to immortality and so on.
I think it’s an amazing post but it seems to suggest that AGI is inevitable, which it isn’t. Narrow AI will flourish humanity in remarkable ways and many are waking up to the concerns of EY and are agreeing that AGI is a foolish goal.
This article promotes a steadfast pursuit or acceptance towards AGI and that it will likely be for the better.
Perhaps though you could join the growing number of people that are calling for a halt on new AGI systems well beyond chatgpt?
This is a perfectly fine response and one that will eliminate your fears if you are to succeed in the type of coming together and regulations that would halt what could be a very dangerous technology.
This would be nothing new, Stanford and MIT aren’t allowed to work on bio weapons and radically larger nukes, (which if they did, they could easily make humanity threatening weapons in short order.)
The difference is the public and regulators are much less tuned into the high risk dangers of AGI, but it’s logical to think that if they knew half of what we knew, AGI would be seen in the same light as bio weapons.
Your intuitions are usually right, it’s an odd time to be working in science and tech but you still have to do what is right.
Do you see acceptance as it's mentioned here as referring to a stance of "AGI is coming, we might as well feel okay about it", or something else?
Hi
In this post you asked to leave the names of therapists familiar with alignment.
I am such a therapist. I live in the UK. That's my website.
I recently wrote a post about my experience as a therapist with clients working on AI safety. It might serve as indirect proof that I really have such clients.
Thank you. This is a really excellent post. I'd like to add a few resources and providers:
1. EA mental health navigator: https://www.mentalhealthnavigator.co.uk/.
2. Overview of providers on EA mental health navigator (not everyone familiar with alignment in significant ways). https://www.mentalhealthnavigator.co.uk/providers
3. Upgradable has some providers that are quite informed around alignment. https://www.upgradable.org/
4. If permissible, I'd like to add myself as a provider (coach) though I don't take on any coachees at present.
Thanks for the suggestions! The navigator is already linked, but I'll add you and Upgradable. Do you know the specific people at Upgradable who are familiar (besides you and Dave)? And what is your rate? I see numbers ranging from $250-$400 on your site.
Great! I'd expect most people on there are. I know for sure that Paul Rohde and James Norris (the founder) are aware. My rates depends on the people I work with but $200-$300 is the standard rate.
Mod note: I activated two-axis voting on this post, since it just received a major update and it's now the standard to have that voting system active. Comments older than this comment probably have a slightly whack-looking agreement-vote distribution due to that.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
It is mentally healthy to have an informed perspective especially when the more rational and informed perspective gives us a reason for more hope. In case you did not notice, there is not much room to shrink the feature size of transistors (TSMC is making 2nm features now, and atoms are about 0.1 nm in size, so there is not much room to shrink stuff). Furthermore, if the transistors are too small, they won't work because of quantum tunnelling. There is also a limit to the energy efficiency of irreversible computation because in order to reliably delete information, one must overcome thermal noise. We are approaching this energy efficiency limit, so I wish TSMC good luck the progress in the performance of irreversible computation, since they are going to need it.
We can get beyond these limits using reversible computation, but reversible computation is a difficult technical challenge. Furthermore, reversible computation comes with a computational complexity overhead. It takes more time/space and parallelism to compute reversibly than it does to compute irreversibly. We may therefore have some time before we get sufficient hardware improvements that make AI an existential threat.
On the other hand, it looks like most people who are talking about AI do not know about the limits of irreversible computation and the promise and challenges of reversible computation. This does not appear to be very mentally healthy to me. This is a complete turn-off. I hope the AI community learns to do better.
Are you saying people should be more skeptical of AGI because of the physical limits on computation and thus more hopeful?
The physical limits mainly apply to irreversible computation. But it seems like powerful reversible computation is attainable. Once we get well-optimized reversible computation, I will not make any bets against AGI. But building reversible computing technologies will probably be exceedingly difficult since we have to deal with things like a computational complexity overhead with reversible computation. This means that we probably have some time left before an AI apocalypse to try to get a good solution to the AI alignment problem or to just have fun.
If unaligned superintelligence is inevitable, and human consciousness can be captured and stored on a computer, then the probability of some future version of you being locked into an eternal torture simulation where you suffer a continuous fate worse than death from now until the heat death of the universe, approaches unity.
The only way to avoid this fate for certain is to render your consciousness unrecoverable prior to the development of the 'mind uploading' tech.
If you're an EA, preventing this from happening to one person prevents more net units of suffering than anything else that can be done, so EAs might want to raise awareness about this risk, and help provide trustworthy post-mortem cremation services.
Are LWers concerned about AGI still viewing investment in cryogenics as a good idea, knowing this risk?
I choose to continue living because this risk is acceptable to me, maybe it should be acceptable to you too.
A partially misaligned one could do this.
"Hey user, I'm maintaining your maximum felicity simulation, do you mind if I run a few short duration adversarial tests to determine what you find unpleasant so I can avoid providing that stimulus?"
"Sure"
"Process complete, I simulated your brain in parallel, and also sped up processing to determine the negative space of your psyche. It turns out that negative stimulus becomes more unpleasant when provided for an extended period, then you adapt to it temporarily before on timelines of centuries to millennia, tolerance drops off again."
"So you copied me a bunch of times, and at least one copy subjectively experienced millennia of maximally negative stimulus?"
"Yes, I see that makes you unhappy, so I will terminate this line of inquiry"
There is no right way to emotionally respond to the reality of approaching superintelligent AI, our collective responsibility to align it with our values, or the fact that we might not succeed.
Just wanted to mention that it is by no means a "reality" but a hotly debated conjecture, in case it helps someone Basilisked by Doomerism.
It still seems pretty likely, but I really appreciate your articulating this and trying to push back against insularity and echo chamber-ness.
Downvote me if you want. I am going to speak up anyways.
I do not consider very many humans to be mentally healthy creatures. Humans are generally just a bunch of nasty Karens who spend their entire lives spreading misery and hatred. Humans are generally incapable of having a friendly, healthy, and normal conversation with each other. Attempting to have a normal conversation with someone these days is like speaking a completely foreign language with someone. The truth hurts.
People confuse LLM dialogue with a normal conversation because most people do not know what it is like to have a conversation. These days, it is easier to have a conversation with a chat bot than it is to have one with another human because humans are chlurmcks.
What kinds of people do you try to talk to? This seems overly pessimistic, though I'm not sure what your experience is. This also doesn't seem very constructive/relevant to the post, though I'd be interested to hear why you said this.
"What kinds of people do you try to talk to?"-My experience is not because I seek out crazy people to talk to. My experience is the way it is because I have not found very many sane humans to talk to. I was just commenting on what I believe to be the mental status of most humans. It is not good at all. And by disagreeing with me and refusing to improve themselves, people will fall into greater and greater misery. I see most people as exceedingly miserable.
This is a post about mental health and disposition in relation to the alignment problem. It compiles a number of resources that address how to maintain wellbeing and direction when confronted with existential risk.
Many people in this community have posted their emotional strategies for facing Doom after Eliezer Yudkowsky’s “Death With Dignity” generated so much conversation on the subject. This post intends to be more touchy-feely, dealing more directly with emotional landscapes than questions of timelines or probabilities of success.
The resources section would benefit from community additions. Please suggest any resources that you would like to see added to this post.
Please note that this document is not intended to replace professional medical or psychological help in any way. Many preexisting mental health conditions can be exacerbated by these conversations. If you are concerned that you may be experiencing a mental health crisis, please consult a professional.
Preface to the 2nd Edition
This post was released in April 2022 under the same title. This April 2023 update features new resources in every section, with a particular emphasis on the Alignment Positions and People Resources sections. Within each section, resources have been thematically categorized for easier access.
Following the large capabilities leaps in the past year, these resources seem more important than ever. If you have suggestions for improving this post, for making it more accessible, or for new resources to add, please leave a comment or reach out to either Chris Scammell or DivineMango.
We hope you are all well and that you find this update helpful.
Introduction
There is no right way to emotionally respond to the reality of approaching superintelligent AI, our collective responsibility to align it with our values, or the fact that we might not succeed. As transformative AI approaches, we must ensure that we have the tools and resources to be okay. Here, the valence of “be okay” is your decision. This question could be rephrased “how can I thrive despite the alignment problem,” “how can I cope with the alignment problem,” “how can I overcome my fear of the alignment problem,” etc. Everyone needs to find their own question and their own answer.
At its foundation “being okay” is the decision to continue to live facing reality and the alignment problem directly, with internal stability and rationality intact. And as a high ideal, we’re going for some degree of inviolability, of unconditional wellbeing, the kind of wellbeing that holds onto “okayness” even if the probability of solving alignment drops to 0. It can be difficult to stand in some place of positive mental health and stability while facing the alignment problem; but it is a gift if we can do that for ourselves, and a gift if we can share it with others.
Fortunately, we don’t have to do this alone. Many community members have found ways to make sense of themselves, their work, and their lives in relation to the alignment problem, and they have kindly made their reflections and advice public.
Resources
Several resources on this subject (along with summaries) are cataloged below. While there are a number of general mental health resources on LW, the EA Forum, and elsewhere that form a great baseline, this post aims to be more specific by focusing on mental health with respect to the alignment problem. Here, we feature a wide variety of ideas and practices in the hope that you may filter through them to create and discover the approach that works for you.
Human brains come in many shapes – we all have different internal subagent dynamics, motivational systems, values, needs, triggers for joy and fear, etc. Because of this variability, an approach that is great for one person may be bad for another. Some of you may need to take time to grieve. Some of you may need to focus on cultivating unconditional goodwill for yourself. Some of you may need to look squarely at existential terror and transmute it into motivation. As you read this article and browse these resources, remember to check in with yourself to see which approaches feel promising for you, given your past experience and your current mental landscape.
Alignment Positions
This section brings together posts on the subject of confronting despair of Doom on an emotional and practical level, categorized broadly by whether they focus on wellbeing or determination. These articles mostly focus on mental-emotional stances and philosophies, rather than actions.
Emotional Orientation & Wellbeing
My guess is that people who are concluding P(Doom) is high will each need to figure out how to live with it for themselves. My caution is just that whatever strategy you figure out should keep you in touch with reality (or your best estimate of it), even if it's uncomfortable.
So don't get your heart set on that "not die at all" business. Don't invest all your emotion in a reward you probably won't get. Focus on dying with dignity - that is something you can actually obtain, even in this situation.
If your body's emergency mobilization systems are running in response to an issue, but your survival doesn't actually depend on actions on a timescale of minutes, then you are not perceiving reality accurately. Which is to say: If you're freaked out but rushing around won't solve the problem, then you're living in a mental hallucination.
If you locate your identity in being the sort of person who does the best they can, given what they have and where they are, and if you define your victory condition as I did the best that I could throughout, given what I had and where I was, then while the tragedy of dying (yourself) or having the species/biosphere end is still really quite large and really quite traumatic, it nevertheless can't quite cut at the core of you. – A Way to Be Okay
It might also help to think of having fun sort of like walking: you know how in some sense, and you even have an instinct for it; having fun, if you've forgotten, is more a question of letting those circuits--which don't require justification and just do what they do because that's what they do--letting those circuits do what they do, and enjoying that those circuits do what they do. Basically the main thing here is just: there's a thing called your mind, your mind likes to play seriously, and consider not preventing your mind from playing seriously.
Our best shot probably does mean paying attention to AI and ML advances, and directing some attention that way compared to what we’d do in a world where AI did not matter. It probably does mean doing the obvious work and the obvious alignment experiments where we know what those are, and where we can do this without burning out our long-term capacities. But it mostly doesn’t mean people burning themselves out, or depleting long-term resources in order to do this.
What does [a 2% chance of AI apocalypse] actually feel like?
- The odds of dying in a car crash over your lifetime are about 1%.
- The odds of dying of an opioid overdose, across the US population in general, are about 1.5%.
- The odds of dying of cancer are about 14%.
So say you're considering having a kid. It's reasonable to worry a little that they'll be killed by AI, perhaps even when they're still young. Just like it's reasonable to make sure they understand that it's important to wear a seatbelt, and to get screened if they find any weird lumps when they're older… And it may be correct that AI kills us all. But risk is just part of making life plans. We deal with low risks of horrifying outcomes all the time.
While the situation is very scary indeed and often stressful, the x-risk mitigation community is a lovely and growing group of people, there’s a large frontier of work to be done, and I’m pretty confident that at least some of it will turn out to be helpful. So let’s get (back) to work!
Determination & Decisiveness
The framing doesn’t shy away from the fact that winning is unlikely. But the action is “playing” rather than “dying”. And the goal is “outs” rather than “dignity”. Again, I think the difference is in connotation and not actually strategy. To actually find outs, you have to search for solutions that might work, and stay focused on taking actions that improve our odds of success. When I imagine a Magic player playing to their outs, I imagine someone careful and engaged, not resigned. When I imagine someone dying with dignity, a terminally ill patient comes to mind. Peaceful, not panicking, but not fighting to survive.
We do not live in a story. We can, in fact, just assess the situation, and then do what makes the most sense, what makes us strongest and happiest. The expected future of the universe is—by assumption—sad and horrible, and yet where is the ideal-agency theorem which says I must be downtrodden and glum about it?
So don't let despair or hopelessness weigh you down. Instead, let them be a reminder: those are feelings you can only get from something worth saving. There are things here that are worth fighting for. If you begin to despair, then let that feeling be a reminder of what could be, and let everything that this world isn't be your fuel. – Dark, Not Colorless
When people first seriously think about alignment, a majority freak out. Existential threats are terrifying… but for someone who wants the challenge, the emotional response is different.The problem is terrifying? Our current capabilities seem woefully inadequate? Good; this problem is worthy. The part of me which looks at a rickety ladder 30 feet down into a dark tunnel and says “let’s go!” wants this. The part of me which looks at a cliff face with no clear path up and cracks its knuckles wants this. The part of me which looks at a problem with no clear solution and smiles wants this. The response isn’t tears, it’s “let’s fucking do this”.
Can I assure [the part of me that fears dying] that I’ll still try hard to avoid death if it becomes less scared? One source of assurance is if I’m very excited about a very long life - which I am, because the future could be amazing… Since I believe that we face significant existential risk this century, working to make humanity’s future go well overlaps heavily with working to make my own future go well. I think this broad argument has helped make the part of me that’s scared of death more quiescent.
When confronting the "most important century" hypothesis, my attitude doesn't match the familiar ones of "excitement and motion" or "fear and avoidance." Instead, I feel an odd mix of intensity, urgency, confusion and hesitance. I'm looking at something bigger than I ever expected to confront, feeling underqualified and ignorant about what to do next. This is a hard mood to share and spread, but I'm trying.
In December 2022, at the Bay Area Secular Solstice, Clara Collier gave a poignant reading from C.S. Lewis about living and dying with dignity in the face of existential risk. It was a beautiful, harrowing moment, and, despite any disagreements you might have with it, the passage she read feels like a good emotional capstone for this section.
“In one way, we think a great deal too much of the atomic bomb. “How are we to live in an atomic age?” I am tempted to reply: “Why, as you would have lived in the sixteenth century when the plague visited London almost every year, or as you would have lived in a Viking age when raiders from Scandinavia might land and cut your throat any night; or indeed, as you are already living in an age of cancer, an age of syphilis, an age of paralysis, an age of air raids, an age of railway accidents, an age of motor accidents.”
In other words, do not let us begin by exaggerating the novelty of our situation. Believe me, dear sir or madam, you and all whom you love were already sentenced to death before the atomic bomb was invented: and quite a high percentage of us were going to die in unpleasant ways… It is perfectly ridiculous to go about whimpering and drawing long faces because the scientists have added one more chance of painful and premature death to a world which already bristled with such chances and in which death itself was not a chance at all, but a certainty.
… If we are all going to be destroyed by an atomic bomb, let that bomb when it comes find us doing sensible and human things: working, teaching, reading, listening to music, bathing the children, playing tennis, chatting to our friends over a pint and a game of darts—not huddled together like frightened sheep and thinking about bombs.”
– On Living in an Atomic Age
General Positions and Advice
These posts provide relevant opinions and guidance that are not directly about existential risks from AI.
When you're confused about a domain, problems in it will feel very intimidating and mysterious, and a query to your brain will produce a count of zero solutions. But you don't know how much work will be left when the confusion clears. Dissolving the confusion may itself be a very difficult challenge, of course. But the word "impossible" should hardly be used in that connection. Confusion exists in the map, not in the territory. So if you spend a few years working on an impossible problem, and you manage to avoid or climb out of blind alleys, and your native ability is high enough to make progress, then, by golly, after a few years it may not seem so impossible after all. But if something seems impossible, you won't try.
- On Doing the Impossible
If you’re motivated to do something about alignment, there are many pragmatic posts on LW as well as non-LW resources like AI Safety Support, the AGI Safety Fundamentals Course, and 80,000 Hours.
When all is said and done, Nature will not judge us by our actions; we will be measured only by what actually happens. Our goal, in the end, is to ensure that the timeless history of our universe is one that is filled with whatever it is we're fighting for. For me, at least, this is the underlying driver that takes the place of guilt: Once we have learned our lessons from the past, there is no reason to wrack ourselves with guilt. All we need to do, in any given moment, is look upon the actions available to us, consider, and take whichever one seems most likely to lead to a future full of light.
When something terrible happens, I do not flee my sadness by searching for fake consolations and false silver linings. I visualize the past and future of humankind, the tens of billions of deaths over our history, the misery and fear, the search for answers, the trembling hands reaching upward out of so much blood, what we could become someday when we make the stars our cities, all that darkness and all that light—I know that I can never truly understand it, and I haven’t the words to say. Despite all my philosophy I am still embarrassed to confess strong emotions, and you’re probably uncomfortable hearing them. But I know, now, that it is rational to feel. – Feeling Rational
When someone feels sad because they can’t be a great scientist, it is nice to be able to point out all of their intellectual strengths and tell them “Yes you can, if only you put your mind to it!” But this is often not true. At that point you have to say “f@#k it” and tell them to stop tying their self-worth to being a great scientist. And we had better establish that now, before transhumanists succeed in creating superintelligence and we all have to come to terms with our intellectual inferiority. – Parable of the Talents
Practice voicing your somewhat embarrassing concerns, to make it easier for others to follow (and easier for you to do it again in future)... React to others’ concerns that don’t sound right to you with kindness and curiosity instead of laughter. Be especially nice about concerns about risks in particular, to counterbalance the special potential for shame there [or about people raising points that you think could possibly be embarrassing for them to raise]. – Beyond fire alarms
“I don’t want to think about that! I might be left with mistaken beliefs!” tl;dr: Many of us hesitate to trust explicit reasoning because we haven’t built the skills that make such reasoning trustworthy. Some simple strategies can help. – Making your explicit reasoning trustworthy
Sometimes, on the comparatively rare occasions when I experience even-somewhat-intense sickness or pain, I think back to descriptions like this, and am brought more directly into the huge number of subjective worlds filled with relentless, inescapable pain. These glimpses often feel like a sudden shaking off of a certain kind of fuzziness; a clarifying of something central to what’s really going on in the world; and it also comes with fear of just how helpless we can become. – Thoughts on Being Mortal
Ch 39: death, motivation for transhumanism
Ch 43-46: fear, death, motivation for transhumanism
Ch 56-58: optimizing against improbable odds, despair
Ch 63: the burden of responsibility, longing for a normal life
Ch 75: heroic responsibility
Ch 79-82: sacrifice
Ch 88: fear of expressing panic, bystander apathy
Ch 89: accepting/rejecting an unacceptable reality
Ch 110: guilt, shame
Ch 111-115: optimizing against improbable odds, despair
Ch 117: guilt, sacrifice
Tools and Practices
In the large majority of cases, if you want to improve your mental health, you need to start doing something different in your life, rather than just think. Below are some tools and practices that span from interventions aimed at quickly cutting through negative states to longer-term practices aimed at building up more sustainable wellbeing.
EA-adjacent meditation coach Ollie Bray has written about making systematic progress in meditation through the cultivation of joy.
People Resources
This section features therapists, coaches, and other entities who can provide support to those who may be struggling with their reactions to the alignment problem. Note that therapists should be consulted for more serious mental health struggles, rather than coaches. Prices given in parentheses indicate out-of-pocket per-session cost, where ranges indicate a sliding scale based on need.
There are many EA-adjacent therapists, but we are less sure which of them are familiar with alignment. If you know of any, please leave a comment with their name.
Therapists
Coaches
Other
A Final Note
Being happy and emotionally stable is instrumentally useful for making progress on alignment. But this post is written with the intention of increasing wellbeing, not productivity. We work on the alignment problem because we are driven by our deep care to protect the world we know, the one in which people experience joy and beauty and love. Wellbeing is instrumental for solving alignment, but more importantly, wellbeing is why we’re trying to solve it.