I was less black-pilled when I wrote this - I also had the idea that though my own attempts to learn AI safety stuff had failed spectacularly perhaps I could encourage more gifted people to try the same. And given my skills or lack thereof, I was hoping this may be some way I could have an impact. As trying is the first filter. Though the world looks scarier now than when I wrote this, to those of high ability I would still say this: we are very close to a point where your genius will not be remarkable, where one can squeeze thoughts more beautiful and clear than you have any hope to achieve from a GPU. If there was ever a time to work on the actually important problems, it is surely now.
indeed. but the first superintelligences aren't looking to be superagentic, which I'd note is a mild reassurance. the runway is short, but I think safety has liftoff. don't lose hope just yet :)
High-detachment is great!...for certain situation for certain times. I really enjoy Rob Burbea's "Seeing That Frees" meta-framework regarding meditation techniques/ viewpoints: they are tools to be picked up and put down. If viewing the world in complete acceptance helps your suffering in that moment, then great! But you wouldn't want to do this all the time; eating and basic hygiene are actions of non-acceptance at a certain conceptual level. Same with impermanence and no-self. Your math friend may be open to that book recommendation.
TurnTrout argues that Tsuyoku Naritai is not it, and maybe he is right. I do not know what the correct emotion feels like, but I think maybe DF knew.
I've had a similar experience with feeling Tsuyoku Naritai, but it being a temporary state (a few hours or days at a time maybe). I'm currently extremely interested in putting on different mindsets/perspectives for different purposes. An example is purposely being low-energy for sleeping and high-energy for waking up (as in purposely cultivating a "low-energy" mindset and having a TAP to tune into that mindset when you're trying to sleep). In this case, Tsuyoku Naritai may be good for lighting the fire for a day or two to get good habits started. But I think people may use unnecessary muscles when tuning into this mindset, causing it to be tense, head-ache inducing, and aversive.
This is speculation though, but I am, again, very interested in this topic and discussing it more. Feel free to dm me as well if you'd like to chat or call.
It is not obvious at all that 'AI aligned with its human creators' is actually better than Clippy. Even AI aligned with human CEV might not beat Clippy. I would much rather die than live forever in a world with untold numbers of tortured ems, suffering subroutines, or other mistreated digital beings.
Few humans are actively sadistic. But most humans are quite indifferent to suffering. The best illustration of this is our attitude toward animals. If there is an economic or ideological reason to torment digital beings we will probably torture them. The future might be radically worse than the present. Some people think that human CEV will be kind to all beings because of the strong preferences of a minority of humans. The humans who care about suffering have strong enough preferences to outweigh small economic incentives. But the world I live in does not make me confident.
I also put non-trivial probability on the possibility that the singularity has already happened and I am already one of the digital beings. This is good news because my life is not currently horrible. But I am definitely afraid I am going to wake up one day and learn I am being sent back into digital hell. At a minimum, I am not at all interested in cryopreservation. I don't want to end up like MMAcevedo if I can still avoid such a fate.
It is not obvious at all that 'AI aligned with its human creators' is actually better than Clippy.
It's pretty obvious to me, but then I am a human being. I would like to live in the sort of world that human beings would like to live in.
I don't particularly blame humans for this world being full of suffering. We didn't invent parasitoid wasps. But we have certainly not used our current powers very responsibly. We did invent factory farms. And most of us do not particularly care.
I am very afraid more powerful humans/human-aligned beings will invent even worse horrors. And if we tolerate factory farming it seems possible we will tolerate the new horrors. So I cannot be confident that humans gaining more power, even if it was equitably distributed among humans, would actually be a good thing.
I fear this has already happened and I am already at the mercy of those vastly more powerful humans. In that sense, I fear for myself! But even if I am safe I fear for the many beings who are not. We can't even save the pigs, how are we going to save the ems!
But don't you share the impression that with increased wealth humans generally care more about the suffering of others? The story I tell myself is that humans have many basic needs (e.g. food, safety, housing) that historically conflicted with 'higher' desires like self-expression, helping others or improving the world. And with increased wealth, humans relatively universally become more caring. Or maybe more cynically, with increased wealth we can and do invest more resources into signalling that we are caring good reasonable people, i.e. the kinds of people others will more likely choose as friends/mates/colleagues.
This makes me optimistic about a future in which humans still shape the world. Would be grateful to have some holes poked into this. Holes that spontaneously come to mind:
I don't know how it will all play out in the end. I hope kindness wins and I agree the effect you discuss is real. But it is not obvious that our empathy increases faster than our capacity to do harm. Right now, for each human there are about seven birds/mammals on farms. This is quite the catastrophe. Perhaps that problem will eventually be solved by lab meat. But right now animal product consumption is still going up worldwide. And many worse things can be created and maybe those will endure.
People can be shockingly cruel to their own family. Scott's Who by Very Slow Decay is one of the scariest things I ever read. How can people do this to their own parents?
After a while of this, your doctors will call a meeting with your family and very gingerly raise the possibility of going to “comfort care only”, which means they disconnect the machines and stop the treatments and put you on painkillers so that you die peacefully. Your family will start yelling at the doctors, asking how the hell these quacks were ever allowed to practice when for God’s sake they’re trying to kill off Grandma just so they can avoid doing a tiny bit of work. They will demand the doctors find some kind of complicated surgery that will fix all your problems, add on new pills to the thirteen you’re already being force-fed every day, call in the most expensive consultants from Europe, figure out some extraordinary effort that can keep you living another few days.
Robin Hanson sometimes writes about how health care is a form of signaling, trying to spend money to show you care about someone else. I think he’s wrong in the general case – most people pay their own health insurance – but I think he’s spot on in the case of families caring for their elderly relatives. The hospital lawyer mentioned during orientation that it never fails that the family members who live in the area and have spent lots of time with their mother/father/grandparent over the past few years are willing to let them go, but someone from 2000 miles away flies in at the last second and makes ostentatious demands that EVERYTHING POSSIBLE must be done for the patient.
With increased wealth, humans relatively universally become more caring? Is this why billionaires are always giving up the vast majority of their fortunes to feed the hungry and house the homeless while willingly living on rice and beans?
If you donate to AI alignment research, it doesn't mean that you get to decide which values are loaded. Other people will decide that. You will then be forced to eat the end result, whatever it may look like. Your mistaken assumption is that there is such a thing as "human values", which will cause a world that is good for human beings in general. In reality, people have their own values, and they include terms for "stopping other people from having what they want", "making sure my enemies suffer", "making people regret disagreeing with me", and so on.
When people talk about "human values" in this context, I think they usually mean something like "goals that are Pareto optimal for the values of individual humans"- and the things you listed definitely aren't that.
If we are talking about any sort of "optimality", we can't expect even individual humans to have these "optimal" values, much less so en masse. Of course it is futile to dream that our deus ex machina will impose those fantastic values on the world if 99% of us de facto disagree with them.
I'm not sure they mean that. Perhaps it would be better to actually specify the specific values you want implemented. But then of course people will disagree, including the actual humans who are trying to build AGI.
What do you believe would happen to a neurotypical forced to have self-awareness and a more accurate model of reality in general?
The idea that they become allistic neurodivergents like me is, of course, a suspicious conclusion, but I'm not sure I see a credible alternative. CEV seems like an inherently neurodivergent idea, in the sense that forcing people (or their extrapolated selves) to engage in analysis is baked into the concept.
I often honestly struggle to see neurotypicals as sane, but I'm hideously misanthropic at times. The problem is, I became the way I am through a combination of childhood trauma and teenage occultism (together with a tendency to be critical of everything), which is a combination that most people don't have and possibly shouldn't have; I don't know how to port my natural appetite for rationality to a "normal" brain.
Exactly your point is what has prevented me from adopting the orthodox LessWrong position. If I knew that in the future Clippy was going to kill me and everyone else, I would consider that a neutral outcome. If, however, I knew that in the future some group of humans were going to successfully align an AGI to their interests, I would be far more worried.
If anyone knows of an Eliezer or SSC-level rebuttal to this, please let me know so that I can read it.
The way I see it, the only correct value to align any AI to is not the arbitrary values of humans-in-general, assuming such a thing even exists, but rather the libertarian principle of self-ownership / non-aggression. The perfect super-AI would have no desire or purpose other than to be the "king" of an anarcho-monarchist world state and rigorously enforce contracts (probably with the aid of human, uplift, etc interpreters, judges, and juries stipulated in the contracts themselves, so that the AI does not have to make decisions about what is reasonable), including a basic social contract, binding on all sentient beings, that if they possess the capacity for moral reasoning, they are required to respect certain fundamental rights of all other sentient beings. (This would include obvious things like not eating them.) It would, essentially, be a sentient law court (and police force, so that it can recognize violations and take appropriate action), in which anything that has consciousness has its rights protected. For a super-AI to be anything other than that is asking for trouble.
There is only one god, and his name is Death. And there is only one thing we say to Death.
Not today.
I bet solutions have been somewhat found/sustained in other contexts (high-performance research teams, low-runway startups, warfare, uh... maybe astronauts?).
I'm trying to read more on this topic for a new post. Any other ideas, or books or historical events or people worth adding to my list?
I think the appropriate emotion is desperate, burning hatred for every vile inhuman force, such as Moloch, death, paperclip maximizers, carnism (considered as a mind parasite almost universally endemic to the human population), or FFI, that blindly and callously defiles the minds and bodies of sentient beings who deserve so much better. I unfortunately 1. cannot maintain such an emotion for long and 2. find my resentment constantly leaking out against other people who lack understanding of these issues and whose actions unknowingly (or knowingly, in the case of carnists, but bless them, they know not what they do) support them, which does not help. So... maybe I'm wrong.
AI alignment isn't the only problem. Most people's values are sufficiently unaligned with my own that find solving AI unattractive as a goal. Even if I had a robust lever to push, such as donating to an AI alignment research org or lobby think tank and it was actually cost-effective, the end result would still be unaligned (with me) values being loaded. So there are two steps rather than one: First, you have to make sure the people who create AI have values aligned with yours, and then you have to make sure that the AI has values aligned with the people creating it.
Frankly, this is hopeless from my perspective. Just the first step is impossible. I know this from years of discussions and debates with my fellow human beings, and from observing politics. The most basic litmus test for me is if they force fates worse than death on people who explictly disagree. In other words, if suffering is mandatory or if people will respect other people's right to choose painless death as an ultima ratio solution for their own selves (not forcing it on others). This is something so basic and trivial, and yet so existential that I consider it a question where no room for compromise is possible from my perspective. And I am observing that, even though public opinion robustly favors some forms of suicide rights, the governments of this world have completely botched the implementation. And that is just one source of disagreement, the one I choose as a litmus test because the morally correct answer is so obvious and non-negotiable from my perspective.
The upside opportunities from the alleged utopias we can achieve if we get the Singularity right also suffer from this problem. I used to think that if you can just make life positive enough, the downside risks might be worth taking. So we could implement (voluntary) hedonic enhancements, experience machines and pleasure wireheading offers to make it worthwhile for those people who want it. These could be so good that it would outweigh the risk, and investing in such future life could be worth it. But of course those technologies are decried as "immoral" also, by the same types of "moralists" who decry suicide rights. To quote former LessWrong user eapache:
...the “stimmer”‘s (the person with the brain-stimulating machine) is distinctly repugnant in a way that feels vaguely ethics-related.
...Anything that we do entirely without benefit to others is onanistic and probably wrong.
https://www.lesswrong.com/posts/e2jmYPX7dTtx2NM8w/when-is-it-wrong-to-click-on-a-cow
There is a lot of talk about "moral obligations" and "ethics" and very little about individual liberty and the ability to actually enjoy life to its fullest potential. People, especially the "moral" ones, demand Sacrifices to the Gods, and the immoral ones are just as likely to create hells over utopias. I see no value in loading their values into an AI, even if it could be done correctly and cost-effectively.
Luckily, I don't care about the fate of the world in reflective equilibrium, so I can simply enjoy my life with lesser pleasures and die before AGI takes over. At least this strategy is robust and doesn't rely on convincing hostile humans (outside of deterring more straightforward physical attacks in the near-term, which I do with basic weaponry) let alone solving the AGI problem. I "solve" climate change the same way.
I can report some anecdotal success
Was that sentence missing a period, or something else?
But also, we are people[.]
silly
You're talking about, among other things, death. So it isn't silly. Silly would be:
[You're playing a video game. It looks like you're going to lose. And someone says: ]
"Do not go gentle into that good night."
DF was born with a time bomb in his genome, a deadly curse more horrifying than most. The name of this curse was Fatal familial insomnia (FFI).
Wikipedia describes the usual progression of this hell:
From the case report DF's psychologist wrote after his death:
Not only is there no cure for FFI; there is no known cure for any prion disease.
On the day it became clear he was experiencing the same symptoms his relatives did, DF must have known his chances were terrible. And even the minuscule odds that he could find a solution were marred by the fact that his problem-solving organ was the very part of him that was beginning to degrade.
And if there was a way out, how could he come up with a solution when he was so, so tired?
If only he could get just a little bit of rest.
There is a motivational technique I use occasionally where I look at my behavior and meditate on what my revealed preferences imply about my actual preferences. Often, I am disgusted.
I then note that I am capable of changing my behavior. And that all that is required to change my revealed preferences is to change my behavior. Though there is an element of sophistry in this line of thinking, I can report some anecdotal success.
Many of us here, like DF, believe we have a deadly curse - or at least we believe we believe we have a deadly curse.
Since I read The Basic AI Drives, I have known abstractly the world is doomed. Though my system 1 seems to have difficultly comprehending this, this belief implies I, and everyone and everything I love, am doomed, too.
Through the lens of my revealed preferences, I either do not truly think the alignment problem is much of a problem, am mostly indifferent to the destruction of myself and everything I care about, or I am choosing the ignoble path of the free-rider.
I notice I am disgusted. But this is good news. All that is required to change my revealed preferences is to change my behavior.
DF's first tools were those of his naturopathic trade. He reported some success with a vitamin cocktail of the standard throw-in-everything-but-the-kitchen-sink, alternative-medicine style.
Perhaps something in his cocktail had some effect as his progression was slower than normal. But slow progression is progression just the same:
Noticing the efficacy of the anesthetics, DF began to use them regularly:
In Irrational Modesty, I argue modesty is a paralytic preventing otherwise-capable people from acting on alignment:
There is another class of anesthetic whose key feature is a sort of detached emotional state, a feeling of being "above it" or "beyond it" or even "below it". Let's call "above it" and "beyond it" high detachment and "below it" low detachment.
Low detachment goes by names like "cynicism" or "nihilism". At its worst, one begins to take pleasure in one's own hopelessness, epitomized by this thought: I believe we are all doomed and there is nothing we can do about it. Isn't that metal!" If you find yourself thinking along those lines, imagine a man in a car hurtling towards a cliff thinking, I believe I am doomed and there is nothing I can do about it. Isn't that metal!".
High detachment goes by names like "enlightenment" and "awakening" and sometimes even "stoicism" It combines the, largely correct, realization that a great deal of suffering is caused by one's internal reactions to external events with the more questionable prescription of using this understanding to self-modify yourself towards a sort of "soft salvation" of resignation, acceptance, and the cultivation of inner peace.
One former hero of mine (a brilliant mathematician who was planning to work in alignment after grad school) was completely demoralized by this line of thinking.
He seems happier now. He seems more fulfilled. He seems to enjoy his meditation retreats.
He seems to have stopped working on avoiding the catastrophe that will kill him and most of the things he used to care about.
I consider this to be something of a shame.
Though DF's anesthetic regimen may have provided some relief, it was by no means a cure:
In this same month his condition worsened further and we began to see hints of DF's unusual mental strength and creativity:
Once phentermine became ineffective, he moved on to other stimulants.
As the stimulants and their come-downs faded in efficacy, DF began to try to physically exhaust himself, forcing himself (in a state of complete mental exhaustion) to go on long hikes.
In the 19 months from the onset of his symptoms and, one presumes, in a state of unimaginable desperation, he got more creative:
From a conversation with @TurnTrout on Discord
Just over 2 years after the onset of symptoms DF died of cardiac arrest, the result of heart damage from his FFI, his variegated treatments, and possibly drug withdrawal. He had lived 18 months longer than the typical case of FFI does.
In my mind, he died a hero's death.
It is probably too high school to end with the Dylan Thomas quotation. So I will just emphasize the rationalist cliche that we should strive to feel the emotion appropriate to the predicament we are in.
TurnTrout argues that Tsuyoku Naritai is not it, and maybe he is right. I do not know what the correct emotion feels like, but I think maybe DF knew.
Too high school — but then again: