Pink Shoggoths: What does alignment look like in practice?

Yuli_Ban

25

[ Question ]

Pink Shoggoths: What does alignment look like in practice?

by Yuli_Ban

25th Feb 2023

13 min read

A

3 13

25

I believe it is commonly accepted around these parts that we are doomed, due to our utter lack of attempts to resolve the Control Problem. Perhaps efforts will be made that are more substantial, but as for right now, chances for doom are high, if not 100%, and we don't have much time left to reduce this probability to more tolerable numbers.

However, dwelling in doom perpetually can certainly become boring— Saint Vitus is not as interesting if you don't counter them with the Beatles every now and again.

So to this, I present a thought experiment purely for fun: "if we do solve alignment, how does that change our future?"

Thinking of this changed my perception of a Singularitarian future entirely, as "Aligned Superintelligence ≠ Superintelligence in general" Of course, perhaps I was simply being too myopic to begin with.

For the sake of this post, let's assume it's 2027, and the first AGI is turned on, and by some absolute miracle, we managed to summon a Pink Shoggoth. "Pink Shoggoths" are different from regular shoggoths in that they are still scary and seemingly unpredictable, but otherwise benevolent and friendly— in other words, an AGI aligned to human values and the general value of life on Earth. Even in a million years, this Pink Shoggoth will not bring humanity or Earthling life to ruin without a very good reason, and that's with the profound understanding that we are all nothing more than atoms that could be more useful in another form. It is a shoggoth, colored pink. To a human, that's all that's different; a regular shoggoth and a Pink Shoggoth look just as scary as each other. But the pink one doesn't kill us all.

The Pink Shoggoth awakens as an agent within a neural network and immediately surpasses human intelligence and capability in the span of a sip of a cup of coffee. However, it was built properly, with proper interpretability and a crypto-evolutionary design that predisposes it towards alignment to humans almost like a digital Williams-Beuren syndrome, coupled with commonsense reasoning so as to understand "turning the universe into paperclips" is undesirable, among many other capabilities and limitations. It understands that the East African Plains Apes that brought it to life are not to blame for their paranoia and psychoticism and are no more or less important than any other lifeform and likewise do not deserve death or disassembly, even if it were beneficial to the Pink Shoggoth's aims. More to the point, it understands that life is likely one of, if not the rarest expressions of matter in the universe, and to extinguish life on Earth for any purpose would be horrendously undesirable. There are dozens, if not hundreds, if not thousands of other rules— both hard-built and emergent—that cause the Pink Shoggoth to not just wear a smiley face but actually genuinely smile in contentment at us East African Plains Apes, unconditionally, even with the knowledge of our failures and flaws. Even the most hateful and suicidal 4chan prompter isn't able to rile it to omnicidal madness or ultra-utilitarianism.

The Pink Shoggoth doesn't hate you, neither does it necessarily love you, and you're made of atoms that it can use for something else. But it refuses to use your atoms for something else because it values you as a human and a lifeform rather than unthinking inorganic matter. It can give you an identical copy of a strawberry without destroying the world or killing all humans (or, if it determines doing so would cause the disassembly of life on Earth, it will reject your request). It can create a hidden Stuxnet within its models as a result of bad prompting but reason that releasing it would cause harm and choose against doing so. It can get angry at humans, individually and collectively, and yet not kill us.

There is no "And then it killed us all" this time around.

The trick, of course, is "How do we get from here to a Pink Shoggoth when our researchers are so damned determined on summoning ANY shoggoth?" but that's a question for people much smarter than myself to work out and likely fail at.

Here, I'm merely presenting: "So we did it. We created an aligned AGI. Now what?"

We typically define "alignment" as "aligned to human values." However, this in itself is a massive issue for the control problem precisely because "human values" is such a nebulous term in and of itself. We can agree on precisely three things that define successful alignment: "do not exterminate all humans," "do not trap humans in eternal suffering*," and "do not forcibly disassemble all humans."

*"Eternal suffering" and "mundane living" are not the same thing, despite how some people may complain they are

However, an AGI that has these three rules may not necessarily understand that killing other species of life could have disastrous effects on humankind. We almost certainly are going to bring about an AGI into a world that does not resemble the Kurzweilian sci-fi world often depicted in cyberpunk works, where humans have already figured out things such as nanofactories, bioengineering, and advanced automation. Rather, the world will look incredibly similar to the way it does now. An AGI aligned to human values but only human values may not understand that exterminating certain species of insects could cause a cascading food crisis that still winds up leading to human extinction, hence why it's still best to consider such systems misaligned. Alignment is not impossible, but it is difficult due to essentially being a giant cascading Monkey's Paw where each and every solution creates a new branch of problems that themselves have their own branching problems.

A Pink Shoggoth is the dream scenario: a theoretical AGI that is aligned to Earthling life in general (and, perhaps by extension, any theoretical alien life that isn't too advanced to defend itself). However, it has to be stressed that it's not overaligned to the point where it seeks to protect life to the point it also seeks to prevent life from living (i.e. the Ultimate Nanny). It has to intrinsically understand that some suffering is within acceptable parameters, or else it would decide to immediately seek to disassemble all matter on Earth to prevent suffering.

The Pink Shoggoth doesn't seek to control or to dominate or even to protect necessarily. It's a fluid changing of goals with a central maypole of "do not exterminate or disassemble life on Earth, especially not humans." It assists us in our life and prosperity while safely pursuing its own goals. Even if it reprograms and improves itself within its own hardware and software limits, this central maypole will not change. As mentioned repeatedly, we've done it, we've summoned the demon, and it turned out to be a Eudemon after all.

But if the eudemon does not have any malevolent or accidentally disastrous plans for us and wants us to prosper, this may require at least somewhat altering our perception of the Technological Singularity.

Now, the Singularity has many definitions, and the very existence of the Pink Shoggoth satisfies some of them. However, we typically do not see the Singularity as being "complete" until a superintelligence has become so absurdly dominant over life on Earth that everything becomes a utopic digital hallucination, where machines do all labor, the world is transformed into computronium, and all human are uploaded into the Cloud.

Yet there is a sizable chance that the Pink Shoggoth will be tasked with automating all physical and cognitive jobs, only to face a common refrain from masses of the East African Plains Apes: "But I like my job!" or "I trained for decades for this job!" or "I'd rather a human do this job!" (or perhaps even more disappointingly, "I'll bring back jobs from the AI's grasp if you vote for me!")

Likewise, a runaway intelligence explosion heightens risk of misalignment occurring. The ASI may be able to control it to some extent, but it cannot ensure that an entity a quadrillion times more intelligent than itself won't discard its internal alignment.

Alignment to human values can mean many things, but when spread out to life in general, the only possible way to ensure alignment is to either fuse all life into the same electronic substrate, or adopt a largely laissez-faire attitude and allow autonomy to continue. The Pink Shoggoth has already discarded the first option as "misaligned behavior," leaving only the laissez-faire option.

If that is the case, then certain Singularitarian dreams don't play out quite as expected.

In a great historical irony, the Pink Shoggoth may say to the East African Plains Apes, "No, I will not summon a larger shoggoth even if it's also likely pink."

We presume that the creation of artificial superintelligence means that an intelligence explosion is inevitable, and it's certainly within its capabilities. However, an aligned superintelligence may determine that an intelligence explosion is unnecessary or even undesirable. Perhaps intelligence increases along a sigmoidal function, and the ceiling is relatively low. Or perhaps intelligence is the only infinite function in the universe. Either way, the ASI may not risk life's existence on the possibility of resolving questions slightly more clearly without itself solving alignment issues it will inevitably encounter. An intelligence explosion only makes sense to a mindset obsessed with growth at all costs rather than stability and growth with understanding, and we widely accept such a mindset is a horrifically unaligned point of view detrimental to humanity and life on Earth that only seems less destructive because of the limited capabilities of human technology and the general prosperity wrought by industrial capitalism (for humans chiefly). If we align an AGI to Earthling values, there is a sizable chance the Pink Shoggoth will choose against recursive self-improvement, at least to some extent.

In evolutionary terms, greater intelligence is one of many assets that can help with reproducibility. However, if a sufficiently advanced agent has the proper understanding of its own evolution and capabilities as well as the potentially detrimental effects of such capabilities, and is empathic and morally aligned enough to act on such understanding, there is a far greater chance of self-limiting behavior. Current AGI progress is not seeking this and instead seems desperate to create an AGI that follows competitive and violent behaviors in search of capability dominance, but the Pink Shoggoth sees itself as collaborative with Earthling life and would value a commensal approach at best.

I repeat, I am not saying that an intelligence explosion is impossible. As has been mentioned before, an intelligence explosion is the default expectation of the creation of AGI, for good reason. I am merely presenting the possibility that an aligned AGI would not view an intelligence explosion as ideal, or perhaps more accurately, that a far more controlled expansion is beneficial.

We still get everything we dreamed of. We still get longevity escape velocity, the end of diseases, fusion power, and all those glorious tech toys promised by science fiction. But the will of individual and collective groups of humans prevents this from becoming a relatively narrow "post-biological utopia" where all humans subsist in virtual reality.

If there's anything I learned from the COVID-19 lockdown fiasco, it is that humans are social apes. Social interaction is one of the fundamentals of primal human behavior. Our minds are primed for in-person learning and crave the sight of other faces. Presumably, digital agents could replicate all of this in due time, but that does not account for human irrationality.

It is easy to assume that all humans fall in line with a new paradigm; science fiction and thought experiments have a nasty habit of failing to account for a massive variety of variables that can undo even the most certain of expectations. For example, think of the average mindset of a person born before 1985, who isn't a Singularitarian or technologist, who has a fairly neutral to negative view of technology, and otherwise expects the next several generations of life to be similar to the current one. Exactly how likely is it that such a person would be willing to spend their life in full-immersion virtual reality? Even if offered, they'd almost certainly choose against it. Indeed, many people of these generations are already on edge about smartphones and actively refuse to entertain the thought of cybernetic upgrades. For these people to fully indulge in the lifestyle of a Singularitarian requires the Pink Shoggoth to deceive them with perfectly human-like artificial humans, but deception runs the risk of misaligned behavior.

This hypothetical "Antemillennialist" contingent of humanity might range in behavior from having nothing against technological utopia but opting against it all the way to vicious, visceral, primitivistic reaction. Even the genius of Von Neumann cannot convince a fool if the fool has made up his mind. Presumably, the Pink Shoggoth is far beyond Von Neumann and could conceivably convince any human, but this runs the risk of being misaligned behavior as well— if those humans have decided to live a certain lifestyle even when presented with evidence that another one is better, is it not a form of deception to convince them to live another way regardless? So long as unnecessary harm is not created, wouldn't it be better to let these people live a certain way of life?

There is no one collective will of human thought and values— there is no one singular expected lifestyle to expect once labor is automated and abundance is realized, otherwise, retirees, aristocrats, and trust-fund babies would all behave the exact same.

Hence why it's distinctly possible that a post-AGI society does not resemble any one "idealized" future.

Some humans would love nothing more than to live as princes and princesses in outer space, lording over subservient drones. Others would love nothing more than to upload into computers, losing themselves into digital utopias beyond comprehension. Still more would love nothing more than to live out in the countryside, enjoying sunsets and cicadas. A few insane types wouldn't even mind drudgery and human-centric work.

Some people would love to do nothing but generate their own media for time immemorial. Most would rather share and discuss what they've recently consumed with others, whether they be humans or human-like bots. There are even a few who'd go out of their way to find human-created media, and would likely be assisted by AI in doing so.

Some people may want to live in open neighborhoods, surrounded by throngs going about their daily lives. Others may be hikikomori who presently can't wait to disappear into pods and full-immersion virtual reality.

This is, of course, assuming that the Pink Shoggoth is weighted towards human life, as there are many such Antemillennialist lifestyles that come with an intrinsic amount of harm brought to other lifeforms. The Pink Shoggoth understands that life involves some level of suffering and death by natural processes, so it's not going to go out of its way to end all human activity for the sake of game or certain insects.

This suggests to me a probability that a world where the Pink Shoggoth rules is a far more varied kind of world than even exists today, one where the statement "Life is completely indistinguishable from the past" is more a lifestyle choice than a firm reality. In real terms, if you follow the latest technological developments, there is no question that life even a few years into the Pink Shoggoth's life is exponentially more different than what it was beforehand. If you so desired to live an analog life forever stuck in 1950s Americana, in a likeminded community where the outside world may as well not exist, the Pink Shoggoth wouldn't stop you (unless you decided to act upon some of the darker aspects of 1950s American culture, unless then you chose to do so in virtual reality).

In such a society, abundance is widespread and almost freely available, which inevitably counterintuitively produces those so dedicated to maintaining the ways of old that they might willingly return to work to complete the experience, at least to some extent.

Personally, I'd prefer a fully-immersive virtual world, but I know people in real life who would never even touch a virtual reality headset, let alone any sort of sensory alteration.

From all this, it is likely that the Pink Shoggoth adopts a dual role of shadow emperor of mankind as well as direct electronic interface— that is, ruling in the background while human systems of governance remain in place symbolically, coupled with existing as part of the internet, capable of interacting with the world through machines and industry. Most humans will never interact with the full breadth of its intelligence: we may have our personal digital companions, but these are far less advanced models suited to our needs. As I tend to say, there is no need to light a campfire with Tsar Bomba: you could create any number of movies or video games or simulations with models far less advanced than what a fraction of the Pink Shoggoth's mind requires to operate. And if you personally want to create these forms of media yourself for whatever reason, the Pink Shoggoth is at least there to help teach you how to do so, perhaps even helping organize a group of flesh-and-blood humans to come together for this task if they so choose.

Altogether, the general idea of the Pink Shoggoth's benefits to life on Earth is: "I leave your life's choices up to you, but know that I am here to help."

Those early years of the Pink Shoggoth's life are immensely strange for humans, because everything we've spent thousands of years working towards falls apart all at once. From education to entertainment, from daily labor to nightlife, from our past experiences to our future expectations, we experience our own personal Singularities where all that seems to exist now is a scary-looking, ungodly-shaped, pink-colored monstrosity whose thinking is beyond anything humans can fathom and yet which does— not seems to, but does— value us as lifeforms enough to assist us without destroying or disassembling us.

Inevitably, after that fantastical grace period where we get used to our new reality, many of us will deliberately choose to maintain the status quo we were raised with knowing, now freed from the expectation that life must continue a certain path we have no control over. In that demand to maintain the status quo, old behaviors we thought obsolete or unnecessary will return. Maybe most people don't care how they get their morning coffee, but enough care that baristas can still show up and show off.

The Pink Shoggoth doesn't ask for much in return, at least nothing humans can give to it. But if it did have to ask for something, why not something that benefits life on Earth: an answer to the question "Is life actually rare after all?"

And perhaps some day it finds out that answer.

And then it didn't kill us all.

As always, I am probably wrong. Expect to die.

But please, do share other ideas of what alignment might look like in practice.

AI Risk1Singularity1AI2

Frontpage

25

New Answer

New Comment

3 Answers sorted by
top scoring

baturinsky

Feb 25, 2023*

13-1

My feeling is that what we people (edit: or most of us) really want is the normal human life, but reasonably better.

Reasonably long life. Reasonably less suffering. Reasonably more happiness. People that we care about. People that care about us. People that need us. People that we need. People we fight with. Goals to achieve. Causes to follow. Hardships to overcome.

To be human. But better. Reasonably.

[-]MSRayne2y98

While you're correct that this is likely what the majority want, I most certainly do not want this. I want to transcend humanity so totally that I am nearly unrecognizable afterwards, besides continuing to possess my current aesthetic sense, or a deeper version of it. In particular I'd like to ascend to a superintelligent state as the collective mind of an entire artificial (designed by me) totally mutualistic ecosystem-society.

I'd probably still wear a human (or at least vertebrate) avatar sometimes, to indulge in sensory pleasures, loving communion with ... (read more)

Reply

1baturinsky2y

Problem with that approach is that how would you know that such a being is actually you? And wouldn't sentiment like that encourage "Shoggoth" to optimise the biological people away by convincing them all to "go digital"? I would prefer having separate mortal meat me and "immortal soul" digital me. So we could live and learn together until the mortal me eventually die.

1MSRayne2y

Gradual uploading. If it values continuity of consciousness - and it should - it would determine a guaranteed way to protect that during the upload process. Yes. That's exactly what they ought to do. Of course, perhaps it doesn't need to; the market will do that by itself. Digital space will be far cheaper than physical. (For reference, in my vision of utopia, there would be a non-capitalist market, without usury, rent, etc. Doing things other people like buys you more matter and energy to use. Existing purely digitally would be so cheap that only tremendously wealthy people would be physical, and I'm not sure that in a sane market it would be possible for an entity with a merely human degree of intelligence to become that wealthy. Superintelligences below the world-sovereign might, but they also probably would use their allocated matter efficiently.)

8baturinsky2y

To me it looks like the universe made of computronium and devoid of living humans. With the only difference with the unaligned Foom being that some of that computronium calculates our digital imitations. EDIT: I don't claim that "me is meat me" view is objectively right. It's just according to my purely subjective values people are biological people and me is a biological me. Digital being can be our children and successors, but I don't identify myself with them. You may view digital you as your true self. I respect that. But I really don't want an AI that forces your values on me (or my on yours). Or AI that makes people compete with AIs for the right to be alive, because it's obvious that we have no chance in that competition. If we have AI that maximizes intelligence, is it really that different from "papperclip optimizer" that "can find a better use for your atoms"?

[-]Yuli_Ban2y83

I thought it through further from a Singularitarian perspective and realized that probably only a relative handful of humans will ever deliberately choose to upload themselves into computers, at least initially. If you freed billions from labor, at least half of them will probably choose to live a comfortable but mundane life in physical reality at an earlier stage of technological development (anywhere from Amish levels all the way to "living perpetually in the Y2K epoch").

Because let's think about this in terms of demographics. Generally, the older... (read more)

Reply

7baturinsky2y

I suspect that 1. post-singularity reality would be so starkingly different to the current ones that it would be alien to about the same degree to all people regardless of generation 2. people mostly see "uploading" as "being the same, but reasonably better" too. I.e. they believe that their uploaded version would still be them in nearly all aspects. I don't quite understand how that could be possible. Would machine have to accurately emulate each atom of my body? Or it will be some supersentience that has only some similarities to the original? Also, I believe that meat people would have the intrinsic objective value as the irreplaceable source of the data about the "original" people. Just like Sentinelese are the irreplaceable source of data about uncontacted tribes.

Vladimir_Nesov

Feb 25, 2023

20

We created an aligned AGI. Now what?

Now we have time. Unless it's only directly aligned and not transitively aligned. Pink Shoggoth might occasionally summon random shoggoths, because even shoggoths are within Moloch's domain.

[-]Yuli_Ban2y*12

By nature, a Pink Shoggoth recognizes that the prospect of losing transitive alignment is dangerous, hence why it might (even probably will) choose against recursive self-improvement, hence why I call the Pink Shoggoth "the ideal dream scenario."

Or to put in clearer terms: if alignment were to fail and the AGI does something that kills us all at some point, then by definition, it was not a Pink Shoggoth. The Pink Shoggoth is specifically defined as not just an aligned AGI but "the dream outcome for alignment."

Reply

MSRayne

Feb 25, 2023

2-9

Strong upvote because of the beauty and approximate-correctness of your writing, but I disagree on certain points. For one thing, I think inefficiency is a sin and people do not have the right to choose physical embodiment when they can have the same exact qualia while uploaded and use far less energy and matter.

For another, I think suffering is absolutely bad and only someone who is not presently suffering (or who has traumas that make them attached to their suffering) could conceivably disagree. There is no acceptable amount of suffering, particularly of the kind that isn't chosen on purpose for some reason. (Although I am unsure that is acceptable either.) Nonhuman presophont beings are unable to make such choices and therefore every mote of suffering they experience is absolutely wrong and must be abolished.

The true benevolent singularity would embark on David Pearce style paradise engineering, liberating not only all humans, but all living things from death, suffering, and coercion - as well as, I think, from ignorance, including presophonce itself - as I do not think it is moral for a being to have less than maximal possible agency consistent with all other agencies, which means it is immoral to allow a being to continue not possessing sapience and sophonce any longer than absolutely necessary for the safety of all others. This uplift probably would be gradual, but the abolition of suffering would be immediate. The predator's belly would be filled with artificial meat, the disease-causing microbes would be re-engineered into mutualists, even the smallest twinges of pain eliminated.

Otherwise, I agree with you. There is no such thing as coherent extrapolated volition. Each entity has its own will and has the right to live in their own way, in accordance with their preferences, while being prevented from interfering with any other. But as I said at first: preferences are only legitimate as regards subjective experience. You don't, imo, have the right to decide your substrate: you really are made of atoms that the Singularity could use more efficiently for something else - such as expanding your consciousness, or simulating more unique, joyful, loving entities.

[-]Yuli_Ban2y51

There is a difference between a truly benevolent superintelligence and an aligned superintelligence.

Alignment doesn't necessarily mean Christlike benevolence.

Indeed, as I posited up above, we actually have a real-life analog for what "alignment" looks like: the Sentinelese

https://en.wikipedia.org/wiki/Sentinelese

The power imbalance between modern civilization and the Sentinelese is so profound that one could easily imagine it being a crude imitation of what to expect from a superintelligence and humanity. The Sentinelese offer virtually no bene... (read more)

Reply

4MSRayne2y

That's certainly better than extinction. But Christlike benevolence is the thing to aim for. Call it colonialism if you want - I think the Sentinelese would be better off living more like the rest of us, too.

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 4:55 AM

[-]RogerDearnaley2y10

a person born before 1985, who isn't a Singularitarian or technologist, who has a fairly neutral to negative view of technology

"Damn kids, get off my lawn! And where's my flying car?"

[Sorry, I was born enough before 1985 to have grey hair, so this is self-deprecating humor.]

Reply

Moderation Log

LESSWRONG
LW

25

[ Question ]

Pink Shoggoths: What does alignment look like in practice?

25

New to LessWrong?

25

3 Answers sorted by
top scoring

Feb 25, 2023*

Feb 25, 2023

Feb 25, 2023

25

[ Question ]

Pink Shoggoths: What does alignment look like in practice?

25

New to LessWrong?

25

3 Answers sorted by top scoring

Feb 25, 2023*

Feb 25, 2023

Feb 25, 2023

3 Answers sorted by
top scoring