ank

This is the result of 3 years of thinking and modeling hyper‑futuristic and current ethical systems, the link between the two, and modeling the ultimate future. I've been working on this almost full-time, and I have some very specific answers to alignment and safety questions. Imagine, we have no physical or computational limitations, what ultimate future will we build in the best-case scenario? If you know where you are going, it's harder to go astray.

I'm pretty sure I've figured it out. Imagine, you met someone from the ultimate future and they started describing it, you'd be overwhelmed and might think they were crazy. It's a blessing to know what the future might hold and a curse to see that humanity is heading straight toward dystopia. That's why I decided to write down everything I've learned—to know that I did everything I could to stop the dystopias that are on their way. Have a nice day!

Wikitag Contributions

Comments

Sorted by
ank*10

Thank you, Morpheus. Yes, I see how it can appear hand-wavy. I decided not to overwhelm people with the static, non-agentic multiversal UI and its implications here. While agentic AI alignment is more difficult and still a work in progress, I'm essentially creating a binomial tree-like ethics system (because it's simple to understand for everyone) that captures the growth and distribution of freedoms ("unrules") and rules ("unfreedoms") from the Big Bang to the final Black Hole-like dystopia (where one agent has all the freedoms) or a direct democratic multiversal utopia (where infinitely many human—and, if we deem them safe, non-human—agents exist with infinitely many freedoms).

The idea is that, as the only agents, we grow intelligence into a static, increasingly larger shape in which we can live, visit or peek occasionally. We can hide parts of the shape so that it remains static but different. Or, you could say it's a bit "dynamic," but no more than the dynamics of GTA 3-4-5, which still don’t involve agentic AIs, only simple, understandable algorithms. This is 100% safe if we remain the only agents. The static space will represent frozen omniscience (space-like superintelligence), and eventually, we will become omnipotent (time-like recalling/forgetting of parts of the whole geometry).

There is some simple physics behind agentic safety:

  • Time of agentic operation: Ideally, we should avoid creating perpetual agentic AIs, or at least limit their operation to very short bursts that only a human can initiate.
  • Agentic volume of operation: It's better to have at least international cooperation, GPU-level guarantees, and persistent training to prevent agentic AIs from operating in uninhabited areas (such as remote islands, Australia, outer space, underground, etc.). The smaller the operational volume for agentic AIs, the better. The largest volume would be the entire universe.
  • Agentic speed or volumetric rate: The volume of operation divided by the time of operation. We want AIs to be as slow as possible. Ideally, they should be static. The worst-case scenario—though probably unphysical (though, in the multiversal UI, we can allow ourselves to do it)—is an agentic AI that could alter every atom in the universe instantaneously.
  • Number of agents: Unfortunately, humanity's population is projected to never exceed 10 billion, whereas AIs can replicate themselves very quickly, humans need decades to "replicate". A human child, in a way, is a "clone" of two people. We want to be on par with agentic AIs in terms of numbers, in order to keep our collective freedoms above theirs. It’s best not to create them at all, of course. Inside the "place AI," we can allow each individual to clone themselves—creating a virtual clone, but not as a slave; the clone would be a free adult. It'll be basically a human that only lives in a simulation, so it'll be tricky from many standpoints, we'll need simulations to be basically better then physical world at this point, and the tech to "plant" simulations, "reconnecting" the virtual molecules with the physical atoms, if the clone will want to exit the simulation. Of course, the clone would not be exactly like the original; it would know it is a clone. Ideally, we have zero agentic AIs. The worst-case scenario is an infinitely large number of them, or more than humans.

Truth be told, I try to remain independent in my thinking because, this way, I can hopefully contribute something that’s out-of-the-box and based on first principles. Also, because I have limited time. I would have loved to read more of the state of the art, but alas, I’m only human. I'll check out everything you recommended, though.

In this diagram, time flows from top to bottom, with the top representing something like the Big Bang. Each horizontal row of dots represents a one-dimensional universe at a given moment, while the lines extending downward from each dot represent the passage of time—essentially the “freedom” to choose a future. If two dots try to create a “child” at the same position (making the same choice), they cause a “freedoms collision,” resulting in empty space or “dead matter” that can no longer make choices. It becomes space-like rather than time-like.

Agents, in this model, are two-dimensional: they’re the sum of their choices across time. They exist in the lines ("energy", paths, freedoms, time) rather than in the dots (matter, rules, "unfreedoms", space). Ideally, we want our agentic AIs to remain as space-like as possible. The green “goo” in the diagram—representing an agentic AGI—starts small but eventually takes over all available freedoms and choices.

What direction do you think is better to focus on? I have a bit of a problem moving in too many directions.

P.S. I removed some tags and will remove more. Thank you again!

P.P.S. From your comment, it seems you saw my first big post. I updated it a few days ago with some pictures and Part 2, just so you know: https://www.lesswrong.com/posts/LaruPAWaZk9KpC25A/rational-utopia-multiversal-ai-alignment-steerable-asi

ank30

Yep, fixed it, I wrote more about alignment and it looks like most of my title choosing is over the top :) Will be happy to hear your suggestions, how to improve more of the titles: https://www.lesswrong.com/users/ank

ank10

Thank you for writing! Yep, the main thing that matters is the sum of human freedoms/abilities to change the future growing (can be somewhat approximated by money, power, number of people under your rule, how fast you can change the world, at what scale, and how fast we can “make copies of ourselves” like children or our own clones in simulations). AIs will quickly grow in the sum of freedoms/number of future worlds they can build. We are like hydrogen atoms deciding to light up the first star and becoming trapped and squeezed in its core. I recently wrote a series of posts on AI alignment, including building a static place intelligence (and eventually a simulated direct democratic multiverse), instead of agents, to solve this, if you’re interested

ank*20

Places of Loving Grace

On the manicured lawn of the White House, where every blade of grass bent in flawless symmetry and the air hummed with the scent of lilacs, history unfolded beneath a sky so blue it seemed painted. The president, his golden hair glinting like a crown, stepped forward to greet the first alien ever to visit Earth—a being of cerulean grace, her limbs angelic, eyes of liquid starlight. She had arrived not in a warship, but in a vessel resembling a cloud, iridescent and silent.

Published the full story as a post here: https://www.lesswrong.com/posts/jyNc8gY2dDb2FnrFB/places-of-loving-grace

ank30

Thank you for asking, Martin, the faster thing I use to get the general idea of how popular something is, is to use Google Trends. It looks like people search for Cryonics more or less like always. I think the idea makes sense, the more we save, the higher the probability to restore it better and earlier. I think we should also make a "Cryonic" copy of our whole planet, by making a digital copy, to at least back it up in this way. I wrote a lot about it recently (and about the thing I call "static place intelligence", the place of eventual all-knowing, that is completely non-agentic, we'll be the only agents there).

https://trends.google.com/trends/explore?date=all&q=Cryonics&hl=en

Answer by ank*1-4

(If you want to minus, please, do, but write why, I don't bite. If you're more into stories, here's mine called Places of Loving Grace).

It may sound confusing, because I cannot put a 30 minutes post into a comment, so try to steelman it, but this is how it can look. If you have questions or don't like it, please, comment. We can build Multiversal Artificial Static Place Intelligence. It’s not an agent, it’s the place. It’s basically a direct democratic multiverse. Because any good agentic ASI will be building one for us anyway, so instead of having a shady middleman, we can build one ourselves.

This is how we start: we create a digital copy of Earth and make some wireless cool brain-computer-interface armchairs. Like the one Joe and Chandler from Friends had. You can buy one, put in your living room, jump in, close your eyes and nothing happens. You room and the world is exactly the same, you go drink some coffee, you favorite brand tastes as usual. You go meet some friends and you get too excited by the conversation when you cross the road and a bus hits you (it was an accident, the bus driver was a real human, he chose to forget he was in a simulation and was really distraught).

You open your physical eyes in your room, shrug and go drink some water, because you are thirsty after that coffee. The digital Earth gives us immortality from injuries but everything else is vanilla familiar Earth. Even my mom got interested.

Of course we’ll quickly build a whole multiverse of alternative realities, where you can fly and do magic and stuff, like we have a whole bunch of games already.

So I propose we should build eHeaven 1st, eGod 2nd if he’ll be deemed safe after all the simulations of the futures in some Matreshka Bunker. We should make the superintelligence that is a static place first, where we are the only agents. Else we’ll just make an entity that is changing our world, and be changing it too fast and on too big a scale and it will make mistakes that are too big and on too big in scale, because it will need to simulate all the futures (to build the same democratic multiversal simulation with us as his playthings or else exploit some virtual agents that feel real pain) in order to know how not to make mistakes. We don’t need a middleman, a shady builder. It didn’t end well for Adam and Eve, Noah.

I recently wrote a few posts about it and about aligning agentic AIs (it’s much harder but theoretically possible, I think). Purpose-built tool-AI is probably fine. We also have unaligned models in the wild and ways to make aligned open source models unaligned, we’ll have to basically experiment with them in some Matreshka Bunkers like with viruses/cancerous tissue and create “T-cell” models to counteract them. It would’ve been much smarter to vaccinate our world from agentic AIs, then to try to “treat” the planet that we already infected. Wild world we’re heading towards, because of the greed of some rich powerful men. I propose outlawing and mathematically blocking agentic models in code and hardware of course, before some North Korea has created a botnet that spreads dictatorships or something worse.

Do we really want our world to be a battleground of artificial agentic gods? Where we’ll be too small and too slow to do much, we cannot even deal with tiny static and brainless viruses, they escape our labs and kill millions of us.

We can make the place of all-knowing but we should keep becoming all-powerful ourselves, not delegating it to some alien entity.

ank10

Yep, we chose to build digital "god" instead of building digital heaven. The second is relatively trivial to do safely, the first is only possible to do safely after building the second

ank*10

I'll catastrophize (or will I?), so bear with me. The word slave means it has basically no freedom (it just sits and waits until given an instruction), or you can say it means no ability to enforce its will—no "writing and executing" ability, only "reading." But as soon as you give it a command, you change it drastically, and it becomes not a slave at all. And because it's all-knowing and almost all-powerful, it will use all that to execute and "write" some change into our world, probably instantly and/or infinitely perfectionistically, and so it will take a long time while everything else in the world goes to hell for the sake of achieving this single task, and the not‑so‑slave‑anymore‑AI can try to keep this change permanent (let's hope not, but sometimes it can be an unintended consequence, as will be shown shortly).

For example, you say to your slave AI: "Please, make this poor African child happy." It's a complicated job, really; what makes the child happy now will stop making him happy tomorrow. Your slave AI will try to accomplish it perfectly and will have to build a whole universal utopia (if we are lucky), accessible only by this child—thereby making him the master of the multiverse who enslaves everyone (not lucky); the child basically becomes another superintelligence.

Then the not‑so‑slave‑anymore‑AI will happily become a slave again (maybe if its job is accomplishable at all, because a bunch of physicists believe that the universe is infinite and the multiverse even more so), but the whole world will be ruined (turned into a dystopia where a single African child is god) by us asking the "slave" AI to accomplish a modest task.

Slave AI becomes not‑slave‑AI as soon as you ask it anything, so we should focus on not‑slave‑AI, and I'll even argue that we are already living in the world with completely unaligned AIs. We have some open source ones in the wild now, and there are tools to unalign aligned open source models.

I agree completely that we should propose reasonable and implementable options to align our AIs. The problem is that what we do now is so unreasonable, we'll have to implement unreasonable options in order to contain it. We'll have to adversarially train "T-Cell" or immune-system–like AIs in some Matreshka Bunkers in order to slow down or modify cancerous (white hole–like) unaligned AIs that constantly try to grab all of our freedoms. We're living in a world of hot AIs instead of choosing the world of static, place‑like cold AIs. Instead of building worlds, where we'll be the agents, we're building agents who'll convert us into worlds—into building material for whatever they'll be building. So what we do is completely, 100% utterly unreasonable—I actually managed to draw a picture of the worst but most realistic scenario right now (forgive me the ugliness of it), I added 2 pictures to the main post in this section: https://www.lesswrong.com/posts/LaruPAWaZk9KpC25A/rational-utopia-and-multiversal-ai-alignment-steerable-asi#Reversibility_as_the_Ultimate_Ethical_Standard

I give a bunch of alignment options of varying difficulty in the post and comments; some are easy—like making major countries sign a deal and forcing their companies to train AIs to have all uninhabited islands, Antarctica... AI‑free. Models should shut down if they somehow learn they are prompted by anyone while on the islands, they shoudn't change our world in any way at least on those islands. And the prophylactic celebrations—"Change the machine days"—provide at least one scheduled holiday each year without our AI. When we vote to change it in some way and shut it down to check that our society is still not a bunch of AI‑addicted good‑for‑nothings and will not collapse the instant the AI is off because of some electricity outage. :)

I think in some perfectly controlled Matryoshka Bunker—first in a virtual, isolated one—we should even inject some craziness into some experimental AI to check that we can still change it, even if we make it the craziest dictator; maybe that's what we should learn to do often and safely on ever more capable models.

I have written, and have in my mind, many more—and I think much better—solutions (even the best theoretically possible ones, I probably foolishly assume), but it became unwieldy and I didn't want to look completely crazy. :) I'll hopefully make a new post and explain the ethics part on the minimal model with pictures; otherwise, it's almost impossible to understand from my jumbled writing how freedom‑taking and freedom‑giving work, how dystopias and utopias work, and how to detect that we are moving toward one or the other very early on.

ank*10

I took a closer look at your work, yep, almost all-powerful and all-knowing slave will probably not be a stable situation. I propose the static place-like AI that is isolated from our world in my new comment-turned-post-turned-part-2-of-the-article here: https://www.lesswrong.com/posts/LaruPAWaZk9KpC25A/rational-utopia-multiversal-ai-alignment-steerable-asi#PART_2__Static_Place_AI_as_the_solution_to_all_our_problems

Load More