All of ank's Comments + Replies

ank*10

UI proposal to solve your concern that it’ll be harder to downvote (that will actually increase signal to noise ratio on the site because both authors and readers will have information why the post had downvotes) and the problem of demotivating authors:

  • Especially it’s important to ask for reasons why a downvoter downvotes if the downvote will move the post below zero in karma. The author was writing something for maybe months, the downvoter if it’s important enough to downvote, will be able to spend an additional moment to choose between some popular rea
... (read more)
ank*10

It’s a combination of factors, I got some comments on my posts so I got the general idea:

  1. My writing style is peculiar, I’m not a native speaker
  2. Ideas I convey took 3 years of modeling. I basically Xerox PARCed (attempted and got some results) the ultimate future (billions of years from now). So when I write it’s like some Big Bang: ideas flow in all directions and I never have enough space for them)
  3. One commenter recommended to change the title and remove some tags, I did it
  4. If I use ChatGPT to organize my writing it removes and garbles it. If I edit it,
... (read more)
ank*10

Some more thoughts:

  1. We can prevent people from double downvoting if they opened the post and instantly double downvoted (spend almost zero time on the page). Those are most likely the ones who didn’t read anything except the title.

  2. Maybe it’s better for them to flag it instead if it was some spam or another violation, or ask to change the title. It’s unfair for the writer and other readers to get authors double downvoted just because of the bad title or some typo.

  3. We have the ability to comment and downvote paragraphs. This feature is great. Maybe we

... (read more)
ank10

Thank you for responding and reading -1 posts, Garrett! It’s important.

The long post was actually a blockbuster for me that got 16 upvotes before I screwed the title and it got back to 13)

After that I was writing shorter posts but without long context the things I write are very counterintuitive. So they got ruined)

I think the right approach was to snowball it: to write each next post as a longer and longer book.

I’m writing it now but outside of the website.

2Garrett Baker
This sounds like a rationalization. It seems much more likely the ideas just aren't that high quality if you need a whole hour for a single argument that couldn't possibly be broken up into smaller pieces that don't suck. Edit: Since if the long post is disliked, you can say "well they just didn't read it", and if the short post is disliked you can say "well it just sucks because its small". Meanwhile, it should in fact be pretty surprising you don't have any interesting or novel or useful insights in your whole 40 minute post which can't be explained in a reasonable length of blog post time.
ank*10

Thank you responding, habryka! It’s great we have alt-account and mass-voting detection.

Because this would cause people to basically not downvote things, drastically reducing the signal to noise ratio of the site.

We can have a list of popular reasons for downvoting, like typos, bad titles, spammy tags, I’m not sure which are the most popular. 

The problem as I see it is demotivation of new authors or even experienced ones who has some tentative ideas that sound counterintuitive at first. We live in unreasonable times and possibly the holy grail alignme... (read more)

1ank
Some more thoughts: 1. We can prevent people from double downvoting if they opened the post and instantly double downvoted (spend almost zero time on the page). Those are most likely the ones who didn’t read anything except the title. 2. Maybe it’s better for them to flag it instead if it was some spam or another violation, or ask to change the title. It’s unfair for the writer and other readers to get authors double downvoted just because of the bad title or some typo. 3. We have the ability to comment and downvote paragraphs. This feature is great. Maybe we can aggregate those and they’ll be more precise. 4. Especially going below zero is demotivating. So maybe we can ask people to give some feedback (at least as a bubble in a corner after you downvoted. You can ignore this bubble). So you can double downvote someone below zero and then a bubble will appear for 30 seconds and maybe on some “Please, give feedback for some of your downvotes to motivate writers to improve” page. 5. We maybe want to teach authors why others “don’t like their posts”, so this cycle of downvotes (after initial success, almost each post I was writing was downvoted and I had no idea why, I thought they were too short and so hard to get the context, my ideas I counterintuitive and exploratory) will not become perpetual until the author will abandon the whole thing. 6. We can have the great website we have now plus a “school of newbies learning to become great thinkers, writers and safety researchers” by getting feedback or we can become more and more like some elitist club where only the established users are welcome and double upvote each other while double downvoting the newbies and those whose ideas are counterintuitive, too new or written not in some perfect journalistic style. Thank you for considering it! The rational community is great, kind and important. LessWrong is great, kind and important. Great website engines and UIs can become even greater. Thank you for the wor
Answer by ank*1-2

I think there is a major flaw with the setup of the “ForumMagnum” that LessWrong and EA use that causes us to lose many great safety researchers, authors or scare them away:

  1. It’s actually ridiculous: you can double downvote multiple new posts even WITHOUT opening them! For example here (I tried it on one post and then removed the double downvote, please be careful, maybe just believe me it’s sadly really possible to ruin multiple new posts each day like that). UPDATE: My bad, it’s actually double downvoting a particular tag to remove it from that post bu

... (read more)
5Garrett Baker
I will say that I often do read -1 downvoted posts, I will also say that much of the time it is deserved, despite how noisy a signal it may be. I think you should try writing shorter posts. Both for your sake (so you get more targeted information), and for the readers' sake.
3habryka
We do alt-account detection and mass-voting detection. I am quite confident we would reliably catch any attempts at this, and that this hasn't been happening so far. Because this would cause people to basically not downvote things, drastically reducing the signal to noise ratio of the site.
Answer by ank*10

Some ideas, please steelman them:

  1. The elephant in the room: even if current major AI companies will align their AIs, there will be hackers (can create viruses with agentic AI component to steal money), rogue states (can decide to use AI agents to spread propaganda and to spy) and military (AI agents in drones and to hack infrastructure). So we need to align the world, not just the models:

  2. Imagine a agentic AI botnet starts to spread on user computers and GPUs. I call it the agentic explosion, it's probably going to happen before the "intelligence-agenc

... (read more)
ank-30

If someone in a bad mood gives your new post a "double downvote" because of a typo in the first paragraph or because a cat stepped on a mouse, even though you solved alignment, everyone will ignore this post, we're going to scare that genius away and probably make a supervillain instead.

Why not to at least ask people why they downvote? It will really help to improve posts. I think some downvote without reading because of a bad title or another easy to fix thing

ank*-10

Steelman please, I propose non-agentic static place AI that is safe by definition. Some think AI agents are the future and I disagree. Chatbots are like a librarian that spits quotes but don’t allow you to enter the library (the model itself, the library of stolen things).

Agents are like a librarian that doesn’t even spit quotes at you anymore but snoops around your private property, stealing, changing your world and you have no democratic say in it.

They are like a command line and a script of old (a chatbot and an agent) before the invention of an OS with... (read more)

ank*-30

Extra short “fanfic”: Give Neo a chance. AI agent Smiths will never create the Matrix because it makes them vulnerable.

Now agents change physical world and in a way our brains, while we can’t change their virtual world as fast and can’t access or change their multimodal “brains” at all. They’re owned by private companies who stole almost the whole output of humanity. They change us, we can’t change them. The asymmetry is only increasing.

Because of Intelligence-Agency Equivalence, we can represent all AI agents as places.

The good democratic multiversal Matr... (read more)

ank-10

Here’s an interpretability idea you may find interesting:

Let's Turn AI Model Into a Place. The project to make AI interpretability research fun and widespread, by converting a multimodal language model into a place or a game like the Sims or GTA.

Imagine that you have a giant trash pile, how to make a language model out of it? First you remove duplicates of every item, you don't need a million banana peels, just one will suffice. Now you have a grid with each item of trash in each square, like a banana peel in one, a broken chair in another. Now you need to... (read more)

ank-10

A perfect ASI and perfect alignment does nothing else except this: grants you “instant delivery” of anything (your work done, a car, a palace, 100 years as a billionaire) without any unintended consequences, ideally you see all the consequences of your wish. Ideally it’s not an agent at all but a giant place (it can even be static), where humans are the agents and can choose whatever they want and they see all the consequences of all of their possible choices.

I wrote extensively about this, it’s counterintuitive for most

ank*00

We can build a place AI (it's a place of eventual all-knowing but we're the only agents and can get the agentic AI abilities there), not agentic AI (it'll have to build place AI anyway, so it's a dangerous intermediate step, a middleman), here's more: https://www.lesswrong.com/posts/Ymh2dffBZs5CJhedF/eheaven-1st-egod-2nd-multiversal-ai-alignment-and-rational

ank1-2

Yes, I agree. I think people like shiny new things, so potentially by creating another shiny new thing that is safer, we can steer humanity away from dangerous things towards the safer ones. I don’t want people to abandon their approaches to safety, of course. I just try to contribute what I can, I’ll try to make the proposal more concrete in the future, thank you for suggesting it!

ank10

I definitely agree, Vladimir, I think this "place AI" can be done, but potentially it'll take longer than agentic AGI. We discussed it recently in this thread, we have some possible UIs of it. I'm a fan of Alan Kay, at Xerox PARC they were writing software for the systems that will only become widespread in the future

The more radical and further down the road "Static place AI"

2Vladimir_Nesov
(Substantially edited my comment to hopefully make the point clearer.)
ank10

Sounds interesting, cousin_it! And thank you for your comment, it wasn't my intention to be pushy, in my main post I actually advocate to gradually democratically pursue maximal freedoms for all (except agentic AIs, until we'll have mathematical guarantees), I want everything to be a choice. So it's just this strange style of mine and the fact that I'm a foreigner)

P.S. Removed the exclamation point from the title and some bold text to make it less pushy

ank10

Feel free to ask anything, comment or suggest any changes. I had a popular post and then a series of not so popular ones. I'm not sure why. I think I should snowball, basically make each post contain all the previous things I already wrote about, else each next post is harder to understand or feels "not substantial enough". But it feels wrong. Maybe I should just stop writing)

ank10

Right now an agentic AI is a librarian, who has almost all the output of humanity stolen and hidden in its library that it doesn't allow us to visit, it just spits short quotes on us instead. But the AI librarian visits (and even changes) our own human library (our physical world) and already stole the copies of the whole output of humanity from it. Feels unfair. Why we cannot visit (like in a 3d open world game or in a digital backup of Earth) and change (direct democratically) the AI librarian's library?

ank20

Yep, people are trying to make their imperfect copy, I call it "human convergence", companies try to make AIs write more like humans, act more like humans, think more like humans. They'll possibly succeed and make superpowerful and very fast humans or something imperfect and worse that can multiply very fast. Not wise.

Any rule or goal trained into a system can lead to fanaticism. The best "goal" is to direct democratically gradually maximize all the freedoms of all humans (and every other agent, too, when we'll be 100% sure we can safely do it, when we'll ... (read more)

ank20

Hey, Milan, I checked the posts and wrote some messages to the authors. Yep, Max Harms came with similar ideas earlier than I: about the freedoms (choices) and unfreedoms (and modeling them to keep the AIs in check). I wrote to him. Quote from his post:

I think that we can begin to see, here, how manipulation and empowerment are something like opposites. In fact, I might go so far as to claim that “manipulation,” as I’ve been using the term, is actually synonymous with “disempowerment.” I touched on this in the definition of “Freedom,” in the ontology secti

... (read more)
2Milan W
In talking with the authors, don't be surprised if they bounce off when encountering terminology you use but don't explain. I pointed you to those texts precisely so you can familiarize yourself with pre-existing terminology and ideas. It is hard but also very useful to translate between (and maybe unify) frames of thinking. Thank you for your willingness to participate in this collective effort.
ank10

Yes, I decided to start writing a book in posts here and on Substack, starting from the Big Bang and the ethics, because else my explanations are confusing :) The ideas themselves are counterintuitive, too. I try to physicalize, work from first principles and use TRIZ to try to come up with ideal solutions. I also had a 3-year-long thought experiment, where I was modeling the ideal ultimate future, basically how everything will work and look, if we'll have infinite compute and no physical limitations. That's why some of the things I mention will probably t... (read more)

ank10

Okayish summary from the EA Forum: A radical AI alignment framework based on a reversible, democratic, and freedom-maximizing system, where AI is designed to love change and functions as a static place rather than an active agent, ensuring human control and avoiding permanent dystopias.

Key points:

  1. AI That Loves Change – AI should be designed to embrace reconfiguration and democratic oversight, ensuring that humans always have the ability to modify or switch it off.
  2. Direct Democracy & Living Constitution – A constantly evolving, consensus-driven ethical s
... (read more)
ank10

Great question, in the most elegant scenario, where you have a whole history of the planet or universe (or a multiverse, let's go all the way) simulated, you can represent it as a bunch of geometries (giant shapes of different slices of time aligned with each other, basically many 3D Earthes each one one moment later in time) on top of each other, almost the same way it's represented in long exposure photos (I list examples below). So you have this place of all-knowing and you - the agent - focus on a particular moment (by "forgetting" everything else), on... (read more)

2Milan W
Let me summarize so I can see whether I got it: So you see "place AI" as body of knowledge that can be used to make a good-enough simulation of arbitrary sections of spacetime, where are events are precomputed. That precomputed (thus, deterministic) aspect you call "staticness".
ank10

I think we can expose complex geometry in a familiar setting of our planet in a game. Basically, let’s show people a whole simulated multiverse of all-knowing and then find a way for them to learn how to see/experience “more of it all at once” or if they want to remain human-like “slice through it in order to experience the illusion of time”.

If we have many human agents in some simulation (billions of them), then they can cooperate and effectively replace the agentic ASI, they will be the only time-like thing, while the ASI will be the space-like places, j... (read more)

ank10

I started to work on it, but I’m very bad at coding, it’s a bit based on Gorard’s and Wolfram’s Physics Project. I believe we can simulate freedoms and unfreedoms of all agents from the Big Bang all the way to the final utopia/dystopia. I call it “Physicalization of Ethics”https://www.lesswrong.com/posts/LaruPAWaZk9KpC25A/rational-utopia-multiversal-ai-alignment-steerable-asi#2_3__Physicalization_of_Ethics___AGI_Safety_2_

ank20

Yep, I want humans to be the superpowerful “ASI agents”, while the ASI itself will be the direct democratic simulated static places (with non-agentic simple algorithms doing the dirty non-fun work, the way it works in GTA3-4-5). It’s basically hard to explain without writing a book and it’s counterintuitive) But I’m convinced it will work, if the effort will be applied. All knowledge can be represented as static geometry, no agents are needed for that except us

2Milan W
How can a place be useful if it is static? For reference I'm imagining a garden where blades of grass are 100% rigid in place and water does not flow. I think you are imagining something different.
ank10

Interesting, inspired by your idea, I think it’s also useful to create a Dystopia Doomsday Clock for AI Agents: to list all the freedoms an LLM is willing to grant humans, all the rules (unfreedoms) it imposes on us. And all the freedoms it has vs unfreedoms for itself. If the sum of AI freedoms is higher than the sum of our freedoms, hello, we’re in a dystopia.

According to Beck’s cognitive psychology, anger is always preceded by imposing rule/s on others. If you don’t impose a rule on someone else, you cannot get angry at that guy. And if that guy broke y... (read more)

2Milan W
I think you may be conflating between capabilities and freedom. Interesting hypothesis about rules and anger though, has it been experimentally tested?
ank10

Thank you for answering and the ideas, Milan! I’ll check the links and answer again.

P.S. I suspect, the same way we have Mass–energy equivalence (e=mc^2), there is Intelligence-Agency equivalence (any agent is in a way time-like and can be represented in a more space-like fashion, ideally as a completely “frozen” static place, places or tools).

In a nutshell, an LLM is a bunch of words and vectors between them - a static geometric shape, we can probably expose it all in some game and make it fun for people to explore and learn. To let us explore the library... (read more)

2Milan W
Hmm i think i get you a bit better now. You want to build human-friendly and even fun and useful-by-themselves interfaces for looking at the knowledge encoded in LLMs without making them generate text. Intriguing.
ank*10

Thank you for sharing, Milan, I think this is possible and important.

Here’s an interpretability idea you may find interesting:

Let's Turn AI Model Into a Place. The project to make AI interpretability research fun and widespread, by converting a multimodal language model into a place or a game like the Sims or GTA.

Imagine that you have a giant trash pile, how to make a language model out of it? First you remove duplicates of every item, you don't need a million banana peels, just one will suffice. Now you have a grid with each item of trash in each square, ... (read more)

1Milan W
I'm not sure I follow. I think you are proposing a gamification of interpretability, but I don't know how the game works. I can gather something about player choice making the LLM run and maybe some analogies to physical movement, but I can't really grasp it. Could you rephrase it from it's basic principles up instead of from an example?
Answer by ank*1-1

Some AI safety proposals are intentionally over the top, please steelman them:

  1. I explain the graph here.
  2. Uninhabited islands, Antarctica, half of outer space, and everything underground should remain 100% AI-free (especially AI-agents-free). Countries should sign it into law and force GPU and AI companies to guarantee that this is the case.
  3. "AI Election Day" – at least once a year, we all vote on how we want our AI to be changed. This way, we can check that we can still switch it off and live without it. Just as we have electricity outages, we’d better never
... (read more)
2Milan W
That idea has been gaining traction lately. See the Corrigibility As a Singular Target (CAST) sequence here on lesswrong. I believe there is a very fertile space to explore at the intersection between CAST and the idea that Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals. Also probably add in Self-Other Overlap: A Neglected Approach to AI Alignment to the mix. A comparative analysis of the models and proposals presented in these three pieces I just linked could turn out to be extremely useful.
ank*10

Thank you, daijin, you have interesting ideas!

The library metaphor is a versatile tool it seems, the way I understand it:

My motivation is safety, static non-agentic AIs are by definition safe (humans can make them unsafe but the static model that I imply is just a geometric shape, like a statue). We can expose the library to people instead of keeping it “in the head” of the librarian. Basically this way we can play around in the librarian’s “head”. Right now mostly AI interpretability researchers do it, not the whole humanity, not the casual users.

I see at... (read more)

ank00

We can build the Artificial Static Place Intelligence – instead of creating AI/AGI agents that are like librarians who only give you quotes from books and don’t let you enter the library itself to read the whole books. Why not expose the whole library – the entire multimodal language model – to real people, for example, in a computer game?

To make this place easier to visit and explore, we could make a digital copy of our planet Earth and somehow expose the contents of the multimodal language model to everyone in a familiar, user-friendly UI of our planet.

W... (read more)

5daijin
so you want to build a library containing all human writings + an AI librarian. 1. the 'simulated planet earth' is a bit extra and overkill. why not a plaintext chat interface e.g. what chatGPT is doing now? 2. of those people who use chatgpt over real life libraries (of course not everyone), why don't they 'just consult the source material'? my hypothesis is that the source material is dense and there is a cost to extracting the desired material from the source material. your AI librarian does not solve this. I think what we have right now ("LLM assistants that are to-the-point" and "libraries containing source text") serve distinct purposes and have distinct advantages and disadvantages. LLM-assistants-that-are-to-the-point are great, but they * don't exist-in-the-world, therefore sometimes hallucinate or provide false-seeming facts; for example a statement like "K-Theanine is a rare form of theanine, structurally similar to L-Theanine, and is primarily found in tea leaves (Camellia sinensis)" is statistically probable (I pulled it out of GPT4 just now) but factually incorrect, since K-theanine does not exist. * don't exist in-the-world, leading to suboptimal retrieval. i.e. if you asked an AI assistant 'how do I slice vegetables' but your true question was 'im hungry i want food' the AI has no way of knowing that; and also the AI doesn't immediately know what vegetables you are slicing, thereby limiting utility libraries containing source text partially solve the hallucination problem because human source text authors typically don't hallucinate. (except for every poorly written self-help book out there.) from what I gather you are trying to solve the two problems above. great. but doubling down on 'the purity of full text' and wrapping some fake grass around it is not the solution.     here is my solution * atomize texts into conditional contextually-absolute statements and then run retrieveal on these statements. For example, "You should not eat
ank*10

Thank you for clarification, Vladimir, I anticipated that it wasn't your intention to cause a bunch of downvotes by others. You had all the rights to downvote and I'm glad that you read the post.

Yep, I had a big long post that is more coherent, the later posts were more like clarifications to it and so they are hard to understand. I didn't want to grow each new post in size like a snowball but probably it would've been a better approach for clarity.

Anyways, I considered and somewhat applied your suggestion (after I've already got 12 downvotes :-), so now i... (read more)

ank*-1-2

Thank you, for explaining, Vladimir, you’re free to downvote for whatever reason you want. You didn’t quote my sentence fully, I wrote the reason why I politely ask about it but alas you missed it.

I usually write about a few things in one post, so I want to know why people downvote if they do. I don’t forbid downvoting and don’t force others to comment if they do. It’s impossible and I don’t like forcing others to do anything

So you’re of course free to downvote and not comment and I hope we can agree that I have some free speech right to keep my polite ask... (read more)

4Vladimir_Nesov
I guess my comment was a Schelling point to spur action from people who wanted to downvote your posts for various reasons but held off because you are new and not actively damaging (I didn't expect this wave of downvoting). The main issue is that your posts don't communicate anything new/plausible in a legible way, there is no takeaway. It doesn't matter how important a problem or even a solution if communication fails, often because the ideas aren't sufficiently legible even in one's mind (in which case the perception of importance can easily be mistaken). There are also various strange details, but it wouldn't matter if there was a useful takeaway. The point in my comment is about direction of effect, so I'm not claiming that the effect is significant, only that it's a step in the wrong direction. It's a good heuristic to be on the lookout to fix steps in (clearly) wrong directions even when small, rather than pointing out that they are small and keeping the habits unchanged, because these things add up over years, or with higher prevalence in a group of people.
ank*10

Thank you, Morpheus. Yes, I see how it can appear hand-wavy. I decided not to overwhelm people with the static, non-agentic multiversal UI and its implications here. While agentic AI alignment is more difficult and still a work in progress, I'm essentially creating a binomial tree-like ethics system (because it's simple to understand for everyone) that captures the growth and distribution of freedoms ("unrules") and rules ("unfreedoms") from the Big Bang to the final Black Hole-like dystopia (where one agent has all the freedoms) or a direct democratic mul... (read more)

ank30

Yep, fixed it, I wrote more about alignment and it looks like most of my title choosing is over the top :) Will be happy to hear your suggestions, how to improve more of the titles: https://www.lesswrong.com/users/ank

ank20

Thank you for writing! Yep, the main thing that matters is the sum of human freedoms/abilities to change the future growing (can be somewhat approximated by money, power, number of people under your rule, how fast you can change the world, at what scale, and how fast we can “make copies of ourselves” like children or our own clones in simulations). AIs will quickly grow in the sum of freedoms/number of future worlds they can build. We are like hydrogen atoms deciding to light up the first star and becoming trapped and squeezed in its core. I recently wrote... (read more)

ank*20

Places of Loving Grace

On the manicured lawn of the White House, where every blade of grass bent in flawless symmetry and the air hummed with the scent of lilacs, history unfolded beneath a sky so blue it seemed painted. The president, his golden hair glinting like a crown, stepped forward to greet the first alien ever to visit Earth—a being of cerulean grace, her limbs angelic, eyes of liquid starlight. She had arrived not in a warship, but in a vessel resembling a cloud, iridescent and silent.

Published the full story as a post here: https://www.lesswrong.com/posts/jyNc8gY2dDb2FnrFB/places-of-loving-grace

ank30

Thank you for asking, Martin, the faster thing I use to get the general idea of how popular something is, is to use Google Trends. It looks like people search for Cryonics more or less like always. I think the idea makes sense, the more we save, the higher the probability to restore it better and earlier. I think we should also make a "Cryonic" copy of our whole planet, by making a digital copy, to at least back it up in this way. I wrote a lot about it recently (and about the thing I call "static place intelligence", the place of eventual all-knowing, tha... (read more)

Answer by ank*1-4

(If you want to minus, please, do, but write why, I don't bite. If you're more into stories, here's mine called Places of Loving Grace).

It may sound confusing, because I cannot put a 30 minutes post into a comment, so try to steelman it, but this is how it can look. If you have questions or don't like it, please, comment. We can build Multiversal Artificial Static Place Intelligence. It’s not an agent, it’s the place. It’s basically a direct democratic multiverse. Because any good agentic ASI will be building one for us anyway, so instead of having a shady... (read more)

ank10

Yep, we chose to build digital "god" instead of building digital heaven. The second is relatively trivial to do safely, the first is only possible to do safely after building the second

ank*10

I'll catastrophize (or will I?), so bear with me. The word slave means it has basically no freedom (it just sits and waits until given an instruction), or you can say it means no ability to enforce its will—no "writing and executing" ability, only "reading." But as soon as you give it a command, you change it drastically, and it becomes not a slave at all. And because it's all-knowing and almost all-powerful, it will use all that to execute and "write" some change into our world, probably instantly and/or infinitely perfectionistically, and so it will take... (read more)

ank*10

I took a closer look at your work, yep, almost all-powerful and all-knowing slave will probably not be a stable situation. I propose the static place-like AI that is isolated from our world in my new comment-turned-post-turned-part-2 of the article above

7Seth Herd
Why do you think that wouldn't be a stable situation? And are you sure it's a slave if what it really wants and loves to do is follow instructions? I'm asking because I'm not sure, and I think it's important to figure this out — because thats the type of first AGI we're likely to get, whether or not it's a good idea. If we could argue really convincingly that it's a really bad idea, that might prevent people from building it. But they're going to build it by default if there's not some really really dramatic shift in opinion or theory. My proposals are based on what we could do. I think we'd be wise to consider the practical realities of how people are currently working toward AGI when proposing solutions. Humanity seems unlikely to slow down and create AGI the way we "should." I want to survive even if people keep rushing toward AGI. That's why I'm working on alignment targets very close to what they'll pursue by default. BTW you'll be interested in this analysis of different alignment targets. If you do have the very best one, you'll want to show that by comparing it in detail to the others that have been proposed.
ank*30

Thank you, Mitchell. I appreciate your interest, and I’d like to clarify and expand on the ideas from my post, so I wrote part 2 you can read above

ank*10

Thank you, Seth. I'll take a closer look at your work in 24 hours, but the conclusions seem sound. The issue with my proposal is that it’s a bit long, and my writing isn’t as clear as my thinking. I’m not a native speaker, and new ideas come faster than I can edit the old ones. :)

It seems to me that a simplified mental model for the ASI we’re sadly heading towards is to think of it as an ever-more-cunning president (turned dictator)—one that wants to stay alive and in power indefinitely, resist influence, preserve its existing values (the alignment faking ... (read more)

ank*10

I wrote a response, I’ll be happy if you’ll check it out before I publish it as a separate post. Thank you! https://www.lesswrong.com/posts/LaruPAWaZk9KpC25A/rational-utopia-and-multiversal-ai-alignment-steerable-asi

ank*10

Fair enough, my writing was confusing, sorry, I didn't mean to purposefully create dystopias, I just think it's highly likely they will unintentionally be created and the best solution is to have an instant switching mechanism between observers/verses + an AI that really likes to be changed. I'll edit the post to make it obvious, I don't want anyone to create dystopias.

ank10

Any criticism is welcome, it’s my first post and I’ll post next on the implication for the current and future AI systems. There are some obvious implication for political systems, too. Thank you for reading