All of Michael Soareverix's Comments + Replies

Is it possible to develop specialized (narrow) AI that surpasses every human at infecting/destroying GPU systems, but won't wipe us out? LLM-powered Stuxnet would be an example. Bacteria isn't smarter than humans, but it is still very dangerous. It seems like a digital counterpart could prevent GPUs and so, prevent AGI.

(Obviously, I'm not advocating for this in particular since it would mean the end of the internet and I like the internet. It seems likely, however, that there are pivotal acts possible by narrow AI that prevent AGI without actually being AGI.)

2interstice
No I don't think so because people could just airgap the GPUs.

Super interesting!

There's a lot of information here that will be super helpful for me to delve into. I've been bookmarking your links.

I think optimizing for the empowerment of other agents is a better target than giving the AI all the agency and hoping that it creates agency for people as a side-effect to maximizing something else. I'm glad to see there's lots of research happening on this and I'll be checking out 'empowerment' as an agency term.

Agency doesn't equal 'goodness', but it seems like an easier target to hit. I'm trying to break down the alignment problem into slices to figure it out and agency seems like a key slice.

1Tristan Tran
With optimization, I'm always concerned with the interactions of multiple agents, are there any ways in this system that two or more agents could form cartels and increase each others agency. I see this happen with some reinforcement learning models where if some edge cases aren't covered, then they will just mine each other for easy points thanks to how we set up the reward function.
2the gears to ascension
the problem is there are going to be self-agency-maximizing ais at some point and the question is how to make AIs that can defend the agency of humans against those.

Great post. This type of genuine comment (human-centered rather than logically abstract) seems like the best way to communicate the threat to non-technical people. I've tried talking about the problem to friends in social sciences and haven't found a good way to convey how serious I feel about it and how there is no current logical prevention of this problem.

One thing I notice is that it doesn’t link to a plan of action, or tell you how you should feel. It just describes the scenario. Perhaps that’s what’s needed - less of the complete argument, and more just breaking it down into digestible morsels.

Hey Akash, I sent you a message about my summer career plans and how I can bring AI Alignment into that. I'm a senior in college who has a few relevant skills and I'd really like to connect with some professionals in the field. I'd love to connect or learn from you!

Yeah, this makes sense. However, I can honestly see myself reverting my intelligence a bit at different junctures, the same way I like to replay video games at greater difficulty. The main reason I am scared of reverting my intelligence now is that I have no guarantee of security that something awful won't happen to me. With my current ability, I can be pretty confident that no one is going to really take advantage of me. If I were a child again, with no protection or less intelligence, I can easily imagine coming to harm because of my naivete.

I also think... (read more)

1Vakus Drake
I think the whole point of a guardian angel AI only really makes sense if it isn't an offshoot of the central AGI.  After all if you trusted the singleton enough to want a guardian angel AI, then you will want it to be as independent from the singleton as is allowed.  Whereas if you do trust the singleton AI (because say you grew up after the singularity) then I don't really see the point of a guardian angel AI. >I think there would be levels, and most people would want to stay at a pretty normal level and would move to more extreme levels slowly before deciding on some place to stay. I also disagree with this insofar as as I don't think that people "deciding on some place to stay" is a stable state of affairs under an aligned superintelligence. Since I don't think people will want to be loop immortals if they know they are heading towards that. Similarly I don't even know if I would consider an AGI aligned if it didn't try ensure people understood the danger of becoming a loop immortal and try to nudge people away from it.  Though I really want to see some surveys of normal people to confirm my suspicions that most people find the idea of being an infinitely repeating loop immortal existentially horrifying. 

I've combined it with image generation to bring someone back from the dead and it just leaves me shaken how realistic it is. I can be surprised. It genuinely feels like a version of them

1Algon
Whoa, what? Could you elaborate if it is not painful? For context, I'm interested in life-logging as life extension, as well as a way to create some simulacra of loved ones. I anticipated I'd need a lot of data for a satisfactory solution, maybe an unrealistic amount, and a fair bit for an emotionally convincing dialogue.

Thanks! I think I can address a few of your points with my thoughts.

(Also, I don't know how to format a quote so I'll just use quotation marks)

"It seems inefficient for this person to be disconnected from the rest of humanity and especially from "god". In fact, the AI seems like it's too small of an influence on the viewpoint character's life."

The character has chosen to partially disconnect themselves from the AI superintelligence because they want to have a sense of agency, which the AI respects. It's definitely inefficient, but that is kind of the point... (read more)

This post is identical to how I started thinking about life a few years ago. Every goal can be broken into subgoals.

I actually made a very simple web app a few years ago to do this: https://dynamic-goal-tree-soareverix--soareverix.repl.co/

It's not super aesthetic, but it has the same concept of infinitely expanding goals.

Amazing post, by the way. The end gave me chills and really puts it all into perspective.

I'm not sure exactly what you mean. If we get an output that says "I am going to tell you that I am going to pick up the green crystals, but I'm really going to pick up the yellow crystals", then that's a pretty good scenario, since we still know its end behavior.

I think what you mean is the scenario where the agent tells us the truth the entire time it is in simulation but then lies in the real world. That is definitely a bad scenario. And this model doesn't prevent that from happening. 

There are ideas that do (deception takes additional compute vs h... (read more)

Answer by Michael Soareverix34

I added a comment to ChristianKI's excellent post elaborating on what he said. By the way, you should keep the post up! It's a useful resource for people interested in climate change.

Additionally, if you do believe that AI is an emergency, you should ask her out. You never know, these could be the last years you get so I'd go for it!

Aw, ChristianKI, you got all the points I was going for, even the solar shades idea! I guess I'll try to add some detail instead.

To Ic (the post author): Solar shades are basically just a huge tinfoil sheet that you stretch out once you reach space. The edges require some stability so gravity doesn't warp the tinfoil in on itself, and it has to be in a high orbit so there's no interference, but you basically just send up a huge roll of tinfoil and extend it to manually block sunlight. If things get really bad, we can manually cool down the planet with this... (read more)

2ChristianKl
Pollution is an issue but it's not climate change. Stretching the phrase climate change to mean everything is not good to clearly talk about the issues involved in trading of different environmental issues. 

I'm someone new to the field, and I have a few ideas on it, namely penalizing a model for accessing more compute than it starts with (every scary AI story seems to start with the AI escaping containment and adding more compute to itself, causing an uncontrolled intelligence explosion). I'd like feedback on the ideas, but I have no idea where to post them or how to meaningfully contribute.

I live in America, so I don't think I'll be able to join the company you have in France, but I'd really like to hear where there are more opportunities to learn, discuss, formalize, and test out alignment ideas. As a company focused on this subject, is there a good place for beginners?

5adamShimi
Thanks for your comment! Probably the best place to get feedback as a beginner is AI Safety Support. They can also redirect you towards relevant programs, and they have a nice alignment slack. As for your idea, I can give you quick feedback on my issues with this whole class of solutions. I'm not saying you haven't thought about these issues, nor that no solution in this class is possible at all, just giving the things I would be wary of here: * How do you limit the compute if the AI is way smarter than you are? * Assuming that you can limit the compute, how much compute do you give it? Too little and it's not competitive, leading many people to prefer alternatives without this limit; too much and you're destroying the potential guarantees. * Even if there's a correct and safe amount of compute to give for each task, how do you compute that amount? How much time and resources does it cost?

Another point by Stuart Russel: Objective uncertainty. I'll add this into the list when I've got more time.

What stops a superintelligence from instantly wireheading itself?

A paperclip maximizer, for instance, might not need to turn the universe into paperclips if it can simply access its reward float and set it to the maximum. This is assuming that it has the intelligence and means to modify itself, and it probably still poses an existential risk because it would eliminate all humans to avoid being turned off.

The terrifying thing I imagine about this possibility is that it also answers the Fermi Paradox. A paperclip maximizer seems like it would be obvious in the universe, but an AI sitting quietly on a dead planet with its reward integer set to the max is far more quiet and terrifying.

3Kristin Lindquist
Not an answer but a related question: is habituation perhaps a fundamental dynamic in an intelligent mind? Or did the various mediators of human mind habituation (e.g. downregulation of dopamine receptors) arise from evolutionary pressures?
8Matt Goldenberg
Whether or not an AI would want to wirehead would depend entirely on it's ontology. Maximizing paperclips, maximizing the reward from paperclips, and maximizing the integer that tracks paperclips are 3 very different concepts, and depending on how the AI sees itself all 3 are plausible goals the AI could have, depending on it's ontology. There's no reason to suspect that one of those ontologies is more likely that I can see. Even if the idea does have an ontology that maximizes the integer tracking paperclips, one then has to ask how time is factored into the equation. Is it better to be in the state of maximum reward for a longer period of time? Then the AI will want to ensure everything that could prevent it being in that is gone. Finally, one has to consider how the integer itself works. Is it unbounded? If it is, then to maximize the reward the AI must use all matter and energy possible to store the largest possible version of that integer in memory.
2Gurkenglas
Suppose it's superintelligent in the sense that it's good at answering hypothetical questions of form "How highly will world w score on metric m?". Then you set w to its world, m to how many paperclips w has, and output actions that, when added to w, increase its answers.

Interesting! I appreciate the details here; it gives me a better sense of why narrow ASI is probably not something that can exist. Is there a place we could talk over audio about AGI alignment versus text here on LessWrong? I'd like to get a better idea of the field, especially as I move into work like creating an AI Alignment Sandbox.

My Discord is Soareverix#7614 and my email is maarocket@gmail.com. I'd really appreciate the chance to talk with you over audio before I begin working on sharing alignment info and coming up with my own methods for solving the problem.

Good points. Your point about value alignment being better to solve then just trying to orchestrate a pivotal act is true, but if we don't have alignment solved by the time AGI rolls around, then from a pure survival perspective, it might be better to try a narrow ASI pivotal act instead of hoping that AGI turns out to be aligned already. This solution above doesn't solve alignment in the traditional sense, it just pushes the AGI timeline back hopefully enough to solving alignment.

The idea I have specifically is that you have something like GPT-3 (unintell... (read more)

5Quintin Pope
In all three cases, the AI you’re asking for is a superintelligent AGI. Each has to navigate a broad array of physically instantiated problems requiring coherent, goal oriented optimisation. No stateless, unembedded and temporally incoherent system like GPT-3 is going to be able to create nanotechnology, beat all human computer security experts, or convince everyone of your position. Values arise to guide the actions that intelligence systems perform. Evolution did not arrange for us to form values because it liked human values. It did so because forming values is an effective strategy for getting more performance out of an agentic system, and SGD can figure this fact out just as easily as evolution. If you optimise a system to be coherent and take actions in the real world, it will end up with values oriented around doing so effectively. Nature abhors a vacuum. If you don’t populate your superintelligent AGI with human-compatible values, some other values will arise and consume the free energy you’ve left around.
Answer by Michael Soareverix20

I am new to the AI Alignment field, but at first glance, this seems promising! You can probably hard-code it not to have the ability to turn itself off, if that turns out to be a problem in practice. We'd want to test this in some sort of basic simulation first. The problem would definitely be self-modification and I can imagine the system convincing a human to turn it off in some strange, manipulative, and potentially dangerous way. For instance, the model could begin attacking humans, instantly causing a human to run to shut it down, so the model would l... (read more)

Very cool! So this idea has been thought of, and it doesn't seem totally unreasonable, though it definitely isn't a perfect solution. A neat idea is a sort of 'laziness' score so that it doesn't take too many high-impact options.

It would be interesting to try to build an AI alignment testing ground, where you have a little simulated civilization and try to use AI to align properly with it, given certain commands. I might try to create it in Unity to test some of these ideas out in the (less abstract than text and slightly more real) world.

One solution I can see for AGI is to build in some low-level discriminator that prevents the agent from collecting massive reward. If the agent is expecting to get near-infinite reward in the near future by wiping out humanity using nanotech, then we can set a solution so it decides to do something that will earn it a more finite amount of reward (like obeying our commands).

This has a parallel with drugs here on Earth. Most people are a little afraid of that type of high.

This probably isn't an effective solution, but I'd love to hear why so I can keep refining my ideas.

5Rob Bensinger
A discussion of related ideas on Arbital: mild optimization.

Appreciate it! Checking this out now

I view AGI in an unusual way. I really don't think it will be conscious or think in very unusual ways outside of its parameters. I think it will be much more of a tool, a problem-solving machine that can spit out a solution to any problem. To be honest, I imagine that one person or small organization will develop AGI and almost instantly ascend into (relative) godhood. They will develop an AI that can take over the internet, do so, and then calmly organize things as they see fit.

GPT-3, DALLE-E 2, Google Translate... these are all very much human-operated t... (read more)

Vaniver115

You might be interested in the gwern essay Why Tool AIs Want to Be Agent AIs.