Super interesting!
There's a lot of information here that will be super helpful for me to delve into. I've been bookmarking your links.
I think optimizing for the empowerment of other agents is a better target than giving the AI all the agency and hoping that it creates agency for people as a side-effect to maximizing something else. I'm glad to see there's lots of research happening on this and I'll be checking out 'empowerment' as an agency term.
Agency doesn't equal 'goodness', but it seems like an easier target to hit. I'm trying to break down the alignment problem into slices to figure it out and agency seems like a key slice.
Great post. This type of genuine comment (human-centered rather than logically abstract) seems like the best way to communicate the threat to non-technical people. I've tried talking about the problem to friends in social sciences and haven't found a good way to convey how serious I feel about it and how there is no current logical prevention of this problem.
One thing I notice is that it doesn’t link to a plan of action, or tell you how you should feel. It just describes the scenario. Perhaps that’s what’s needed - less of the complete argument, and more just breaking it down into digestible morsels.
Hey Akash, I sent you a message about my summer career plans and how I can bring AI Alignment into that. I'm a senior in college who has a few relevant skills and I'd really like to connect with some professionals in the field. I'd love to connect or learn from you!
Yeah, this makes sense. However, I can honestly see myself reverting my intelligence a bit at different junctures, the same way I like to replay video games at greater difficulty. The main reason I am scared of reverting my intelligence now is that I have no guarantee of security that something awful won't happen to me. With my current ability, I can be pretty confident that no one is going to really take advantage of me. If I were a child again, with no protection or less intelligence, I can easily imagine coming to harm because of my naivete.
I also think...
I've combined it with image generation to bring someone back from the dead and it just leaves me shaken how realistic it is. I can be surprised. It genuinely feels like a version of them
Thanks! I think I can address a few of your points with my thoughts.
(Also, I don't know how to format a quote so I'll just use quotation marks)
"It seems inefficient for this person to be disconnected from the rest of humanity and especially from "god". In fact, the AI seems like it's too small of an influence on the viewpoint character's life."
The character has chosen to partially disconnect themselves from the AI superintelligence because they want to have a sense of agency, which the AI respects. It's definitely inefficient, but that is kind of the point...
This post is identical to how I started thinking about life a few years ago. Every goal can be broken into subgoals.
I actually made a very simple web app a few years ago to do this: https://dynamic-goal-tree-soareverix--soareverix.repl.co/
It's not super aesthetic, but it has the same concept of infinitely expanding goals.
Amazing post, by the way. The end gave me chills and really puts it all into perspective.
I'm not sure exactly what you mean. If we get an output that says "I am going to tell you that I am going to pick up the green crystals, but I'm really going to pick up the yellow crystals", then that's a pretty good scenario, since we still know its end behavior.
I think what you mean is the scenario where the agent tells us the truth the entire time it is in simulation but then lies in the real world. That is definitely a bad scenario. And this model doesn't prevent that from happening.
There are ideas that do (deception takes additional compute vs h...
I added a comment to ChristianKI's excellent post elaborating on what he said. By the way, you should keep the post up! It's a useful resource for people interested in climate change.
Additionally, if you do believe that AI is an emergency, you should ask her out. You never know, these could be the last years you get so I'd go for it!
Aw, ChristianKI, you got all the points I was going for, even the solar shades idea! I guess I'll try to add some detail instead.
To Ic (the post author): Solar shades are basically just a huge tinfoil sheet that you stretch out once you reach space. The edges require some stability so gravity doesn't warp the tinfoil in on itself, and it has to be in a high orbit so there's no interference, but you basically just send up a huge roll of tinfoil and extend it to manually block sunlight. If things get really bad, we can manually cool down the planet with this...
I'm someone new to the field, and I have a few ideas on it, namely penalizing a model for accessing more compute than it starts with (every scary AI story seems to start with the AI escaping containment and adding more compute to itself, causing an uncontrolled intelligence explosion). I'd like feedback on the ideas, but I have no idea where to post them or how to meaningfully contribute.
I live in America, so I don't think I'll be able to join the company you have in France, but I'd really like to hear where there are more opportunities to learn, discuss, formalize, and test out alignment ideas. As a company focused on this subject, is there a good place for beginners?
Another point by Stuart Russel: Objective uncertainty. I'll add this into the list when I've got more time.
What stops a superintelligence from instantly wireheading itself?
A paperclip maximizer, for instance, might not need to turn the universe into paperclips if it can simply access its reward float and set it to the maximum. This is assuming that it has the intelligence and means to modify itself, and it probably still poses an existential risk because it would eliminate all humans to avoid being turned off.
The terrifying thing I imagine about this possibility is that it also answers the Fermi Paradox. A paperclip maximizer seems like it would be obvious in the universe, but an AI sitting quietly on a dead planet with its reward integer set to the max is far more quiet and terrifying.
Interesting! I appreciate the details here; it gives me a better sense of why narrow ASI is probably not something that can exist. Is there a place we could talk over audio about AGI alignment versus text here on LessWrong? I'd like to get a better idea of the field, especially as I move into work like creating an AI Alignment Sandbox.
My Discord is Soareverix#7614 and my email is maarocket@gmail.com. I'd really appreciate the chance to talk with you over audio before I begin working on sharing alignment info and coming up with my own methods for solving the problem.
Good points. Your point about value alignment being better to solve then just trying to orchestrate a pivotal act is true, but if we don't have alignment solved by the time AGI rolls around, then from a pure survival perspective, it might be better to try a narrow ASI pivotal act instead of hoping that AGI turns out to be aligned already. This solution above doesn't solve alignment in the traditional sense, it just pushes the AGI timeline back hopefully enough to solving alignment.
The idea I have specifically is that you have something like GPT-3 (unintell...
I am new to the AI Alignment field, but at first glance, this seems promising! You can probably hard-code it not to have the ability to turn itself off, if that turns out to be a problem in practice. We'd want to test this in some sort of basic simulation first. The problem would definitely be self-modification and I can imagine the system convincing a human to turn it off in some strange, manipulative, and potentially dangerous way. For instance, the model could begin attacking humans, instantly causing a human to run to shut it down, so the model would l...
Very cool! So this idea has been thought of, and it doesn't seem totally unreasonable, though it definitely isn't a perfect solution. A neat idea is a sort of 'laziness' score so that it doesn't take too many high-impact options.
It would be interesting to try to build an AI alignment testing ground, where you have a little simulated civilization and try to use AI to align properly with it, given certain commands. I might try to create it in Unity to test some of these ideas out in the (less abstract than text and slightly more real) world.
One solution I can see for AGI is to build in some low-level discriminator that prevents the agent from collecting massive reward. If the agent is expecting to get near-infinite reward in the near future by wiping out humanity using nanotech, then we can set a solution so it decides to do something that will earn it a more finite amount of reward (like obeying our commands).
This has a parallel with drugs here on Earth. Most people are a little afraid of that type of high.
This probably isn't an effective solution, but I'd love to hear why so I can keep refining my ideas.
Appreciate it! Checking this out now
I view AGI in an unusual way. I really don't think it will be conscious or think in very unusual ways outside of its parameters. I think it will be much more of a tool, a problem-solving machine that can spit out a solution to any problem. To be honest, I imagine that one person or small organization will develop AGI and almost instantly ascend into (relative) godhood. They will develop an AI that can take over the internet, do so, and then calmly organize things as they see fit.
GPT-3, DALLE-E 2, Google Translate... these are all very much human-operated t...
Is it possible to develop specialized (narrow) AI that surpasses every human at infecting/destroying GPU systems, but won't wipe us out? LLM-powered Stuxnet would be an example. Bacteria isn't smarter than humans, but it is still very dangerous. It seems like a digital counterpart could prevent GPUs and so, prevent AGI.
(Obviously, I'm not advocating for this in particular since it would mean the end of the internet and I like the internet. It seems likely, however, that there are pivotal acts possible by narrow AI that prevent AGI without actually being AGI.)