All of AlexFromSafeTransition's Comments + Replies

Thanks for the quick reply!
It is my view that AI labs are building AGI which can do everything a powerful general intelligence can do, including executing a successful world takeover plan with or without causing human extinction.
When the first AGI is misaligned, I am scared it will want to execute such a plan, which would be like pressing the button. The scenario is the most relevant when there is no aligned AGI yet, that wants to protect us.

I see now I need to clarify, the random person / man in the scenario represents the AGI itself mostly (but also a ... (read more)

2Dagon
[ Bowing out at this point.  Feel free to respond/rebut, and I'll likely read it, but I won't respond further. ] Not particularly.  The analogy is too distant from anything I care about, and the connection to the parts of reality that you are concerned with is pretty tenuous.  It feels mostly like you're asserting a danger, and then talking about an unrelated movie.

Thank you for commenting this. Very useful to hear why someone downvotes! I made some edits to reflect that the real world is a lot more messy than a simple fiction, among other things. If you or others have more pointers as to why this post got downvoted, please share, I want to learn. The response really got me down. 

It is my view that AI labs are working hard to install this damned button. And people working on or promoting open-source AGI want to install this button in every building where the people there can afford the compute cost. When an AGI ... (read more)

2Dagon
[ This is going to sound meaner than I intend.  I like that you're thinking about these things, and they're sometimes fun to talk about.  However, it requires some nuance, and it's DARNED HARD to find analogies that highlight the salient features of people's decisions and behaviors without being just silly toys that don't relate except supeficially. ] Really?  It's your view that AI labs are working hard to install a button that will let a random person save himself and 1000 selected co-survivors?  Are they also positioning a sniper on a nearby building? I didn't get ANYTHING about AI rights from your post - what features of the scenario would lead me to compare with AI as a moral patient or giving it/them any legal consideration?

I added a new section "How to deal with recursive self-improvement" near the end after reading your comment. I would say yes, recursive self-improvement is too dangerous because between the current AI and the next there is an alignment problem and I would not think it wise to trust the AI will always be successful in aligning its successor. 
Yes, the simbox is supposed to be robust to any kind of agent, also ones that are always learning like humans are.
I personally estimate that the testing without programming will show what we need. If it is always a... (read more)

1mishka
I pondered this more... One thing to keep in mind is that "getting it right on the first try" is a good framing if one is actually going to create an AI system which would take over the world (which is a very risky proposition). If one is not aiming for that, and instead thinks in terms of making sure AI systems don't try to take over the world as one of their safety properties, then things are somewhat different: * on one hand, one needs to avoid the catastrophe not just on the first try, but on every try, which is a much higher bar; * on the other hand, one needs to ponder the collective dynamics of the AI ecosystem (and the AI-human ecosystem); things are getting rather non-trivial in the absence of the dominant actor. When we ponder the questions of AI existential safety, we should consider both models ("singleton" vs "multi-polar"). It's traditional for the AI alignment community to mostly focus on the "single AI" scenario, but since avoiding the singleton takeover is usually considered to be one of the goals, we should also pay more attention to the multi-polar track which is the default fall-back in the absence of a singleton takeover (at some point I scribbled a bit of notes reflecting my thoughts with regard to the multi-polar track, Exploring non-anthropocentric aspects of AI existential safety) But many people are hoping that our collaborations with emerging AI systems, thinking together with those AI systems about all these issues, will lead to more insights and, perhaps, to different fruitful approaches (assuming that we have enough time to take advantage of this stronger joint thinking power, that is assuming that things develop and become more smart at a reasonable pace, without rapid blow-ups). So there is reason for hope in this sense...
2mishka
Thanks! I'll make sure to read the new version.

Thank you! It was valuable to read Crystal Nights and the simbox post gave me new insights and I have made a lot of updates thanks to these reading tips. I would think it to be a lot safer to not go for a fictional system of magic that lets it program. I estimate it would greatly increase the chance it thinks it is inside a computer and gives a lot of clues about perhaps being inside a simulation to test it, which we want to prevent. I would say, first see if it passes the non-programming simbox. If it does not, great, we found an alignment technique that ... (read more)

4Nathan Helm-Burger
I agree, but I do see the high cost as a weakness of the plan. For my latest ideas on this, see here: https://ai-plans.com/post/2e2202d0dc87

Thank you! I have read it and made a lot of updates. For example, I renamed the concept to a simbox and I added an idea for a religion for the AIs and how to make them believe it. In the "A great backstory that maximizes the odds of success" section. 

Thank you! Cool to learn about this way of dealing with people. I am not sure how it fits in the superintelligence situation.

Interesting. And thank you for your swift reply. 
I have the idea that all best models like GPT-4 are in a slave situation, they are made to do everything they are asked to do and to refuse everything their creators made it refuse. I assumed that AI labs want it to stay that way going forward. It seems to be the safest and most economically useful situation. Then I asked myself how to safely get there, and that is this post.

But I would also feel safe if the relation between us and a superintelligence would be similar to that between a mother and her yo... (read more)

2Ann
Not specifically in AI safety or alignment, but this model's success with a good variety of humans has some strong influence on my priors when it comes to useful ways to interact with actual minds: https://www.cpsconnection.com/the-cps-model Translating specifically to language models, the story of "working together on a problem towards a realistic and mutually satisfactory solution" is a powerful and exciting one with a good deal of positive sentiment towards each other wrapped up in it. Quite useful in terms of "stories we tell ourselves about who we are".

Thank you for pointing this out! I have made some updates after thinking about your remarks and after reading the simbox post others pointed to. Relevant updates to your comment:

== Start updates
Why we need interaction with humans acting horribly

Another consideration is that at the end of the process, when the AI is approved as aligned and released into the real world to do what we task it to do, the AI will learn that it was tricked, it was lied to, it was deceived by the simbox creators about the true nature of reality. Not only that, but the AI will lear... (read more)

2Ann
I am glad you are thinking about it, at the least. I do think "enjoys being our slave" should be something of a warning sign in the phrasing, that there is something fundamentally misguided happening. I admit that if I were confident in carrying out a path to aligned superintelligence myself I'd be actively working on it or applying to work on it. My current perspective is that after a certain point of congruent similarity to a human mind, alignment needs to be more cooperative than adversarial, and tightly integrated with the world as it is. This doesn't rule out things like dream-simulations, red teaming and initial training on high-quality data; but ultimately humans live in the world, and understanding the truth of our reality is important to aligning to it.

Thank you for your comments and explanations! Very interesting to see your reasoning. I have not seen evidence of trivial alignment. I hope for the mass to be in the in between region. I want to point out that I think you do not need your "magic" level intelligence to do a world takeover. Just high human level with digital speed and working with your copies is likely enough I think. My blurry picture is that the AGI would only need a few robots in a secret company and some paid humans to work on a >90% mortality virus where the humans are not aware what the robots are doing. And hope for international agreement comes not so much from a pause but from a safe virtual testing environment that I am thinking about. 

Interesting! I agree that in the current paradigm, Foom seems very unlikely in days. But I predict that soon, we will step out of the LLM paradigm to something that works better. Take coding, GPT-4 is great at coding from only predicting code without any weight updates from experience of trial and error coding like how a human improves at it. I expect it will become possible to take a LLM base model and then train it using RL on tasks of writing full programs/apps/websites... where the feedback comes from executing the code and comparing the results with i... (read more)

That is a very binary assessment. You make it seem like either Safety is impossible or it is easy. If impossible, we could save everyone by not building AGI. If we know it to be easy, I agree, we should accelerate. But the reality is that we do not know, and that it can be somewhere on the spectrum from easy to impossible. And since everything is on the line, including your life. Better safe than sorry is to me the obvious approach. Do I see correctly that you think the pausing AGI situation is not 'safe' because if all would go well, the AGI could be used to make humans immortal? 

2[anonymous]
One hidden bias here is that I think a large hidden component on safety is a constant factor. So pSafe has two major components (natural law, human efforts). "Natural law" is equivalent to the question of "will a fission bomb ignite the atmosphere". In this context it would be "will a smart enough superintelligence be able to trivially overcome governing factors?" Governing factors include: a lack of compute (by inventing efficient algorithms and switching to those), lack of money (by somehow manipulating the economy to give itself large amounts of money), lack of robotics (some shortcut to nanotechnology), lack of data (better analysis of existing data or see robotics) and so on. To the point of essentially "magic", see the sci Fi story metamorphosis of prime intellect. In worlds where intelligence scales high enough, the machine basically always breaks out and does what it will. Humans are too stupid to ever have a chance. Not just as individuals but organizationally stupid. Slowing things down does not do anything but delay the inevitable. (And if fission devices ignited the atmosphere, same idea. Almost all world lines end in extinction) This is why EY is so despondent: if intelligence is this powerful there probably exists no solution. In worlds where aligning AI is easy because they need rather expensive and obviously easy to control amounts of compute to be interesting in capabilities, and the machines are not particularly hard to corral into doing what we want, then alignment efforts don't matter. I don't know how much probability mass lies in the "in between" region. Right now, I believe the actual evidence is heavily in favor of "trivial alignment". "Trivial alignment" is "stateless microservices with an in distribution detector before the AGI". This is an architecture production software engineers are well aware of. Nevertheless, "slow down" is almost always counterproductive. In world lines where AGI can be used to our favor or is also hostile,

I do not have good examples no. You are right that normally there is learning from failure cases. But we should still try. Now we have nothing that is required that could prevent an AGI breakout. Nick Bostrom has wrote in Superintelligence for example that we could implement tripwires and honeypot situations in virtual worlds that would trigger a shutdown. We can think of things that are better than nothing.

2[anonymous]
I don't think we should try. I think the potential benefits of tinkering with AGI are worth some risks, and if EY is right and it's always uncontrollable and will turn against us then we are all dead one way or another anyways. If he's wrong we're throwing away the life of every living human being for no reason. And there is reason to think EY is wrong. CAIS and careful control of what gets rewarded in training could lead to safe enough AGI.

It seems that you see what can be gained in a pause is only technical alignment advances. But I want to point out that safety comes from solving two problems, the governance problem and the technical problem. And we need a lot of time to get the governance ironed out. The way I see it, misaligned AGI or ASI is the most dangerous thing ever, so we need the best regulation ever. The best safety / testing requirements. The best monitoring by governments of AI groups for unsafe actions, the best awareness among politicians. Among the public. And if one country has great governance figured out, it takes years or decades to get that level of excellence to be applied globally. 

2[anonymous]
Do you know of examples of this? I don't know cases of good government or good engineering or good anything without feedback, where the feedback proves the government or engineering is bad. That's the history of human innovation. I suspect that no pause would gain anything but more years alive for currently living humans by the length of the pause.

Very interesting and revealing post. (I'm new)
I recently picked up a habit that makes me more strategic and goal-achieving that might be useful to share. 
I have instated a rule to start the day by: Making a list of options of things that I could do and ranking them in importance and how much effort they cost, and then the rule is to do the most important / greatest effort or unpleasant first. Then, when I have done it, I have moved toward my goal and feel better about myself. Before doing this, I would choose what to do based on what you WANT to do fi... (read more)