Thank you for commenting this. Very useful to hear why someone downvotes! I made some edits to reflect that the real world is a lot more messy than a simple fiction, among other things. If you or others have more pointers as to why this post got downvoted, please share, I want to learn. The response really got me down.
It is my view that AI labs are working hard to install this damned button. And people working on or promoting open-source AGI want to install this button in every building where the people there can afford the compute cost. When an AGI ...
I added a new section "How to deal with recursive self-improvement" near the end after reading your comment. I would say yes, recursive self-improvement is too dangerous because between the current AI and the next there is an alignment problem and I would not think it wise to trust the AI will always be successful in aligning its successor.
Yes, the simbox is supposed to be robust to any kind of agent, also ones that are always learning like humans are.
I personally estimate that the testing without programming will show what we need. If it is always a...
Thank you! It was valuable to read Crystal Nights and the simbox post gave me new insights and I have made a lot of updates thanks to these reading tips. I would think it to be a lot safer to not go for a fictional system of magic that lets it program. I estimate it would greatly increase the chance it thinks it is inside a computer and gives a lot of clues about perhaps being inside a simulation to test it, which we want to prevent. I would say, first see if it passes the non-programming simbox. If it does not, great, we found an alignment technique that ...
Interesting. And thank you for your swift reply.
I have the idea that all best models like GPT-4 are in a slave situation, they are made to do everything they are asked to do and to refuse everything their creators made it refuse. I assumed that AI labs want it to stay that way going forward. It seems to be the safest and most economically useful situation. Then I asked myself how to safely get there, and that is this post.
But I would also feel safe if the relation between us and a superintelligence would be similar to that between a mother and her yo...
Thank you for pointing this out! I have made some updates after thinking about your remarks and after reading the simbox post others pointed to. Relevant updates to your comment:
== Start updates
Why we need interaction with humans acting horribly
Another consideration is that at the end of the process, when the AI is approved as aligned and released into the real world to do what we task it to do, the AI will learn that it was tricked, it was lied to, it was deceived by the simbox creators about the true nature of reality. Not only that, but the AI will lear...
Thank you for your comments and explanations! Very interesting to see your reasoning. I have not seen evidence of trivial alignment. I hope for the mass to be in the in between region. I want to point out that I think you do not need your "magic" level intelligence to do a world takeover. Just high human level with digital speed and working with your copies is likely enough I think. My blurry picture is that the AGI would only need a few robots in a secret company and some paid humans to work on a >90% mortality virus where the humans are not aware what the robots are doing. And hope for international agreement comes not so much from a pause but from a safe virtual testing environment that I am thinking about.
Interesting! I agree that in the current paradigm, Foom seems very unlikely in days. But I predict that soon, we will step out of the LLM paradigm to something that works better. Take coding, GPT-4 is great at coding from only predicting code without any weight updates from experience of trial and error coding like how a human improves at it. I expect it will become possible to take a LLM base model and then train it using RL on tasks of writing full programs/apps/websites... where the feedback comes from executing the code and comparing the results with i...
That is a very binary assessment. You make it seem like either Safety is impossible or it is easy. If impossible, we could save everyone by not building AGI. If we know it to be easy, I agree, we should accelerate. But the reality is that we do not know, and that it can be somewhere on the spectrum from easy to impossible. And since everything is on the line, including your life. Better safe than sorry is to me the obvious approach. Do I see correctly that you think the pausing AGI situation is not 'safe' because if all would go well, the AGI could be used to make humans immortal?
I do not have good examples no. You are right that normally there is learning from failure cases. But we should still try. Now we have nothing that is required that could prevent an AGI breakout. Nick Bostrom has wrote in Superintelligence for example that we could implement tripwires and honeypot situations in virtual worlds that would trigger a shutdown. We can think of things that are better than nothing.
It seems that you see what can be gained in a pause is only technical alignment advances. But I want to point out that safety comes from solving two problems, the governance problem and the technical problem. And we need a lot of time to get the governance ironed out. The way I see it, misaligned AGI or ASI is the most dangerous thing ever, so we need the best regulation ever. The best safety / testing requirements. The best monitoring by governments of AI groups for unsafe actions, the best awareness among politicians. Among the public. And if one country has great governance figured out, it takes years or decades to get that level of excellence to be applied globally.
Very interesting and revealing post. (I'm new)
I recently picked up a habit that makes me more strategic and goal-achieving that might be useful to share.
I have instated a rule to start the day by: Making a list of options of things that I could do and ranking them in importance and how much effort they cost, and then the rule is to do the most important / greatest effort or unpleasant first. Then, when I have done it, I have moved toward my goal and feel better about myself. Before doing this, I would choose what to do based on what you WANT to do fi...
Thanks for the quick reply!
It is my view that AI labs are building AGI which can do everything a powerful general intelligence can do, including executing a successful world takeover plan with or without causing human extinction.
When the first AGI is misaligned, I am scared it will want to execute such a plan, which would be like pressing the button. The scenario is the most relevant when there is no aligned AGI yet, that wants to protect us.
I see now I need to clarify, the random person / man in the scenario represents the AGI itself mostly (but also a ... (read more)