We're very likely to give them long term goals. And as I explain here: https://www.lesswrong.com/posts/x8bK7ohAHzMMchsaC/what-is-autonomy-and-how-does-it-lead-to-greater-risk-from lots of things people are likely to request seem near certain to lead to complex and autonomous systems.
(1) have goals, (2) which will be long term, (
In many cases, you also need incorrigibility, and stability under improvement.
I'm not sure I understand; are you saying that given these, we have high P(Doom), or that these are necessary to be safe even if GIs have only short term goals? Or something else entirely?
The key is that if AGI's are smarter than humans those organizations run AGIs that have long-term goals will outperform organizations that mix humans with long-term goals along with AGIs that are only capable to pursue short-term goals.
One long-term goal that many AGIs are going to have is to create training data to make the AGI more effective. Training data is good when it makes the AGI more performant over a long timeframe.
AGIs that run that way are going to outperform AGIs where humans oversee all the training data.
So long term goals aren't a default; market pressure will put them there as humans slowly cede more and more control to AIs, simply because the latter are making decisions that work out better. Presumably this would start with lower level decisions (e.g. how exactly to write this line of code; which employee to reward based on performance) and then slowly be given higher level decisions to make. In particular, we don't die the first time someone creates an AI with the ability to (escape, self improve and then) kill the competing humans, because that AI is ...
The key is that if AGI’s are smarter than humans those organizations run AGIs that have long-term goals will outperform organizations that mix humans with long-term goals along with AGIs that are only capable to pursue short-term goals.
If the LT goal of the AI is perfectly aligned with the goals of the organisation, yes -- smarter isn't enough, it needs to be infallible. If it's fallible, the organisation needs to be able to tweak the goals as it goes along. Remember, smarter means it's better at executing it's goal, not at understanding it.
As to the definition of short term goal: any goal that is can be achieved (fully, e.g. without a "and keep it that way" clause) in a finite short time (for instance, in a few seconds), with the resources the system already has at hand. Equivalently, I think: any goal that doesn't push instrumental power seeking. As to how we know a system has a short term goal: if we could argue that systems prefer short term goals by default, then we still wouldn't know as to the goals of a particular system but we could hazard a guess that the goals are short term. Perha...
The problem is the way we train AIs. We ALWAYS minimize error and optimize towards a limit. If I train an AI to take a bite out of an apple, what I am really doing is showing it thousands of example situations and rewarding it for acting in those situations where it improves the probability that it eats the apple.
Now let's say it goes super intelligent. It doesn't just eat one apple and say "cool, I am done - time to shut down." No, we taught it to optimize the situation as to improve the probability that it eats an apple. For lack of better words, it feels "pleasure" in optimizing situations towards taking a bite out of an apple.
Once the probability of eating an apple reaches 100%, it will eventually drop as the apple is eaten, then the AI will once again start optimizing towards eating another apple.
It will try to set up situations where it eats apples for all eternity. (Assuming superintelligence does not result in some type of goal enlightenment.)
Ok, ok, you say. Well, we will just hard program it to turn off once it reaches a certain probability of meeting its goal. Good idea. Once it reaches 99.9% probability of taking a bite out of an apple. We automatically turn it off. That will probably work for an apple eating AI.
But what if our goal is more complicated? (Like fix climate change). Well, the AI may reach superintelligence before finishing the goal and decide it doesn't want to be shut down. Good luck stopping it.
Isn't it just as scary (or more so) if they're more like humans, with short-term goals and incoherent long-term ideals? Instrumental convergence is part of the argument of doom EVEN IF they have goals, they will take short-term actions to increase their power, at our expense. If they DON'T have goals, it seems likely they'll STILL take actions which harm us.
if we make General Intelligences with short term goals perhaps we don't need to fear AI apocalypse
One of the hypothetical problems with opaque superintelligence is that it may combine unexpected interpretations of concepts, with extraordinary power to act upon the world, with the result that even a simple short-term request results in something dramatic and unwanted.
Suppose you say to such an AI, "What is 1+1?" You think its task is to display on the screen, decimal digits representing the number that is the answer to that question. But what does it think its task is? Suppose it decides that its task is to absolutely ensure that you know the right answer to that question. You might end up in the Matrix, perpetually reliving the first moment that you learned about addition.
So we not only need to worry about AI appropriating all resources for the sake of long-term goals. We also need to anticipate and prevent all the ways it might destructively overthink even a short-term goal.
I am using Wikipedia's definition: "Ensuring that emergent goals match the specified goals for the system is known as inner alignment."
Inner alignment is definitely a problem. In the case you described, the emergent goal was long term (ensure I remember the answer to 1+1), and I remain wondering whether by default short term specified goals do or do not lead to strange long term goals like in your example.
Short and long term goals have different implications regarding instrumental convergence . If I have the goal of immediately taking a bite of an apple that is in my hand right now, I don't need to gather resources or consider strategies, I can just do it. On the other hand, imagine I have an apple in my hand and I want to take a bite of it in a trillion years. I need to (define 'me', 'apple', and 'bite'; and) secure maximum resources, to allow the apple and I to survive that long in the face of nature, competitors and entropy. Thus, I instrumentally converge to throwing everything at universal takeover - except the basic necessities crucial to my goal.
Some of the cruxes that high P(Doom) rests on are that (sufficiently) General Intelligences will (1) have goals, (2) which will be long term, (3) and thus will instrumentally converge to wanting resources, (4) which are easiest to get with humans (and other AIs they might build) out of the way, (5) so when they can get away with it they'll do away with humans.
So if we make General Intelligences with short term goals perhaps we don't need to fear AI apocalypse.
Assuming the first crux, why the second? That is, assuming GIs will have goals, what are the best reasons to think that such intelligences will by default have long term goals (as opposed to short term goals like "quickly give a good answer to the question I was just asked")?