Simulation_Brain comments on Should I believe what the SIAI claims? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (600)
I think there are very good questions in here. Let me try to simplify the logic:
First, the sociological logic: if this is so obviously serious, why is no one else proclaiming it? I think the simple answer is that a) most people haven't considered it deeply and b) someone has to be first in making a fuss. Kurzweil, Stross, and Vinge (to name a few that have thought about it at least a little) seem to acknowledge a real possibility of AI disaster (they don't make probability estimates).
Now to the logical argument itself:
a) We are probably at risk from the development of strong AI. b) The SIAI can probably do something about that.
The other points in the OP are not terribly relevant; Eliezer could be wrong about a great many things, but right about these.
This is not a castle in the sky.
Now to argue for each: There's no good reason to think AGI will NOT happen within the next century. Our brains produce AGI; why not artificial systems? Artificial systems didn't produce anything a century ago; even without a strong exponential, they're clearly getting somewhere.
There are lots of arguments for why AGI WILL happen soon; see Kurzweil among others. I personally give it 20-40 years, even allowing for our remarkable cognitive weaknesses.
Next, will it be dangerous? a) Something much smarter than us will do whatever it wants, and very thoroughly. (this doesn't require godlike AI, just smarter than us. Self-improving helps, too.) b) The vast majority of possible "wants" done thoroughly will destroy us. (Any goal taken to extremes will use all available matter in accomplishing it.) Therefore, it will be dangerous if not VERY carefully designed. Humans are notably greedy and bad planners individually, and often worse in groups.
Finally, it seems that SIAI might be able to do something about it. If not, they'll at least help raise awareness of the issue. And as someone pointed out, achieving FAI would have a nice side effect of preventing most other existential disasters.
While there is a chain of logic, each of the steps seems likely, so multiplying probabilities gives a significant estimate of disaster, justifying some resource expenditure to prevent it (especially if you want to be nice). (Although spending ALL your money or time on it probably isn't rational, since effort and money generally have sublinear payoffs toward happiness).
Hopefully this lays out the logic; now, which of the above do you NOT think is likely?
I've heard a lot of variations on this theme. They all seem to assume that the AI will be a maximizer rather than a satisficer. I agree the AI could be a maximizer, but don't see that it must be. How much does this risk go away if we give the AI small ambitions?
Even small ambitions are risky. If I ask a potential superintelligence to do something easy but an obstacle gets in the way it will most likely obliterate that obstacle and do the 'simple thing'. Unless you are very careful that 'obstacle' could wind up being yourself or, if you are unlucky, your species. Maybe it just can't risk one of you pressing the off switch!
Good point. The resources expended towards a "small" goal aren't directly bounded by the size of the goal. As you said, an obstacle can make the resources used go arbitrarily high. An alternative constraint would be on what the AI is allowed to use up in achieving the goal - "No more that 10 kilograms of matter, nor more than 10 megajoules of energy, nor any human lives, nor anything with a market value of more that $1000". This will have problems of its own, when the AI thinks up something to use up that we never anticipated (We have something of a similar problem with corporations - but at least they operate on human timescales).
Part of the safety of existing optimizers is that they can only use resources or perform actions that we've explicitly let them try using. An electronic CAD program may tweak transistor widths, but it isn't going to get creative and start trying to satisfy its goals by hacking into the controls of the manufacturing line and changing their settings. An AI with the option to send arbitrary messages to arbitrary places is quite another animal...
The idea is to prevent a "runaway" disaster.
Relatively standard and conventional engineering safety methodologies would be used for other kinds of problems.
My observation is that small ambitions can become 'runaway disasters' unless a lot of the problems of FAI are solved.
That sounds as 'safe' as giving Harry Potter rules to follow.
I understand that this is an area in which we fundamentally disagree. I have previously disagreed about the wisdom of using human legal systems to control AI behaviour and I assume that our disagreement will be similar on this subject.
"Small ambitions" are a proposed solution. Get the machine to want something - and then stop when it's desires are satisfied - or at a specified date, whichever comes first.
The solution has some complications - but it does look as though it is a pretty obvious safety measure - one that suitably paranoid individuals are likely to have near the top of their lists.
It doesn't make a runaway disaster impossible. The agent could still set up minions, "forget" to switch them off - and then they run amok. The point is to make a runaway disaster much less likely. The safety level is pretty configurable - if the machine's desires are sufficiently constrained. I went into a lot of these issues on:
http://alife.co.uk/essays/stopping_superintelligence/
See also the previous discussion of the issue on this site.
Shane Legg has also gone into methods of restraining a machine "from within" - so to speak. Logically, you could limit space, time or matterial resources in this way - if you have control over an agent's utility function.
This is very dangerous thinking. There are many potential holes not covered in your essay. The problem with all these holes is that even the smallest one can potentially lead to the end of the universe. As Eliezer often mentions: the AI has to be mathematically rigorously proven to be friendly; there can't be any room for guessing or hoping.
As an example, consider that to the AI moving to quiescent state will be akin to dying. (Consider somebody wanting to make you not want anything or force you to want something that you normally don't.) I hope you don't come reply with a "but we can do X", because that would be another patch, and that's exactly what we want to avoid. There is no getting around creating a solid proven mathematical definition of friendly.
The end of the universe - OMG!
It seems reasonable to expect that agents will welcome their end if their time has come.
The idea, as usual, is not to try and make the agent do something it doesn't want to - but rather to make it want to do it in the first place.
I expect off switches - and the like - will be among the safety techniques employed. Provable correctness might be among them as well - but judging by the history of such techniques it seems rather optimistic to expect very much from them.
I am fairly confident that we can tweak any correct program into a form which allows a mathematical proof that the program behavior meets some formal specification of "Friendly".
I am less confident that we will be able to convince ourselves that the formal specification of "Friendly" that we employ is really something that we want.
We can prove there are no bugs in the program, but we can't prove there are no bugs in the program specification. Because the "proof" of the specification requires that all of the stakeholders actually look at that specification of "Friendly", think about that specification, and then bet their lives on the assertion that this is indeed what they want.
What is a "stakeholder", you ask? Well, what I really mean is pitchfork-holder. Stakes are from a different movie.
I don't think there is much different between the two. Either way you are modifying the agent's behavior. If it doesn't want it, it won't have it.
The problem with off switches is that 1) it might not be guaranteed to work (AI changes its own code or prevents anyone from accessing/using the off switch), 2) it might not be guaranteed to work the way you want to. Unless you have formally proven that AI and all the possible modifications it can make to itself are safe, you can't know for sure.
It is not a modification if you make it that way "in the first place" as specified - and the "If it doesn't want it, it won't have it" seems contrary to the specified bit where you "make it want to do it in the first place".
The idea of off switches is not that they are guaranteed to work, but that they are a safety feature. If you can make a machine do anything you want at all, you can probably make it turn itself off. You can build it so the machine doesn't wish to stay turned on - but goes willing into the night.
We will never "know for sure" that a machine intelligence is safe. This is the real world, not math land. We may be able to prove some things about it - such that its initial state is not vulnerable to input stream buffer-overflow attacks - but we won't be able to prove something like that the machine will only do what we want it to do, for some value of "we".
At the moment, the self-improving systems we see are complex man-machine symbioses - companies and governments. You can't prove math theorems about such entities - they are just too messy. Machine intelligence seems likely to be like that for quite a while - functionally embedded in a human matrix. The question of "what would the machine do if no one could interfere with its code" is one for relatively late on - machines will already be very smart by then - smarter than most human computer programmers, anyway.
Now this is an interesting thought. Even a satisficer with several goals but no upper bound on each will use all available matter on the mix of goals it's working towards. But a limited goal (make money for GiantCo, unless you reach one trillion, then stop) seems as though it would be less dangerous. I can't remember this coming up in Eliezer's CFAI document, but suspect it's in there with holes poked in its reliability.
I discuss "small" ambitions in:
http://alife.co.uk/essays/stopping_superintelligence/
They seem safer to me too. This is one of the things people can do if they are especially paranoid about leaving the machine turned on - for some reason or another.
An AI that was a satisficer would't be "the" AI; it'd be the first of many.
Odd. I would have thought that the first satisfied superhuman AI would be the last AI.
I was probably wrong in assuming I understood the discussion, in that case.
Your mistake may be in assuming that I understand.
The only part of the chain of logic that I don't fully grok is the "FOOM" part. Specifically, the recursive self improvement. My intuition tells me that an AGI trying to improve itself by rewriting its own code would encounter diminishing returns after a point - after all, there would seem to be a theoretical minimum number of instructions necessary to implement an ideal Bayesian reasoner. Once the AGI has optimized its code down to that point, what further improvements can it do (in software)? Come up with something better than Bayesianism?
Now in your summary here, you seem to downplay the recursive self-improvement part, implying that it would 'help,' but isn't strictly necessary. But my impression from reading Eliezer was that he considers it an integral part of the thesis - as it would seem to be to me as well. Because if the intelligence explosion isn't coming from software self-improvement, then where is it coming from? Moore's Law? That isn't fast enough for a "FOOM", even if intelligence scaled linearly with the hardware you threw at it, which my intuition tells me it probably wouldn't.
Now of course this is all just intuition - I haven't done the math, or even put a lot of thought into it. It's just something that doesn't seem obvious to me, and I've never heard a compelling explanation to convince me my intuition is wrong.
I don't think anyone argues that there's no limit to recursive self-improvement, just that the limit is very high. Personally I'm not sure if a really fast FOOM is possible, but I think it's likely enough to be worth worrying about (or at least letting the SIAI worry about it...).
I think the concern stands even without a FOOM; if AI gets a good bit smarter than us, however that happens (design plus learning, or self-improvement), it's going to do whatever it wants.
As for your "ideal Bayesian" intuition, I think the challenge is deciding WHAT to apply it to. The amount of computational power needed to apply it to every thing and every concept on earth is truly staggering. There is plenty of room for algorithmic improvement, and it doesn't need to get that good to outwit (and out-engineer) us.
I think the widespread opinion is that the human brain has relatively inefficient hardware -- I don't have a cite for this -- and, most likely, inefficient software as well (it doesn't seem like evolution is likely to have optimized general intelligence very well in the relatively short timeframe that we have had it at all, and we don't seem to be able to efficiently and consistently channel all of our intelligence into rational thought.)
That being the case, if we were going to write an AI that was capable of self-improvement on hardware that was roughly as powerful or more powerful than the human brain (which seems likely) it stands to reason that it could potentially be much faster and more effective than the human brain; and self-improvement should move it quickly in that direction.
I for one largely agree, but a few differences:
We've had a strong exponential since the beginning of computing. Thinking that humans create computers is something of a naive anthropocentric viewpoint: humans don't create computers and haven't for decades. Human+computer systems create computers, and the speed of progress is largely constrained by the computational aspects even today (computers increasingly do more of the work, and perhaps already do the majority). To understand this more, read this post from a former intel engineer (and apparently AI lab manager). Enlightening inside knowledge, but for whatever reason he only got up to 7 karma and wandered away.
Also, if you plotted out the data points of brain complexity on earth over time, I'm near certain it also follows a strong exponential.
The differences between all these exponentials are 'just' constants.
I find this dubious, mainly because physics tells us that using all available matter is actually highly unlikely to ever be a very efficient strategy.
However, agreed about the potential danger of future hyper-intelligence.