wedrifid comments on What I would like the SIAI to publish - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (218)
Your intuitions are not serving you well here. It may help to note that you don't have to tell an AI to self-improve at all. With very few exceptions giving any task to an AI will result in it self improving. That is, for an AI self improvement is an instrumental goal for nearly all terminal goals. The motivation to self improve in order to better serve its overarching purpose is such that it will find any possible loophole you leave if you try to 'forbid' the AI from self improving by any mechanism that isn't fundamental to the AI and robust under change.
Whatever task you give an AI, you will have to provide explicit boundaries. For example, if you give an AI the task to produce paperclips most efficiently, then it shouldn't produce shoes. It will have to know very well what it is meant to do to be able to measure its efficiency against the realization of the given goal to be able to know what self-improvement means. If it doesn't know exactly what it should output it cannot judge its own capabilities and efficiency, it doesn't know what improvement implies.
How do you explain the discrepancy between implementing explicit design boundaries yet failing to implement scope boundaries?
By noting that there isn't one. I don't think you understood my comment.
I think you misunderstood what I meant by scope boundaries. Not scope boundaries of self-improvement but of space and resources. If you are already able to tell an AI what a paperclip is why are you unable to tell it to produce 10 paperclips most effectively rather than infinitely many. I'm not trying to argue that there is no risk, but that the assumption of certain catastrophal failure is not that likely. If the argument for the risks posed by AI is that they do not care, then why would one care to do more than necessary?
Yet another example of divergent assumptions. XiXiDu is apparently imagining an AI that has been assigned some task to complete - perhaps under constraints. "Do this, then display a prompt when finished." His critics are imagining that the AI has been told "Your goal in life is to continually maximize the utility function U <complicated definition of U inserted here>" where the constraints, if any, are encoded in the utility function as a pseudo-cost.
It occurs to me, as I listen to this debate, that a certain amount of sanity can be imposed on a utility-maximizing agent simply by specifying decreasing returns to scale and increasing costs to scale over the short term with the long term curves being somewhat flatter. That will tend to guide the agent away from explosive growth pathways.
Or maybe this just seems like sanity to me because I have been practicing akrasia for too long.
Such an AI would still be motivated to FOOM to consolidate its future ability to achieve large utility against the threat of being deactivated before then.
It doesn't know about any threat. You implicitly assume that it has something equivalent to fear, that it perceives threats. You allow for the human ingenuity to implement this and yet you believe that they are unable to limit its scope. I just don't see that it would be easy to make an AI that would go FOOM because it doesn't care to go FOOM. If you tell it to optimize some process then you'll have to tell it what optimization means. If you can specify all that, how is it then still likely that it somehow comes up with its own idea that optimization might be to consume the universe if you told it to optimize its software running on a certain supercomputer? Why would it do that, where does the incentive come from? If I tell a human to optimize he might muse to turn the planets into computronium but if I tell a AI to optimize it doesn't know what it means until I tell it what it means and then it still won't care because it isn't equipped with all the evolutionary baggage that humans are equipped with.
It is a general intelligence that we are considering. It can deduce the threat better than we can.
Because it is a general intelligence. It is smart. It is not limited to getting its ideas from you, it can come up with its own. And if the AI has been given the task of optimising its software for performance on a certain computer then it will do whatever it can to do that. This means harnessing external resources to do research on computation theory.
No he doesn't. He assumes only that it is a general intelligence with an objective. Potentially negative consequences are just part of possible universes that it models like everything else.
I'm not sure what can be done to make this clear:
SELF IMPROVEMENT IS AN INSTRUMENTAL GOAL THAT IS USEFUL FOR ACHIEVING MOST TERMINAL VALUES.
You have this approximately backwards. A human knows that if you tell her to create 10 paperclips every day you don't mean take over the world so she can be sure that nobody will interfere with her steady production of paperclips in the future. The AI doesn't.
It has the ability to model and to investigate hypothetical possibilities that might negatively impact the utility function it is optimizing. If it doesn't, it is far below human intelligence and is non-threatening for the same reason a narrow AI is non-threatening (but it isn't very useful either).
The difficulty of detecting these threats is spread out around the range of difficulties the AI is capable of handling, so it can infer that there are probably more threats which it could only detect if it were smarter. Therefore, making itself smarter will enable it to detect more threats and thereby increase utility.
To be able to optimize it will have to know what it is supposed to optimize. You've to carefully specify what it output (utility function) is supposed to be or it won't be able to tell how good it is at optimizing. If you just tell it to produce paperclips, it won't be able to self-improve because it doesn't know how paperclips look like etc., therefore it cannot judge its own success or that extreme heat would be a negative impact giving paperclips made out of plastic. You further assume that it has a detailed incentive, that it is given a detailed pathway that it tells to look for threats and eliminate them.
If it doesn't it is what most researchers are working on, an intelligence with the potential to learn and make use of what it learnt, with the potential to become intelligent (educated). I'm getting the impression that people here assume that researchers are not working on an AGI but to hardcode a FOOM machine. If FOOM is simply part of your definition then there's no arguing against it going FOOM. But what researchers like Goertzel are working on are systems with the potential to reach human level intelligence, that does not mean that they will by definition jailbreak their nursery school. Although I never tried to argue against the possibility but that there are many pathways where this won't happen rather than the way it is portrayed by the SIAI, that any implementation of AGI will most likely consume humnanity.
The sorts of intelligences you are talking about are narrow AIs, not general intelligences. If you told a general intelligence to produce paperclips but it didn't know what a paperclip was, then its first subgoal would be to find out. The sort of mind that would give up on a minor obstacle like that wouldn't foom, but it wouldn't be much of an AGI either.
And yes, most researchers today are working on narrow AIs, not on AGI. That means they're less likely to successfully make a general intelligence, but it has no bearing on the question of what will happen if they do make one.
That sort of scope is not likely to be a problem. The difficulty is that you have to get every part of the specification and every part of the specification executer exactly right, including the ability to maintain that specification under self modification.
For example, the specification:
... will quite probably wipe out humanity unless a significant proportion of what it takes to produce an FAI is implemented. And it will do it while (and for the purpose of) creating 10 paperclips per day.
See other comments hereabouts for hints.
And I was arguing that any given AI won't be able to self-improve without an exact specification of its output against which it can judge its own efficiency. That's why I don't see how it would be likely to be able to implement such exact specifications but yet fail to limit its scope of space, time and resources. What makes it even more unlikely in my opinion is that an AI won't care to output anything as long as it isn't explicitly told to do so. Where would that incentive come from?
You assume that it knows that it is supposed to use all of science and the universe to self-improve when it would very likely just self-improve to the extent that it is told and don't care to go any further. That is for example software-optimization. I just don't see why you think that any artificial general intelligence would automatically assume that it would have to understand the whole universe to come up with the best possible way to produce 10 paperclips?
You don't need to tell it to self improve at all.
Per day. Risk mitigation. Security concerns. Possibility of interuption of resource supply due to finance, politics or the collapse of civilisation. Limited lifespan of the sun (primary energy source). Amount of iron in planet.
Given that particular specification if the AI didn't take a level in baddass it would appear to be malfunctioning.
I just saw this comment by Ben Goertzel regarding self-improvement. I'd love if someone here explained why he as AGI researcher gets this so wrong?
Political incentive determines the bottom line. Then the page is filled with rhetoric (and, from the looks of it, loaded language and status posturing.)
Seriously, Ben is trying to accuse people of abusing the self-modification term based on the (trivially true) observation that there is a blurry boundary between learning and self-modification?
It's a good thing Ben is mostly harmless. I particularly liked the part where I asked Eliezer:
... and actually got a candid reply.
It is interesting to note the effort Ben is going to here to dissaffiliate himself with the SIAI and portray them as 'out group'. Wei was querying (see earlier link) the wisdom of having Ben as Director of Research just earlier this year.
An educated outsider will very likely side with the expert though. Just like with the hype around the LHC and its dangers, academics and educated people largely believed the physicists working on it and not the fringe group that claimed it will destroy the world. Although that might be vice versa with the general public. Of course you cannot draw any conclusions about who's right from this, but it should be investigated anyway because what all parties have in common is the need for support and money.
There are two different groups to be convinced here by each party. One group includes the educated people (academics) and mediocre rationalists and the other group is the general public.
When it comes to who's right, the people one should listen to are the educated experts who are listening to both parties, their position and arguments. Although their intelligence and status as rationalists will be disputed as each party will claim that they are not smart enough to see the truth if they disagree with them.
Goertzel is generalizing from the human example of intelligence, which is probably the most pernicious and widespread failure mode in thinking about AI.
Or he may be completely disconnected from anything even resembling the real world. I literally have trouble believing that a professional AI researcher could describe a primitive, dumber-than-human AGI as "toddler-level" in the same sentence he dismisses it as a self-modification threat.
Toddlers self-modify into people using brains made out of meat!
No they don't. Self-modification in the context of AGI doesn't mean learning or growing, it means understanding the most fundamental architecture of your own mind and purposefully improving it.
That said, I think your first sentence is probably right. It looks like Ben can't imagine a toddler-level AGI self-modifying because human toddlers can't (or human adults, for that matter). But of course AGIs will be very different from human minds. For one thing, their source code will be a lot easier to understand than ours. For another, their minds will probably be much better at redesigning and improving code than ours are. Look at the kind of stuff that computer programs can do with code: Some of them already exceed human capabilities in some ways.
"Toddler-level AGI" is actually a very misleading term. Even if an AGI is approximately equal to a human toddler by some metrics, it will certainly not be equal by many other metrics. What does "toddler-level" mean when the AGI is vastly superior to even adult human minds in some respects?
Well, bad analogy. They don't self-modify by understanding their source code and improving it. They gradually grow larger brains in a pre-set fashion while learning specific tasks. Humans have very little ability to self-modify.
(My shorter answer, by the way - I interpret all such behaviors through a Hansonian lens. This includes "near vs far", observations about the incentives of researchers, the general theme of "X is not about Y" and homo hypocritus. Rather cynical, some may suggest, but this kind of thinking gives very good explanaions for "Why?"s that would otherwise be confusing.)
The basic idea is to make a machine that is satisfied relatively easily. So, for example, you tell it to build the ten paperclips with 10 kj total - and tell it not to worry too much if it doesn't make them - it is not that important.
Sorry, I don't understand your comment at all. I'll be back tomorrow.
Yes, as I said, you seem to assume that it is very likely to succeed on all the hard problems but yet fail on the scope boundary. The scary idea states that it is likely that if we create self-improving AI it will consume humanity. I believe that is a rather unlikely outcome and haven't seen any good reason to believe something else yet.
No, it states that we run the risk of accidentally making something that will consume (or exterminate, subvert, betray, make miserable, or otherwise Do Bad Things to) humanity, that looks perfectly safe and correct, right up until it's too late to do anything about it... and that this is the default case: the case if we don't do something extraordinary to prevent it.
This doesn't require self-improvement, and it doesn't require wiping out humanity. It just requires normal, every-day human error.
Here is Ben's phrasing: