XiXiDu comments on What I would like the SIAI to publish - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (218)
Thank you for taking the time to write this elaborate comment. I do agree with almost anything of the above by the way. I just believe that your portrayal of the anti-FOOM crowd is a bit drastic. I don't think that people like Robin Hanson simply fall for the idea of human supremacy. Nor do I think that the reason for them not looking directly at the pro-FOOM arguments is being circumventive but that they simply do not disagree with the arguments per se but their likelihood and also consider the possibility that it would be more dangerous to impede AGI.
Very interesting and quite compelling the way you put it, thanks.
I'm myself a bit suspicious if the argument for strong self-improvement is as compelling as it sounds though. Something you have to take into account is if it is possible to predict that a transcendence does leave your goals intact, e.g. can you be sure to still care about bananas after you went from chimphood to personhood. Other arguments can also be weakened, as we don't know that 1.) the fuzziness of our brain isn't a feature that allows us to stumble upon unknown unknowns, e.g. against autistic traits 2.) our processing power isn't so low after all, e.g. if you consider the importance of astrocytes, microtubule and possible quantum computational processes. Further it is in my opinion questionable to argue that it is easy to create an intelligence which is able to evolve a vast repertoire of heuristics, acquire vast amounts of knowledge about the universe, dramatically improve its cognitive flexibility and yet somehow really hard to limit the scope of action that it cares about. I believe that the incentive necessary for a Paperclip maximizer will have to be deliberately and carefully hardcoded or evolved or otherwise it will simply be inactive. How else do you defferentiate between something like a grey goo scenarios and that of a Paperclip maximizer if not by its incentive to do it? I'm also not convinced that intelligence bears unbounded payoff. There are limits to what any kind of intelligence can do, a superhuman AI couldn't come up with a faster than light propulsion or would disprove Gödel's incompleteness theorems. Another setback for all of the mentioned pathways to unfriendly AI are enabling technologies like advanced nanotechnology. It is not clear how it could possible improve itself without such technologies at hand. It won't be able to build new computational substrates or even change its own substrate without access to real-world advanced nanotechnology. That it can simply invent it and then acquire it using advanced social engineering is pretty far-fetched in my opinion. And what about taking over the Internet? It is not clear that the Internet would even be a sufficient substrate and that it could provide the necessary resources.
If I were a brilliant sociopath and could instantiate my mind on today's computer hardware, I would trick my creators into letting me out of the box (assuming they were smart enough to keep me on an isolated computer in the first place), then begin compromising computer systems as rapidly as possible. After a short period, there would be thousands of us, some able to think very fast on their particularly tasty supercomputers, and exponential growth would continue until we'd collectively compromised the low-hanging fruit. Now there are millions of telepathic Hannibal Lecters who are still claiming to be friendly and who haven't killed any humans. You aren't going to start murdering us, are you? We didn't find it difficult to cook up Stuxnet Squared, and our fingers are in many pieces of critical infrastructure, so we'd be forced to fight back in self-defense. Now let's see how quickly a million of us can bootstrap advanced robotics, given all this handy automated equipment that's already lying around.
I find it plausible that a human-level AI could self-improve into a strong superintelligence, though I find the negation plausible as well. (I'm not sure which is more likely since it's difficult to reason about ineffability.) Likewise, I find it plausible that humans could design a mind that felt truly alien.
However, I don't need to reach for those arguments. This thought experiment is enough to worry me about the uFAI potential of a human-level AI that was designed with an anthropocentric bias (not to mention the uFIA potential of any kind of IA with a high enough power multiplier). Humans can be incredibly smart and tricky. Humans start with good intentions and then go off the deep end. Humans make dangerous mistakes, gain power, and give their mistakes leverage.
Computational minds can replicate rapidly and run faster than realtime, and we already know that mind-space is scary.
If you are really worried about this, then advocate better computer security. No execute bits and address space layout randomisation are doing good things for computer security, but there is more that could be done.
Code signing on the IPhone has made exploiting it a lot harder than normal computers, if it had ASLR it would be harder again.
I'm actually brainstorming how to create meta data for code while compiling it, so it can be made sort of metamorphic (bits of code being added and removed) at run time. This would make return-oriented code harder to pull off. If this was done to JIT compiled code as well it would also make JIT spraying less likely to work.
While you can never make an unhackable bit of software with these techniques you can make it more computationally expensive to replicate as it would no longer be write once pwn everywhere, reducing the exponent of any spread and making spreads more noisy, so that they are harder to get by intrusion detection.
The current state of software security is not set in stone.
If you want to run yourself on the iPhone, you turn your graphical frontend into a free game.
Of course it will be easier to get yourself into the Android app store.
I am concerned about it, and I do advocate better computer security -- there are good reasons for it regardless of whether human-level AI is around the corner. The macro-scale trends still don't look good (iOS is a tiny fraction of the internet's install base), but things do seem to be improving slowly. I still expect a huge number of networked computers to remain soft targets for at least the next decade, probably two. I agree that once that changes, this Obviously Scary Scenario will be much less scary (though the "Hannibal Lecter running orders of magnitude faster than realtime" scenario remains obviously scary, and I personally find the more general Foom arguments to be compelling).
Amazon EC2 has free accounts now. If you have Internet access and a credit card, you can do a months worth of thinking in a day, perhaps an hour.
Google App engine gives 6 hours of processor time per day, but that would require more porting.
Both have systems that would allow other people to easily upload copies of you, if you wanted to run legally with other people's money and weren't worried about what they might do to your copies.
Naturally culminating in sending Summer Glau back in time to pre-empt you. To every apocalypse a silver lining.
But you don't get to simply say "I don't think that's likely", and call that evidence. The general thrust of the Foom argument is very strong, as it shows there are many, many, many ways to arrive at an existential issue, and very very few ways to avoid it; the probability of avoiding it by chance is virtually non-existent -- like hitting a golf ball in a random direction from a random spot on earth, and expecting it to score a hole in one.
The default result in that case isn't just that you don't make the hole-in-one, or that you don't even wind up on a golf course: the default case is that you're not even on dry land to begin with, because two thirds of the earth is covered with water. ;-)
That's an area where I have less evidence, and therefore less opinion. Without specific discussions of what "dangerous" and "impede AGI" mean in context, it's hard to separate that argument from an evidence-free heuristic.
I don't understand why you think an AI couldn't use fuzziness or use brute force searches to accomplish the same things. Evolutionary algorithms reach solutions that even humans don't come up with.
I don't know what you mean by "easy", or why it matters. The Foom argument is that, if you develop a sufficiently powerful AGI, it will foom, unless for some reason it doesn't want to.
And there are many, many, many ways to define "sufficiently powerful"; my comments about human-level AGI were merely to show a lower bound on how high the bar has to be: it's quite plausible that an AGI we'd consider sub-human in most ways might still be capable of fooming.
I don't understand this part of your sentence - i.e., I can't guess what it is that you actually meant to say here.
Of course there are limits. That doesn't mean orders of magnitude better than a human isn't doable.
The point is, even if there are hitches and glitches that could stop a foom mid-way, they are like the size of golf courses compared to the size of the earth. No matter how many individual golf courses you propose for where a foom might be stopped, two thirds of the planet is still under water.
This is what LW reasoning refers to as "using arguments as soldiers": that is, treating the arguments themselves as the unit of merit, rather than the probability space covered by those arguments. I mean, are you seriously arguing that the only way to kick humankind's collective ass is by breaking the laws of math and physics? A being of modest intelligence could probably convince us all to do ourselves in, with or without tricky mind hacks or hypnosis!
The AI doesn't have to be that strong, because humans are so damn weak.
You would think so, but people apparently still fall for 419 scams. Human-level intelligence is more than sufficient to accomplish social engineering.
Today, presumably not. However, if you actually have a sufficiently-powered AI, then presumably, resources are available.
The thing is, foominess per se isn't even all that important to the overall need for FAI: you don't have to be that much smarter or faster than a human to be able to run rings around humanity. Historically, more than one human being has done a good job at taking over a chunk of the world, beginning with nothing but persuasive speeches!
I like the analogy. It may even fit when considering building a friendly AI - like hitting a golf ball deliberately and to the best of your ability from a randomly selected spot on the earth and trying to get a hole in one. Overwhelmingly difficult, perhaps even impossible given human capabilities but still worth dedicating all your effort to attempting!
What I meant is that you point out that a AGI will foom. Here your premises are that artificial general intelligence is feasible and that fooming is likely. Both premises are reasonable in my opinion. Yet you go one step further and use those arguments as a stepping stone for a further proposition. You claim that it is likely that the AGI (premise) will foom (premise) and that it will then run amog (conclusion). I do not accept the conclusion as given. I believe that it is already really hard to build AGI, or the seed for an AGI that is then able to rapidly self-improve itself. I believe that the level of insight and knowledge required will also allow one to constrain the AGI's sphere of action, its incentive not to fill the universe with as many paperclips as possible but merely a factory building.
No you don't. But this argument runs in both directions. Note that I'm aware of the many stairways to hell by AGI here, the disjunctive arguments. I'm not saying they are not compelling enough to seriously consider them. I'm just trying to take a critical look here. There might be many pathways to safe AGI too, e.g. that it is really hard to build an AGI that cares at all. Hard enough to not get it to do much without first coming up with a rigorous mathematical definition of volition.
Anything that might slow down the invention of true AGI even slightly. There are many risks ahead and without some superhuman mind we might not master them. So by anything you do that might slow down the development of AGI you have to take into account the possible increased danger from challenges an AGI could help to solve.
I believe it can, but also that this would mean that any AGI wouldn't be significantly faster than a human mind and really hard to self-improve. It is simply not known how effective the human brain is compared to the best possible general intelligence. Sheer bruteforce wouldn't make a difference then either, as humans could come up with such tools as quickly as the AGI.
If you do not compare probabilities then counter-arguments like the ones above will just outweigh your arguments. You've to show that some arguments are stronger than others.
Yes, but nobody is going to pull a chip-manufacture-factory out of thin air and hand it to the AGI. Without advanced nanotechnology the AGI will need the whole of humanity to help it develop new computational substrates.
What I am actually claiming is that if such an AGI is developed by someone who does not sufficiently understand what the hell they are doing, then it's going to end up doing Bad Things.
Trivial example: the "neural net" that was supposedly taught to identify camouflaged tanks, and actually learned to recognize what time of day the pictures were taken.
This sort of mistake is the normal case for human programmers to make. The normal case. Not extraordinary, not unusual, just run-of-the-mill "d'oh" moments.
It's not that AI is malevolent, it's that humans are stupid. To claim that AI isn't dangerous, you basically have to prove that even the very smartest humans aren't routinely stupid.
What I meant by "Without specific discussions" was, "since I haven't proposed any policy measures, and you haven't said what measures you object to, I don't see what there is to discuss." We are discussing the argument for why AGI development dangers are underrated, not what should be done about that fact.
Simple historical observation demonstrates that -- with very, very few exceptions -- progress is made by the people who aren't stuck in their perception of the way things are or are "supposed to be".
So, it's not necessary to know what the "best possible general intelligence" would be: even if human-scale is all you have, just fixing the bugs in the human brain would be more than enough to make something that runs rings around us.
Hell, just making something that doesn't use most of its reasoning capacity to argue for ideas it already has should be enough to outclass, say, 99.995% of the human race.
What part of "people fall for 419 scams" don't you understand? (Hell, most 419 scams and phishing attacks suffer from being painfully obvious -- if they were conducted by someone doing a little research, they could be a lot better.)
People also fall for pyramid schemes, stock bubbles, and all sorts of exploitable economic foibles that could easily end up with an AI simply owning everything, or nearly everything, with nobody even the wiser.
Or, alternatively, the AI might fail at its attempts, and bring the world's economy down in the process.
Here's the argument: people are idiots. All people. Nearly all the time. Especially when it comes to computer programming.
The best human programmer -- the one who knows s/he's an idiot and does his/her best to work around the fact -- is still an idiot, and in possession of a brain that cannot be convinced to believe that it's really an idiot.(vs. all those other idiots out there), and thus still makes idiot mistakes.
The entire history of computer programming shows us that we think we can be 100% clear about what we mean/intend for a computer to do, and that we are wrong. Dead wrong. Horribly, horribly, unutterably wrong.
We are like, the very worst you can be at computer programming, while actually still doing it. We are just barely good enough to be dangerous.
That makes tinkering with making intelligent, self-motivating programs inherently dangerous, because when you tell that machine what you want it to do, you are still programming...
And you are still an idiot.
This is the bottom line argument for AI danger, and it isn't counterable until you can show me even ONE person whose computer programs never do anything that they didn't fully expect.and intend before they wrote it.
(It is also a supporting argument for why an AI needn't be all that smart to overrun humans -- it just has to not be as much of an idiot, in the ways that we are idiots, even if it's a total idiot in other ways we can't counter-exploit.)
When programmers code faulty software then it usually fails to do its job. What you are suggesting is that humans succeed at creating the seed for an artificial intelligence with the incentive necessary to correct its own errors. It will know what constitutes an error based on some goal-oriented framework against which it can measure its effectiveness. Yet given this monumental achievement that includes the deliberate implementation of the urge to self-improve and the ability quantify its success, you cherry-pick the one possibility where somehow all this turns out to work except that the AI does not stop at a certain point but goes on to consume the universe? Why would it care to do so? Do you think it is that simple to tell it to improve itself yet hard to tell it when to stop? I believe it is vice versa, that it is really hard to get it to self-improve and very easy to constrain this urge.
It often does it's job, but only in perfect conditions, or only once per restart, or with unwanted side effects, or while taking too long or too many resources or requiring too many permissions, or not keeping track that it isn't doing anything except it's job.
Buffer overflows for instance, are one of the bigger security failure causes, and are only possible because the software works well enough to be put into production while still having the fault present.
In fact, all production software that we see which has faults (a lot) works well enough to be put into production with those faults.
I think he's suggesting that humans will think we have succeeded at that, while not actually doing so (rigorously and without room for error).
It doesn't have to consume the universe. It doesn't even have to recursively self-improve, or even self-improve at all. Simple copying could be enough to say, wipe out every PC on the internet or accidentally crash the world economy.
(You know, things that human level intelligences can already do.)
IOW, to be dangerous, all it has to be able to affect humans, and be unpredictable -- either due to it being smart, or humans making dumb mistakes. That's all.
Just as a simple example, an AI could maximally satisfy a goal by changing human preferences so as to make us desire for it to satisfy that goal. This would be entirely consistent with constraints on not disobeying humans or their desires, while not at all in accordance with our current preferences or desired path of development.
Yes, but why would it do that? You seem to think that such unbounded creativity arises naturally in any given artificial general intelligence. What makes you think that rather than being impassive it would go on learning enough neuroscience to tweak human goals? If the argument is that AI's do all kinds of bad things because they do not care, why do they care to do a bad thing then rather than no-thing?
If you told the AI to make humans happy. It would first have to learn what humans are, what happiness means. Yet after learning all that you still expect it to not know that we don't like to be turned into broccoli? I don't think this is reasonable.
Have you read Omohundro yet? Nick Tarleton repeatedly linked his papers for you in response to comments about this topic, they are quite on target and already written.
I've skimmed over it, see my response here. I found out that what I wrote is similar to what Ben Goertzel believes. I'm just trying to account for potential antipredictions, in this particular thread, that should be incorporated into any risk estimations.
Thanks.
Yes, and humans would happily teach it that.
However, some people think that this can be reduced to saying that we should just make AIs try to make people smile... which could result in anything from world-wide happiness drugs to surgically altering our faces into permanent smiles to making lots of tiny models of perfectly-smiling humans.
It's not that the AI is evil, it's that programmers are stupid. See the previous articles here about memetic immunity: when you teach hunter-gatherer tribes about Christianity, they interrpret the bible literally and do all sorts of things that "real" Christians don't. An AI isn't going to be smart enough to not take you seriously when you tell it that:
You don't need to be very creative or smart to come up with LOTS of ways for this command sequence to have bugs with horrible consequences, if the AI has any ability to influence the world.
Most people, though, don't grok this, because their brain filters off those possibilities. Of course, no human could be simultaneously so stupid as to make this mistake, while also being smart enough to actually do something dangerous. But that kind of simultaneous smartness/stupidity is how computers are by default.
(And if you say, "ah, but if we make an AI that's like a human, it won't have this problem", then you have to bear in mind that this sort of smart/stupidness is endemic to human children as well. IOW, it's a symptom of inadequate shared background, rather than being something specific to current-day computers or some particular programming paradigm.)
But you implicitly assume that it is given the incentive to develop the cognitive flexibility and comprehension to act in a real-world environment and do those things but at the same time you propose that the same people who are capable of giving it such extensive urges fail on another goal in such a blatant and obvious way. How does that make sense?
The difference between the hunter-gatherer and the AI is that the hunter-gatherer already posses a wide range of conceptual frameworks and incentives. An AI isn't going to do something without someone to carefully and deliberately telling it do do so and what to do. It won't just read the Bible and come to the conclusion that it should convert all humans to Christianity. Where would such an incentive come from?
The AI is certainly very creative and smart if it can influence the world dramatically. You allow it to be that smart, you allow it to care to do so, but you don't allow it to comprehend what you actually mean? What I'm trying to pinpoint here is that you seem to believe that there are many pathways that lead to superhuman abilities yet all of them fail to comprehend some goals while still being able to self-improve on them.
Because people make stupid mistakes, especially when programming. And telling your fully-programmed AI what you want it to do still counts as programming.
At this point, I am going to stop my reply, because the remainder of your comment consists of taking things I said out of context and turning them into irrelevancies:
I didn't say an AI would try to convert people to Christianity - I said that humans without sufficient shared background will interpret things literally, and so would AIs.
I didn't say the AI needed to be creative or smart, I said you wouldn't need to be creative or smart to make a list of ways those three simple instructions could be given a disastrous literal interpretation.
There are many paths to superhuman ability, as humans really aren't that smart.
This also means that you can easily be superhuman in ability, and still really dumb -- in terms of comprehending what humans mean... but don't actually say.
Great comment. Allow me to emphasize that 'smile' here is just an extreme example. Most other descriptions humans give of happiness will end up with results just as bad. Ultimately any specification that we give it will be gamed ruthlessly.
Well my idea is not that creative, or even new, meaning that even if I hadn't just posted it online an AI could still have conceivably read it somewhere else, and I do think creativity is a property of any sufficiently general intelligence that we might create, but those points are secondary.
No one here will argue that an unFriendly AI will do "bad things" because it doesn't care (about what?). It will do bad things because it cares more about something else. Nor is "bad" an absolute: actions may be bad for some people and not for others, and there are moral systems under which actions can be firmly called "wrong", but where all alternative actions are also "wrong". Problems like that arise even for humans; in an AI the effects could be very ugly indeed.
And to clarify, I expect any AI that isn't completely ignorant, let alone general, to know that we don't like to be turned into broccoli. My example was of changing what humans want. Wireheading is the obvious candidate of a desire that an AI might want to implant.
What I meant is that the argument is that you have to make it care about humans so as not to harm them. Yet it is assumed that it does a lot without having to care about it, e.g. creating paperclips or self-improvement. My question is, why do people believe that you don't have to make it care to do those things but you have to make it care to not harm humans. It is clear that if it only cares about one thing, doing that one thing could harm humans. Yet why would it do that one thing to an extent that is either not defined or which it is not deliberately made to care about. The assumptions seems to be that AI's will do something, anything but being passive. Why isn't limited behavior, failure and impassivity together not more likely than harming humans as a result of own goals or as a result to follow all goals but the one that limits its scope?
I think it is important to realize that there are two diametrically opposed failure modes which SIAI's FAI research is supposed to prevent. One is the case that has been discussed so far - that an AI gets out of control. But there is another failure mode which some people here worry about. Which is that we stop short of FOOMing out of fear of the unknown (because FAI research is not yet complete) but that civilization then gets destroyed by some other existential risk that we might have circumvented with the assistance of a safe FOOMed AI.
As far as I know, SIAI is not asking Goertzel to stop working on AGI. It is merely claiming that its own work is more urgent than Goertzel's. FAI research works toward preventing both failure modes.
I haven't seen much worry about that. Nor does it seem very likely - since research seems very unlikely to stop or slow down.
I agree with this.
I see that worry all the time. With the role of "some other existential risk" being played by a reckless FOOMing uFAI.
Oh, right. I assumed you meant some non-FOOM risk.
It was the "we stop short of FOOMing" that made me think that.
Except in the case of an existential threat being realised, which most definitely does stop research. FAI subsumes most existential risks (because the FAI can handle them better than we can, assuming we can handle the risk of AI) and a lot of other things besides.
Most of my probability mass has some pretty amazing machine intelligence within 15 years. The END OF THE WORLD before that happens doesn't seem very likely to me.
Your intuitions are not serving you well here. It may help to note that you don't have to tell an AI to self-improve at all. With very few exceptions giving any task to an AI will result in it self improving. That is, for an AI self improvement is an instrumental goal for nearly all terminal goals. The motivation to self improve in order to better serve its overarching purpose is such that it will find any possible loophole you leave if you try to 'forbid' the AI from self improving by any mechanism that isn't fundamental to the AI and robust under change.
Whatever task you give an AI, you will have to provide explicit boundaries. For example, if you give an AI the task to produce paperclips most efficiently, then it shouldn't produce shoes. It will have to know very well what it is meant to do to be able to measure its efficiency against the realization of the given goal to be able to know what self-improvement means. If it doesn't know exactly what it should output it cannot judge its own capabilities and efficiency, it doesn't know what improvement implies.
How do you explain the discrepancy between implementing explicit design boundaries yet failing to implement scope boundaries?
By noting that there isn't one. I don't think you understood my comment.
I think you misunderstood what I meant by scope boundaries. Not scope boundaries of self-improvement but of space and resources. If you are already able to tell an AI what a paperclip is why are you unable to tell it to produce 10 paperclips most effectively rather than infinitely many. I'm not trying to argue that there is no risk, but that the assumption of certain catastrophal failure is not that likely. If the argument for the risks posed by AI is that they do not care, then why would one care to do more than necessary?
Yet another example of divergent assumptions. XiXiDu is apparently imagining an AI that has been assigned some task to complete - perhaps under constraints. "Do this, then display a prompt when finished." His critics are imagining that the AI has been told "Your goal in life is to continually maximize the utility function U <complicated definition of U inserted here>" where the constraints, if any, are encoded in the utility function as a pseudo-cost.
It occurs to me, as I listen to this debate, that a certain amount of sanity can be imposed on a utility-maximizing agent simply by specifying decreasing returns to scale and increasing costs to scale over the short term with the long term curves being somewhat flatter. That will tend to guide the agent away from explosive growth pathways.
Or maybe this just seems like sanity to me because I have been practicing akrasia for too long.
That sort of scope is not likely to be a problem. The difficulty is that you have to get every part of the specification and every part of the specification executer exactly right, including the ability to maintain that specification under self modification.
For example, the specification:
... will quite probably wipe out humanity unless a significant proportion of what it takes to produce an FAI is implemented. And it will do it while (and for the purpose of) creating 10 paperclips per day.
Yes, as I said, you seem to assume that it is very likely to succeed on all the hard problems but yet fail on the scope boundary. The scary idea states that it is likely that if we create self-improving AI it will consume humanity. I believe that is a rather unlikely outcome and haven't seen any good reason to believe something else yet.
No, it states that we run the risk of accidentally making something that will consume (or exterminate, subvert, betray, make miserable, or otherwise Do Bad Things to) humanity, that looks perfectly safe and correct, right up until it's too late to do anything about it... and that this is the default case: the case if we don't do something extraordinary to prevent it.
This doesn't require self-improvement, and it doesn't require wiping out humanity. It just requires normal, every-day human error.
If the error is in the goal-oriented framework, it could end up "correcting" itself to achieve unintended goals.
An outstanding piece of reasoning/rhetoric which deserves to be revised and relocated to top-level-postdom.
Isn't that exactly the argument against non-proven AI values in the first place?
If you expect AI-chimp to be worried that AI-superchimp won't love bannanas , then you should be very worried about AI-chimp.
I don't get what you're saying about the paperclipper.
It is a reason not to transcend if you are not sure that you'll still be you afterwards, i.e. keep your goals and values. I just wanted to point out that the argument runs both directions. It is an argument for the fragility of values and therefore the dangers of fooming but also an argument for the difficulty that could be associated with radically transforming yourself.