Changed the title.
Thanks, this is an important result showing that the dominating property really isn't enough to pick out a prior for a good agent. I like your example as a to-the-point explanation of the issue.
I think the post title is a somewhat misleading, though: it sounds as though differences in instantiations of AIXI don't really matter, and they can all be arbitrarily stupid. Any chance of changing that? Perhaps to something like "Versions of AIXI can be arbitrarily stupid"?
Fair question.
My point is that if improving techniques could take you from (arbitrarily chosen percentages here) a 50% chance that an unfriendly AI would cause an existential crisis, to 25% chance that it would - you really didn't gain all that much, and the wiser course of action is still not to make the AI.
The actual percentages are wildly debatable, of course, but I would say that if you think there is any chance - no matter how small - of triggering ye olde existential crisis, you don't do it - and I do not believe that technique alone could get us anywhere close to that.
The ideas you propose in OP seem wise, and good for society - and wholly ineffective in actually stopping us from creating an unfriendly AI, The reasons are simply that the complexity defies analysis, at least by human beings. The fear is that the unfriendly arises from unintended design consequences, from unanticipated system effects rather than bugs in code or faulty intent
It's a consequence of entropy - there are simply far, far more ways for something to get screwed up than for it to be right. So unexpected effects arising from complexity are far, far more likely to cause issues than be beneficial unless you can somehow correct for them - planning ahead only will get you so far.
Your OP suggests that we might be more successful if we got more of it right "the first time". But - things this complex are not created, finished, de-novo - they are an iterative, evolutionary task. The training could well be helpful, but I suspect not for the reasons you suggested. The real trick is to design things so that when they go wrong - it still works correctly. You have to plan for and expect failure, or that inevitable failure is the end of the line.
I disagree that "you really didn't gain all that much" in your example. There are possible numbers such that it's better to avoid producing AI, but (a) that may not be a lever which is available to us, and (b) AI done right would probably represent an existential eucatastrophe, greatly improving our ability to avoid or deal with future threats.
The flaws leading to an unexpectedly unfriendly AI certainly might lead back to a flaw in the design - but I think it is overly optimistic to think that the human mind (or a group of minds, or perhaps any mind) is capable of reliably creating specs that are sufficient to avoid this. We can and do spend tremendous time on this sort of thing already, and bad things still happen. You hold the shuttle up as an example of reliability done right (which it is) - but it still blew up, because not all of shuttle design is software. In the same way, the issue could arise from some environmental issue that alters the AI in such a way that it is unpredictable - power fluctuations, bit flip, who knows. The world is a horribly non-deterministic place, from a human POV.
By way of analogy - consider weather prediction. We have worked on it for all of history, we have satellites and supercomputers - and we are still only capable of accurate predictions for a few days or week, getting less and less accurate as we go. This isn't a case of making a mistake - it is a case of a very complex end-state arising from simple beginnings, and lacking the ability to make perfectly accurate predictions about some things. To put it another way - it may simply be the problem is not computable, now or with any forseeable technology.
I'm not sure quite what point you're trying to make:
- If you're arguing that with the best attempt in the world it might be we still get it wrong, I agree.
- If you're arguing that greater diligence and better techniques won't increase our chances, I disagree.
- If you're arguing something else, I've missed the point.
I think there may be an unfounded assumption here - that an unfriendly AI would be the results of some sort of bug, or coding errors that could be identified ahead of time and fixed.
I rather suspect those sorts of stuff would not result in "unfriendly", they would result in crash/nonsense/non-functional AI.
Presumably part of the reason the whole friendly/non-friendly thing is an issue is because our models of cognition are crude, and a ton of complex high-order behavior is a result of emergent properties in a system, not from explicit coding. I would expect the sort of error that accidentally turns an AI into a killer robot would be subtle enough that it is only comprehensible in hindsight, if then. (Note this does not mean intentionally making a hostile AI is all that hard. I can make hostility, or practical outcomes identical to it, without AI at all, so it stands to reason that could carry over)
I'm not suggesting that the problems would come from what we normally think of as software bugs (though see the suggestion in this comment). I'm suggesting that they would come from a failure to specify the right things in a complex scenario -- and that this problem bears enough similarities to software bugs that they could be a good test bed for working out how to approach such problems.
To clarify, I was not critiquing the idea that we need to get "superintelligence unleashed on the world" correct the first try - that of course I do agree with. I was critiquing the more specific idea that we need to get AGI morality/safety correct the first try.
One could compare to ICBM missile defense systems. The US (and other nations) have developed that tech, and i'ts a case where you have to get the deployed product "right the first try". However you can't test it in the real world, but you absolutely can do iterative development in simulation, and this really is the only sensible way to develop such tech. Formal verification is about as useful for AGI safety as it is for testing ICBM defense - not much use at all.
I'm not sure how much we are disagreeing here. I'm not proposing anything like formal verification. I think development in simulation is likely to be an important tool in getting it right the first time you go "live", but I also think there may be other useful general techniques/tools, and that it could be worth investigating them well in advance of need.
For more than a decade I have been systematically identifying error-prone programming habits—by reviewing the literature, by analyzing other people’s mistakes, and by analyzing my own mistakes—and redesigning my programming environment to eliminate those habits. For example, “escape” mechanisms, such as backslashes in various network protocols and % in printf, are error-prone: it’s too easy to feed “normal” strings to those functions and forget about the escape mechanism.
I switched long ago to explicit tagging of “normal” strings; the resulting APIs are wordy but no longer error-prone. The combined result of many such changes has been a drastic reduction in my own error rate. Starting in 1997, I offered $500 to the first person to publish a verifiable security hole in the latest version of qmail, my Internet-mail transfer agent; see http://cr.yp.to/qmail/guarantee.html. There are now more than a million Internet SMTP servers running qmail. Nobody has attempted to claim the $500. Starting in 2000, I made a similar $500 offer for djbdns, my DNS software; see http://cr.yp.to/djbdns/guarantee.html. This program is now used to publish the IP addresses for two million .com domains: citysearch.com, for example, and lycos.com. Nobody has attempted to claim the $500.
There were several non-security bugs in qmail, and a few in djbdns. My error rate has continued to drop since then. I’m no longer surprised to whip up a several-thousand-line program and have it pass all its tests the first time.
Bug-elimination research, like other user-interface research, is highly nonmathematical.
The goal is to have users, in this case programmers, make as few mistakes as possible in achieving their desired effects. We don’t have any way to model this—to model human psychology—except by experiment. We can’t even recognize mistakes without a human’s help. (If you can write a program to recognize a class of mistakes, great—we’ll incorporate your program into the user interface, eliminating those mistakes—but we still won’t be able to recognize the remaining mistakes.) I’ve seen many mathematicians bothered by this lack of formalization; they ask nonsensical questions like “How can you prove that you don’t have any bugs?” So I sneak out of the department, take off my mathematician’s hat, and continue making progress towards the goal.
http://cr.yp.to/cv/activities-20050107.pdf (apparently this guy's accomplishments are legendary in crypto circles)
http://www.fastcompany.com/28121/they-write-right-stuff
Personal experience: I found that I was able to reduce my bug rate pretty dramatically through application of moderate effort (~6 months of paying attention to what I was doing and trying to improve my workflow without doing anything advanced like screencasting myself or even taking dedicated self-improvement time), and I think it could probably be increased even more by adding many layers of process.
In any case, I think it makes sense to favor the development of bug reduction techs like version control, testing systems, type systems, etc. as part of a broad program of differential technological development. (I wonder how far you could go by analyzing almost every AGI failure mode as a bug of some sort, in the "do what I mean, not what I say" sense. The key issues being that bugs don't always manifest instantly and sometimes change behavior subtly instead of immediately halting program execution. Maybe the "superintelligence would have tricky bugs" framing would be an easier sell for AI risks to computer scientists. The view would imply that we need to learn to write bug free code, including anticipating & preventing all AGI-specific bugs like wireheading, before building an AGI.)
See also: My proposal for how to structure FAI development.
Thanks, this is a great collection of relevant information.
I agree with your framing of this as differential tech development. Do you have any thoughts on the best routes to push on this?
I will want to think more about framing AGI failures as (subtle) bugs. My initial impression is positive, but I have some worry that it would introduce a new set of misconceptions.
Most challenges we can approach with trial-and-error, so many of our habits and social structures are set up to encourage this.
This hasn't always been the case. Throughout history leader have had to get it right the first time. Especially in war-times. I'd bet that someone more versed in history than I could give lots of examples. Granted in our current society which is saturated with complexity and the limits of human comprehension trial-and-error seems like a viable way. But hierachical processes seem to work quite well for major human endeavors like th manhatten project.
Good point that this hasn't always been the case. However, we also know that people made a lot of mistakes in some of these cases. It would be great to work out how we can best approach such challenges in the future.
Giving an artificial intelligence good values may be a particularly important challenge, and one where we need to be correct first time.
This view is probably completely mistaken, for two separate reasons:
we can test AI architectures at different levels of scaling. A human brain is just a scaled up primate brain, which suggests that all the important features of how value acquisition works, empathy, altruism, value alignment, whatever - all of those features can be tested first in AGI that is near human level.
We already have encountered numerous large-scale 'one-shot' engineering challenges, and there is already an extremely effective general solution. If you have a problem that you have to get right the first try, you change this into an iterative problem by creating a simulation framework. Doing that for AGI may involve creating the Matrix, more or less, but that isn't necessarily anymore complex than creating AGI in the first place.
To me these look like (pretty good) strategies for getting something right the first time, not in opposition to the idea that this would be needed.
They do suggest that an environment which is richer than just "submit perfect code without testing" might be a better training ground.
View more: Next
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Do you think you'd use this out of interest Owen?
Maybe, if it had good enough UI and enough features?
I feel like it's quite a narrow-target/high-bar to compete with back-of-the-envelope/whiteboard at one end (for ease of use), and a software package that does monte carlos properly at the other end.