Eliezer_Yudkowsky comments on Tiling Agents for Self-Modifying AI (OPFAI #2) - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (260)
The LW post may address some of your concerns. The idea here is that we need a tiling decision criterion, and the paper isn't supposed to be an AI design, it's supposed to get us a little conceptually closer to a tiling decision criterion. If you don't understand why a tiling decision criterion is a good thing in a self-improving AI which is supposed to have a stable goal system, then I'm not quite sure what issue needs addressing.
Thanks for your courtesy, and again, sorry for not being more specific in my original comment.
Yes, I'm questioning why a self-improving AI which is intended to have a stable goal system needs a tiling decision criterion. In your publication, you wrote
I don't see why the model of the sequence of agents is a good operationalization. My intuition is that
To elaborate, and for concreteness, I'll comment on
I haven't read the technical portions of the paper, but my surface impression is that the operationalization in the paper is analogous modifying your arms by successively shaving slivers of tissue off of them, and grafting slivers of tissue onto them, with a view toward making them really long. Another way to go would be to grow the long arms in a lab, chop off your current arms, and then graft the newly created long arms onto yourself. In the context of self-modifying AIs, the latter possibility seems to me to be significantly more likely than the former possibility.
Is my surface impression of the operationalization right? If so, what do you think about the points that I raise in the previous paragraphs?
Jonah, some self-modifications will potentially be large, but others might be smaller. More importantly we don't want each self-modification to involve wrenching changes like altering the mathematics you believe in, or even worse, your goals. Most of the core idea in this paper is to prevent those kinds of drastic or deleterious changes from being forced by a self-modification.
But it's also possible that there'll be many gains from small self-modifications, and it would be nicer not to need a special case for those, and for this it is good to have (in theoretical principle) a highly regular bit of cognition/verification that needs to be done for the change (e.g. for logical agents the proof of a certain theorem) so that small local changes only call for small bits of the verification to be reconsidered.
Another way of looking at it is that we're trying to have the AI be as free as possible to self-modify while still knowing that it's sane and stable, and the more overhead is forced or the more small changes are ruled out, the less free it is.
Thanks for engaging.
I'm very sympathetic to this in principle, but don't see why there would be danger of these things in practice.
Humans constantly perform small self-modifications, and this doesn't cause serious problems. People's goals do change, but not drastically, and people who are determined can generally keep their goals pretty close to their original goals. Why do you think that AI would be different?
To ensure that one gets a Friendly AI, it suffices to start with good goal system, and to ensure that the goal system remains pretty stable over time. It's not necessary that the AI be as free as possible.
You might argue that an limited AI wouldn't be able to realize as good as a future as one without limitations.
But if this is the concern, why not work to build a limited AI that can itself solve the problems about having a stable goal system under small modifications? Or, if it's not possible to get a superhuman AI subject to such limitations, why not build a subhuman AI and then work in conjunction with it to build Friendly AI that's as free as possible?
Many things in AI that look like they ought to be easy have hidden gotchas which only turn up once you start trying to code them, and we can make a start on exposing some of these gotchas by figuring out how to do things using unbounded computing power (albeit this is not a reliable way of exposing all gotchas, especially in the hands of somebody who prefers to hide difficulties, or even someone who makes a mistake about how a mathematical object behaves, but it sure beats leaving everything up to verbal arguments).
Human beings don't make billions of sequential self-modifications, so they're not existence proofs that human-quality reasoning is good enough for that.
I'm not sure how to go about convincing you that stable-goals self-modification is not something which can be taken for granted to the point that there is no need to try to make the concepts crisp and lay down mathematical foundations. If this is a widespread reaction beyond yourself then it might not be too hard to get a quote from Peter Norvig or a similar mainstream authority that, "No, actually, you can't take that sort of thing for granted, and while what MIRI's doing is incredibly preliminary, just leaving this in a state of verbal argument is probably not a good idea."
Depending on your math level, reading Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference by Judea Pearl might present you with a crisper idea of why it can be a good idea to formalize certain types of AI problems in general, and it would be a life-enriching experience, but I anticipate that's more effort than you'd want to put into this exact point.
I don't disagree (though I think that I'm less confident on this point than you are).
Why do you think that an AI would need to make billions of sequential self-modifications when humans don't need to?
I agree that it can't be taken for granted. My questions are about the particular operationalization of a self-modifying AI that you use in your publication. Why do you think that the particular operationalization is going to be related to the sorts of AIs that people might build in practice?
The paper is meant to be interpreted within an agenda of "Begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty"; not as "We think this Godelian difficulty will block AI", nor "This formalism would be good for an actual AI", nor "A bounded probabilistic self-modifying agent would be like this, only scaled up and with some probabilistic and bounded parts tacked on". If that's not what you meant, please clarify.
Ok, that is what I meant, so your comment has helped me better understand your position.
Why do you think that
is cost-effective relative to other options on the table?
For "other options on the table," I have in mind things such as spreading rationality, building the human capital of people who care about global welfare, increasing the uptake of important information into the scientific community, and building transferable skills and connections for later use.
Personally, I feel like that kind of metawork is very important, but that somebody should also be doing something that isn't just metawork. If there's nobody making concrete progress on the actual problem that we're supposed to be solving, there's a major risk of the whole thing becoming a lost purpose, as well as of potentially-interested people wandering off to somewhere where they can actually do something that feels more real.
From inside MIRI, I've been able to feel this one viscerally as genius-level people come to me and say "Wow, this has really opened my eyes. Where do I get started?" and (until now) I've had to reply "Sorry, we haven't written down our technical research agenda anywhere" and so they go back to machine learning or finance or whatever because no, they aren't going to learn 10 different fields and become hyper-interdisciplinary philosophers working on important but slippery meta stuff like Bostrom and Shulman.
Yes, that's a large de-facto part of my reasoning.
I think that in addition to this being true, it is also how it looks from the outside -- at least, it's looked that way to me, and I imagine many others who have been concerned about SI focusing on rationality and fanfiction are coming from a similar perspective. It may be the case that without the object-level benefits, the boost to MIRI's credibility from being seen to work on the actual technical problem wouldn't justify the expense of doing so, but whether or not it would be enough to justify the investment by itself, I think it's a really significant consideration.
[ETA: Of course, in the counterfactual where working on the object problem actually isn't that important, you could try to explain this to people and maybe that would work. But since I think that it is actually important, I don't particularly expect that option to be available.]
I understand your position, but believe that your concerns are unwarranted, though I don't think that this is obvious.
BTW, I spent a large fraction of the first few months of 2013 weighing FAI research vs. other options before arriving at MIRI's 2013 strategy (which focuses heavily on FAI research). So it's not as though I think FAI research is obviously the superior path, and it's also not as though we haven't thought through all these different options, and gotten feedback from dozens of people about those options, and so on.
Also note that MIRI did, in fact, decide to focus on (1) spreading rationality, and (2) building a community of people who care about rationality, the far future, and x-risk, before turning its head to FAI research: see (in chronological order) the Singularity Summit, Less Wrong and CFAR.
But the question of which interventions are most cost effective (given astronomical waste) is a huge and difficult topic, one that will require thousands of hours to examine properly. Building on Beckstead and Bostrom, I've tried to begin that examination here. Before jumping over to that topic, I wonder: do you now largely accept the case Eliezer made for this latest paper as an important first step on an important sub-problem of the Friendly AI problem? And if not, why not?
My comments were addressed at Eliezer's paper specifically, rather than MIRI's general strategy, or your own views.
Sure – what I'm thinking about is cost-effectiveness at the margin.
Based on Eliezer's recent comments, my impression is that Eliezer is not making such a case, and is rather making a case for the paper being of sociological/motivational value. Is your understanding different?
That sounds like a very long conversation if we're supposed to be giving quantitative estimates on everything. The qualitative version is just that this sort of thing can take a long time, may not parallelize easily, and can potentially be partially factored out to academia, and so it is wise to start work on it as soon as you've got enough revenue to support even a small team, so long as you can continue to scale your funding while that's happening.
This reply takes for granted that all astronomical benefits bottleneck through a self-improving AI at some point.
Thanks for clarifying your position.
My understanding based on what you say is that the research in your paper is intended to spearhead a field of research, rather than to create something that will be directly used for friendliness in the first AI. Is this right?
If so, our differences are about the sociology of the scientific, technological and political infrastructure rather than about object level considerations having to do with AI.
For readers who want to read more about this point, see FAI Research as Effective Altruism.
Other options on the table are not mutually exclusive. There is a lot of wealth and intellectual brain power in the world, and a lot of things to work on. We can't and shouldn't all work on one most important problem. We can't all work on the thousand most important problems. We can't even agree on what those problems are.
I suspect Eliezer has a comparative advantage in working on this type of AI research, and he's interested in it, so it makes sense for him to work on this. It especially makes sense to the extent that this is an area no one else is addressing. We're only talking about an expenditure of several careers and a few million dollars. Compared to the world economy, or even compared to the non-profit sector, this is a drop in the bucket.
Now if instead Eliezer was the 10,000th smart person working on string theory, or if there was an Apollo-style government-funded initiative to develop an FAI by 2019, then my estimate of the comparative advantage of MIRI would shift. But given the facts as they are, MIRI seems like a plausible use of the limited resources it consumes.
If Eliezer feels that this is his comparative advantage then it's fine for him to work on this sort of research — I'm not advocating that such research be stopped. My own impression is that Eliezer has comparative advantage in spreading rationality and that he could have a bigger impact by focusing on doing so.
I'm not arguing that such research shouldn't be funded. The human capital question is genuinely more dicey, insofar as I think that Eliezer has contributed substantial value through his work on spreading rationality, and my best guess is that the opportunity cost of not doing more is large.
"...need to make billions of sequential self-modifications when humans don't need to" to do what? Exist, maximize utility, complete an assignment, fulfill a desire...? Some of those might be better termed as "wants" than "needs" but that info is just as important in predicting behavior.
For starters, humans aren't able to make changes as easily as an AI can. We don't have direct access to our source code that we can change effortlessly, any change we make costs either time, money, or both.
That doesn't address the question. It says that an AI could more easily make self-modifications. It doesn't suggest that an AI needs to make such self-modifications. Human intelligence is an existence proof that human-level intelligence does not require "billions of sequential self-modifications". Whether greater than human intelligence requires it, in fact whether greater than human intelligence is even possible, is still an open question.
So I reiterate, "Why do you think that an AI would need to make billions of sequential self-modifications when humans don't need to?"
Human intelligence required billions of sequential modifications (though not selfmodifications). An AI in general would not need self-modifications, but for a AGI it seems that it would be necessary. I don’t doubt a formal reasoning for the latter statement has been written by someone smarter than me before, but a very informal argument would be something like this:
If an AGI doesn’t need to self-modify, then that AGI is already perfect (or close enough that it couldn’t possibly matter). Since practically no software humans ever built was ever perfect in all respects, that seems exceedingly unlikely. Therefore, the first AGI would (very likely) need to be modified. Of course, at the begining it might be modified by humans (thus, not selfmodified), but the point of building AGI is to make it smarter than us. Thus, once it is smarter than us by a certain amount, it wouldn’t make sense for us (stupider intellects) to improve it (smarter intellect). Thus, it would need to self-modify, and do it a lot, unless by some ridiculously fortuitous accident of math (a) human intelligence is very close to the ideal, or (b) human intelligence will build something very close to the ideal on the first try.
It would be nice if those modifications would be things that are good for us, even if we can’t understand them.
FWIW, Jonah has a PhD in math and has probably read Pearl or a similar graphical models book.
(Not directly relevant to the conversation, but just trying to lower your probability estimate that Jonah's objections are naieve.)
I don't see it is a decisive point, one of "many weak arguments," but I think the analogy with human self-modification is relevant. I would like to see more detailed discussion of the issue.
Aspects of this that seem relevant to me:
Self-modification is to be interpreted to include 'directly editing one's own low-level algorithms using high-level deliberative process' but not include 'changing one's diet to change one's thought processes'. If you are uncomfortable using the word 'self-modification' for this please substitute a new word 'fzoom' which means only that and consider everything I said about self-modification to be about fzoom.
Humans wouldn't look at their own source code and say, "Oh dear, a Lobian obstacle", on this I agree, but this is because humans would look at their own source code and say "What?". Humans have no idea under what exact circumstances they will believe something, which comes with its own set of problems. The Lobian obstacle shows up when you approach things from the end we can handle, namely weak but well-defined systems which can well-define what they will believe, whereas human mathematicians are stronger than ZF plus large cardinals but we don't know how they work or what might go wrong or what might change if we started editing neural circuit #12,730,889,136.
As Christiano's work shows, allowing for tiny finite variances of probability might well dissipate the Lobian obstacle, but that's the sort of thing you find out by knowing what a Lobian obstacle is.
Very helpful. This seems like something that could lead to a satisfying answer to my question. And don't worry, I won't engage in a terminological dispute about "self-modification."
Can you clarify a bit what you mean by "low-level algorithms"? I'll give you a couple of examples related to what I'm wondering about.
Suppose I am working with a computer to make predictions about the the weather, and we consider the operations of the computer along with my brain as a single entity for the purposes testing whether the Lobian obstacles you are thinking of arise in practice. Now suppose I make basic modifications to the computer, expecting that the joint operation of my brain with the computer will yield improved output. This will not cause me to trip over Lobian obstacles. Why does whatever concern you have about the Lob problem predict that it would not, but also predict that future AIs might stumble over the Lob problem?
Another example. Humans learn different mental habits without stumbling over Lobian obstacles, and they can convince themselves that adopting the new mental habits is an improvement. Some of these are more derivative ("Don't do X when I have emotion Y") and others are perhaps more basic ("Try to update through explicit reasoning via Bayes' Rule in circumstances C"). Why does whatever concern you have about the Lob problem predict that humans can make these modifications without stumbling, but also predict that future AIs might stumble over the Lob problem?
If the answer to both examples is "those are not cases of directly editing one's low-level algorithms using high-level deliberative processes," can you explain why your concern about Lobian issues only arises in that type of case? This is not me questioning your definition of "fzoom," it is my asking why Lobian issues only arise when you are worrying about fzoom.
The first example is related to what I had in mind when I talked about fundamental epistemic standards in a previous comment:
Well, part of this is because modern humans are monstrous in the eyes of many pre-modern humans. To them, the future has been lost because they weren't using a self-modification procedure that provably preserved their values.