Eliezer_Yudkowsky comments on Tiling Agents for Self-Modifying AI (OPFAI #2) - Less Wrong

55 Post author: Eliezer_Yudkowsky 06 June 2013 08:24PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (260)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 06 June 2013 03:53:05AM 7 points [-]

Jonah, some self-modifications will potentially be large, but others might be smaller. More importantly we don't want each self-modification to involve wrenching changes like altering the mathematics you believe in, or even worse, your goals. Most of the core idea in this paper is to prevent those kinds of drastic or deleterious changes from being forced by a self-modification.

But it's also possible that there'll be many gains from small self-modifications, and it would be nicer not to need a special case for those, and for this it is good to have (in theoretical principle) a highly regular bit of cognition/verification that needs to be done for the change (e.g. for logical agents the proof of a certain theorem) so that small local changes only call for small bits of the verification to be reconsidered.

Another way of looking at it is that we're trying to have the AI be as free as possible to self-modify while still knowing that it's sane and stable, and the more overhead is forced or the more small changes are ruled out, the less free it is.

Comment author: JonahSinick 06 June 2013 04:16:11AM 0 points [-]

Thanks for engaging.

More importantly we don't want each self-modification to involve wrenching changes like altering the mathematics you believe in, or even worse, your goals.

I'm very sympathetic to this in principle, but don't see why there would be danger of these things in practice.

But it's also possible that there'll be many gains from small self-modifications,

Humans constantly perform small self-modifications, and this doesn't cause serious problems. People's goals do change, but not drastically, and people who are determined can generally keep their goals pretty close to their original goals. Why do you think that AI would be different?

Another way of looking at it is that we're trying to have the AI be as free as possible to self-modify while still knowing that it's sane and stable, and the more overhead is forced or the more small changes are ruled out, the less free it is.

To ensure that one gets a Friendly AI, it suffices to start with good goal system, and to ensure that the goal system remains pretty stable over time. It's not necessary that the AI be as free as possible.

You might argue that an limited AI wouldn't be able to realize as good as a future as one without limitations.

But if this is the concern, why not work to build a limited AI that can itself solve the problems about having a stable goal system under small modifications? Or, if it's not possible to get a superhuman AI subject to such limitations, why not build a subhuman AI and then work in conjunction with it to build Friendly AI that's as free as possible?

Comment author: Eliezer_Yudkowsky 06 June 2013 05:03:38AM 5 points [-]

Many things in AI that look like they ought to be easy have hidden gotchas which only turn up once you start trying to code them, and we can make a start on exposing some of these gotchas by figuring out how to do things using unbounded computing power (albeit this is not a reliable way of exposing all gotchas, especially in the hands of somebody who prefers to hide difficulties, or even someone who makes a mistake about how a mathematical object behaves, but it sure beats leaving everything up to verbal arguments).

Human beings don't make billions of sequential self-modifications, so they're not existence proofs that human-quality reasoning is good enough for that.

I'm not sure how to go about convincing you that stable-goals self-modification is not something which can be taken for granted to the point that there is no need to try to make the concepts crisp and lay down mathematical foundations. If this is a widespread reaction beyond yourself then it might not be too hard to get a quote from Peter Norvig or a similar mainstream authority that, "No, actually, you can't take that sort of thing for granted, and while what MIRI's doing is incredibly preliminary, just leaving this in a state of verbal argument is probably not a good idea."

Depending on your math level, reading Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference by Judea Pearl might present you with a crisper idea of why it can be a good idea to formalize certain types of AI problems in general, and it would be a life-enriching experience, but I anticipate that's more effort than you'd want to put into this exact point.

Comment author: JonahSinick 06 June 2013 05:21:16AM *  1 point [-]

Many things in AI that look like they ought to be easy have hidden gotchas which only turn up once you start trying to code them

I don't disagree (though I think that I'm less confident on this point than you are).

Human beings don't make billions of sequential self-modifications, so they're not existence proofs that human-quality reasoning is good enough for that.

Why do you think that an AI would need to make billions of sequential self-modifications when humans don't need to?

I'm not sure how to go about convincing you that stable-goals self-modification is not something which can be taken for granted to the point that there is no need to try to make the concepts crisp and lay down mathematical foundations. If

I agree that it can't be taken for granted. My questions are about the particular operationalization of a self-modifying AI that you use in your publication. Why do you think that the particular operationalization is going to be related to the sorts of AIs that people might build in practice?

Comment author: Eliezer_Yudkowsky 06 June 2013 06:02:27AM 7 points [-]

I agree that it can't be taken for granted. My questions are about the particular operationalization of a self-modifying AI that you use in your publication. Why do you think that the particular operationalization is going to be related to the sorts of AIs that people might build in practice?

The paper is meant to be interpreted within an agenda of "Begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty"; not as "We think this Godelian difficulty will block AI", nor "This formalism would be good for an actual AI", nor "A bounded probabilistic self-modifying agent would be like this, only scaled up and with some probabilistic and bounded parts tacked on". If that's not what you meant, please clarify.

Comment author: JonahSinick 06 June 2013 06:13:46AM 2 points [-]

Ok, that is what I meant, so your comment has helped me better understand your position.

Why do you think that

Begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty

is cost-effective relative to other options on the table?

For "other options on the table," I have in mind things such as spreading rationality, building the human capital of people who care about global welfare, increasing the uptake of important information into the scientific community, and building transferable skills and connections for later use.

Comment author: Kaj_Sotala 06 June 2013 07:32:35AM *  21 points [-]

Personally, I feel like that kind of metawork is very important, but that somebody should also be doing something that isn't just metawork. If there's nobody making concrete progress on the actual problem that we're supposed to be solving, there's a major risk of the whole thing becoming a lost purpose, as well as of potentially-interested people wandering off to somewhere where they can actually do something that feels more real.

Comment author: lukeprog 06 June 2013 01:04:19PM *  13 points [-]

as well as of potentially-interested people wandering off to somewhere where they can actually do something that feels more real

From inside MIRI, I've been able to feel this one viscerally as genius-level people come to me and say "Wow, this has really opened my eyes. Where do I get started?" and (until now) I've had to reply "Sorry, we haven't written down our technical research agenda anywhere" and so they go back to machine learning or finance or whatever because no, they aren't going to learn 10 different fields and become hyper-interdisciplinary philosophers working on important but slippery meta stuff like Bostrom and Shulman.

Comment author: Eliezer_Yudkowsky 06 June 2013 07:41:10AM 6 points [-]

Yes, that's a large de-facto part of my reasoning.

Comment author: Benja 06 June 2013 05:39:01PM *  5 points [-]

I think that in addition to this being true, it is also how it looks from the outside -- at least, it's looked that way to me, and I imagine many others who have been concerned about SI focusing on rationality and fanfiction are coming from a similar perspective. It may be the case that without the object-level benefits, the boost to MIRI's credibility from being seen to work on the actual technical problem wouldn't justify the expense of doing so, but whether or not it would be enough to justify the investment by itself, I think it's a really significant consideration.

[ETA: Of course, in the counterfactual where working on the object problem actually isn't that important, you could try to explain this to people and maybe that would work. But since I think that it is actually important, I don't particularly expect that option to be available.]

Comment author: Kaj_Sotala 06 June 2013 06:20:55PM *  1 point [-]

Yes. I've had plenty of conversations with people who were unimpressed with MIRI, in part because the organization looked like it was doing nothing but idle philosophy. (Of course, whether that was the true rejection of the skeptics in question is another matter.)

Comment author: JonahSinick 06 June 2013 04:08:40PM *  0 points [-]

I understand your position, but believe that your concerns are unwarranted, though I don't think that this is obvious.

Comment author: lukeprog 06 June 2013 05:22:57PM 7 points [-]

If I gave you a list of people who in fact expressed interest but then, when there were no technical problems for them to work on, "wandered off to somewhere where they can actually do something that feels more real," would you change your mind? (I may not be able to produce such a list, because I wasn't writing down people's names as they wandered away, but I might be able to reconstruct it.)

Comment author: [deleted] 06 June 2013 05:33:55PM *  0 points [-]

Sounds like me two years ago, before I committed to finishing my doctorate. Oops.

Comment author: JonahSinick 06 June 2013 05:33:21PM 0 points [-]

I don't doubt you: I have different reasons for believing Kaj's concerns to be unwarranted:

  1. It's not clear to me that offering people problems in mathematical logic is a good way to get people to work on Friendly AI problems. I think that the mathematical logic work is pretty far removed from the sort of work that will be needed for friendliness.

  2. I believe that people who are interested in AI safety will not forget about AI safety entirely, independently of whether they have good problems to work on now.

  3. I believe that people outside of MIRI will organically begin to work on AI safety without MIRI's advocacy when AI is temporally closer.

Comment author: lukeprog 06 June 2013 01:00:17PM *  6 points [-]

cost-effective relative to other options on the table

BTW, I spent a large fraction of the first few months of 2013 weighing FAI research vs. other options before arriving at MIRI's 2013 strategy (which focuses heavily on FAI research). So it's not as though I think FAI research is obviously the superior path, and it's also not as though we haven't thought through all these different options, and gotten feedback from dozens of people about those options, and so on.

Also note that MIRI did, in fact, decide to focus on (1) spreading rationality, and (2) building a community of people who care about rationality, the far future, and x-risk, before turning its head to FAI research: see (in chronological order) the Singularity Summit, Less Wrong and CFAR.

But the question of which interventions are most cost effective (given astronomical waste) is a huge and difficult topic, one that will require thousands of hours to examine properly. Building on Beckstead and Bostrom, I've tried to begin that examination here. Before jumping over to that topic, I wonder: do you now largely accept the case Eliezer made for this latest paper as an important first step on an important sub-problem of the Friendly AI problem? And if not, why not?

Comment author: JonahSinick 06 June 2013 04:06:52PM *  0 points [-]

BTW, I spent a large fraction of the first few months of 2013 weighing FAI research vs. other options before arriving at MIRI's 2013 strategy (which focuses heavily on FAI research). So it's not as though I think FAI research is obviously the superior path, and it's also not as though we haven't thought through all these different options, and gotten feedback from dozens of people about those options, and so on.

My comments were addressed at Eliezer's paper specifically, rather than MIRI's general strategy, or your own views.

Also note that MIRI did, in fact, decide to focus on (1) spreading rationality, and (2) building a community of people who care about rationality, the far future, and x-risk, before turning its head to FAI research: see (in chronological order) the Singularity Summit, Less Wrong and CFAR.

Sure – what I'm thinking about is cost-effectiveness at the margin.

Before jumping over to that topic, I wonder: do you now largely accept the case Eliezer made for this latest paper as an important first step on an important sub-problem of the Friendly AI problem? And if not, why not?

Based on Eliezer's recent comments, my impression is that Eliezer is not making such a case, and is rather making a case for the paper being of sociological/motivational value. Is your understanding different?

Comment author: Eliezer_Yudkowsky 06 June 2013 08:08:29PM 2 points [-]

Based on Eliezer's recent comments, my impression is that Eliezer is not making such a case, and is rather making a case for the paper being of sociological/motivational value.

No, that's not what I've been saying at all.

I'm sorry if this seems rude in some sense, but I need to inquire after your domain knowledge at this point. What is your level of mathematical literacy and do you have any previous acquaintance with AI problems? It may be that, if we're to proceed on this disagreement, MIRI should try to get an eminent authority in the field to briefly confirm basic, widespread, and correct ideas about the relevance of doing math to AI, rather than us trying to convince you of that via object-level arguments that might not be making any sense to you.

By 'the relevance of math to AI' I don't mean mathematical logic, I mean the relevance of trying to reduce an intuitive concept to a crisp form. In this case, like it says in the paper and like it says in the LW post, FOL is being used not because it's an appropriate representational fit to the environment... though as I write this, I realize that may sound like random jargon on your end... but because FOL has a lot of standard machinery for self-reflection of which we could then take advantage, like the notion of Godel numbering or ZF proving that every model entails every tautology... which probably doesn't mean anything to you either. But then I'm not sure how to proceed; if something can't be settled by object-level arguments then we probably have to find an authority trusted by you, who knows about the (straightforward, common) idea of 'crispness is relevant to AI' and can quickly skim the paper and confirm 'this work crispifies something about self-modification that wasn't as crisp before' and testify that to you. This sounds like a fair bit of work, but I expect we'll be trying to get some large names to skim the paper anyway, albeit possibly not the Early Draft for that.

Comment author: [deleted] 06 June 2013 08:15:27PM 2 points [-]

I need to inquire after your domain knowledge at this point. What is your level of mathematical literacy and do you have any previous acquaintance with AI problems?

Quick Googling suggest someone named "Jonah Sinick" is a mathematician in number theory. It appears to be the same person.

Comment author: JonahSinick 06 June 2013 09:47:00PM 2 points [-]

No, that's not what I've been saying at all.

Ok, I look forward to better understanding :-)

What is your level of mathematical literacy and do you have any previous acquaintance with AI problems?

I have a PhD in pure math, I know the basic theory of computation and of computational complexity, but I don't have deep knowledge of these domains, and I have no acquaintance with AI problems.

It may be that, if we're to proceed on this disagreement, MIRI should try to get an eminent authority in the field to briefly confirm basic, widespread, and correct ideas about the relevance of doing math to AI, rather than us trying to convince you of that via object-level arguments that might not be making any sense to you.

Yes, this could be what's most efficient. But my sense is that our disagreement is at a non-technical level rather than at a technical level.

My interpretation of

The paper is meant to be interpreted within an agenda of "Begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty"; not as "We think this Godelian difficulty will block AI", nor "This formalism would be good for an actual AI", nor "A bounded probabilistic self-modifying agent would be like this, only scaled up and with some probabilistic and bounded parts tacked on".

was that you were asserting only very weak confidence in the relevance the paper to AI safety, and that you were saying "Our purpose in writing this was to do something that could conceivably have something to do with AI safety, so that people take notice and start doing more work on AI safety." Thinking it over, I realize that you might have meant "We believe that this paper is an important first step on a technical level. Can you clarify here?

If the latter interpretation is right, I'd recur to my question about why the operationalization is a good one, which I feel that you still haven't addressed, and which I see as crucial.

Comment author: ESRogs 06 June 2013 10:08:18PM *  1 point [-]

Why do you think that

Begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty

is cost-effective relative to other options on the table?

...

BTW, I spent a large fraction of the first few months of 2013 weighing FAI research vs. other options before arriving at MIRI's 2013 strategy (which focuses heavily on FAI research). So it's not as though I think FAI research is obviously the superior path, and it's also not as though we haven't thought through all these different options, and gotten feedback from dozens of people about those options, and so on.

My comments were addressed at Eliezer's paper specifically, rather than MIRI's general strategy, or your own views.

Do you not see that what Luke wrote was a direct response to your question?

There are really two parts to the justification for working on the this paper: 1) Direct FAI research is a good thing to do now. 2) This is a good problem to work on within FAI research. Luke's comment gives context explaining why MIRI is focusing on direct FAI research, in support of 1. And it's clear from what you list as other options that you weren't asking about 2.

It sounds like what you want is for this problem to be compared on its own to every other possible intervention. In theory that would be the rational thing to do to ensure you were always doing the most cost-effective work on the margin. But that only makes sense if it's computationally practical to do that evaluation at every step.

What MIRI has chosen to do instead is to invest some time up front coming up with a strategic plan, and then follow through on that. This seem entirely reasonable to me.

Comment author: JonahSinick 06 June 2013 10:12:10PM *  -2 points [-]

...

If the probability is too small, then it isn't worth it. The activities that I mention plausibly reduce astronomical waste to a nontrivial degree. Arguing that you can do better than them requires an argument that establishes the expected impact of MIRI Friendly AI research on AI safety above a nontrivial threshold.

Do you not see that what Luke wrote was a direct response to your question?

Which question?

Luke's comment gives context explaining why MIRI is focusing on direct FAI research, in support of 1.

Sure, I acknowledge this.

It sounds like what you want is for this problem to be compared on its own to every other possible intervention. In theory that would be the rational thing to do to ensure you were always doing the most cost-effective work on the margin. But that only makes sense if it's computationally practical to do that evaluation at every step.

I don't think that it's computationally intractable to come up with better alternatives. Indeed, I think that there are a number of concrete alternatives that are better.

What MIRI has chosen to do instead is to invest some time up front coming up with a strategic plan, and then follow through on that. This seem entirely reasonable to me.

I wasn't disputing this. I was questioning the relevance of MIRI's current research to AI safety, not saying that MIRI's decision process is unreasonable.

Comment author: lukeprog 06 June 2013 05:17:04PM 0 points [-]

The way I'm using these words, my "this latest paper as an important first step on an important sub-problem of the Friendly AI problem" is equivalent to Eliezer's "begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty."

Comment author: JonahSinick 06 June 2013 05:26:08PM *  0 points [-]

Ok. I disagree that the paper is an important first step.

Because Eliezer is making an appeal based on psychological and sociological considerations, spelling out my reasoning requires discussion of what sorts of efforts are likely to impact the scientific community, and whether one can expect such research to occur by default. Discussing these requires discussion of psychology, sociology and economics, partly as related to whether the world's elites will navigate the creation of AI just fine.

I've described a little bit of my reasoning, and will be elaborating on it in detail in future posts.

Comment author: Eliezer_Yudkowsky 06 June 2013 06:40:23AM 7 points [-]

That sounds like a very long conversation if we're supposed to be giving quantitative estimates on everything. The qualitative version is just that this sort of thing can take a long time, may not parallelize easily, and can potentially be partially factored out to academia, and so it is wise to start work on it as soon as you've got enough revenue to support even a small team, so long as you can continue to scale your funding while that's happening.

This reply takes for granted that all astronomical benefits bottleneck through a self-improving AI at some point.

Comment author: JonahSinick 06 June 2013 03:53:11PM *  3 points [-]

Thanks for clarifying your position.

My understanding based on what you say is that the research in your paper is intended to spearhead a field of research, rather than to create something that will be directly used for friendliness in the first AI. Is this right?

If so, our differences are about the sociology of the scientific, technological and political infrastructure rather than about object level considerations having to do with AI.

Comment author: Eliezer_Yudkowsky 06 June 2013 08:01:53PM 4 points [-]

Sounds about right. You might mean a different thing from "spearhead a field of research" than I do, my phrasing would've been "Start working on the goddamned problem."

From your other comments I suspect that you have a rather different visualization of object-level considerations to do with AI and this is relevant to your disagreement.

Comment author: JonahSinick 06 June 2013 08:04:06PM 1 point [-]

Ok. I think that MIRI could communicate more clearly by highlighting this. My previous understanding had been that MIRI staff think that by default, one should expect to need to solve the Lob problem in order to build a Friendly AI. Is there anything in the public domain that would have suggested otherwise to me? If not, I'd suggest writing this up and highlighting it.

Comment author: lukeprog 06 June 2013 01:06:43PM 1 point [-]

takes for granted that all astronomical benefits bottleneck through a self-improving AI at some point

For readers who want to read more about this point, see FAI Research as Effective Altruism.

Comment author: elharo 07 June 2013 11:55:25AM *  0 points [-]

Other options on the table are not mutually exclusive. There is a lot of wealth and intellectual brain power in the world, and a lot of things to work on. We can't and shouldn't all work on one most important problem. We can't all work on the thousand most important problems. We can't even agree on what those problems are.

I suspect Eliezer has a comparative advantage in working on this type of AI research, and he's interested in it, so it makes sense for him to work on this. It especially makes sense to the extent that this is an area no one else is addressing. We're only talking about an expenditure of several careers and a few million dollars. Compared to the world economy, or even compared to the non-profit sector, this is a drop in the bucket.

Now if instead Eliezer was the 10,000th smart person working on string theory, or if there was an Apollo-style government-funded initiative to develop an FAI by 2019, then my estimate of the comparative advantage of MIRI would shift. But given the facts as they are, MIRI seems like a plausible use of the limited resources it consumes.

Comment author: JonahSinick 07 June 2013 03:42:32PM 1 point [-]

I suspect Eliezer has a comparative advantage in working on this type of AI research, and he's interested in it, so it makes sense for him to work on this.

If Eliezer feels that this is his comparative advantage then it's fine for him to work on this sort of research — I'm not advocating that such research be stopped. My own impression is that Eliezer has comparative advantage in spreading rationality and that he could have a bigger impact by focusing on doing so.

It especially makes sense to the extent that this is an area no one else is addressing. We're only talking about an expenditure of several careers and a few million dollars. Compared to the world economy, or even compared to the non-profit sector, this is a drop in the bucket.

I'm not arguing that such research shouldn't be funded. The human capital question is genuinely more dicey, insofar as I think that Eliezer has contributed substantial value through his work on spreading rationality, and my best guess is that the opportunity cost of not doing more is large.

Comment author: Martin-2 06 June 2013 03:32:57PM 0 points [-]

"...need to make billions of sequential self-modifications when humans don't need to" to do what? Exist, maximize utility, complete an assignment, fulfill a desire...? Some of those might be better termed as "wants" than "needs" but that info is just as important in predicting behavior.

Comment author: falenas108 06 June 2013 02:20:24PM 0 points [-]

Why do you think that an AI would need to make billions of sequential self-modifications when humans don't need to?

For starters, humans aren't able to make changes as easily as an AI can. We don't have direct access to our source code that we can change effortlessly, any change we make costs either time, money, or both.

Comment author: elharo 07 June 2013 12:00:33PM *  0 points [-]

That doesn't address the question. It says that an AI could more easily make self-modifications. It doesn't suggest that an AI needs to make such self-modifications. Human intelligence is an existence proof that human-level intelligence does not require "billions of sequential self-modifications". Whether greater than human intelligence requires it, in fact whether greater than human intelligence is even possible, is still an open question.

So I reiterate, "Why do you think that an AI would need to make billions of sequential self-modifications when humans don't need to?"

Comment author: bogdanb 07 June 2013 08:04:44PM *  1 point [-]

Human intelligence is an existence proof that human-level intelligence does not require "billions of sequential self-modifications". Whether greater than human intelligence requires it, in fact whether greater than human intelligence is even possible, is still an open question.

Why do you think that an AI would need to make billions of sequential self-modifications when humans don't need to?

Human intelligence required billions of sequential modifications (though not selfmodifications). An AI in general would not need self-modifications, but for a AGI it seems that it would be necessary. I don’t doubt a formal reasoning for the latter statement has been written by someone smarter than me before, but a very informal argument would be something like this:

If an AGI doesn’t need to self-modify, then that AGI is already perfect (or close enough that it couldn’t possibly matter). Since practically no software humans ever built was ever perfect in all respects, that seems exceedingly unlikely. Therefore, the first AGI would (very likely) need to be modified. Of course, at the begining it might be modified by humans (thus, not selfmodified), but the point of building AGI is to make it smarter than us. Thus, once it is smarter than us by a certain amount, it wouldn’t make sense for us (stupider intellects) to improve it (smarter intellect). Thus, it would need to self-modify, and do it a lot, unless by some ridiculously fortuitous accident of math (a) human intelligence is very close to the ideal, or (b) human intelligence will build something very close to the ideal on the first try.

It would be nice if those modifications would be things that are good for us, even if we can’t understand them.

Comment author: jsteinhardt 07 June 2013 02:28:27PM 0 points [-]

Depending on your math level, reading Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference by Judea Pearl might present you with a crisper idea of why it can be a good idea to formalize certain types of AI problems in general, and it would be a life-enriching experience, but I anticipate that's more effort than you'd want to put into this exact point.

FWIW, Jonah has a PhD in math and has probably read Pearl or a similar graphical models book.

(Not directly relevant to the conversation, but just trying to lower your probability estimate that Jonah's objections are naieve.)

Comment author: Nick_Beckstead 06 June 2013 05:10:52PM *  2 points [-]

I don't see it is a decisive point, one of "many weak arguments," but I think the analogy with human self-modification is relevant. I would like to see more detailed discussion of the issue.

Aspects of this that seem relevant to me:

  • Genetic and cultural modifications to human thinking patterns have been extremely numerous. If you take humanity as a whole as an entity doing self-modification on itself, there have been an extremely large number of successful self-modifications.
  • Genetic and cultural evolution have built humans individually capable of self-modification without stumbling over Lobian obstacles. Evolution and culture likely used relatively simple and easy search processes to do this, rather than ones that rely on very sophisticated mathematical insights. Analogously, one might expect that people will develop AGI in a way that overcomes these problems as well.
Comment author: Eliezer_Yudkowsky 06 June 2013 08:39:18PM 9 points [-]

Self-modification is to be interpreted to include 'directly editing one's own low-level algorithms using high-level deliberative process' but not include 'changing one's diet to change one's thought processes'. If you are uncomfortable using the word 'self-modification' for this please substitute a new word 'fzoom' which means only that and consider everything I said about self-modification to be about fzoom.

Humans wouldn't look at their own source code and say, "Oh dear, a Lobian obstacle", on this I agree, but this is because humans would look at their own source code and say "What?". Humans have no idea under what exact circumstances they will believe something, which comes with its own set of problems. The Lobian obstacle shows up when you approach things from the end we can handle, namely weak but well-defined systems which can well-define what they will believe, whereas human mathematicians are stronger than ZF plus large cardinals but we don't know how they work or what might go wrong or what might change if we started editing neural circuit #12,730,889,136.

As Christiano's work shows, allowing for tiny finite variances of probability might well dissipate the Lobian obstacle, but that's the sort of thing you find out by knowing what a Lobian obstacle is.

Comment author: Nick_Beckstead 07 June 2013 12:49:58PM 2 points [-]

Self-modification is to be interpreted to include 'directly editing one's own low-level algorithms using high-level deliberative process' but not include 'changing one's diet to change one's thought processes'. If you are uncomfortable using the word 'self-modification' for this please substitute a new word 'fzoom' which means only that and consider everything I said about self-modification to be about fzoom.

Very helpful. This seems like something that could lead to a satisfying answer to my question. And don't worry, I won't engage in a terminological dispute about "self-modification."

Can you clarify a bit what you mean by "low-level algorithms"? I'll give you a couple of examples related to what I'm wondering about.

Suppose I am working with a computer to make predictions about the the weather, and we consider the operations of the computer along with my brain as a single entity for the purposes testing whether the Lobian obstacles you are thinking of arise in practice. Now suppose I make basic modifications to the computer, expecting that the joint operation of my brain with the computer will yield improved output. This will not cause me to trip over Lobian obstacles. Why does whatever concern you have about the Lob problem predict that it would not, but also predict that future AIs might stumble over the Lob problem?

Another example. Humans learn different mental habits without stumbling over Lobian obstacles, and they can convince themselves that adopting the new mental habits is an improvement. Some of these are more derivative ("Don't do X when I have emotion Y") and others are perhaps more basic ("Try to update through explicit reasoning via Bayes' Rule in circumstances C"). Why does whatever concern you have about the Lob problem predict that humans can make these modifications without stumbling, but also predict that future AIs might stumble over the Lob problem?

If the answer to both examples is "those are not cases of directly editing one's low-level algorithms using high-level deliberative processes," can you explain why your concern about Lobian issues only arises in that type of case? This is not me questioning your definition of "fzoom," it is my asking why Lobian issues only arise when you are worrying about fzoom.

The first example is related to what I had in mind when I talked about fundamental epistemic standards in a previous comment:

Part of where I'm coming from on the first question is that Lobian issues only seem relevant to me if you want to argue that one set of fundamental epistemic standards is better than another, not for proving that other types of software and hardware alterations (such as building better arms, building faster computers, finding more efficient ways to compress your data, finding more efficient search algorithms, or even finding better mid-level statistical techniques) would result in more expected utility. But I would guess that once you have an agent operating with a minimally decent fundamental epistemic standards, you just can't prove that altering the agent's fundamental epistemic standards would result in an improvement. My intuition is that you can only do that when you have an inconsistent agent, and in that situation it's unclear to me how Lobian issues apply.

Comment author: Vaniver 06 June 2013 08:19:57PM 0 points [-]

Genetic and cultural evolution have built humans individually capable of self-modification without stumbling over Lobian obstacles.

Well, part of this is because modern humans are monstrous in the eyes of many pre-modern humans. To them, the future has been lost because they weren't using a self-modification procedure that provably preserved their values.