Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Tiling Agents for Self-Modifying AI (OPFAI #2)

53 Post author: Eliezer_Yudkowsky 06 June 2013 08:24PM

An early draft of publication #2 in the Open Problems in Friendly AI series is now available:  Tiling Agents for Self-Modifying AI, and the Lobian Obstacle.  ~20,000 words, aimed at mathematicians or the highly mathematically literate.  The research reported on was conducted by Yudkowsky and Herreshoff, substantially refined at the November 2012 MIRI Workshop with Mihaly Barasz and Paul Christiano, and refined further at the April 2013 MIRI Workshop.

Abstract:

We model self-modication in AI by introducing 'tiling' agents whose decision systems will approve the construction of highly similar agents, creating a repeating pattern (including similarity of the offspring's goals).  Constructing a formalism in the most straightforward way produces a Godelian difficulty, the Lobian obstacle.  By technical methods we demonstrate the possibility of avoiding this obstacle, but the underlying puzzles of rational coherence are thus only partially addressed.  We extend the formalism to partially unknown deterministic environments, and show a very crude extension to probabilistic environments and expected utility; but the problem of finding a fundamental decision criterion for self-modifying probabilistic agents remains open.

Commenting here is the preferred venue for discussion of the paper.  This is an early draft and has not been reviewed, so it may contain mathematical errors, and reporting of these will be much appreciated.

The overall agenda of the paper is introduce the conceptual notion of a self-reproducing decision pattern which includes reproduction of the goal or utility function, by exposing a particular possible problem with a tiling logical decision pattern and coming up with some partial technical solutions.  This then makes it conceptually much clearer to point out the even deeper problems with "We can't yet describe a probabilistic way to do this because of non-monotonicity" and "We don't have a good bounded way to do this because maximization is impossible, satisficing is too weak and Schmidhuber's swapping criterion is underspecified."  The paper uses first-order logic (FOL) because FOL has a lot of useful standard machinery for reflection which we can then invoke; in real life, FOL is of course a poor representational fit to most real-world environments outside a human-constructed computer chip with thermodynamically expensive crisp variable states.

As further background, the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each self-modification, and that self-modification will (at least in initial stages) take place within the highly deterministic environment of a computer chip.  This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals).  Mathematical proofs have the property that they are as strong as their axioms and have no significant conditionally independent per-step failure probability if their axioms are semantically true, which suggests that something like mathematical reasoning may be appropriate for certain particular types of self-modification during some developmental stages.

Thus the content of the paper is very far off from how a realistic AI would work, but conversely, if you can't even answer the kinds of simple problems posed within the paper (both those we partially solve and those we only pose) then you must be very far off from being able to build a stable self-modifying AI.  Being able to say how to build a theoretical device that would play perfect chess given infinite computing power, is very far off from the ability to build Deep Blue.  However, if you can't even say how to play perfect chess given infinite computing power, you are confused about the rules of the chess or the structure of chess-playing computation in a way that would make it entirely hopeless for you to figure out how to build a bounded chess-player.  Thus "In real life we're always bounded" is no excuse for not being able to solve the much simpler unbounded form of the problem, and being able to describe the infinite chess-player would be substantial and useful conceptual progress compared to not being able to do that.  We can't be absolutely certain that an analogous situation holds between solving the challenges posed in the paper, and realistic self-modifying AIs with stable goal systems, but every line of investigation has to start somewhere.

Parts of the paper will be easier to understand if you've read Highly Advanced Epistemology 101 For Beginners including the parts on correspondence theories of truth (relevant to section 6) and model-theoretic semantics of logic (relevant to 3, 4, and 6), and there are footnotes intended to make the paper somewhat more accessible than usual, but the paper is still essentially aimed at mathematically sophisticated readers.

Comments (257)

Comment author: Eliezer_Yudkowsky 06 June 2013 08:30:27PM 21 points [-]

(Reply to.)

My previous understanding had been that MIRI staff think that by default, one should expect to need to solve the Lob problem in order to build a Friendly AI.

By default, if you can build a Friendly AI you were not troubled by the Lob problem. That working on the Lob Problem gets you closer to being able to build FAI is neither obvious nor certain (perhaps it is shallow to work on directly, and those who can build AI resolve it as a side effect of doing something else) but everything has to start somewhere. Being able to state crisp difficulties to work on is itself rare and valuable, and the more you engage with a problem like stable self-modification, the more you end up knowing about it. Engagement in a form where you can figure out whether or not your proof goes through is more valuable than engagement in the form of pure verbal arguments and intuition, although the latter is significantly more valuable than not thinking about something at all.

Reading through the whole Tiling paper might make this clearer; it spends the first 4 chapters on the Lob problem, then starts introducing further concepts once the notion of 'tiling' has been made sufficiently crisp, like the Vingean principle or the naturalistic principle, and then an even more important problem with tiling probabilistic agents (Ch. 7) and another problem with tiling bounded agents (Ch. 8), neither of which are even partially solved in the paper, but which would've made a lot less sense - would not have been reified objects in the reader's mind - if the paper hadn't spent all that time on the mathematical machinery needed to partially solve the Lob problem in logical tiling, which crispifies the notion of a 'problem with tiling'.

Comment author: elharo 07 June 2013 12:13:16PM 11 points [-]

I feel like in this comment you're putting your finger on a general principal of instrumental rationality that goes beyond the specific issue at hand, and indeed beyond the realm of mathematical proof. It might be worth a post on "engagement" at some point.

Specifically, I note similar phenomena in software development where sometimes what I start working on ends up being not at all related to the final product, but nonetheless sets me off on a chain of consequences that lead me to the final, useful product. And I too experience the annoyance of managers insisting that I lay out a clear path from beginning to end, when I don't yet know what the territory looks like or sometimes even what the destination is.

As Eisenhower said, "Plans are worthless, but planning is everything."

Comment author: Kawoomba 06 June 2013 08:47:34PM 1 point [-]

By default, if you can build a Friendly AI you were not troubled by the Lob problem.

If you can build a Friendly AI which can self-modify. FOOM-able algorithms are an important but not the only avenue to AGI, friendly or otherwise. Also, the "AGI"-class doesn't necessarily imply superhuman cognition. Humans are intelligent agents for which the Löb problem has little bearing since we can't (or don't) self-modify to such a large degree quite yet.

Comment author: ESRogs 06 June 2013 09:35:14PM 2 points [-]

Also, the "AGI"-class doesn't necessarily imply superhuman cognition.

Yes, but Friendly AI does. Nobody said you needed to solve the Lob problem to build an AGI. What we're talking about here is something more specific than that.

Comment author: jsteinhardt 07 June 2013 01:52:03PM 0 points [-]

Any agent that takes in information about the world is implicitly self-modifying all the time.

Comment author: [deleted] 24 June 2013 09:52:30PM *  1 point [-]

Here's a distinction you could make: an AI is self-modifying if it is effectively capable of making any change to its source code at any time, and non-self-modifying if it is not. (The phrase "capable of" is vague, of course.)

I can imagine non-self-modifying AI having an advantage over self-modifying AI, because it might be possible for an NSM AI to be protected from its own stupidity, so to speak. If the AI were to believe that overwriting all of its beliefs with the digits of pi is a good idea, nothing bad would happen, because it would be unable to do that. Of course, these same restrictions that make the AI incapable of breaking itself might also make it incapable of being really smart.

I believe I've heard someone say that any AI capable of being really smart must be effectively self-modifying, because being really smart involves the ability to make arbitrary calculations, and if you can make arbitrary calculations, then you're not restricted. My objection is that there's a big difference between making arbitrary calculations and running arbitrary code; namely, the ability to run arbitrary code allows you to alter other calculations running on the same machine.

Comment author: [deleted] 25 June 2013 01:52:04AM 2 points [-]

Lemme expand on my thoughts a little bit. I imagine a non-self-modifying AI to be made of three parts: a thinking algorithm, a decision algorithm, and a belief database. The thinking and decision algorithms are immutable, and the belief database is (obviously) mutable. The supergoal is coded into the decision algorithm, so it can't be changed. (Problem: the supergoal only makes sense in the concept of certain beliefs, and beliefs are mutable.) The contents of the belief database influence the thinking algorithm's behavior, but they don't determine its behavior.

The ideal possibility is that we can make the following happen:

  • The belief database is flexible enough that it can accommodate all types of beliefs from the very beginning. (If the thinking algorithm is immutable, it can't be updated to handle new types of beliefs.)
  • The thinking algorithm is sufficiently flexible that the beliefs in the belief database can lead the algorithm in the right directions, producing super-duper intelligence.
  • The thinking algorithm is sufficiently inflexible that the beliefs in the belief database cannot cause the algorithm to do something really bad, producing insanity.
  • The supergoal remains meaningful in the context of the belief database regardless of how the thinking algorithm ends up behaving.

(My ideas haven't been taken seriously in the past, and I have no special knowledge in this area, so it's likely that my ideas are worthless. They feel valuable to me, however.)

Comment author: Kawoomba 07 June 2013 01:59:40PM 0 points [-]

For a somewhat contrived and practically less relevant notion of self modifying. You could regard a calculator as being self modifying, not very relevantly.

Comment author: jsteinhardt 07 June 2013 02:30:32PM 2 points [-]

It would be useful to understand why we think a calculator doesn't "count" as self-modification. In particular, we don't think calculators run into the Lob obstacle, so what is the difference between calculators and AIs?

Comment author: Kawoomba 07 June 2013 03:41:37PM *  0 points [-]

As always in such matters, think of Turing Machines. If the transition function isn't modified, the state of the Turing Machine may change. However, it'll always be in a internal state prespecified in its transition function, it won't get unknown or unknowable new entries in its action table.

Universal Turing Machines are designed to change, to take their transition function from the input tape as input, a prime example of self-modification. But they as well -- having read their new transition function from their input tape -- will go along their business as usual without further changes to their transition function. (You can of course program them to later continue changing their action table, but the point is that such changes to its own action table -- to its own behavior -- are clearly delineated from just contents in its memory / work tape.)

A calculator or a non-self-modifying AI will undergo changes in its memory, but it'll never endeavor to define new internal states, with new rules, on its own. It'll memorize whether you've entered "0.7734" in its display, but it'll only perform its usual actions on that number. A game of tetris will change what blocks it displays on your screen, but that won't modify its rules.

There may be accidental modifications (bugs etc.) leading to unknown states and behavior, but I wouldn't usefully call that an active act of self-modification. (It's not a special case to guard against, other than by the usual redundancy / using checksums. But that's no more FAI research than rather the same constraints as when working with e.g. real time or mission critical applications.)

Comment author: philh 07 June 2013 11:24:06PM 2 points [-]

I don't think this is quite there. A UTM is itself a TM, and its transition function is fixed. But it emulates a TM, and it could instead emulate a TM-with-variable-transition-function, and that thing would be self-modifying in a deeper sense than an emulation of a standard TM.

But it's still not obvious to me how to formalize this, because (among other problems) you can replace an emulated TMWVTF with an emulated UTM which in turn emulates a TMWVTF...

Comment author: JonahSinick 06 June 2013 09:48:47PM 0 points [-]

See the last paragraph of this comment highlighting my question about the relevance of the operationalization.

Comment author: Eliezer_Yudkowsky 06 June 2013 10:09:32PM 3 points [-]

I feel like I'm not clear on what question you're asking. Can you give an example of what a good answer would look like, maybe using Xs and Ys since I can hardly ask you to come up with an actual good argument?

Comment author: JonahSinick 06 June 2013 10:26:47PM *  5 points [-]

There are many possible operationalizations of a self-modifying AI. For example,

  • One could model a self-improving AI as the Chinese economy (which is in some sense a self-improving powerful optimization process).

  • One could model a self-improving AI as a chess playing computer program which uses a positional weighting system to choose which moves to make, and which analyzes which weighting heuristics statistically lead to more winning games, in order to improve its positional weighting system.

My reaction to your paper is similar to what my reaction would be to a paper that studies ways to make sure that the Chinese economy doesn't change in such as way that so that GDP start dropping, or ways to make sure that the chess program doesn't self-modify to get worse and worse at winning chess games rather than better and better.

It's conceivable that such a paper would be useful for building a self-improving AI, but a priori I would bet very heavily that activities such as

  • Working to increase rationality
  • Spreading concern for global welfare
  • Building human capital of people who are concerned about global welfare

are more cost-effective activities ways for reducing AI risk than doing such research.

I'm looking for an argument for why the operationalization in the paper is more likely to be relevant to creating safe AI than modeling a self-improving AI as the Chinese economy, or as the aforementioned chess program, or than a dozen other analogous operationalizations that I could make up.

Comment author: Eliezer_Yudkowsky 27 June 2013 05:09:35AM 18 points [-]

One could model a self-improving AI as the Chinese economy (which is in some sense a self-improving powerful optimization process)...

I'm looking for an argument for why the operationalization in the paper is more likely to be relevant to creating safe AI than modeling a self-improving AI as the Chinese economy, or as the aforementioned chess program, or than a dozen other analogous operationalizations that I could make up.

If somebody wrote a paper showing how an economy could naturally build another economy while being guaranteed to have all prices derived from a constant set of prices on intrinsic goods, even as all prices were set by market mechanisms as the next economy was being built, I'd think, "Hm. Interesting. A completely different angle on self-modification with natural goal preservation."

I'm surprised at the size of the apparent communications gap around the notion of "How to get started for the first time on a difficult basic question" - surely you can think of mathematical analogies to research areas where it would be significant progress just to throw out an attempted formalization as a base point?

There are all sorts of disclaimers plastered onto the paper about how this only works because logic is monotonic, probabilistic reasoning is not monotonic etcetera. The point is to have a way, any way of just getting started on stable self-modification even though we know the particular exact formalism doesn't directly work for probabilistic agents. Once you do that you can at least state what it is you can't do. A paper on a self-replicating economy with a stable set of prices on intrinsic goods would likewise be something you could look at and say, "But this formally can't do X, because Y" and then you would know more about X and Y then you did previously. Being able to say, "But the verifier-suggester separation won't work for expected utility agents because probabilistic reasoning is not monotonic" means you've gotten substantially further into FAI work than when you're staring dumbly at the problem.

AIXI was conceptual progress on AGI, and especially public discussion of AGI, because it helped people like me say much more formally all the things that we didn't like about AIXI, like the anvil problem or AIXI seizing control of its reward channel or AIXI only being able to represent utility functions of sensory data rather than environmental ontologies. Someone coming up with a list of 5 key properties the tiling architecture does not have would be significant progress, and I would like to specifically claim that as an intended, worthwhile, fully-pays-back-the-effort positive consequence if it happens - and this is not me covering all the bases in case of disappointment, the paper was presented in a way consonant with that goal and not in a way consonant with claiming one-trueness.

I don't understand the model you have of FAI research where this is not the sort of thing that you do at the beginning.

Comment author: JonahSinick 27 June 2013 06:49:24PM *  2 points [-]

Thanks for continuing to engage.

I described my position in another comment. To reiterate and elaborate:

  1. My current best guess is that there are so many unrelated potential models for AI (relative to the information that we currently have) that the probability of FAI work on a single one of them ending up being relevant is tiny. In order to make a compelling argument for the relevance of MIRI's work on the Lob problem, you have to argue that the model used isn't only one of, e.g. 10^10 distinct models of AI with similar probability of being realized in practice.

  2. One could argue that the problem is sufficiently important so that one should work on it even if the probability of the work being relevant is tiny. But there are other interventions on the table. You've made major contributions by spreading rationality and by creating a community for people who are interested in global welfare to network and collaborate with one another. These things probably substantially reduce astronomical waste (in expectation). In order to argue in favor of MIRI's FAI research being optimal philanthropy, you have to argue that the probability of the research being relevant is sufficiently great so that its expected value outweighs the expected value of these other activities.

  3. One could argue that if there are in fact so many models for AI then we're doomed anyway, so we should assume that there aren't so many models. But rather than trying to work on the models that we think most relevant now, we can wait until it becomes more clear what AGI will look like in practice, and then develop FAI for that type of AI. Whether or not this is feasible is of course related to the question of whether the world's elites will navigate the creation of AI just fine. I think that there are good reasons to think that the probability of this is pretty high, and that the most leveraged efforts are getting good people in positions of future influence rather than doing FAI research now. Your work on rationality training and community building can help, and already has helped a lot with this.

Comment author: Eliezer_Yudkowsky 27 June 2013 07:21:13PM 8 points [-]

Neither 2 nor 3 is the sort of argument I would ever make (there's such a thing as an attempted steelman which by virtue of its obvious weakness doesn't really help). You already know(?) that I vehemently reject all attempts to multiply tiny probabilities by large utilities in real-life practice, or to claim that the most probable assumption about a background epistemic question leads to a forgone doom and use this to justify its improbable negation. The part at which you lost me is of course part 1.

I still don't understand what you could be thinking here, and feel like there's some sort of basic failure to communicate going on. I could guess something along the lines of "Maybe Jonah is imagining that Friendly AI will be built around principles completely different from modern decision theory and any notion of a utility function..." (but really, is something like that one of just 10^10 equivalent candidates?) "...and more dissimilar to that than logical AI is from decision theory" (that's a lot of dissimilarity but we already demonstrated conceptual usefulness over a gap that size). Still, that's the sort of belief someone might end up with if their knowledge of AI was limited to popular books extolling the wonderful chaos of neural networks, but that idea is visibly stupid so my mental model of Anna warns me not to attribute it to you. Or I could guess, "Maybe Jonah is Holden-influenced and thinks that all of this discussion is irrelevant because we're going to build a Google Maps AGI", where in point of fact it would be completely relevant, not a tiniest bit less relevant, if we were going to build a planning Oracle. (The experience with Holden does give me pause and make me worry that EA people may think they already know how to build FAI using their personal wonderful idea, just like vast numbers of others think they already know how to build FAI.) But I still can't think of any acceptable steel version of what you mean, and I say again that it seems to me that you're saying something that a good mainstream AI person would also be staring quizzically at.

What would be one of the other points in the 10^10-sized space? If it's something along the lines of "an economic model" then I just explained why if you did something analogous with an economic model it could also be interesting progress, just as AIXI was conceptually important to the history of ideas in the field. I could explain your position by supposing that you think that mathematical ideas never generalize across architectures and so only analyzing the exact correct architecture of a real FAI could be helpful even at the very beginning of work, but this sounds like a visibly stupid position so the model of Anna in my head is warning me not to attribute it to you. On the other hand, some version of, "It is easier to make progress than Jonah thinks because the useful generalization of mathematical ideas does not require you to select correct point X out of 10^10 candidates" seems like it would almost have to be at work here somewhere.

I seriously don't understand what's going on in your head here. It sounds like any similar argument should Prove Too Much by showing that no useful work or conceptual progress could have occurred due to AI work in the last 60 years because there would be 10^10 other models for AI. Each newly written computer program is unique but the ideas behind them generalize, the resulting conceptual space can be usefully explored, that's why we don't start over with every new computer program. You can do useful things once you've collected enough treasure nuggets and your level of ability builds up, it's not a question of guessing the one true password out of 10^10 tries with nothing being progress until then. This is true on a level of generality which applies across computer science and also to AI and also to FAI and also to decision theory and also to math. Everyone takes this for granted as an obvious background fact of doing research which is why I would expect a good mainstream AI person to also be staring quizzically at your statements here. I do not feel like the defense I'm giving here is in any way different from the defense I'd give of a randomly selected interesting AI paper if you said the same thing about it. "That's just how research works," I'd say.

Please amplify point 1 in much greater detail using concrete examples and as little abstraction as possible.

Comment author: JonahSinick 27 June 2013 09:38:04PM 3 points [-]

I continue to appreciate your cordiality.

A number of people have recently told me that they have trouble understanding me unless I elaborate further, because I don't spell out my reasoning in sufficient detail. I think that this is more a matter of the ideas involved being complicated, and there being a lot of inferential distance, than it is lack of effort on my part, but I can see how it would be frustrating to my interlocutors. It seems that I'm subject to the illusion of transparency. I appreciate your patience.

Neither 2 nor 3 is the sort of argument I would ever make (there's such a thing as an attempted steelman which by virtue of its obvious weakness doesn't really help). You already know(?) that I vehemently reject all attempts to multiply tiny probabilities by large utilities in real-life practice, or to claim that the most probable assumption about a background epistemic question leads to a forgone doom and use this to justify its improbable negation. The part at which you lost me is of course part 1.

I know that you've explicitly disavowed arguments of the type in my points 2 and 3. My reason for bringing them up is to highlight the importance of addressing point 1: to emphasize that it doesn't suffice to say "the problem is important and we have to get started on it somehow." I recognize that we have very different implicit assumptions on point 1, and that that's where the core of the disagreement lies.

I still don't understand what you could be thinking here, and feel like there's some sort of basic failure to communicate going on. I could guess something along the lines of "Maybe Jonah is imagining that Friendly AI will be built around principles completely different from modern decision theory and any notion of a utility function..." (but really, is something like that one of just 10^10 equivalent candidates?) "...and more dissimilar to that than logical AI is from decision theory" (that's a lot of dissimilarity but we already demonstrated conceptual usefulness over a gap that size). Still, that's the sort of belief someone might end up with if their knowledge of AI was limited to popular books extolling the wonderful chaos of neural networks, but that idea is visibly stupid so my mental model of Anna warns me not to attribute it to you. Or I could guess, "Maybe Jonah is Holden-influenced and thinks that all of this discussion is irrelevant because we're going to build a Google Maps AGI", where in point of fact it would be completely relevant, not a tiniest bit less relevant, if we were going to build a planning Oracle. (The experience with Holden does give me pause and make me worry that EA people may think they already know how to build FAI using their personal wonderful idea, just like vast numbers of others think they already know how to build FAI.) But I still can't think of any acceptable steel version of what you mean, and I say again that it seems to me that you're saying something that a good mainstream AI person would also be staring quizzically at.

There's essentially only one existing example of an entity with general intelligence: a human. I think that our prior should be that the first AGI will have internal structure analogous to that of a human. Here I'm not suggesting that an AGI will have human values by default: I'm totally on board with your points about the dangers of anthropomorphization in that context. Rather, what I mean is that I envisage the first AGI as having many interacting specialized modules, rather than a mathematically defined utility function.

There are serious dangers of such an entity having values that are orthogonal to humans, and serious dangers of value drift. (Your elegant article Why does power corrupt?, has some relevance to the latter point.) But it seems to me that the measures that one would want to take to prevent humans' goals changing seem completely different from the sorts of measures that might emerge from MIRI's FAI research.

I'll also highlight a comment of Nick Beckstead, which you've already seen and responded to. I didn't understand your response.

I should clarify that I don't have high confidence that the first AGI will develop along these lines. But it's my best guess, and it seems much more plausible to me than models of the type in your paper.

It sounds like any similar argument should Prove Too Much by showing that no useful work or conceptual progress could have occurred due to AI work in the last 60 years because there would be 10^10 other models for AI.

The difference that I perceive between the two scenarios is the nature of the feedback loops in each case.

When one is chipping away at a problem incrementally, one has the capacity to experiment and use the feedback generated from experimentation to help one limit the search space. Based on what I know about the history of science, general relativity is one of the only successful theories that was created without lots of empirical investigation.

The engineers who designed the first bridges had trillions of combinations of design features and materials to consider a priori, the vast majority of which wouldn't work. But an empirical discovery like "material X is too weak to work within any design" greatly limits the search space, because you don't have to think further about any of the combinations involving material X. Similarly if one makes a discovery of the type "material Y is so strong that it'll work with any design." By making a series of such discoveries, one can hone in on a few promising candidates.

This is how I predict that the development of AGI will go. I think that the search space is orders of magnitude too large to think about in a useful way without a lot of experimentation, and that a priori we can't know what the first AGI will look like. I think that once it becomes more clear what the first AGI will look like, it will become much more feasible to make progress on AI safety.

Please amplify point 1 in much greater detail using concrete examples and as little abstraction as possible.

It'll take me a while to come up with a lot of concrete hypotheticals, but I'll get back to you on this.

Comment author: Eliezer_Yudkowsky 27 June 2013 10:06:20PM 6 points [-]

There's essentially only one existing example of an entity with general intelligence: a human. I think that our prior should be that the first AGI will have internal structure analogous to that of a human. Here I'm not suggesting that an AGI will have human values by default: I'm totally on board with your points about the dangers of anthropomorphization in that context. Rather, what I mean is that I envisage the first AGI as having many interacting specialized modules

Okay. This sounds like you're trying to make up your own FAI theory in much the same fashion as Holden (and it's different from Holden's, of course). Um, what I'd like to do at this point is take out a big Hammer of Authority and tell you to read "Artificial Intelligence: A Modern Approach" so your mind would have some better grist to feed on as to where AI is and what it's all about. If I can't do that... I'm not really sure where I could take this conversation. I don't have the time to personally guide you to understanding of modern AI starting from that kind of starting point. If there's somebody else you'd trust to tell you about AI, with more domain expertise, I could chat with them and then they could verify things to you. I just don't know where to take it from here.

On the object level I will quickly remark that some of the first attempts at heavier-than-air flying-machines had feathers and beaks and they did not work very well, that 'interacting specialized modules' is Selling Nonapples, that there is an old discussion in cognitive science about the degree of domain specificity in human intelligence, and that the idea that 'humans are the only example we have' is generally sterile, for reasons I've already written about but I can't remember the links offhand, hopefully someone else does. It might be in Levels of Organization in General Intelligence, I generally consider that pretty obsolete but it might be targeted to your current level.

Comment author: JonahSinick 27 June 2013 10:47:14PM *  2 points [-]

Okay. This sounds like you're trying to make up your own FAI theory in much the same fashion as Holden (and it's different from Holden's, of course).

Either of my best guess or Holden's best guess could be right, and so could lots of other ideas that we haven't thought of. My proposed conceptual framework should be viewed as one of many weak arguments.

The higher level point that I was trying to make is that [the conceptual framework implicit in view that the MIRI's current FAI research has a non-negligible chance of being relevant to AI safety] seems highly conjunctive. I don't mean this rhetorically at all – I genuinely don't understand why you think that we can make progress given how great the unknown unknowns are. You may be right, but justification of your view requires further argumentation.

Um, what I'd like to do at this point is take out a big Hammer of Authority and tell you to read "Artificial Intelligence: A Modern Approach" so your mind would have some better grist to feed on as to where AI is and what it's all about. If I can't do that... I'm not really sure where I could take this conversation. I don't have the time to personally guide you to understanding of modern AI starting from that kind of starting point. If there's somebody else you'd trust to tell you about AI, with more domain expertise, I could chat with them and then they could verify things to you. I just don't know where to take it from here.

A more diplomatic way of framing this would be something like:

"The book Artificial Intelligence: A Modern Approach has a discussion of current approaches to artificial intelligence. Are you familiar with the ideas therein? If not, I'd suggest that you take a look"

Putting that aside, based on conversations with a number of impressive people in machine learning, etc. who I know, my impression is that at the moment, there aren't strong contenders for research programs that could plausibly lead to AGI. I largely accept Luke's argument in his blog post on AI timelines, but this is based on the view that the speed of research is going to increase a lot over the coming years, rather than on the belief that any existing research programs have a reasonable chance of succeeding.

I'd be very interested in hearing about existing research programs that have a reasonable chance of succeeding.

Comment author: shminux 27 June 2013 10:36:17PM 0 points [-]

Just wondering why you see Jonah Sinick of high enough status to be worth explaining to what's been discussed on LW repeatedly. Or maybe I'm totally misreading this exchange.

Comment author: Vaniver 27 June 2013 10:14:50PM *  1 point [-]

I'll also highlight a comment of Nick Beckstead, which you've already seen and responded to. I didn't understand your response.

Let me try from a different angle.

With humans, we see three broad clusters of modification: reproduction, education, and chemistry. Different people are physically constructed in different ways, and so we can see evolution of human civilization by biological evolution of the humans inside it. The environments that people find themselves in or choose leave imprints on those people. Chemicals people ingest can change those people, such as with caffeine, alcohol, morphine, or heroin. (I would include 'changing your diet to change your thought processes' under chemical changes, but the chemical changes from becoming addicted to heroin and from not being creatine deficient look very different.)

For AIs, most of the modification that's interesting and new will look like the "chemistry" cluster. An AI modifying its source code will look a lot like a human injecting itself with a new drug that it just invented. (Nick_Beckstead's example of modifying the code of the weather computer is more like education than it is like chemistry.)

This is great because some drugs dramatically improve performance, and so a person on caffeine could invent a super nootropic, and then on the super nootropic invent a cure for cancer and an even better nootropic, and so on. This is terrifying because any drug that adjusts your beliefs or your decision-making algorithm (think of 'personality' as a subset of this) dramatically changes how you behave, and might do so for the worse. This is doubly terrifying because these changes might be irreversible- you might take a drug that gets rid of your depression by making you incapable of feeling desire, and then not have any desire to restore yourself! This is triply terrifying because the effects of the drug might be unknown- you might not be able to determine what a drug will do to you until after you take it, and by then it might be too late.

For humans this problem is mostly solved by trial and error followed by patternmatching- "coffee is okay, crack is not, because Colin is rich and productive and Craig is neither"- which is not useful for new drugs, and not useful for misclassified old drugs, and not very safe for very powerful systems. The third problem- that the effects might be unknown- is the sort of thing that proofs might help with, except there are some technical obstacles to doing that. The Lobstacle is a prominent theoretical one, and while it looks like there are lots of practical obstacles as well surmounting the theoretical obstacles should help with surmounting the practical obstacles.

Any sort of AGI that's able to alter its own decision-making process will have the ability to 'do chemistry on itself,' and one with stable values will need to have solved the problem of how to do that while preserving its values. (I don't think that humans have 'stable' values; I'd call them something more like 'semi-stable.' Whether or not this is a bug or feature is unclear to me.)

Comment author: JonahSinick 27 June 2013 10:59:49PM 1 point [-]

I understand where you're coming from, and I think that you correctly highlight a potential source of concern, and one which my comment didn't adequately account for. However:

  1. I'm skeptical that it's possible to create an AI based on mathematical logic at all. Even if an AI with many interacting submodules is dangerous, it doesn't follow that working on AI safety for an AI based on mathematical logic is promising.

  2. Humans can impose selective pressures on emergent AI's so as to mimic the process of natural selection that humans experienced.

Comment author: Kawoomba 06 June 2013 11:04:22PM *  7 points [-]

There are many possible operationalizations of a self-modifying AI

No doubt. And as of now, for none of them we're able to tell whether they are safe or not. There's insufficient rigor in the language, the formulizations aren't standardized or pinned down (in this subject matter). MIRI's work is creating and pinning down the milestones for how we'd even go about assessing self-modifying friendly AI in terms of goal stability, in mathematical language.

To have any operationalization of how some specific model of self-modification provably maintains some invariant would be a large step forward, the existence of other models of self-modification nonwithstanding. Safety cannot be proven for all approaches, because not all approaches are safe.

It's conceivable that such a paper would be useful for building a self-improving AI, but a priori I would bet very heavily that activities such as (Working to increase rationality, Spreading concern for global welfare Building human capital of people who are concerned about global welfare) are more cost-effective activities ways for reducing AI risk than doing such research.

Even if that were so, that's not MIRI's (or EY's) most salient comparative advantage (also: CFAR).

Comment author: JonahSinick 07 June 2013 12:26:56AM *  3 points [-]

To have any operationalization of how some specific model of self-modification provably maintains some invariant would be a large step forward, the existence of other models of self-modification nonwithstanding. Safety cannot be proven for all approaches, because not all approaches are safe.

My claim is that there are sufficiently many possible models for AI that given what we (the Less Wrong community, not necessarily AI researchers) know now, the probability of a given model being developed is tiny.

The actionable safety issues that would come up if the AI is like the Chinese economy would be very different from the actionable safety issues that would come up if the AI is like a self-improving chess playing program, which would be very different from the actionable safety issues that would come up if the AI is of the type that Eliezer's publication describes.

Given the paucity of information available about the design of the first AI, I don't think that the probability of doing safety research on a particular model being actionable is sufficiently high for such research to be warranted (relative to other available activities).

Even if that were so, that's not MIRI's (or EY's) most salient comparative advantage (also: CFAR).

  1. Eliezer made a major contribution to increasing rationality with his How To Actually Change Your Mind sequence, which improved the rationality of many people who I know, including myself.

  2. MIRI could engage in other AI safety activities, such as improving future forecasting.

  3. If an organization doesn't have a cost-effective activity to engage in, and the employees recognize this, then they can leave and do something else. Here I'm not claiming that this is in fact the case of MIRI, rather, I'm just responding to your argument.

  4. MIRI's staff could migrate to CFAR.

  5. Out of all of the high impact activities that MIRI staff could do, it's not clear to me that Friendly AI research is their comparative advantage.

Comment author: homunq 23 June 2013 02:35:56AM *  1 point [-]

Also, even if we accept that MIRI's comparative advantage has to do with having a clearer view of the Friendliness vs. UnFriendliness distinction, why wouldn't it be more effective for them to try to insure against an UnFriendly outcome by addressing the UnFriendliness already in the world today? For instance, corporate governance. Corporations' optimization powers are a tremendous source of human happiness, but their UnFriendly tendencies are clear. For now, corporations have only parasitic intelligence, and don't look particularly foomy, but if I had to bet on whether MIRI or Google/TenCent/Palantir/whatever was more likely to foom, there would be no contest.

[There are a bunch of assumptions embedded there. The principal ones are:

  1. If a corporation, as currently constituted, somehow went foom it would be likely to be UnFriendly
  2. If we were able to make it so corporations appeared more Friendly in their day-to-day actions, they would also become less likely to rush headlong into an UnFriendly foom.

I think 1 is pretty undeniable, but I could understand it if someone disagreed with 2.]

Comment author: Nick_Beckstead 06 June 2013 05:40:49PM *  9 points [-]

I am very glad to see MIRI taking steps to list open problems and explain why those problems are important for making machine intelligence benefit humanity.

I'm also struggling to see why this Lob problem is a reasonable problem to worry about right now (even within the space of possible AI problems). Basically, I'm skeptical that this difficulty or something similar to it will arise in practice. I'm not sure if you disagree, since you are saying you don't think this difficulty will "block AI." And if it isn’t going to arise in practice (or something similar to it), I’m not sure why this should be high on the priority list of general AI issues to think about it (edited to add: or why working on this problem now should be expected to help machine intelligence develop in a way that benefits humanity).

Some major questions I have are:

  • What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble? I promise not to interpret your answer as "Eliezer says this is probably going to happen."
  • Do you think that people building AGI in the future will stumble over Lob issues if MIRI doesn't work on those issues? If so, why?

Part of where I'm coming from on the first question is that Lobian issues only seem relevant to me if you want to argue that one set of fundamental epistemic standards is better than another, not for proving that other types of software and hardware alterations (such as building better arms, building faster computers, finding more efficient ways to compress your data, finding more efficient search algorithms, or even finding better mid-level statistical techniques) would result in more expected utility. But I would guess that once you have an agent operating with a minimally decent fundamental epistemic standards, you just can't prove that altering the agent's fundamental epistemic standards would result in an improvement. My intuition is that you can only do that when you have an inconsistent agent, and in that situation it's unclear to me how Lobian issues apply.

Part of where I'm coming from on the second question is that evolutionary processes made humans who seem capable of overcoming putative Lobian obstacles to self-modification. See my other comment for more detail. The other part has to do with basic questions about whether people will adequately prepare for AI by default.

Comment author: paulfchristiano 06 June 2013 09:14:33PM 10 points [-]

I think you are right that strategy work may be higher value. But I think you underestimate the extent to which (1) such goods are complements [granting for the moment the hypothesis that this kind of AI work is in fact useful], and (2) there is a realistic prospect of engaging in many such projects in parallel, and that getting each started is a bottleneck.

Part of where I'm coming from on the first question is that Lobian issues only seem relevant to me if you want to argue that one set of fundamental epistemic standards is better than another

As Wei Dai observed, you seem to be significantly understating the severity of the problem. We are investigating conditions under which an agent can believe that its own operation will lawfully lead to good outcomes, which is more or less necessary for reasonably intelligent, reasonably sane behavior given our current understanding.

Part of where I'm coming from on the second question is that evolutionary processes made humans who seem capable of overcoming putative Lobian obstacles to self-modification.

Compare to: "I'm not sure how relevant formalizing mathematical reasoning is, because evolution made humans who are pretty good at reasoning without any underlying rigidly defined formal systems."

Is there an essential difference between these cases? Your objection is very common, but it looks like to me like it is on the wrong end of a very strong empirical regularity, i.e. it seems like you would argue against some of the most compelling historical developments in mathematics on similar grounds, while basically never ending up on the right side of an argument.

Similarly, you would discourage the person who advocates studying mathematical logic with the goal of building a thinking machine [which as far as I can tell was one of the original objects, before the program of formalization took off]. I do think we can predictably say that such research is worthwhile.

This is without even getting into MIRI's raison d'etre, namely that it may be possible for societies to produce AI given widely varying levels of understanding of the underlying formal frameworks, and that all things equal we expect a deeper understanding of the underlying theory to result in better outcomes (according to the values of AI designers).

Comment author: Nick_Beckstead 07 June 2013 12:22:35PM *  4 points [-]

I think you underestimate the extent to which (1) such goods are complements [granting for the moment the hypothesis that this kind of AI work is in fact useful], and (2) there is a realistic prospect of engaging in many such projects in parallel, and that getting each started is a bottleneck.

This is an interesting point I wasn't fully taking into consideration. As I said in another comment, where MIRI has the right kind of technical AI questions, it makes sense to write them up.

As Wei Dai observed, you seem to be significantly understating the severity of the problem. We are investigating conditions under which an agent can believe that its own operation will lawfully lead to good outcomes, which is more or less necessary for reasonably intelligent, reasonably sane behavior given our current understanding.

I think it would greatly help me understand the expected practical implications of this research if you could address the question I asked in the original comment: "What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble?" I think I get why it causes problems if, as Wei Dai said, the AI makes decisions purely based on proofs. I don't see how the problem would be expected to arise in scenarios that seem more plausible. I think that the MIRI people working in this problem know a lot more about this than me, which is why I am asking for examples; I expect you have something to tell me that will make this make more sense.

Part of where I'm coming from on the second question is that evolutionary processes made humans who seem capable of overcoming putative Lobian obstacles to self-modification.

Compare to: "I'm not sure how relevant formalizing mathematical reasoning is, because evolution made humans who are pretty good at reasoning without any underlying rigidly defined formal systems."

Is there an essential difference between these cases? Your objection is very common, but it looks like to me like it is on the wrong end of a very strong empirical regularity, i.e. it seems like you would argue against some of the most compelling historical developments in mathematics on similar grounds, while basically never ending up on the right side of an argument.

Similarly, you would discourage the person who advocates studying mathematical logic with the goal of building a thinking machine [which as far as I can tell was one of the original objects, before the program of formalization took off]. I do think we can predictably say that such research is worthwhile.

The argument I was trying to make was of the form:

  • Creation process A [natural and cultural evolution] led to agents who don't stumble over problem B [Lobian issues].
  • By analogy, creation process C [people making AGI] will lead to agents who don't stumble over problem B [Lobian issues], even if MIRI does not take special precautions to prevent this from happening.
  • Therefore, it is not necessary to take special precautions to make sure creation process C doesn't stumble over problem B.

I don't think this type of reasoning will lead to the conclusion that formalizing mathematics and doing mathematical logic are not worthwhile. Perhaps you interpreted my argument another way.

Comment author: paulfchristiano 07 June 2013 06:22:29PM 7 points [-]

"What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble?"

The opportunity to kill yourself in exchange for $10 is a prototypical case. It's well and good to say "this is only a problem for an agent who uses proofs," but that's not a scenario, that's a type of agent. Yes, real agents will probably not use mathematical logic in the naive way. But as I pointed out in response to Wei Dai, probabilism doesn't make the issues go away. It softens the impossibility proofs, but we still lack possibility proofs (which is what MIRI is working on). So this seems like a weak objection. If you want to say "future agents will have different reasoning frameworks than the ones we currently understand," that's well and good, but see below. (That seems a lot like discouraging someone from trying to develop logic because their proto-logic doesn't resemble the way that humans actually reason.)

The argument I was trying to make was of the form:...

This is what I thought you meant; it seems analogous to:

  • Creation process A [natural and cultural evolution] led to agents who don't require a formalized deductive system
  • By analogy, creation process C [people making AGI] will lead to agents who don't require a formalized deductive system
  • Therefore, it is not necessary to take special precautions to ensure that deductive systems are formalized.

Do you object to the analogy?

No one thinks that the world will be destroyed because people built AI's that couldn't handle the Lobian obstruction. That doesn't seem like a sensible position, and I think Eliezer explicitly disavows it in the writeup. The point is that we have some frameworks for reasoning about reasoning. Those formalisms don't capture reflective reasoning, i.e. they don't provide a formal account of how reflective reasoning could work in principle. The problem Eliezer points to is an obvious problem that any consistent framework for reflective reasoning must resolve.

Working on this problem directly may be less productive than just trying to understand how reflective reasoning works in general---indeed, folks around here definitely try to understand how reflective reasoning works much more broadly, rather than focusing on this problem. The point of this post is to state a precise problem which existing techniques cannot resolve, because that is a common technique for making progress.

Comment author: Nick_Beckstead 07 June 2013 08:03:13PM *  3 points [-]

The opportunity to kill yourself in exchange for $10 is a prototypical case. It's well and good to say "this is only a problem for an agent who uses proofs," but that's not a scenario, that's a type of agent. Yes, real agents will probably not use mathematical logic in the naive way. But as I pointed out in response to Wei Dai, probabilism doesn't make the issues go away. It softens the impossibility proofs, but we still lack possibility proofs (which is what MIRI is working on). So this seems like a weak objection.

Thank you for the example. I do want to say "this is only a problem for an agent who uses proofs" if that's indeed true. It sounds like you agree, but are saying that some analogous but more complicated problem might arise for probabilistic agents, and that it might not be resolved be whoever else is making AI unless this research is done by MIRI. If you have an example of a complication that you think would plausibly arise in practice and have further thoughts on why we shouldn't expect this complication to be avoided by default in the course of the ordinary development of AI, I would be interested in hearing more. These do seem like crucial questions to me if we want to argue that this is an important line of research for the future of AI. Do you agree that these questions are crucial?

Creation process A [natural and cultural evolution] led to agents who don't require a formalized deductive system By analogy, creation process C [people making AGI] will lead to agents who don't require a formalized deductive system Therefore, it is not necessary to take special precautions to ensure that deductive systems are formalized. Do you object to the analogy?

I do object to this analogy, though I now have a better idea of where you are coming from. Here's a stab at how the arguments are different (first thing that came to mind):

  • My argument says that if creation process A led to agents who overcome obstacle X to doing Z, then the ordinary development of AGI will lead to agents who overcome obstacle X to doing Z.
  • Your argument says that if creation process A led to agents who overcome obstacle X to doing Z in way Y, then the ordinary development of AGI will lead to agents who overcome obstacle X to doing Z in way Y.

We might want to insert some qualifiers like "obstacle X needs to be essential to the proper functioning of the agent" or something along those lines, and other conditions I haven't thought of may be relevant as well (often the case with analogies). But, basically, though I think the analogy suggests that the ordinary development of AI will overcome Lobian obstacles, I think it is much less supported that AGIs will overcome these obstacles in the same way as humans overcome them. Likewise, humans overcome obstacles to reasoning effectively in certain ways, and I don't think there is much reason to suspect that AGIs will overcome these obstacles in the same ways. Therefore, I don't think that the line of argument I was advancing supports the view that formalizing math and doing mathematical logic will be unhelpful in developing AI.

No one thinks that the world will be destroyed because people built AI's that couldn't handle the Lobian obstruction. That doesn't seem like a sensible position, and I think Eliezer explicitly disavows it in the writeup. The point is that we have some frameworks for reasoning about reasoning. Those formalisms don't capture reflective reasoning, i.e. they don't provide a formal account of how reflective reasoning could work in principle. The problem Eliezer points to is an obvious problem that any consistent framework for reflective reasoning must resolve.

I think what you're saying is that getting a good framework for reasoning about reasoning could be important for making AGI go well. This is plausible to me. And then you're also saying that working on this Lobian stuff is a reasonable place to start. This is not obvious to me, but this seems like something that could be subtle, and I understand the position better now. I also don't think that however you're doing it should necessarily seem reasonable to me right now, even if it is.

Big picture: the big questions I had about this were:

  • What are some plausible concrete examples of self-modifications where Lob issues might cause you to stumble?
  • Do you think that people building AGI in the future will stumble over Lob issues if MIRI doesn't work on those issues? If so, why?

I would now ask those questions differently:

  • What are some plausible concrete examples of cases where machines might fail to reason about self-modification properly if this research isn't done? Why do you think it might happen in these cases?
  • Do you think that people building AGI in the future will fail to do this research, if it is in fact necessary for building well-functioning AIs? If so, why?

I now have a better understanding of what your answer the first question might look like, though I'm still struggling to imagine what plausibly go wrong in practice if the research isn't done. As far as I can tell, there hasn't been any effort directed at addressing the second question in this specific context so far. Maybe that's because it's thought that it's just part of the general question of whether future elites will handle AI development just fine. I'm not sure it is because it sounds like this may be part of making an AGI work at all, and the arguments I've heard for future elites not navigating it properly seems to turn on safety issues rather than basic functionality issues.

Comment author: Eliezer_Yudkowsky 07 June 2013 09:52:45PM 9 points [-]

It sounds like you agree, but are saying that some analogous but more complicated problem might arise for probabilistic agents, and that it might not be resolved be whoever else is making AI unless this research is done by MIRI.

That's not it, rather:

I think what you're saying is that getting a good framework for reasoning about reasoning could be important for making AGI go well. This is plausible to me. And then you're also saying that working on this Lobian stuff is a reasonable place to start. This is not obvious to me, but this seems like something that could be subtle, and I understand the position better now.

Yep. We have reasoning frameworks like the currently dominant forms of decision theory, but they don't handle reflectivity well.

The Lob Problem isn't a top-priority scary thing that is carved upon the tombstones of worlds, it's more like, "Look! We managed to crisply exhibit something very precise that would go wrong with standard methods and get started on analyzing and fixing it! Before we just saw in a more intuitive sense that something would go wrong when we applied standard theories to reflective problems but now we can state three problems very precisely!" (Lob and coherent quantified belief sec. 3, nonmonotonicity of probabilistic reasoning sec. 5.2 & 7, maximizing / satisficing not being good-enough idioms for bounded agents sec. 8.) Problems with reflectivity in general are expectedly carved upon the tombstones of worlds because they expectedly cause problems with goal stability during self-modification. But to make progress on that you need crisp problems to provide fodder for getting started on finding a good shape for a reflective decision theory / tiling self-improving agent.

Comment author: paulfchristiano 10 June 2013 01:04:59PM 4 points [-]

(As usual, I have somewhat less extreme views here than Eliezer.)

saying that some analogous but more complicated problem might arise for probabilistic agents

There is a problem here, we have an impossibility proof for a broad class of agents, and we know of no agents that don't have the problem. Indeed, this limits the relevance of the impossibility proof, but it doesn't limit the realness of the problem.

If you have an example of a complication that you think would plausibly arise in practice and have further thoughts on why we shouldn't expect this complication to be avoided by default in the course of the ordinary development of AI, I would be interested in hearing more.

I don't quite see where you are coming from here. It seems like the situation is:

  1. There are problems that reflective reasoners would be expected to solve, which we don't understand how to resolve in current frameworks for general reasoning (of which mathematical logic is the strongest).
  2. If you think that reflective reasoning may be an important part of AGI, then having formal frameworks for reflective reasoning is an important part of having formal frameworks for AGI.
  3. If you think that having formal frameworks is likely to improve our understanding of AGI, then having formal frameworks that support reflective reasoning is a useful step towards improving our understanding of AGI.

The sort of complication I imagine is: it is possible to build powerful AGI without having good frameworks for understanding its behavior, and then people do that. It seems like all things equal understanding a system is useful, not only for building it but also for having reasonable expectations about its behavior (which is in turn useful for making further preparations, solving safety problems, etc.). To the extent that understanding things at a deep level ends up being necessary to building them at all, then what we're doing won't matter (except insofar as people who care about safety making modest technical contributions is indirectly useful).

Do you think that people building AGI in the future will fail to do this research, if it is in fact necessary for building well-functioning AIs? If so, why?

Same answer. It may be that understanding reasoning well is necessary to building powerful agents (indeed, that would be my mode guess). But it may be that you can influence the relative development of understanding vs. building, in which case pushing on understanding has a predictable effect. For example, if people didn't know what proofs or probabilities were, it isn't out of the question that they could build deep belief nets by empirical experimentation. But I feel safe saying that understanding proof and probability helps you better reason about the behavior of extremely powerful deep belief nets.

Here's a stab at how the arguments are different (first thing that came to mind):

I agree that the cases differ in many ways. But this distinction doesn't seem to get at the important thing. To someone working on logic you would say "I don't know whether deduction systems will be formalized in the future, but I know that agents will be able to reason. So this suggests to me that your particular approach for defining reasoning, via formalization, is unnecessary." In some sense this is true---if I'm an early mathematician and I don't do logic, someone else will---but it has relatively little bearing on whether logic is likely to be mathematically productive to work on. If the question is about impact rather than productivity as a research program, then see the discussion above.

Comment author: Nick_Beckstead 10 June 2013 01:37:08PM 0 points [-]

The sort of complication I imagine is: it is possible to build powerful AGI without having good frameworks for understanding its behavior, and then people do that. It seems like all things equal understanding a system is useful, not only for building it but also for having reasonable expectations about its behavior (which is in turn useful for making further preparations, solving safety problems, etc.). To the extent that understanding things at a deep level ends up being necessary to building them at all, then what we're doing won't matter (except insofar as people who care about safety making modest technical contributions is indirectly useful).

OK, helpful. This makes more sense to me.

But this distinction doesn't seem to get at the important thing. To someone working on logic you would say "I don't know whether deduction systems will be formalized in the future, but I know that agents will be able to reason. So this suggests to me that your particular approach for defining reasoning, via formalization, is unnecessary." In some sense this is true---if I'm an early mathematician and I don't do logic, someone else will---but it has relatively little bearing on whether logic is likely to be mathematically productive to work on. If the question is about impact rather than productivity as a research program, then see the discussion above.

This reply would make more sense if I was saying that knowing how to overcome Lobian obstacles would never be necessary for building well-functioning AI. But I was making the weaker claim that either it would never be necessary OR it would be solved in the ordinary development of AI. So if someone is formalizing logic a long time ago with the aim of building thinking machines AND they thought that when thinking machines were built logic wouldn't be formalized properly and the machines wouldn't work, then I might have complained. But if they had said, "I'd like to build a thinking machine and I think that formalizing logic will help get us there, whether it is done by others or me. And maybe it will go a bit better or come a bit sooner if I get involved. So I'm working on it." then I wouldn't have had anything to say.

Anyway, I think we roughly understand each other on this thread of the conversation, so maybe there is no need to continue.

Comment author: Qiaochu_Yuan 06 June 2013 06:03:07PM *  6 points [-]

I’m not sure why this should be high on the priority list of general AI issues to think about it.

I don't think it's the highest-priority issue to think about, but my impression is that among the issues that Eliezer has identified as worth thinking about, it could be the one closest to being completely mathematically formalized, so it's a good one to focus on for the purpose of getting mathematicians interested in MIRI.

Comment author: Nick_Beckstead 06 June 2013 06:16:23PM 1 point [-]

I do appreciate arguments in favor of focusing your effort on tractable problems, even if they are not the most important problems to solve.

It's certainly hard to answer the question, "Why is this the best project to work on within AI?" since it implicitly requires comparisons will all types of stuff. It's probably unreasonable to ask Eliezer to answer this question in a comment. However, it is reasonable to ask, "Why will this research help make machine intelligence develop in a way that will benefit humanity?" Most of the other questions in my comment are also relevant to that question.

Comment author: Wei_Dai 06 June 2013 08:37:39PM *  4 points [-]

I also question the importance of working on this problem now, but for a somewhat different reason.

Part of where I'm coming from on the first question is that Lobian issues only seem relevant to me if you want to argue that one set of fundamental epistemic standards is better than another

My understanding is that Lobian issues make it impossible for a proof-based AI to decide to not immediately commit suicide, because it can't prove that it won't do something worse than nothing in the future. (Let's say it will have the option to blow up Earth in the future. Since it can't prove that its own proof system is consistent, it can't prove that it won't prove that blowing up Earth maximizes utility at that future time.) To me this problem looks more like a problem with making decisions based purely on proofs, and not much related to self-modification.

Comment author: paulfchristiano 06 June 2013 09:23:21PM *  8 points [-]

Using probabilities instead of proofs seems to eliminate the old obstructions, but it does leave a sequence of challenging problems (hence the work on probabilistic reflection). E.g., we've proved that there is an algorithm P using a halting oracle such that:

(Property R): Intuitively, we "almost" have a < P(X | a < P(X) < b) < b. Formally:

  • For each sentence X, each a, and each b, P(X AND a<P(X)<b ) < b * P(a <= P(X) <= b).
  • For each sentence X, each a, and each b, P(X AND a<=P(X)<=b) > a * P(a < P(X) < b)

But this took a great deal of work, and we can't exhibit any algorithm that simultaneously satisfies Property R and has P(Property R) = 1. Do you think this is not an important question? It seems to me that we don't yet know how many of the Godelian obstructions carry in the probabilistic environment, and there are still real problems that will involve ingenuity to resolve.

Comment author: Wei_Dai 06 June 2013 10:00:45PM 6 points [-]

Putting the dangers of AI progress aside, we probably ought to first work on understanding logical uncertainty in general, and start with simpler problems. I find it unlikely that we can solve "probabilistic reflection" (or even correctly specify what the problem is) when we don't yet know what principles allow us to say that P!=NP is more likely to be true than false. Do we even know that using probabilities is the right way to handle logical uncertainty? (People assumed that using probabilities is the right way to handle indexical uncertainty and that turned out to be wrong.)

Comment author: paulfchristiano 07 June 2013 06:36:50PM 4 points [-]

we don't yet know what principles allow us to say that P!=NP is more likely to be true than false

We have coherent answers at least. See e.g. here for a formalism (and similarly the much older stuff by Gaifman, which didn't get into priors). MIRI is working much more directly on this problem as well. Can you think of concrete open questions in that space? Basically we are just trying to develop the theory, but having simple concrete problems would surely be good. (I have a bucket of standard toy problems to resolve, and don't have a good approach that handle all of them, but it's pretty easy to hack together a solution to them so they don't really count as open problems.)

I agree that AI progress is probably socially costly (highly positive for currently living folks, modestly negative for the average far future person). I think work with a theoretical bias is more likely to be helpful, and I don't think it is very bad on net. Moreover, as long as safety-concerned folks are responsible for a very small share of all of the good AI work, the reputation impacts of doing good work seem very large compared to the social benefits or costs.

We don't know that probabilities are the right way to handle logical uncertainty, nor that our problem statements are correct. I think that the kind of probabilistic reflection we are working on is fairly natural though.

I agree with both you and Nick that the strategic questions are very important, probably more important than the math. I don't think that is inconsistent with getting the mathematical research program up and going. I would guess that all told the math will help on the strategy front via building the general credibility of AI safety concern (by 1. making it clear that there are concrete policy-relevant questions here, and 2. building status and credibility for safety-concerned communities and individuals), but even neglecting that I think it would still be worth it.

Comment author: Wei_Dai 09 June 2013 01:04:40AM 4 points [-]

We have coherent answers at least. See e.g. here for a formalism (and similarly the much older stuff by Gaifman, which didn't get into priors).

I read that paper before but it doesn't say why its proposed way of handling logical uncertainty is the correct one, except that it "seem to have some good properties". It seems like we're still at a stage when we don't understand logical uncertainty at a deep level and can offer solutions based on fundamental principles, but just trying out various ideas to see what sticks.

I agree that AI progress is probably socially costly [...] Moreover, as long as safety-concerned folks are responsible for a very small share of all of the good AI work, the reputation impacts of doing good work seem very large compared to the social benefits or costs.

I'm not entirely clear on your position. Are you saying that theoretical AI work by safety-concerned folks has a net social cost, accounting for reputation impacts, or excluding reputation impacts?

I think that the kind of probabilistic reflection we are working on is fairly natural though.

Maybe I'm just being dense but I'm still not really getting why you think that (despite your past attempts to explain it to me in conversation). The current paper doesn't seem to make a strong attempt to explain it either.

Comment author: paulfchristiano 10 June 2013 02:05:41PM 2 points [-]

I read that paper before but it doesn't say why its proposed way of handling logical uncertainty is the correct one, except that it "seem to have some good properties".

This is basically the same as the situation with respect to indexical probabilities. There are dominance arguments for betting odds etc. that don't quite go through, but it seems like probabilities are still distinguished as a good best guess, and worth fleshing out. And if you accept probabilities prior specification is the clear next question.

I'm not entirely clear on your position. Are you saying that theoretical AI work by safety-concerned folks has a net social cost, accounting for reputation impacts, or excluding reputation impacts?

I think it's plausible there are net social costs, excluding reputational impacts, and would certainly prefer to think more about it first. But with reputational impacts it seems like the case is relatively clear (of course this is potentially self-serving reasoning), and there are similar gains in terms of making things seem more concrete etc.

Maybe I'm just being dense but I'm still not really getting why you think that (despite your past attempts to explain it to me in conversation). The current paper doesn't seem to make a strong attempt to explain it either.

Well, the first claim was that without the epsilons (i.e. with closed instead of open intervals) it would be exactly what you wanted (you would have an inner symbol that exactly corresponded to reality), and the second claim was that the epsilons aren't so bad (e.g. because exact comparisons between floats are kind of silly anyway). Probably those could be more explicit in the writeup, but it would be helpful to know which steps seem shakiest.

Comment author: Wei_Dai 10 June 2013 04:52:00PM 3 points [-]

Well, the first claim was that without the epsilons (i.e. with closed instead of open intervals) it would be exactly what you wanted (you would have an inner symbol that exactly corresponded to reality)

Why do you say "exactly corresponded to reality"? You'd have an inner symbol which corresponded to the outer P, but P must be more like subjective credence than external reality, since in reality each logical statement is presumably either true or false, not a probabilistic mixture of both?

Intuitively, what I'd want is a "math intuition module" which, if it was looking at a mathematical expression denoting the beliefs that a copy of itself would have after running for a longer period of time or having more memory, would assign high probability that those beliefs would better correspond to reality than its own current beliefs. This would in turn license the AI using this MIM to build a more powerful version of itself, or just to believe that "think more" is generally a good idea aside from opportunity costs. I understand that you are not trying to directly build such an MIM, just to do a possibility proof. But your formalism looks very different from my intuitive requirement, and I don't understand what your intuitive requirement might be.

Comment author: paulfchristiano 02 July 2013 12:43:37PM 1 point [-]

P is intended to be like objective reality, exactly analogously with the predicate "True." So we can adjoin P as a symbol and the reflection principle as an axiom schema, and thereby obtain a more expressive language. Depending on architecture, this also may increase the agent's ability to formulate or reason about hypotheses.

Statements without P's in them, are indeed either true or false with probability 1. I agree it is a bit odd for statements with P in them to have probabilities, but I don't see a strong argument it shouldn't happen. In particular, it seems irrelevant to anything meaningful we would like to do with a truth predicate. In subsequent versions of this result, the probabilities have been removed and the core topological considerations exposed directly.

The relationship between a truth predicate and the kind of reasoning you discuss (a MIM that believes its own computations are trustworthy) is that truth is useful or perhaps necessary for defining the kind of correspondence that you want the MIM to accept, about a general relationship between the algorithm it is running and what is "true". So having a notion of "truth" seems like the first step.

Comment author: lukeprog 08 June 2013 06:43:22PM 2 points [-]

I would guess that all told the math will help on the strategy front via building the general credibility of AI safety concern

Also, by attracting thinkers who can initially only be attracted by crisp technical problems, but as they get involved, will turn their substantial brainpower toward the strategic questions as well.

For three additional reasons for MIRI to focus on math for now, see the bullet points under "strategic research will consume a minority of our research budget in 2013" in MIRI's Strategy for 2013.

Comment author: cousin_it 08 June 2013 01:14:46PM *  0 points [-]

what principles allow us to say that P!=NP is more likely to be true than false

Maybe we use the same principle that allows me to say "I guess I left my wallet at home" after I fail to find the wallet in the most likely places it could be, like my pockets. In other words, maybe we do Bayesian updating about the location of the "true" proof or disproof, as we check some apriori likely locations (attempted proofs and disproofs) and fail to find it there. This idea is still very vague, but looks promising to me because it doesn't assume logical omniscience, unlike Abram's and Benja's ideas...

Comment author: Nick_Beckstead 07 June 2013 12:02:15PM 0 points [-]

To me this problem looks more like a problem with making decisions based purely on proofs, and not much related to self-modification.

I think I was implicitly assuming that you wouldn't have an agent making decisions based purely on proofs.

Comment author: hairyfigment 06 June 2013 07:14:48PM 1 point [-]

Layman's answer: we want to predict what some self-modifying AI will do, so we want a decision theory that can ask about the effect of adopting a new decision theory or related processes. (The paper's issues could easily come up.) The one alternative I can see involves knowing in advance, as humans, how any modification that a super-intelligence could imagine will affect its goals. This seems like exactly what humans are bad at.

Speaking of, you say we "seem capable of overcoming putative Lobian obstacles to self-modification." But when I think about CEV, this appears dubious. We can't express exactly what 'extrapolation' means, save by imagining a utility function that may not exist. And without a better language for talking about goal stability, how would we even formalize that question? How could we formally ask if CEV is workable?

Comment author: Nick_Beckstead 06 June 2013 05:45:35PM 1 point [-]

Another aspect of where I'm coming from is that there should be a high standard of proof for claiming that something is an important technical problem in future AI development because it seems so hard to predict what will and won't be relevant for distant future technologies. My feeling is that paragraphs like this one, while relevant, don't provide strong enough arguments to overcome the prior:

As further background, the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each self-modification, and that self-modification will (at least in initial stages) take place within the highly deterministic environment of a computer chip. This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals). Mathematical proofs have the property that they are as strong as their axioms and have no significant conditionally independent per-step failure probability if their axioms are semantically true, which suggests that something like mathematical reasoning may be appropriate for certain particular types of self-modification during some developmental stages.

I would greatly appreciate further elaboration on why this is the right problem to be working on right now.

Comment author: AlexMennen 06 June 2013 08:41:09PM 3 points [-]

Another aspect of where I'm coming from is that there should be a high standard of proof for claiming that something is an important technical problem in future AI development because it seems so hard to predict what will and won't be relevant for distant future technologies.

On the other hand, trying to solve many things that have a significant probability of being important so that you're likely to eventually solve something that actually is important as a result, seems like a better idea than not doing anything because you can't prove that any particular sub-problem is important.

Comment author: Nick_Beckstead 07 June 2013 12:53:22PM 3 points [-]

I agree with this principle but think my claims are consistent with it. Doing stuff other than "technical problems in the future of AI" is an alternative worth considering.

Comment author: jsteinhardt 07 June 2013 02:08:25PM *  1 point [-]

Another aspect of where I'm coming from is that there should be a high standard of proof for claiming that something is an important technical problem in future AI development because it seems so hard to predict what will and won't be relevant for distant future technologies.

I disagree. I would go in the other direction: if it seems relatively plausible that something is relevant, then I'm happy to have someone working on it. This is why I am happy to have MIRI work on this and related problems (to the extent that I even attended one of the Lob workshops), even if I do not personally think they are likely to be relevant.

ETA: My main reason for believing this is that, among scientific fields, willingness to entertain such research programs seems to correlate with the overall health of the field.

Comment author: ESRogs 06 June 2013 07:10:06PM 1 point [-]

Are there other problems that you think it would be better to be working on now?

Comment author: Nick_Beckstead 06 June 2013 08:14:32PM *  2 points [-]

My question was more of a request for information than a challenge; Eliezer could say some things that would make doing mathematics on the Lob problem look more promising to me. It seems likely to me that I am missing some important aspect of the situation.

If I'm not missing anything major, then I think that, within the realm of AI risk, general strategic work addressing questions like, "Will the world's elites navigate the creation of AI just fine?" would be preferable. That's just one example; I do not mean to be claiming that this is the best thing to do. As I said in another comment, it's very hard to argue that one course is optimal.

Comment author: ESRogs 06 June 2013 09:23:21PM 3 points [-]

Thanks, that, at least for me, provides more context for your questions.

Comment author: Nick_Beckstead 06 June 2013 10:22:53PM *  3 points [-]

All that said, I do think that where MIRI has technical FAI questions to work on now, I think it is a very reasonable to write up:

  • what the question is
  • why answering the question is important for making machine intelligence benefit humanity
  • why we shouldn't expect the question to be answered by default by whoever makes AGI

In this particular case, I am asking for more info about the second two questions.

Comment author: lukeprog 06 June 2013 11:01:08PM *  3 points [-]

For convenience: "SuperBenefit" = increasing the probability that advanced machine intelligence has a positive impact.

I agree that MIRI has a lot left to explain with respect to questions #2 and #3, but it's easier to explain those issues when we've explained #1 already, and we've only just begun to do that with AI forecasting, IEM, and Tiling Agents.

Presumably the relevance of AI forecasting and IEM to SuperBenefit is clear already?

In contrast, it does seem like the relevance of the Tiling Agents work to SuperBenefit is unclear to many people, and that more explanation is needed there. Now that Tiling Agents has been published, Eliezer has begun to explain its relevance to SuperBenefit in various places on this page, but it will take a lot of trial and error for us to discover what is and isn't clear to people.

As for question #3, we've also only just begun to address that issue in detail.

So, MIRI still has a lot of explaining to do, and we're working on it. But allow me a brief reminder that this gap isn't unique to MIRI at all. Arguing for the cost effectiveness of any particular intervention given the overwhelming importance of the far future is extremely complicated, whether it be donating to AMF, doing AI risk strategy, spreading rationality, or something else.

E.g. if somebody accepts the overwhelming importance of the far future and is donating to AMF, they have roughly as much explaining to do as MIRI does, if not more.

Comment author: Nick_Beckstead 07 June 2013 01:42:48AM 1 point [-]

Presumably the relevance of AI forecasting and IEM to SuperBenefit is clear already?

Yes.

So, MIRI still has a lot of explaining to do, and we're working on it. But allow me a brief reminder that this gap isn't unique to MIRI at all. Arguing for the cost effectiveness of any particular intervention given the overwhelming importance of the far future is extremely complicated, whether it be donating to AMF, doing AI risk strategy, spreading rationality, or something else. E.g. if somebody accepts the overwhelming importance of the far future and is donating to AMF, they have roughly as much explaining to do as MIRI does, if not more.

I basically agree with these comments, with a couple of qualifications.

I think it's unique to MIRI in the sense that it makes sense for MIRI to be expected to explain how its research is going to accomplish its mission of making machine intelligence benefit humanity, whereas it doesn't make sense for global health charities to be expected to explain why improving global health makes the far future go better. This means MIRI has an asymmetrically hard job, but I do think it's a reasonable division of labor.

I think it makes sense for other people who care about the far future to evaluate how the other strategies you mentioned are expected to affect the far future, and try to find the best ones. There is an overwhelming amount of work to do.

Comment author: lukeprog 07 June 2013 03:47:39AM 5 points [-]

I think it's unique to MIRI in the sense that it makes sense for MIRI to be expected to explain how its research is going to accomplish its mission of making machine intelligence benefit humanity, whereas it doesn't make sense for global health charities to be expected to explain why improving global health makes the far future go better.

Right. Very few charities are even claiming to be good for the far future. So there's an asymmetry between MIRI and other charities w.r.t. responsibility to explain plausible effects on the far future. But among parties (including MIRI) who care principally about the far future and are trying to do something about it, there seems to be no such asymmetry — except for other reasons, e.g. asymmetry in resource use.

Comment author: Nick_Beckstead 07 June 2013 11:54:14AM 1 point [-]

Yes.

Comment author: jsteinhardt 07 June 2013 02:16:47PM 0 points [-]

So, MIRI still has a lot of explaining to do, and we're working on it. But allow me a brief reminder that this gap isn't unique to MIRI at all. Arguing for the cost effectiveness of any particular intervention given the overwhelming importance of the far future is extremely complicated, whether it be donating to AMF, doing AI risk strategy, spreading rationality, or something else.

I agree with this. Typically people justify their research on other grounds than this, for instance by identifying an obstacle to progress and showing how their approach might overcome it in a way that previously tried approaches were not able to. My impression is that one reason for doing this is that it is typically much easier to communicate along these lines, because it brings the discourse towards much more familiar technical questions while still correlating well with progress more generally.

Note that under this paradigm, the main thing MIRI needs to do to justify their work is to explain why the Lob obstacle is insufficiently addressed by other approaches (for instance, statistical learning theory). I would actually be very interested in understanding the relationship of statistics to the Lob obstacle, so look forward to any writeup that might exist in the future.

Comment author: Halfwit 06 June 2013 05:00:41AM *  23 points [-]

The fact that MIRI is finally publishing technical research has impressed me. A year ago it seemed, to put it bluntly, that your organization was stalling, spending its funds on the full-time development of Harry Potter fanfiction and popular science books. Perhaps my intuition there was uncharitable, perhaps not. I don't know how much of your lead researcher's time was spent on said publications, but it certainly seemed, from the outside, that it was the majority. Regardless, I'm very glad MIRI is focusing on technical research. I don't know how much farther you have to walk, but it's clear you're headed in the right direction.

Comment author: Qiaochu_Yuan 06 June 2013 01:44:23AM *  8 points [-]

(These were some comments I had on a slightly earlier draft than this, so the page numbers and such might be slightly off.)

Page 4, footnote 8: I don't think it's true that only stronger systems can prove weaker systems consistent. It can happen that system A can prove system B consistent and A and B are incomparable, with neither stronger than the other. For example, Gentzen's proof of the consistency of PA uses a system which is neither stronger nor weaker than PA.

Page 6: the hypotheses of the second incompleteness theorem are a little more restrictive than this (though not much, I think).

Page 11, problem c: I don't understand the sentence containing "highly regular and compact formula." Looks like there's a typo somewhere.

Comment author: JoshuaZ 06 June 2013 03:23:10PM 2 points [-]

A can prove system B consistent and A and B are incomparable, with neither stronger than the other. For example, Gentzen's proof of the consistency of PA uses a system which is neither stronger nor weaker than PA.

I think there are more trivial counterexamples to the statement also. Take Robinson arithmetic and throw in an axiom asserting the consistency of PA. This system can trivially prove that PA is consistent, and is much weaker than PA.

Comment author: Quinn 07 June 2013 03:32:56AM 1 point [-]

Your post confused me for a moment, because Robinson + Con(PA) is of course not weaker than PA. It proves Con(PA), and PA doesn't.

I see now that your point is that Robinson arithmetic is sufficiently weak compared to PA that PA should not be weaker than Robinson + Con(PA). Is there an obvious proof of this?

(For example, if Robinson + Con(PA) proved all theorems of PA, would this contradict the fact that PA is not finitely axiomatizable?)

Comment author: JoshuaZ 07 June 2013 06:40:12AM 1 point [-]

Yes, finite axiomatizability is the obvious way of seeing this. You are correct that strictly speaking Robinson + Con(PA) is not weaker than PA, but rather is another incomparable example (which was the intended point). Note that there are other ways of seeing that Robinson + Con(PA) is weaker than PA without using the finite axiomatization of PA if one is willing to be be slightly non-rigorous. For example, one can note that Robinson arithmetic has as a model Z[x]+ so any theorem of Robinson +Con(PA) should be a theorem of Z[x]+ +Con(PA), (this step requires some details).

Comment author: Quinn 08 June 2013 01:55:36AM *  1 point [-]

Ah, so my question was more along the line: does finite axiomatizability of a stronger (consistent) theory imply finite axiomatizability of the weaker theory? (This would of course imply Q + Con(PA) is not stronger than PA, Q being the usual symbol for Robinson arithmetic.)

On the model theoretic side, I think I can make something work, but it depends on distorting the specific definition of Con(PA) in a way that I'm not really happy about. In any case, I agree that your example is trivial to state and trivial to believe correct, but maybe it's less trivial to prove correct.

Here's what I was thinking:

Consider the predicate P(x) which says "if x != Sx, then x does not encode a PA-proof of 0=1", and let ConMinus(PA) say for all x, P(x). Now, I think one could argue that ConMinus is a fair definition of (or substitute for?) Con, in that qualifying a formula with "if x != Sx" does not change its meaning in the standard model. Alternately, you could push this "if x != Sx" clause deeper, into basically every formula you would use to define the primitive recursive functions needed to talk about consistency in the first place, and you would not change the meanings of these formulas in the standard model. (I guess what I'm saying is that "the" formula asserting the consistency of PA is poorly specified.)

Also, PA is smart enough to prove that numbers are not their own successors, so PA believes in the equivalence of Con and ConMinus. In particular, PA does not prove ConMinus(PA), so PA is not stronger than Q + ConMinus(PA).

On the other hand, let M be the non-negative integers, together with one additional point omega. Put S(omega) = omega, put omega + anything = omega = anything + omega, and similarly for multiplication (except 0 * omega = omega * 0 = 0). I am pretty sure this is a model of Q.

Q is smart enough about its standard integers that it knows none of them encode PA-proofs of 0=1 (the "proves" predicate being Delta_0). Thus the model M satisfies Q + ConMinus(PA). But now we can see that Q + ConMinus(PA) is not stronger than PA, because PA proves "for all x, x is not equal to Sx", yet this statement fails in a model of Q + ConMinus(PA).

EDIT: escape characters for *.

Comment author: [deleted] 25 June 2013 12:24:43AM 1 point [-]

Ah, so my question was more along the line: does finite axiomatizability of a stronger (consistent) theory imply finite axiomatizability of the weaker theory?

If I'm not mistaken, NBG and ZFC are a counterexample to this: NBG is a conservative extension of ZFC (and therefore stronger than ZFC), but NBG is finitely axiomatizable while ZFC is not.

Comment author: JoshuaZ 08 June 2013 02:15:30AM 0 points [-]

Yeah, the details of actually proving this are looking like they contain more subtleties than I expected, but I tentatively agree with your analysis. Here's what may be another proof,. Not only is PA not finitely axiomatizable, but any consistent extension of PA isn't (I think this is true, the same proof that works for PA should go through here, but I haven't checked the details). So PA+ConMinus(PA) still isn't finitely axiomatizable. So now, pick any of the axioms created in the axiom schema of induction that are needed in PA + ConMinus(PA), Q+ConMinus(PA) also can't prove any of those (since it is strictly weaker than PA+ConMinus(PA),) but all of those statements are theorems of PA (since they are in fact axioms). Does this work?

Overall, this is requiring a lot more subtlety than I initially thought was involved which may make Qiaochu Yuan's example a better one.

Comment author: Quinn 08 June 2013 04:06:32AM 1 point [-]

Not only is PA not finitely axiomatizable, but any consistent extension of PA isn't (I think this is true, the same proof that works for PA should go through here, but I haven't checked the details)

Well if we had this, we would know immediately that Q + Con(PA) is not an extension of PA (which is what we originally wanted), because it certainly is finitely axiomatizable. I know there are several proofs that PA is not finitely axiomatizable, but I have not read any of them, so can't comment on the strengthened statement, though it sounds true.

Comment author: warbo 06 June 2013 03:06:25PM *  1 point [-]

Page 4 footnote 8 in the version you saw looks like footnote 9 in mine.

I don't see how 'proof-of-bottom -> bottom' makes a system inconsistent. This kind of formula appears all the time in Type Theory, and is interpreted as "not(proof-of-bottom)".

The 'principle of explosion' says 'forall A, bottom -> A'. We can instantiate A to get 'bottom -> not(proof-of-bottom)', then compose this with "proof-of-bottom -> bottom" to get "proof-of-bottom -> not(proof-of-bottom)". This is an inconsistency iff we can show proof-of-bottom. If our system is consistent, we can't construct a proof of bottom so it remains consistent. If our system is inconsistent then we can construct a proof of bottom and derive bottom, so our system remains inconsistent.

Have I misunderstood this footnote?

[EDIT: Ignore me for now; this is of course Lob's theorem for bottom. I haven't convinced myself of the existence of modal fixed points yet though]

Comment author: MrMind 07 June 2013 07:07:53AM 0 points [-]

Page 4, footnote 8: I don't think it's true that only stronger systems can prove weaker systems consistent. It can happen that system A can prove system B consistent and A and B are incomparable, with neither stronger than the other.

That is strictly correct, but not relevant for self-improving AI. You don't want father AI that cannot prove everything that the child AI can prove. Maybe the footnote should be edited in this sense.

Comment author: jsteinhardt 07 June 2013 01:58:54PM *  0 points [-]

Well, if A can prove everything B can, except for con(A), and B can prove everything A can, except for con(B), then you're relatively happy.

ETA: retracted (thanks to Joshua Z for pointing out the error).

Comment author: JoshuaZ 08 June 2013 02:17:29AM 1 point [-]

I don't think this can't happen, since A has proven Con(B), then it can now reason using system B for consistency purposes and get from the fact that B proves Con(A) to get A proving Con(A), which is bad.

Comment author: jsteinhardt 08 June 2013 03:34:03AM 1 point [-]

Thanks for pointing this out. My mathematical logic is rusty.

Comment author: JoshuaFox 09 June 2013 07:47:18PM *  7 points [-]

Please augment footnote 4 or otherwise provide a more complete summary of symbols.

I appreciate that some of these symbols have standard meanings in the relevant fields. But different subfields use these with subtle differences.

Note, for example, this list of logic symbols in which single-arrow -> and double-arrow => are said to carry the same meaning, but you use them distinctly. (You also use horizontal line (like division) to indicate implication on page 7. I think you mean the same as single-arrow. Again, that's standard notation, but it should be described along with the others.)

On page 3 you specify a special meaning for double turnstile ||- . I'd like to see an explanation of your meaning for the symbol, "an agent has cognitively concluded a belief," instead of the entailment usually meant by the double turnstile, but in any case, please add it to the summary.

It's good that you keep the paper technical, while using the footnotes to enlighten the rest of us. A more complete list of symbols would be helpful for that.

Comment author: Wei_Dai 07 June 2013 10:24:30AM 7 points [-]

Why frame this problem as about tiling/self-modification instead of planning/self-prediction? If you do the latter though, the problem looks more like an AGI (or AI capability) problem than an FAI (or AI safety) problem, which makes me wonder if it's really a good idea to publicize the problem and invite more people to work on it publicly.

Regarding section 4.3 on probabilistic reflection, I didn't get a good sense from the paper of how Christiano et al's formalism relates to the concrete problem of AI self-modification or self-prediction. For example what are the functions P and p supposed to translate to in terms of an AI and its descendent or future self?

Comment author: Benja 07 June 2013 11:00:04AM 2 points [-]

One argument in favor of this being relevant specifically to FAI is that evolution kludged up us, so there is no strong reason to think that AGI projects with an incomplete understanding of the problem space will eventually kludge up an AGI that is able to solve these problems itself and successfully navigate an intelligence explosion -- and then paperclip the universe, since the incomplete understanding of the human researchers creating the seed AI wouldn't suffice for giving the seed AI stable goals. I.e., solving this in some way looks probably necessary for reaching AI safety at all, but only possibly helpful for AI capability.

I'm not entirely unworried about that concern, but I'm less worried about it than about making AGI more interesting by doing interesting in-principle work on it, and I currently feel that even the latter danger is outweighed by the danger of not tackling the object-level problems early enough to actually make progress before it's too late.

Comment author: Wei_Dai 07 June 2013 08:48:50PM *  4 points [-]

One argument in favor of this being relevant specifically to FAI is that evolution kludged up us, so there is no strong reason to think that AGI projects with an incomplete understanding of the problem space will eventually kludge up an AGI that is able to solve these problems itself and successfully navigate an intelligence explosion -- and then paperclip the universe, since the incomplete understanding of the human researchers creating the seed AI wouldn't suffice for giving the seed AI stable goals.

This sentence has kind of a confusing structure and I'm having trouble understanding the logic of your argument. Could you rewrite it? Also, part of my thinking, which I'm not sure if you've addressed, is that an AGI that fails the Lobian obstacle isn't just unable to stably self-modify, it's unable to do even the simplest kind of planning because it can't predict that its future selves won't do something crazy. A "successful" (ETA: meaning one that FOOMs) AGI project has to solve this planning/self-prediction problem somehow. Why wouldn't that solution also apply to the self-modification problem?

Comment author: Benja 09 June 2013 10:16:20PM *  12 points [-]

Sorry for being confusing, and thanks for giving me a chance to try again! (I did write that comment too quickly due to lack of time.)

So, my point is, I think that there is very little reason to think that evolution somehow had to solve the Löbstacle in order to produce humans. We run into the Löbstacle when we try to use the standard foundations of mathematics (first-order logic + PA or ZFC) in the obvious way to make a self-modifying agent that will continue to follow a given goal after having gone through a very large number of self-modifications. We don't currently have any framework not subject to this problem, and we need one if we want to build a Friendly seed AI. Evolution didn't have to solve this problem. It's true that evolution did have to solve the planning/self-prediction problem, but it didn't have to solve it with extremely high reliability. I see very little reason to think that if we understood how evolution solved the problem it solved, we would then be really close to having a satisfactory Löbstacle-free decision theory to use in a Friendly seed AI -- and thus, conversely, I see little reason to think that an AGI project must solve the Löbstacle in order to solve the planning/prediction problem as well as evolution did.

I can more easily conceive of the possibility (but I think it rather unlikely, too) that solving the Löbstacle is fundamentally necessary to build an agent that can go through millions of rewrites without running out of steam: perhaps without solving the Löbstacle, each rewrite step will have an independent probability of making the machine wirehead (for example), so an AGI doing no better than evolution will almost certainly wirehead during an intelligence explosion. But in this scenario, since evolution build us, an AGI project might build an AI that solves the planning/self-prediction as well as we do, and that AI might then go and solve the Löbstacle and go through a billion self-modifications and take over the world. (The human operators might intervene and un-wirehead it every 50,000 rewrites or so until it's figured out a solution to the Löbstacle, for example.) So even in this scenario, the Löbstacle doesn't seem a barrier to AI capability to me; but it is a barrier to FAI, because if it's the AI that eventually solves the Löbstacle, the superintelligence down the line will have the values of the AI at the time it's solved the problem. This was what I intended to say by saying that the AGI would "successfully navigate an intelligence explosion -- and then paperclip the universe".

(On the other hand, while I only think of the above as an outside possibility, I think there's more than an outside possibility that a clean reflective decision theory could be helpful for an AGI project, even if I don't think it's a necessary prerequisite. So I'm not entirely unsympathetic to your concerns.)

Does the above help to clarify the argument I had in mind?

Comment author: Wei_Dai 01 September 2013 06:55:02AM 3 points [-]

So, my point is, I think that there is very little reason to think that evolution somehow had to solve the Löbstacle in order to produce humans.

So you think that humans do not have a built-in solution to the Löbstacle, and you must also think we are capable of building an FAI that does have a built-in solution to the Löbstacle. That means an intelligence without a solution to the Löbstacle can produce another intelligence that shares its values and does have a solution to the Löbstacle. But then why is it necessary for us to solve this problem? (You said earlier "solving this in some way looks probably necessary for reaching AI safety at all, but only possibly helpful for AI capability.") Why can't we instead built an FAI without solving this problem, and depend on the FAI to solve the problem while it's designing the next generation FAI?

Also earlier you said

I currently feel that even the latter danger is outweighed by the danger of not tackling the object-level problems early enough to actually make progress before it's too late.

I've been arguing with Eliezer and Paul about this recently, and thought that I should get the details of your views too. Have you been following the discussions under my most recent post?

Comment author: Benja 08 October 2013 05:39:31PM 0 points [-]

Sorry for the long-delayed reply, Wei!

So you think that humans do not have a built-in solution to the Löbstacle, and you must also think we are capable of building an FAI that does have a built-in solution to the Löbstacle. That means an intelligence without a solution to the Löbstacle can produce another intelligence that shares its values and does have a solution to the Löbstacle.

Yup.

But then why is it necessary for us to solve this problem? [...] Why can't we instead built an FAI without solving this problem, and depend on the FAI to solve the problem while it's designing the next generation FAI?

I have two main reasons in mind. First, if you are willing to grant that (a) this is a problem that would require humans years of serial research to solve and (b) that it looks much easier to build this into an AI designed from scratch rather than bolting it on to an existing AI design that was created without taking these considerations into account, but you still think that (c) it would be a good plan to have the first-generation FAI solve this problem when building the next-generation FAI, then it seems that you need to assume that the FAI will be much better at AGI design than its human designers before it executes its first self-rewrite, since the human team would by assumption still need years to solve the problem at that point and the plan wouldn't be particularly helpful if the first-generation FAI would need a similar amount of time or longer. But it seems unlikely to me that we first need to build ultraintelligent machines a la I.J. Good, far surpassing humans, before we can get an intelligence explosion: it seems to me that most of the probability mass should be in the required level of AGI research ability being <= the level of the human research team working on the AGI. I admit that one possible strategy could be to continue having humans improve the initial FAI until it is superintelligent and then ask it to write a successor from scratch, solving the Löbstacle in the process, but it doesn't seem particularly likely that this is cheaper than solving the problem beforehand.

Second, if we followed this plan, when building the initial FAI we would be unable to use mathematical logic (or other tools sufficiently similar to be subject to the same issues) in a straight-forward way when having it reason about its potential successor. This cuts off a large part of design-space that I'd naturally be looking to. Yes, if we can do it then it's possible in principle to get an FAI to do it, but mimicking human reasoning doesn't seem likely to me to be the easiest way to build a safe AGI.

Have you been following the discussions under my most recent post?

I agree with you that relying on an FAI team to solve a large number of philosophical problems correctly seems dangerous, although I'm sympathetic to Eliezer's criticism of your outside-view arguments -- I essentially agree with your conclusions, but I think I use more inside-view reasoning to arrive at them (would need to think longer to tease this apart). I agree with Paul that something like CEV for philosophy in addition to values should probably part of an FAI design. I agree with you that progress in metaphilosophy would be very valuable, but I do not have any concrete leads to follow. But I think that having good solutions to some of these problems is not unlikely to be helpful for FAI design (and more helpful to FAI than uFAI) so I still think that some amount of work allocated to these philosophical problems looks like a good thing; and I also think that working on these problems does on average reduce the probability of making a bad mistake even if we manage to have the FAI do philosophy itself and have it checked by "coherent extrapolated philosophy".

You quoted my earlier comment that I think that making object-level progress is important enough that it seems a net positive despite making AGI research more interesting, but I don't really feel that your post or the discussion below that contains much in the way of arguments about that -- could you elaborate on the connection?

Comment author: Eliezer_Yudkowsky 09 June 2013 10:48:39PM 3 points [-]

(I endorse essentially all of Benja's reply above.)

Comment author: Wei_Dai 13 June 2013 08:55:19PM 1 point [-]

Thanks, that's very helpful. (I meant to write a longer reply but haven't gotten around to it yet. Didn't want you to feel ignored in the mean time.)

Comment author: cousin_it 07 June 2013 10:01:58AM *  6 points [-]

Congrats to Eliezer and Marcello on the writeup! It has helped me understand Benja's "parametric polymorphism" idea better.

There's a slightly different angle that worries me. What happens if you ask an AI to solve the AI reflection problem?

1) If an agent A_1 generates another agent A_0 by consequentialist reasoning, possibly using proofs in PA, then future descendants of A_0 also count as consequences. So at least A_0 should not face the problem of "telomere shortening", because PA can see the possible consequences of "telomere shortening" already. But what will A_0 look like? That's a mystery.

2) To figure out the answer to (1), it's natural to try devising a toy problem where we could test different implementations of A_1. Benja made a good attempt, then Wei came up with an interesting quining solution to that. Eliezer has now formalized his objection to quining solutions as the "Vingean principle" (no, actually "naturalistic principle", thx Eliezer), which is a really nice step. Now I just want a toy problem where we're forced to apply that principle :-) Why such problems are hard to devise is another mystery.

Comment author: Eliezer_Yudkowsky 08 June 2013 06:43:25AM 3 points [-]

(Quick note: Wei's quining violates the naturalistic principle, not the Vingean principle. Wei's actions were still inside quantifiers but had separate forms for self-modification and action. So did Benja's original proposal in the Quirrell game, which Wei modified - I was surprised and impressed when Benja's polymorphism approach carried over to a naturalistic system.)

Comment author: Wei_Dai 09 June 2013 12:49:32AM *  2 points [-]

Was it UDT1.1 (as a solution to this problem) that violates the Vingean principle?

Also, I'm wondering if Benja's polymorphism approach solves the "can't decide whether or not to commit suicide" problem that I described here. Your paper doesn't seem to address this problem since the criteria of action you use all talk about "NULL or GOAL" and since suicide leads to NULL, an AI using your criterion of action has trouble deciding whether or not to commit suicide for an even more immediate reason. Do you have any ideas how your framework might be changed to allow this problem to be addressed?

Comment author: Eliezer_Yudkowsky 09 June 2013 01:17:08AM 2 points [-]

Was it UDT1.1 (as a solution to this problem) that violates the Vingean principle?

As I remarked in that thread, there are many possible designs that violate the Vingean principle, AFAICT UDT 1.1 is one of them.

Also, I'm wondering if Benja's polymorphism approach solves the "can't decide whether or not to commit suicide" problem that I described here. Your paper doesn't seem to address this problem since the criteria of action you use all talk about "NULL or GOAL" and since suicide leads to NULL, an AI using your criterion of action has trouble deciding whether or not to commit suicide for an even more immediate reason.

Suicide being permitted by the NULL option is a different issue from suicide being mandated by self-distrust. Benja's TK gets rid of distrust of offspring. Work on reflective/naturalistic trust is ongoing.

Comment author: cousin_it 08 June 2013 07:38:34AM *  0 points [-]

Thanks! Corrected.

Comment author: shminux 06 June 2013 08:33:22PM 5 points [-]

As a non-expert fan of AI research, I simply wanted to mention that this and other recent papers seem to go a fair ways toward addressing one of the Karnofsky's criticisms of the SI I remember agreeing with:

Overall disconnect between SI's goals and its activities. SI seeks to build FAI and/or to develop and promote "Friendliness theory" that can be useful to others in building FAI. Yet it seems that most of its time goes to activities other than developing AI or theory. Its per-person output in terms of publications seems low. Its core staff seem more focused on Less Wrong posts, "rationality training" and other activities that don't seem connected to the core goals; Eliezer Yudkowsky, in particular, appears (from the strategic plan) to be focused on writing books for popular consumption. These activities seem neither to be advancing the state of FAI-related theory nor to be engaging the sort of people most likely to be crucial for building AGI.

Hopefully more interesting stuff will follow, even if I am not in a position to evaluate its validity.

Comment author: So8res 06 June 2013 03:47:13PM *  5 points [-]

Page 14, Remarks. Typo:

Remarks. Although T_0 is slightly more powerful than T_1 in the sense that T_0 can prove certain exact theorems which T_0 cannot...

This should be "T_0 can prove certain exact theorems which T_1 cannot".

Comment author: [deleted] 06 June 2013 02:00:37AM *  5 points [-]

Cool.

I'll get my math team working on this right away, and we eagerly await more of these. ;)

EDIT: slight sarcasm on the "my math team".

Comment author: Eliezer_Yudkowsky 06 June 2013 02:23:31AM 5 points [-]

(Your math team?)

Comment author: [deleted] 06 June 2013 02:22:14PM 4 points [-]

(Friends that are studying math for the purpose of solving FAI problems and parts of myself that can be similarly described.)

Comment author: SaidAchmiz 06 June 2013 12:56:35PM 1 point [-]

I am guessing nyan_sandwich is referring to a team that participates in the various nation-/world-wide mathematics competitions that exist. It's an interesting way for math-talented college students and the like to exercise their mathematical abilities.

Comment author: [deleted] 06 June 2013 02:21:16PM *  2 points [-]

nope. Unfortunately nothing so formal.

I use "my math team" the same way people refer to lawyer-friends as "my legal team"; slightly sarcastically.

Comment author: thomblake 06 June 2013 01:11:58PM 3 points [-]

This post does answer some questions I had regarding the relevance of mathematical proof to AI safety, and the motivations behind using mathematical proof in the first place. I don't believe I've seen this bit before:

the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling

Comment author: Eliezer_Yudkowsky 06 June 2013 08:36:28PM 2 points [-]

...I've actually said it many, many times before but there's a lot of people out there depicting that particular straw idea (e.g. Mark Waser).

Comment author: thomblake 06 June 2013 08:48:07PM 2 points [-]

I don't read a lot of other people's stuff about your ideas (e.g. Mark Waser) but I have read most of the things you've published. I'm surprised to hear you've said it many times before.

Comment author: lukeprog 10 January 2014 04:39:28PM 2 points [-]

if you can't even answer the kinds of simple problems posed within the paper... then you must be very far off from being able to build a stable self-modifying AI.

For the record, I personally think this statement is too strong. Instead, I would say something more like what Paul said:

Working on [the Lobian obstacle] directly may be less productive than just trying to understand how reflective reasoning works in general---indeed, folks around here definitely try to understand how reflective reasoning works much more broadly, rather than focusing on this problem. The point of this post is to state a precise problem which existing techniques cannot resolve, because that is a common technique for making progress.

Comment author: lukeprog 06 June 2013 01:20:16AM 5 points [-]

If ever I wanted to upvote something twice, it's this.

Comment author: JonahSinick 06 June 2013 02:25:07AM *  2 points [-]

I read the first two pages of the publication, and wasn't convinced that the problem that the paper attempts to address has non-negligible relevance to AI safety. I would be interested in seeing you spell out your thoughts on this point in detail.

[Edit: Thinking this over a little bit, I realize that maybe my comment isn't so helpful in isolation. I corresponded with Luke about this, and would be happy to flesh out my thinking either publicly or in private correspondence.]

Comment author: Eliezer_Yudkowsky 06 June 2013 03:16:06AM 7 points [-]

The LW post may address some of your concerns. The idea here is that we need a tiling decision criterion, and the paper isn't supposed to be an AI design, it's supposed to get us a little conceptually closer to a tiling decision criterion. If you don't understand why a tiling decision criterion is a good thing in a self-improving AI which is supposed to have a stable goal system, then I'm not quite sure what issue needs addressing.

Comment author: JonahSinick 06 June 2013 03:38:55AM *  2 points [-]

Thanks for your courtesy, and again, sorry for not being more specific in my original comment.

Yes, I'm questioning why a self-improving AI which is intended to have a stable goal system needs a tiling decision criterion. In your publication, you wrote

In a self-modifying AI, most self-modifications should not change most aspects of the AI; it would be odd to consider agents that could only make large, drastic self-modifications. To reflect this desideratum within the viewpoint from agents constructing other agents, we will examine agents which construct successor agents of highly similar design...

I don't see why the model of the sequence of agents is a good operationalization. My intuition is that

  • A self modifying AI would modify itself by modifying its modules one by one.
  • It would reconstruct a given module whole-cloth, rather than doing so by incrementally changing the module in small steps.

To elaborate, and for concreteness, I'll comment on

If you wanted a road to a certain city to exist, you might try attaching more powerful arms to yourself so that you could lift paving stones into place. This can be viewed as a special case of constructing a new creature with similar goals and more powerful arms, and then replacing yourself with that creature.

I haven't read the technical portions of the paper, but my surface impression is that the operationalization in the paper is analogous modifying your arms by successively shaving slivers of tissue off of them, and grafting slivers of tissue onto them, with a view toward making them really long. Another way to go would be to grow the long arms in a lab, chop off your current arms, and then graft the newly created long arms onto yourself. In the context of self-modifying AIs, the latter possibility seems to me to be significantly more likely than the former possibility.

Is my surface impression of the operationalization right? If so, what do you think about the points that I raise in the previous paragraphs?

Comment author: Eliezer_Yudkowsky 06 June 2013 03:53:05AM 7 points [-]

Jonah, some self-modifications will potentially be large, but others might be smaller. More importantly we don't want each self-modification to involve wrenching changes like altering the mathematics you believe in, or even worse, your goals. Most of the core idea in this paper is to prevent those kinds of drastic or deleterious changes from being forced by a self-modification.

But it's also possible that there'll be many gains from small self-modifications, and it would be nicer not to need a special case for those, and for this it is good to have (in theoretical principle) a highly regular bit of cognition/verification that needs to be done for the change (e.g. for logical agents the proof of a certain theorem) so that small local changes only call for small bits of the verification to be reconsidered.

Another way of looking at it is that we're trying to have the AI be as free as possible to self-modify while still knowing that it's sane and stable, and the more overhead is forced or the more small changes are ruled out, the less free it is.

Comment author: JonahSinick 06 June 2013 04:16:11AM 0 points [-]

Thanks for engaging.

More importantly we don't want each self-modification to involve wrenching changes like altering the mathematics you believe in, or even worse, your goals.

I'm very sympathetic to this in principle, but don't see why there would be danger of these things in practice.

But it's also possible that there'll be many gains from small self-modifications,

Humans constantly perform small self-modifications, and this doesn't cause serious problems. People's goals do change, but not drastically, and people who are determined can generally keep their goals pretty close to their original goals. Why do you think that AI would be different?

Another way of looking at it is that we're trying to have the AI be as free as possible to self-modify while still knowing that it's sane and stable, and the more overhead is forced or the more small changes are ruled out, the less free it is.

To ensure that one gets a Friendly AI, it suffices to start with good goal system, and to ensure that the goal system remains pretty stable over time. It's not necessary that the AI be as free as possible.

You might argue that an limited AI wouldn't be able to realize as good as a future as one without limitations.

But if this is the concern, why not work to build a limited AI that can itself solve the problems about having a stable goal system under small modifications? Or, if it's not possible to get a superhuman AI subject to such limitations, why not build a subhuman AI and then work in conjunction with it to build Friendly AI that's as free as possible?

Comment author: Eliezer_Yudkowsky 06 June 2013 05:03:38AM 5 points [-]

Many things in AI that look like they ought to be easy have hidden gotchas which only turn up once you start trying to code them, and we can make a start on exposing some of these gotchas by figuring out how to do things using unbounded computing power (albeit this is not a reliable way of exposing all gotchas, especially in the hands of somebody who prefers to hide difficulties, or even someone who makes a mistake about how a mathematical object behaves, but it sure beats leaving everything up to verbal arguments).

Human beings don't make billions of sequential self-modifications, so they're not existence proofs that human-quality reasoning is good enough for that.

I'm not sure how to go about convincing you that stable-goals self-modification is not something which can be taken for granted to the point that there is no need to try to make the concepts crisp and lay down mathematical foundations. If this is a widespread reaction beyond yourself then it might not be too hard to get a quote from Peter Norvig or a similar mainstream authority that, "No, actually, you can't take that sort of thing for granted, and while what MIRI's doing is incredibly preliminary, just leaving this in a state of verbal argument is probably not a good idea."

Depending on your math level, reading Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference by Judea Pearl might present you with a crisper idea of why it can be a good idea to formalize certain types of AI problems in general, and it would be a life-enriching experience, but I anticipate that's more effort than you'd want to put into this exact point.

Comment author: JonahSinick 06 June 2013 05:21:16AM *  1 point [-]

Many things in AI that look like they ought to be easy have hidden gotchas which only turn up once you start trying to code them

I don't disagree (though I think that I'm less confident on this point than you are).

Human beings don't make billions of sequential self-modifications, so they're not existence proofs that human-quality reasoning is good enough for that.

Why do you think that an AI would need to make billions of sequential self-modifications when humans don't need to?

I'm not sure how to go about convincing you that stable-goals self-modification is not something which can be taken for granted to the point that there is no need to try to make the concepts crisp and lay down mathematical foundations. If

I agree that it can't be taken for granted. My questions are about the particular operationalization of a self-modifying AI that you use in your publication. Why do you think that the particular operationalization is going to be related to the sorts of AIs that people might build in practice?

Comment author: Eliezer_Yudkowsky 06 June 2013 06:02:27AM 7 points [-]

I agree that it can't be taken for granted. My questions are about the particular operationalization of a self-modifying AI that you use in your publication. Why do you think that the particular operationalization is going to be related to the sorts of AIs that people might build in practice?

The paper is meant to be interpreted within an agenda of "Begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty"; not as "We think this Godelian difficulty will block AI", nor "This formalism would be good for an actual AI", nor "A bounded probabilistic self-modifying agent would be like this, only scaled up and with some probabilistic and bounded parts tacked on". If that's not what you meant, please clarify.

Comment author: JonahSinick 06 June 2013 06:13:46AM 2 points [-]

Ok, that is what I meant, so your comment has helped me better understand your position.

Why do you think that

Begin tackling the conceptual challenge of describing a stably self-reproducing decision criterion by inventing a simple formalism and confronting a crisp difficulty

is cost-effective relative to other options on the table?

For "other options on the table," I have in mind things such as spreading rationality, building the human capital of people who care about global welfare, increasing the uptake of important information into the scientific community, and building transferable skills and connections for later use.

Comment author: Kaj_Sotala 06 June 2013 07:32:35AM *  21 points [-]

Personally, I feel like that kind of metawork is very important, but that somebody should also be doing something that isn't just metawork. If there's nobody making concrete progress on the actual problem that we're supposed to be solving, there's a major risk of the whole thing becoming a lost purpose, as well as of potentially-interested people wandering off to somewhere where they can actually do something that feels more real.

Comment author: lukeprog 06 June 2013 01:00:17PM *  6 points [-]

cost-effective relative to other options on the table

BTW, I spent a large fraction of the first few months of 2013 weighing FAI research vs. other options before arriving at MIRI's 2013 strategy (which focuses heavily on FAI research). So it's not as though I think FAI research is obviously the superior path, and it's also not as though we haven't thought through all these different options, and gotten feedback from dozens of people about those options, and so on.

Also note that MIRI did, in fact, decide to focus on (1) spreading rationality, and (2) building a community of people who care about rationality, the far future, and x-risk, before turning its head to FAI research: see (in chronological order) the Singularity Summit, Less Wrong and CFAR.

But the question of which interventions are most cost effective (given astronomical waste) is a huge and difficult topic, one that will require thousands of hours to examine properly. Building on Beckstead and Bostrom, I've tried to begin that examination here. Before jumping over to that topic, I wonder: do you now largely accept the case Eliezer made for this latest paper as an important first step on an important sub-problem of the Friendly AI problem? And if not, why not?

Comment author: Eliezer_Yudkowsky 06 June 2013 06:40:23AM 7 points [-]

That sounds like a very long conversation if we're supposed to be giving quantitative estimates on everything. The qualitative version is just that this sort of thing can take a long time, may not parallelize easily, and can potentially be partially factored out to academia, and so it is wise to start work on it as soon as you've got enough revenue to support even a small team, so long as you can continue to scale your funding while that's happening.

This reply takes for granted that all astronomical benefits bottleneck through a self-improving AI at some point.

Comment author: elharo 07 June 2013 11:55:25AM *  0 points [-]

Other options on the table are not mutually exclusive. There is a lot of wealth and intellectual brain power in the world, and a lot of things to work on. We can't and shouldn't all work on one most important problem. We can't all work on the thousand most important problems. We can't even agree on what those problems are.

I suspect Eliezer has a comparative advantage in working on this type of AI research, and he's interested in it, so it makes sense for him to work on this. It especially makes sense to the extent that this is an area no one else is addressing. We're only talking about an expenditure of several careers and a few million dollars. Compared to the world economy, or even compared to the non-profit sector, this is a drop in the bucket.

Now if instead Eliezer was the 10,000th smart person working on string theory, or if there was an Apollo-style government-funded initiative to develop an FAI by 2019, then my estimate of the comparative advantage of MIRI would shift. But given the facts as they are, MIRI seems like a plausible use of the limited resources it consumes.

Comment author: Martin-2 06 June 2013 03:32:57PM 0 points [-]

"...need to make billions of sequential self-modifications when humans don't need to" to do what? Exist, maximize utility, complete an assignment, fulfill a desire...? Some of those might be better termed as "wants" than "needs" but that info is just as important in predicting behavior.

Comment author: falenas108 06 June 2013 02:20:24PM 0 points [-]

Why do you think that an AI would need to make billions of sequential self-modifications when humans don't need to?

For starters, humans aren't able to make changes as easily as an AI can. We don't have direct access to our source code that we can change effortlessly, any change we make costs either time, money, or both.

Comment author: elharo 07 June 2013 12:00:33PM *  0 points [-]

That doesn't address the question. It says that an AI could more easily make self-modifications. It doesn't suggest that an AI needs to make such self-modifications. Human intelligence is an existence proof that human-level intelligence does not require "billions of sequential self-modifications". Whether greater than human intelligence requires it, in fact whether greater than human intelligence is even possible, is still an open question.

So I reiterate, "Why do you think that an AI would need to make billions of sequential self-modifications when humans don't need to?"

Comment author: bogdanb 07 June 2013 08:04:44PM *  1 point [-]

Human intelligence is an existence proof that human-level intelligence does not require "billions of sequential self-modifications". Whether greater than human intelligence requires it, in fact whether greater than human intelligence is even possible, is still an open question.

Why do you think that an AI would need to make billions of sequential self-modifications when humans don't need to?

Human intelligence required billions of sequential modifications (though not selfmodifications). An AI in general would not need self-modifications, but for a AGI it seems that it would be necessary. I don’t doubt a formal reasoning for the latter statement has been written by someone smarter than me before, but a very informal argument would be something like this:

If an AGI doesn’t need to self-modify, then that AGI is already perfect (or close enough that it couldn’t possibly matter). Since practically no software humans ever built was ever perfect in all respects, that seems exceedingly unlikely. Therefore, the first AGI would (very likely) need to be modified. Of course, at the begining it might be modified by humans (thus, not selfmodified), but the point of building AGI is to make it smarter than us. Thus, once it is smarter than us by a certain amount, it wouldn’t make sense for us (stupider intellects) to improve it (smarter intellect). Thus, it would need to self-modify, and do it a lot, unless by some ridiculously fortuitous accident of math (a) human intelligence is very close to the ideal, or (b) human intelligence will build something very close to the ideal on the first try.

It would be nice if those modifications would be things that are good for us, even if we can’t understand them.

Comment author: jsteinhardt 07 June 2013 02:28:27PM 0 points [-]

Depending on your math level, reading Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference by Judea Pearl might present you with a crisper idea of why it can be a good idea to formalize certain types of AI problems in general, and it would be a life-enriching experience, but I anticipate that's more effort than you'd want to put into this exact point.

FWIW, Jonah has a PhD in math and has probably read Pearl or a similar graphical models book.

(Not directly relevant to the conversation, but just trying to lower your probability estimate that Jonah's objections are naieve.)

Comment author: Nick_Beckstead 06 June 2013 05:10:52PM *  1 point [-]

I don't see it is a decisive point, one of "many weak arguments," but I think the analogy with human self-modification is relevant. I would like to see more detailed discussion of the issue.

Aspects of this that seem relevant to me:

  • Genetic and cultural modifications to human thinking patterns have been extremely numerous. If you take humanity as a whole as an entity doing self-modification on itself, there have been an extremely large number of successful self-modifications.
  • Genetic and cultural evolution have built humans individually capable of self-modification without stumbling over Lobian obstacles. Evolution and culture likely used relatively simple and easy search processes to do this, rather than ones that rely on very sophisticated mathematical insights. Analogously, one might expect that people will develop AGI in a way that overcomes these problems as well.
Comment author: Eliezer_Yudkowsky 06 June 2013 08:39:18PM 10 points [-]

Self-modification is to be interpreted to include 'directly editing one's own low-level algorithms using high-level deliberative process' but not include 'changing one's diet to change one's thought processes'. If you are uncomfortable using the word 'self-modification' for this please substitute a new word 'fzoom' which means only that and consider everything I said about self-modification to be about fzoom.

Humans wouldn't look at their own source code and say, "Oh dear, a Lobian obstacle", on this I agree, but this is because humans would look at their own source code and say "What?". Humans have no idea under what exact circumstances they will believe something, which comes with its own set of problems. The Lobian obstacle shows up when you approach things from the end we can handle, namely weak but well-defined systems which can well-define what they will believe, whereas human mathematicians are stronger than ZF plus large cardinals but we don't know how they work or what might go wrong or what might change if we started editing neural circuit #12,730,889,136.

As Christiano's work shows, allowing for tiny finite variances of probability might well dissipate the Lobian obstacle, but that's the sort of thing you find out by knowing what a Lobian obstacle is.

Comment author: Nick_Beckstead 07 June 2013 12:49:58PM 1 point [-]

Self-modification is to be interpreted to include 'directly editing one's own low-level algorithms using high-level deliberative process' but not include 'changing one's diet to change one's thought processes'. If you are uncomfortable using the word 'self-modification' for this please substitute a new word 'fzoom' which means only that and consider everything I said about self-modification to be about fzoom.

Very helpful. This seems like something that could lead to a satisfying answer to my question. And don't worry, I won't engage in a terminological dispute about "self-modification."

Can you clarify a bit what you mean by "low-level algorithms"? I'll give you a couple of examples related to what I'm wondering about.

Suppose I am working with a computer to make predictions about the the weather, and we consider the operations of the computer along with my brain as a single entity for the purposes testing whether the Lobian obstacles you are thinking of arise in practice. Now suppose I make basic modifications to the computer, expecting that the joint operation of my brain with the computer will yield improved output. This will not cause me to trip over Lobian obstacles. Why does whatever concern you have about the Lob problem predict that it would not, but also predict that future AIs might stumble over the Lob problem?

Another example. Humans learn different mental habits without stumbling over Lobian obstacles, and they can convince themselves that adopting the new mental habits is an improvement. Some of these are more derivative ("Don't do X when I have emotion Y") and others are perhaps more basic ("Try to update through explicit reasoning via Bayes' Rule in circumstances C"). Why does whatever concern you have about the Lob problem predict that humans can make these modifications without stumbling, but also predict that future AIs might stumble over the Lob problem?

If the answer to both examples is "those are not cases of directly editing one's low-level algorithms using high-level deliberative processes," can you explain why your concern about Lobian issues only arises in that type of case? This is not me questioning your definition of "fzoom," it is my asking why Lobian issues only arise when you are worrying about fzoom.

The first example is related to what I had in mind when I talked about fundamental epistemic standards in a previous comment:

Part of where I'm coming from on the first question is that Lobian issues only seem relevant to me if you want to argue that one set of fundamental epistemic standards is better than another, not for proving that other types of software and hardware alterations (such as building better arms, building faster computers, finding more efficient ways to compress your data, finding more efficient search algorithms, or even finding better mid-level statistical techniques) would result in more expected utility. But I would guess that once you have an agent operating with a minimally decent fundamental epistemic standards, you just can't prove that altering the agent's fundamental epistemic standards would result in an improvement. My intuition is that you can only do that when you have an inconsistent agent, and in that situation it's unclear to me how Lobian issues apply.

Comment author: Vaniver 06 June 2013 08:19:57PM 1 point [-]

Genetic and cultural evolution have built humans individually capable of self-modification without stumbling over Lobian obstacles.

Well, part of this is because modern humans are monstrous in the eyes of many pre-modern humans. To them, the future has been lost because they weren't using a self-modification procedure that provably preserved their values.

Comment author: lukeprog 06 June 2013 11:07:19PM 0 points [-]

Here is my general response to your concern.

Comment author: lukeprog 02 July 2013 01:52:10AM *  1 point [-]

Probably a stupid question but... on the issue of goal stability more generally, might the Lyapunov stability theorems (see sec. 3) be of use? For more mathematical detail, see here.

Comment author: lukeprog 10 August 2013 10:35:55PM 1 point [-]

For another take on why some people expect this kind of work on the Lobian obstacle to be useful for FAI, see this interview with Benja Fallenstein.

Comment author: lukeprog 30 June 2013 01:47:11AM 1 point [-]

I like to think of this paper as being as the philosophical edge of AI research, which doesn't yet have its own subfield, ala a quote from Boden in Bobrow & Hayes (1985):

AI suffers from some of the same problems as philosophy: it tackles the unanswered, or even the unanswerable, questions. As soon as it manages to find a fruitful way to answer one of them, the question gets hived off as a specialist sub-field. The special sciences started emerging from philosophy in the Renaissance. Specialist areas of study have been emerging from AI for only thirty years. But already we have distinctive sub-fields — such as pattern recognition, image-processing, and rule-based systems. Their executors often speak of AI (if they speak of it at all) as something not quite respectable, something nasty in the woodshed that was once glimpsed but is better forgotten. What should not be forgotten is that the respectable topics were excluded from what people are prepared to call 'AI' as soon as they became 'respectable'.

Comment author: Benja 30 June 2013 12:28:55PM 0 points [-]

I like the idea of a "philosophical edge", but what it brings to mind is more the Dennett quote (don't remember whether the idea originates with him, would expect that it doesn't but don't know) to the effect that philosophy (as opposed to science) is what you do when you haven't yet figured out what the right questions to ask are. (Not 100% right match for the tiling paper, but going in the right direction.)

On the other hand, I never liked the famous "it stops being called AI as soon as people start using it" meme you're quoting, because that always struck me as a completely reasonable position to take. Surely pattern recognition, image-processing and rule-based systems aren't obviously huge steps towards passing the Turing test, and although I'm willing to call narrow AI "narrow artificial intelligence" because I see no reason to embark on the fool's errand of trying to change that terminology, I can't really blame people for measuring "AI" research against the standard of general intelligence. And yes, it's quite possible that pattern recognition, image processing and rule-based systems are necessary baby steps on the road to AGI, but if someone in their best judgment thinks that they're probably not, I don't see why they're obviously wrong. And just because your research into alchemy lead to important insights into chemistry, you don't get to call all chemistry research "alchemy" (with obvious analogy caveat that the metal-to-gold-by-magic-symbols goal of alchemy is bunk and we have an existence proof of AGI).

Comment author: jsteinhardt 07 June 2013 02:38:33PM 0 points [-]

This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals).

I'm confused by this sentence. There are many statistical testing methods that output what are essentially proofs; e.g. statements of the form "probability of a failure existing is at most 10^(-100)". Why would this not be sufficient?

More generally, as I've said in another comment, I would really like to understand how the Lob obstacle relates to statistical learning methods, especially since those seem like our best guess as to what an AI paradigm would look like.

Comment author: Eliezer_Yudkowsky 07 June 2013 08:30:56PM 2 points [-]

What sort of statistical testing method would output a failure probability of at most 10^(-100) for generic optimization problems without trying 10^100 examples? You can get this in some mathematical situations but only because if X doesn't have property Y then it has an independent 50% chance of showing property Z on many different trials of Z. For more generic optimization problems, if you haven't tested fitness on 10^100 occasions you can't rule out a >10^100 probability of any sort of possible blowup. And even if you test 10^100 samples the guarantee is only as strong as your belief that the samples were taken from a probability distribution exactly the same as real-world contexts likely to be encountered, down to the 100th decimal place.

Comment author: jsteinhardt 07 June 2013 09:45:43PM 2 points [-]

It depends on the sort of guarantee you want. Certainly I can say things of the form "X and Y differ from each other in mean by at most 0.01" with a confidence that high, without 10^100 samples (as long as the samples are independent or at least not too dependent).

If your optimization problem is completely unstructured then you probably can't do better than the number of samples you have, but if it is completely unstructured then you also can't prove anything about it, so I'm not sure what point you're trying to make. It seems a bit unimaginative to think that you can't come up with any statistical structure to exploit, especially if you think there is enough mathematical structure to prove strong statements about self-modification.

Comment author: Eliezer_Yudkowsky 07 June 2013 09:50:57PM 3 points [-]

If you can get me a conditionally independent failure probability of 10^-100 per self-modification by statistical techniques whose assumptions are true, I'll take it and not be picky about the source. It's the 'true assumptions' part that seems liable to be a sticking point. I understand how to get probabilities like this by doing logical-style reasoning on transistors with low individual failure probabilities and proving a one-wrong-number assumption over the total code (i.e., total code functions if any one instruction goes awry) but how else would you do that?

Comment author: timtyler 10 June 2013 10:04:35AM 0 points [-]

This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals).

I'm confused by this sentence. There are many statistical testing methods that output what are essentially proofs; e.g. statements of the form "probability of a failure existing is at most 10^(-100)". [...]

It seems as though they would involve a huge number of trials.

"Evolutionary" algorithms aren't typically used to change fitness functions anyway. They are more usually associated with building representations of the world to make predictions with. This complaint would seem to only apply to a few "artificial life" models - in which all parts of the system are up for grabs.

Comment author: bogdanb 07 June 2013 08:33:26PM -1 points [-]

There are many statistical testing methods that output what are essentially proofs; e.g. statements of the form "probability of a failure existing is at most 10^(-100)". Why would this not be sufficient?

(Approximate orders of magnitude:)

Number of atoms in universe : 10^80

Number of atoms in a human being: 10^28

Number of humans that have existed: 10^10

Number of AGI-creating-level inventions expected to be made by humans: 10^0–10^1

Number of AGI-creating-level inventions expected to be made by 1% (10^-2) of the universe turned into computronium, with no more that human level thought-to-matter efficiency, extrapolating linearly: 10^(80 - 2 - 10 - 28) = 10^40.

Hmm, that doesn’t sound that bad, but we got from 10^(-100) to 10^(-60) really fast. Also, I don’t think Eliezer was talking about that kind of statistical method.

Comment author: jsteinhardt 07 June 2013 09:46:34PM 1 point [-]

I mean, I could easily make the 100 into a 400, so I don't think this is that relevant.

Comment author: bogdanb 07 June 2013 10:57:44PM *  0 points [-]

Yes, the last sentence is probably my real “objection”. (Well, I don’t object to your statements, I just don’t think that’s what Eliezer meant. Even if you run a non-statistical, deterministic theorem prover, using current hardware the probability of failure is much above 10^-100.)

The silly part of the comment was just a reminder (partly to myself) that AGI problems can span orders of magnitude so ridiculously outside the usual human scale that one can’t quite approximate (the number of atoms in the universe)^-1 as zero without thinking carefully about it.

Comment author: JoshuaFox 07 June 2013 08:15:21AM 0 points [-]

Possible typo: Equation 4.2 subscript: T- n+1

Should this be T- (n+1) ?

Comment author: Manfred 07 June 2013 05:58:58AM *  0 points [-]

On page 12, when you talk about the different kinds of trust, it seems like tiling trust is just a subtype of naturalistic trust. If something running on T can trust some arbitrary physical system if that arbitrary physical system implements T, then it should be able to trust its successor if that successor is a physical system that implements T. Not sure if that means anything.

Comment author: Eliezer_Yudkowsky 07 June 2013 08:08:07AM 1 point [-]

This is correct; naturalistic trust subsumes indefinitely tiling trust.

Comment author: Vaniver 06 June 2013 10:12:37PM *  0 points [-]

Under the strict criterion of meliorizing as written, it would make sense to swap to a program that promised to save 1,000 people, let all the others die, and make no further improvements, since this would still be better than not swapping. According to the line of argument in section 4 of Schmidhuber (2007), the agent ought to consider that it would be better to keep the previous program and wait for it to generate a better alternative.

Can't you require that the agents you swap to spend at least some fraction of their effort on meliorizing? Each swap could lower that fraction, based on how much the expected value had increased (the closer we are to the goal, the less we need to search more) and how much effort had already been expended (if we've searched enough, we can be pretty sure that there's not a better solution). More formally, you would want to spend meliorizing effort relative to the optimality gap you're facing (or whatever crude approximation to it you have), and the cost of spending more effort relative to your current best plan (you might have another day you can spend looking, or it might be that if you don't stop planning and start doing now, you lose everyone).

Comment author: timtyler 10 June 2013 12:51:14AM *  -1 points [-]

This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals). Mathematical proofs have the property that they are as strong as their axioms and have no significant conditionally independent per-step failure probability if their axioms are semantically true, which suggests that something like mathematical reasoning may be appropriate for certain particular types of self-modification during some developmental stages.

All optimization involves a generate-and-test procedure. Insisting on proofs is a lot like insisting on testing every variant in a given set. It's a constraint on the optimization processes used - and such constraints seem at least as likely to lead to worse results as they do to better ones.

By analogy, being able to prove there's no mate in three doesn't rule out a mate in four - that a more sensible and less constrained algorithm might easily have found.

Basically, optimizing using proofs (in this way) is like trying to fight with both of your hands tied. Yes, that stops you from hitting yourself in the face - but that isn't the biggest problem in the first place.