Eliezer_Yudkowsky comments on Tiling Agents for Self-Modifying AI (OPFAI #2) - Less Wrong

55 Post author: Eliezer_Yudkowsky 06 June 2013 08:24PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (260)

You are viewing a single comment's thread.

Comment author: Eliezer_Yudkowsky 06 June 2013 08:30:27PM 21 points [-]

(Reply to.)

My previous understanding had been that MIRI staff think that by default, one should expect to need to solve the Lob problem in order to build a Friendly AI.

By default, if you can build a Friendly AI you were not troubled by the Lob problem. That working on the Lob Problem gets you closer to being able to build FAI is neither obvious nor certain (perhaps it is shallow to work on directly, and those who can build AI resolve it as a side effect of doing something else) but everything has to start somewhere. Being able to state crisp difficulties to work on is itself rare and valuable, and the more you engage with a problem like stable self-modification, the more you end up knowing about it. Engagement in a form where you can figure out whether or not your proof goes through is more valuable than engagement in the form of pure verbal arguments and intuition, although the latter is significantly more valuable than not thinking about something at all.

Reading through the whole Tiling paper might make this clearer; it spends the first 4 chapters on the Lob problem, then starts introducing further concepts once the notion of 'tiling' has been made sufficiently crisp, like the Vingean principle or the naturalistic principle, and then an even more important problem with tiling probabilistic agents (Ch. 7) and another problem with tiling bounded agents (Ch. 8), neither of which are even partially solved in the paper, but which would've made a lot less sense - would not have been reified objects in the reader's mind - if the paper hadn't spent all that time on the mathematical machinery needed to partially solve the Lob problem in logical tiling, which crispifies the notion of a 'problem with tiling'.

Comment author: elharo 07 June 2013 12:13:16PM 12 points [-]

I feel like in this comment you're putting your finger on a general principal of instrumental rationality that goes beyond the specific issue at hand, and indeed beyond the realm of mathematical proof. It might be worth a post on "engagement" at some point.

Specifically, I note similar phenomena in software development where sometimes what I start working on ends up being not at all related to the final product, but nonetheless sets me off on a chain of consequences that lead me to the final, useful product. And I too experience the annoyance of managers insisting that I lay out a clear path from beginning to end, when I don't yet know what the territory looks like or sometimes even what the destination is.

As Eisenhower said, "Plans are worthless, but planning is everything."

Comment author: Kawoomba 06 June 2013 08:47:34PM 1 point [-]

By default, if you can build a Friendly AI you were not troubled by the Lob problem.

If you can build a Friendly AI which can self-modify. FOOM-able algorithms are an important but not the only avenue to AGI, friendly or otherwise. Also, the "AGI"-class doesn't necessarily imply superhuman cognition. Humans are intelligent agents for which the Löb problem has little bearing since we can't (or don't) self-modify to such a large degree quite yet.

Comment author: ESRogs 06 June 2013 09:35:14PM 2 points [-]

Also, the "AGI"-class doesn't necessarily imply superhuman cognition.

Yes, but Friendly AI does. Nobody said you needed to solve the Lob problem to build an AGI. What we're talking about here is something more specific than that.

Comment author: jsteinhardt 07 June 2013 01:52:03PM 0 points [-]

Any agent that takes in information about the world is implicitly self-modifying all the time.

Comment author: [deleted] 24 June 2013 09:52:30PM *  1 point [-]

Here's a distinction you could make: an AI is self-modifying if it is effectively capable of making any change to its source code at any time, and non-self-modifying if it is not. (The phrase "capable of" is vague, of course.)

I can imagine non-self-modifying AI having an advantage over self-modifying AI, because it might be possible for an NSM AI to be protected from its own stupidity, so to speak. If the AI were to believe that overwriting all of its beliefs with the digits of pi is a good idea, nothing bad would happen, because it would be unable to do that. Of course, these same restrictions that make the AI incapable of breaking itself might also make it incapable of being really smart.

I believe I've heard someone say that any AI capable of being really smart must be effectively self-modifying, because being really smart involves the ability to make arbitrary calculations, and if you can make arbitrary calculations, then you're not restricted. My objection is that there's a big difference between making arbitrary calculations and running arbitrary code; namely, the ability to run arbitrary code allows you to alter other calculations running on the same machine.

Comment author: [deleted] 25 June 2013 01:52:04AM 2 points [-]

Lemme expand on my thoughts a little bit. I imagine a non-self-modifying AI to be made of three parts: a thinking algorithm, a decision algorithm, and a belief database. The thinking and decision algorithms are immutable, and the belief database is (obviously) mutable. The supergoal is coded into the decision algorithm, so it can't be changed. (Problem: the supergoal only makes sense in the concept of certain beliefs, and beliefs are mutable.) The contents of the belief database influence the thinking algorithm's behavior, but they don't determine its behavior.

The ideal possibility is that we can make the following happen:

  • The belief database is flexible enough that it can accommodate all types of beliefs from the very beginning. (If the thinking algorithm is immutable, it can't be updated to handle new types of beliefs.)
  • The thinking algorithm is sufficiently flexible that the beliefs in the belief database can lead the algorithm in the right directions, producing super-duper intelligence.
  • The thinking algorithm is sufficiently inflexible that the beliefs in the belief database cannot cause the algorithm to do something really bad, producing insanity.
  • The supergoal remains meaningful in the context of the belief database regardless of how the thinking algorithm ends up behaving.

(My ideas haven't been taken seriously in the past, and I have no special knowledge in this area, so it's likely that my ideas are worthless. They feel valuable to me, however.)

Comment author: paulfchristiano 30 August 2014 04:06:46AM 0 points [-]

This point seems like an argument as an argument in favor of the relevance of the problem laid out in this post. I have other complaints with this framing of the problem, which I expect you would share.

The key distinction between this and contemporary AI is not self-modification, but wanting to have the kind of agent which can look at itself and say, "I know that as new evidence comes in I will change my beliefs. Fortunately, it looks like I'm going to make better decisions as a result" or perhaps even more optimistically "But it looks like I'm not changing them in quite the right way, and I should make this slight change."

The usual route is to build agents which don't reason about their own evolution over time. But for sufficiently sophisticated agents, I would expect them to have some understanding of how they will behave in the future, and to e.g. pursue more information based on the explicit belief that by acquiring that information they will enable themselves to make better decisions. This seems like it is a more robust approach to getting the "right" behavior than having an agent which e.g. takes "Information is good" as a brute fact or has a rule for action that bakes in an ad hoc approach to estimating VOI. I think we can all agree that it would not be good to build an AI which calculated the right thing to do, and then did that with probability 99% and took a random action with probability 1%.

That said, even if you are a very sophisticated reasoner, having in hand some heuristics about VOI is likely to be helpful, and if you think that those heuristics are effective you may continue to use them. I just hope that you are using them because you believe they work (e.g. because of empirical observations of them working, the belief that you were intelligently designed to make good decisions, or whatever), not because they are built into your nature.

Comment author: Kawoomba 07 June 2013 01:59:40PM 0 points [-]

For a somewhat contrived and practically less relevant notion of self modifying. You could regard a calculator as being self modifying, not very relevantly.

Comment author: jsteinhardt 07 June 2013 02:30:32PM 2 points [-]

It would be useful to understand why we think a calculator doesn't "count" as self-modification. In particular, we don't think calculators run into the Lob obstacle, so what is the difference between calculators and AIs?

Comment author: Kawoomba 07 June 2013 03:41:37PM *  0 points [-]

As always in such matters, think of Turing Machines. If the transition function isn't modified, the state of the Turing Machine may change. However, it'll always be in a internal state prespecified in its transition function, it won't get unknown or unknowable new entries in its action table.

Universal Turing Machines are designed to change, to take their transition function from the input tape as input, a prime example of self-modification. But they as well -- having read their new transition function from their input tape -- will go along their business as usual without further changes to their transition function. (You can of course program them to later continue changing their action table, but the point is that such changes to its own action table -- to its own behavior -- are clearly delineated from just contents in its memory / work tape.)

A calculator or a non-self-modifying AI will undergo changes in its memory, but it'll never endeavor to define new internal states, with new rules, on its own. It'll memorize whether you've entered "0.7734" in its display, but it'll only perform its usual actions on that number. A game of tetris will change what blocks it displays on your screen, but that won't modify its rules.

There may be accidental modifications (bugs etc.) leading to unknown states and behavior, but I wouldn't usefully call that an active act of self-modification. (It's not a special case to guard against, other than by the usual redundancy / using checksums. But that's no more FAI research than rather the same constraints as when working with e.g. real time or mission critical applications.)

Comment author: philh 07 June 2013 11:24:06PM 2 points [-]

I don't think this is quite there. A UTM is itself a TM, and its transition function is fixed. But it emulates a TM, and it could instead emulate a TM-with-variable-transition-function, and that thing would be self-modifying in a deeper sense than an emulation of a standard TM.

But it's still not obvious to me how to formalize this, because (among other problems) you can replace an emulated TMWVTF with an emulated UTM which in turn emulates a TMWVTF...

Comment author: JonahSinick 06 June 2013 09:48:47PM 0 points [-]

See the last paragraph of this comment highlighting my question about the relevance of the operationalization.

Comment author: Eliezer_Yudkowsky 06 June 2013 10:09:32PM 3 points [-]

I feel like I'm not clear on what question you're asking. Can you give an example of what a good answer would look like, maybe using Xs and Ys since I can hardly ask you to come up with an actual good argument?

Comment author: JonahSinick 06 June 2013 10:26:47PM *  6 points [-]

There are many possible operationalizations of a self-modifying AI. For example,

  • One could model a self-improving AI as the Chinese economy (which is in some sense a self-improving powerful optimization process).

  • One could model a self-improving AI as a chess playing computer program which uses a positional weighting system to choose which moves to make, and which analyzes which weighting heuristics statistically lead to more winning games, in order to improve its positional weighting system.

My reaction to your paper is similar to what my reaction would be to a paper that studies ways to make sure that the Chinese economy doesn't change in such as way that so that GDP start dropping, or ways to make sure that the chess program doesn't self-modify to get worse and worse at winning chess games rather than better and better.

It's conceivable that such a paper would be useful for building a self-improving AI, but a priori I would bet very heavily that activities such as

  • Working to increase rationality
  • Spreading concern for global welfare
  • Building human capital of people who are concerned about global welfare

are more cost-effective activities ways for reducing AI risk than doing such research.

I'm looking for an argument for why the operationalization in the paper is more likely to be relevant to creating safe AI than modeling a self-improving AI as the Chinese economy, or as the aforementioned chess program, or than a dozen other analogous operationalizations that I could make up.

Comment author: Eliezer_Yudkowsky 27 June 2013 05:09:35AM 19 points [-]

One could model a self-improving AI as the Chinese economy (which is in some sense a self-improving powerful optimization process)...

I'm looking for an argument for why the operationalization in the paper is more likely to be relevant to creating safe AI than modeling a self-improving AI as the Chinese economy, or as the aforementioned chess program, or than a dozen other analogous operationalizations that I could make up.

If somebody wrote a paper showing how an economy could naturally build another economy while being guaranteed to have all prices derived from a constant set of prices on intrinsic goods, even as all prices were set by market mechanisms as the next economy was being built, I'd think, "Hm. Interesting. A completely different angle on self-modification with natural goal preservation."

I'm surprised at the size of the apparent communications gap around the notion of "How to get started for the first time on a difficult basic question" - surely you can think of mathematical analogies to research areas where it would be significant progress just to throw out an attempted formalization as a base point?

There are all sorts of disclaimers plastered onto the paper about how this only works because logic is monotonic, probabilistic reasoning is not monotonic etcetera. The point is to have a way, any way of just getting started on stable self-modification even though we know the particular exact formalism doesn't directly work for probabilistic agents. Once you do that you can at least state what it is you can't do. A paper on a self-replicating economy with a stable set of prices on intrinsic goods would likewise be something you could look at and say, "But this formally can't do X, because Y" and then you would know more about X and Y then you did previously. Being able to say, "But the verifier-suggester separation won't work for expected utility agents because probabilistic reasoning is not monotonic" means you've gotten substantially further into FAI work than when you're staring dumbly at the problem.

AIXI was conceptual progress on AGI, and especially public discussion of AGI, because it helped people like me say much more formally all the things that we didn't like about AIXI, like the anvil problem or AIXI seizing control of its reward channel or AIXI only being able to represent utility functions of sensory data rather than environmental ontologies. Someone coming up with a list of 5 key properties the tiling architecture does not have would be significant progress, and I would like to specifically claim that as an intended, worthwhile, fully-pays-back-the-effort positive consequence if it happens - and this is not me covering all the bases in case of disappointment, the paper was presented in a way consonant with that goal and not in a way consonant with claiming one-trueness.

I don't understand the model you have of FAI research where this is not the sort of thing that you do at the beginning.

Comment author: JonahSinick 27 June 2013 06:49:24PM *  2 points [-]

Thanks for continuing to engage.

I described my position in another comment. To reiterate and elaborate:

  1. My current best guess is that there are so many unrelated potential models for AI (relative to the information that we currently have) that the probability of FAI work on a single one of them ending up being relevant is tiny. In order to make a compelling argument for the relevance of MIRI's work on the Lob problem, you have to argue that the model used isn't only one of, e.g. 10^10 distinct models of AI with similar probability of being realized in practice.

  2. One could argue that the problem is sufficiently important so that one should work on it even if the probability of the work being relevant is tiny. But there are other interventions on the table. You've made major contributions by spreading rationality and by creating a community for people who are interested in global welfare to network and collaborate with one another. These things probably substantially reduce astronomical waste (in expectation). In order to argue in favor of MIRI's FAI research being optimal philanthropy, you have to argue that the probability of the research being relevant is sufficiently great so that its expected value outweighs the expected value of these other activities.

  3. One could argue that if there are in fact so many models for AI then we're doomed anyway, so we should assume that there aren't so many models. But rather than trying to work on the models that we think most relevant now, we can wait until it becomes more clear what AGI will look like in practice, and then develop FAI for that type of AI. Whether or not this is feasible is of course related to the question of whether the world's elites will navigate the creation of AI just fine. I think that there are good reasons to think that the probability of this is pretty high, and that the most leveraged efforts are getting good people in positions of future influence rather than doing FAI research now. Your work on rationality training and community building can help, and already has helped a lot with this.

Comment author: Eliezer_Yudkowsky 27 June 2013 07:21:13PM 8 points [-]

Neither 2 nor 3 is the sort of argument I would ever make (there's such a thing as an attempted steelman which by virtue of its obvious weakness doesn't really help). You already know(?) that I vehemently reject all attempts to multiply tiny probabilities by large utilities in real-life practice, or to claim that the most probable assumption about a background epistemic question leads to a forgone doom and use this to justify its improbable negation. The part at which you lost me is of course part 1.

I still don't understand what you could be thinking here, and feel like there's some sort of basic failure to communicate going on. I could guess something along the lines of "Maybe Jonah is imagining that Friendly AI will be built around principles completely different from modern decision theory and any notion of a utility function..." (but really, is something like that one of just 10^10 equivalent candidates?) "...and more dissimilar to that than logical AI is from decision theory" (that's a lot of dissimilarity but we already demonstrated conceptual usefulness over a gap that size). Still, that's the sort of belief someone might end up with if their knowledge of AI was limited to popular books extolling the wonderful chaos of neural networks, but that idea is visibly stupid so my mental model of Anna warns me not to attribute it to you. Or I could guess, "Maybe Jonah is Holden-influenced and thinks that all of this discussion is irrelevant because we're going to build a Google Maps AGI", where in point of fact it would be completely relevant, not a tiniest bit less relevant, if we were going to build a planning Oracle. (The experience with Holden does give me pause and make me worry that EA people may think they already know how to build FAI using their personal wonderful idea, just like vast numbers of others think they already know how to build FAI.) But I still can't think of any acceptable steel version of what you mean, and I say again that it seems to me that you're saying something that a good mainstream AI person would also be staring quizzically at.

What would be one of the other points in the 10^10-sized space? If it's something along the lines of "an economic model" then I just explained why if you did something analogous with an economic model it could also be interesting progress, just as AIXI was conceptually important to the history of ideas in the field. I could explain your position by supposing that you think that mathematical ideas never generalize across architectures and so only analyzing the exact correct architecture of a real FAI could be helpful even at the very beginning of work, but this sounds like a visibly stupid position so the model of Anna in my head is warning me not to attribute it to you. On the other hand, some version of, "It is easier to make progress than Jonah thinks because the useful generalization of mathematical ideas does not require you to select correct point X out of 10^10 candidates" seems like it would almost have to be at work here somewhere.

I seriously don't understand what's going on in your head here. It sounds like any similar argument should Prove Too Much by showing that no useful work or conceptual progress could have occurred due to AI work in the last 60 years because there would be 10^10 other models for AI. Each newly written computer program is unique but the ideas behind them generalize, the resulting conceptual space can be usefully explored, that's why we don't start over with every new computer program. You can do useful things once you've collected enough treasure nuggets and your level of ability builds up, it's not a question of guessing the one true password out of 10^10 tries with nothing being progress until then. This is true on a level of generality which applies across computer science and also to AI and also to FAI and also to decision theory and also to math. Everyone takes this for granted as an obvious background fact of doing research which is why I would expect a good mainstream AI person to also be staring quizzically at your statements here. I do not feel like the defense I'm giving here is in any way different from the defense I'd give of a randomly selected interesting AI paper if you said the same thing about it. "That's just how research works," I'd say.

Please amplify point 1 in much greater detail using concrete examples and as little abstraction as possible.

Comment author: JonahSinick 27 June 2013 09:38:04PM 4 points [-]

I continue to appreciate your cordiality.

A number of people have recently told me that they have trouble understanding me unless I elaborate further, because I don't spell out my reasoning in sufficient detail. I think that this is more a matter of the ideas involved being complicated, and there being a lot of inferential distance, than it is lack of effort on my part, but I can see how it would be frustrating to my interlocutors. It seems that I'm subject to the illusion of transparency. I appreciate your patience.

Neither 2 nor 3 is the sort of argument I would ever make (there's such a thing as an attempted steelman which by virtue of its obvious weakness doesn't really help). You already know(?) that I vehemently reject all attempts to multiply tiny probabilities by large utilities in real-life practice, or to claim that the most probable assumption about a background epistemic question leads to a forgone doom and use this to justify its improbable negation. The part at which you lost me is of course part 1.

I know that you've explicitly disavowed arguments of the type in my points 2 and 3. My reason for bringing them up is to highlight the importance of addressing point 1: to emphasize that it doesn't suffice to say "the problem is important and we have to get started on it somehow." I recognize that we have very different implicit assumptions on point 1, and that that's where the core of the disagreement lies.

I still don't understand what you could be thinking here, and feel like there's some sort of basic failure to communicate going on. I could guess something along the lines of "Maybe Jonah is imagining that Friendly AI will be built around principles completely different from modern decision theory and any notion of a utility function..." (but really, is something like that one of just 10^10 equivalent candidates?) "...and more dissimilar to that than logical AI is from decision theory" (that's a lot of dissimilarity but we already demonstrated conceptual usefulness over a gap that size). Still, that's the sort of belief someone might end up with if their knowledge of AI was limited to popular books extolling the wonderful chaos of neural networks, but that idea is visibly stupid so my mental model of Anna warns me not to attribute it to you. Or I could guess, "Maybe Jonah is Holden-influenced and thinks that all of this discussion is irrelevant because we're going to build a Google Maps AGI", where in point of fact it would be completely relevant, not a tiniest bit less relevant, if we were going to build a planning Oracle. (The experience with Holden does give me pause and make me worry that EA people may think they already know how to build FAI using their personal wonderful idea, just like vast numbers of others think they already know how to build FAI.) But I still can't think of any acceptable steel version of what you mean, and I say again that it seems to me that you're saying something that a good mainstream AI person would also be staring quizzically at.

There's essentially only one existing example of an entity with general intelligence: a human. I think that our prior should be that the first AGI will have internal structure analogous to that of a human. Here I'm not suggesting that an AGI will have human values by default: I'm totally on board with your points about the dangers of anthropomorphization in that context. Rather, what I mean is that I envisage the first AGI as having many interacting specialized modules, rather than a mathematically defined utility function.

There are serious dangers of such an entity having values that are orthogonal to humans, and serious dangers of value drift. (Your elegant article Why does power corrupt?, has some relevance to the latter point.) But it seems to me that the measures that one would want to take to prevent humans' goals changing seem completely different from the sorts of measures that might emerge from MIRI's FAI research.

I'll also highlight a comment of Nick Beckstead, which you've already seen and responded to. I didn't understand your response.

I should clarify that I don't have high confidence that the first AGI will develop along these lines. But it's my best guess, and it seems much more plausible to me than models of the type in your paper.

It sounds like any similar argument should Prove Too Much by showing that no useful work or conceptual progress could have occurred due to AI work in the last 60 years because there would be 10^10 other models for AI.

The difference that I perceive between the two scenarios is the nature of the feedback loops in each case.

When one is chipping away at a problem incrementally, one has the capacity to experiment and use the feedback generated from experimentation to help one limit the search space. Based on what I know about the history of science, general relativity is one of the only successful theories that was created without lots of empirical investigation.

The engineers who designed the first bridges had trillions of combinations of design features and materials to consider a priori, the vast majority of which wouldn't work. But an empirical discovery like "material X is too weak to work within any design" greatly limits the search space, because you don't have to think further about any of the combinations involving material X. Similarly if one makes a discovery of the type "material Y is so strong that it'll work with any design." By making a series of such discoveries, one can hone in on a few promising candidates.

This is how I predict that the development of AGI will go. I think that the search space is orders of magnitude too large to think about in a useful way without a lot of experimentation, and that a priori we can't know what the first AGI will look like. I think that once it becomes more clear what the first AGI will look like, it will become much more feasible to make progress on AI safety.

Please amplify point 1 in much greater detail using concrete examples and as little abstraction as possible.

It'll take me a while to come up with a lot of concrete hypotheticals, but I'll get back to you on this.

Comment author: Eliezer_Yudkowsky 27 June 2013 10:06:20PM 5 points [-]

There's essentially only one existing example of an entity with general intelligence: a human. I think that our prior should be that the first AGI will have internal structure analogous to that of a human. Here I'm not suggesting that an AGI will have human values by default: I'm totally on board with your points about the dangers of anthropomorphization in that context. Rather, what I mean is that I envisage the first AGI as having many interacting specialized modules

Okay. This sounds like you're trying to make up your own FAI theory in much the same fashion as Holden (and it's different from Holden's, of course). Um, what I'd like to do at this point is take out a big Hammer of Authority and tell you to read "Artificial Intelligence: A Modern Approach" so your mind would have some better grist to feed on as to where AI is and what it's all about. If I can't do that... I'm not really sure where I could take this conversation. I don't have the time to personally guide you to understanding of modern AI starting from that kind of starting point. If there's somebody else you'd trust to tell you about AI, with more domain expertise, I could chat with them and then they could verify things to you. I just don't know where to take it from here.

On the object level I will quickly remark that some of the first attempts at heavier-than-air flying-machines had feathers and beaks and they did not work very well, that 'interacting specialized modules' is Selling Nonapples, that there is an old discussion in cognitive science about the degree of domain specificity in human intelligence, and that the idea that 'humans are the only example we have' is generally sterile, for reasons I've already written about but I can't remember the links offhand, hopefully someone else does. It might be in Levels of Organization in General Intelligence, I generally consider that pretty obsolete but it might be targeted to your current level.

Comment author: JonahSinick 27 June 2013 10:47:14PM *  3 points [-]

Okay. This sounds like you're trying to make up your own FAI theory in much the same fashion as Holden (and it's different from Holden's, of course).

Either of my best guess or Holden's best guess could be right, and so could lots of other ideas that we haven't thought of. My proposed conceptual framework should be viewed as one of many weak arguments.

The higher level point that I was trying to make is that [the conceptual framework implicit in view that the MIRI's current FAI research has a non-negligible chance of being relevant to AI safety] seems highly conjunctive. I don't mean this rhetorically at all – I genuinely don't understand why you think that we can make progress given how great the unknown unknowns are. You may be right, but justification of your view requires further argumentation.

Um, what I'd like to do at this point is take out a big Hammer of Authority and tell you to read "Artificial Intelligence: A Modern Approach" so your mind would have some better grist to feed on as to where AI is and what it's all about. If I can't do that... I'm not really sure where I could take this conversation. I don't have the time to personally guide you to understanding of modern AI starting from that kind of starting point. If there's somebody else you'd trust to tell you about AI, with more domain expertise, I could chat with them and then they could verify things to you. I just don't know where to take it from here.

A more diplomatic way of framing this would be something like:

"The book Artificial Intelligence: A Modern Approach has a discussion of current approaches to artificial intelligence. Are you familiar with the ideas therein? If not, I'd suggest that you take a look"

Putting that aside, based on conversations with a number of impressive people in machine learning, etc. who I know, my impression is that at the moment, there aren't strong contenders for research programs that could plausibly lead to AGI. I largely accept Luke's argument in his blog post on AI timelines, but this is based on the view that the speed of research is going to increase a lot over the coming years, rather than on the belief that any existing research programs have a reasonable chance of succeeding.

I'd be very interested in hearing about existing research programs that have a reasonable chance of succeeding.

Comment author: shminux 27 June 2013 10:36:17PM -1 points [-]

Just wondering why you see Jonah Sinick of high enough status to be worth explaining to what's been discussed on LW repeatedly. Or maybe I'm totally misreading this exchange.

Comment author: Vaniver 27 June 2013 10:14:50PM *  0 points [-]

I'll also highlight a comment of Nick Beckstead, which you've already seen and responded to. I didn't understand your response.

Let me try from a different angle.

With humans, we see three broad clusters of modification: reproduction, education, and chemistry. Different people are physically constructed in different ways, and so we can see evolution of human civilization by biological evolution of the humans inside it. The environments that people find themselves in or choose leave imprints on those people. Chemicals people ingest can change those people, such as with caffeine, alcohol, morphine, or heroin. (I would include 'changing your diet to change your thought processes' under chemical changes, but the chemical changes from becoming addicted to heroin and from not being creatine deficient look very different.)

For AIs, most of the modification that's interesting and new will look like the "chemistry" cluster. An AI modifying its source code will look a lot like a human injecting itself with a new drug that it just invented. (Nick_Beckstead's example of modifying the code of the weather computer is more like education than it is like chemistry.)

This is great because some drugs dramatically improve performance, and so a person on caffeine could invent a super nootropic, and then on the super nootropic invent a cure for cancer and an even better nootropic, and so on. This is terrifying because any drug that adjusts your beliefs or your decision-making algorithm (think of 'personality' as a subset of this) dramatically changes how you behave, and might do so for the worse. This is doubly terrifying because these changes might be irreversible- you might take a drug that gets rid of your depression by making you incapable of feeling desire, and then not have any desire to restore yourself! This is triply terrifying because the effects of the drug might be unknown- you might not be able to determine what a drug will do to you until after you take it, and by then it might be too late.

For humans this problem is mostly solved by trial and error followed by patternmatching- "coffee is okay, crack is not, because Colin is rich and productive and Craig is neither"- which is not useful for new drugs, and not useful for misclassified old drugs, and not very safe for very powerful systems. The third problem- that the effects might be unknown- is the sort of thing that proofs might help with, except there are some technical obstacles to doing that. The Lobstacle is a prominent theoretical one, and while it looks like there are lots of practical obstacles as well surmounting the theoretical obstacles should help with surmounting the practical obstacles.

Any sort of AGI that's able to alter its own decision-making process will have the ability to 'do chemistry on itself,' and one with stable values will need to have solved the problem of how to do that while preserving its values. (I don't think that humans have 'stable' values; I'd call them something more like 'semi-stable.' Whether or not this is a bug or feature is unclear to me.)

Comment author: JonahSinick 27 June 2013 10:59:49PM 2 points [-]

I understand where you're coming from, and I think that you correctly highlight a potential source of concern, and one which my comment didn't adequately account for. However:

  1. I'm skeptical that it's possible to create an AI based on mathematical logic at all. Even if an AI with many interacting submodules is dangerous, it doesn't follow that working on AI safety for an AI based on mathematical logic is promising.

  2. Humans can impose selective pressures on emergent AI's so as to mimic the process of natural selection that humans experienced.

Comment author: Kawoomba 06 June 2013 11:04:22PM *  7 points [-]

There are many possible operationalizations of a self-modifying AI

No doubt. And as of now, for none of them we're able to tell whether they are safe or not. There's insufficient rigor in the language, the formulizations aren't standardized or pinned down (in this subject matter). MIRI's work is creating and pinning down the milestones for how we'd even go about assessing self-modifying friendly AI in terms of goal stability, in mathematical language.

To have any operationalization of how some specific model of self-modification provably maintains some invariant would be a large step forward, the existence of other models of self-modification nonwithstanding. Safety cannot be proven for all approaches, because not all approaches are safe.

It's conceivable that such a paper would be useful for building a self-improving AI, but a priori I would bet very heavily that activities such as (Working to increase rationality, Spreading concern for global welfare Building human capital of people who are concerned about global welfare) are more cost-effective activities ways for reducing AI risk than doing such research.

Even if that were so, that's not MIRI's (or EY's) most salient comparative advantage (also: CFAR).

Comment author: JonahSinick 07 June 2013 12:26:56AM *  3 points [-]

To have any operationalization of how some specific model of self-modification provably maintains some invariant would be a large step forward, the existence of other models of self-modification nonwithstanding. Safety cannot be proven for all approaches, because not all approaches are safe.

My claim is that there are sufficiently many possible models for AI that given what we (the Less Wrong community, not necessarily AI researchers) know now, the probability of a given model being developed is tiny.

The actionable safety issues that would come up if the AI is like the Chinese economy would be very different from the actionable safety issues that would come up if the AI is like a self-improving chess playing program, which would be very different from the actionable safety issues that would come up if the AI is of the type that Eliezer's publication describes.

Given the paucity of information available about the design of the first AI, I don't think that the probability of doing safety research on a particular model being actionable is sufficiently high for such research to be warranted (relative to other available activities).

Even if that were so, that's not MIRI's (or EY's) most salient comparative advantage (also: CFAR).

  1. Eliezer made a major contribution to increasing rationality with his How To Actually Change Your Mind sequence, which improved the rationality of many people who I know, including myself.

  2. MIRI could engage in other AI safety activities, such as improving future forecasting.

  3. If an organization doesn't have a cost-effective activity to engage in, and the employees recognize this, then they can leave and do something else. Here I'm not claiming that this is in fact the case of MIRI, rather, I'm just responding to your argument.

  4. MIRI's staff could migrate to CFAR.

  5. Out of all of the high impact activities that MIRI staff could do, it's not clear to me that Friendly AI research is their comparative advantage.

Comment author: homunq 23 June 2013 02:35:56AM *  1 point [-]

Also, even if we accept that MIRI's comparative advantage has to do with having a clearer view of the Friendliness vs. UnFriendliness distinction, why wouldn't it be more effective for them to try to insure against an UnFriendly outcome by addressing the UnFriendliness already in the world today? For instance, corporate governance. Corporations' optimization powers are a tremendous source of human happiness, but their UnFriendly tendencies are clear. For now, corporations have only parasitic intelligence, and don't look particularly foomy, but if I had to bet on whether MIRI or Google/TenCent/Palantir/whatever was more likely to foom, there would be no contest.

[There are a bunch of assumptions embedded there. The principal ones are:

  1. If a corporation, as currently constituted, somehow went foom it would be likely to be UnFriendly
  2. If we were able to make it so corporations appeared more Friendly in their day-to-day actions, they would also become less likely to rush headlong into an UnFriendly foom.

I think 1 is pretty undeniable, but I could understand it if someone disagreed with 2.]