Eliezer_Yudkowsky comments on Tiling Agents for Self-Modifying AI (OPFAI #2) - Less Wrong

55 Post author: Eliezer_Yudkowsky 06 June 2013 08:24PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (260)

You are viewing a single comment's thread. Show more comments above.

Comment author: JonahSinick 06 June 2013 10:26:47PM *  6 points [-]

There are many possible operationalizations of a self-modifying AI. For example,

  • One could model a self-improving AI as the Chinese economy (which is in some sense a self-improving powerful optimization process).

  • One could model a self-improving AI as a chess playing computer program which uses a positional weighting system to choose which moves to make, and which analyzes which weighting heuristics statistically lead to more winning games, in order to improve its positional weighting system.

My reaction to your paper is similar to what my reaction would be to a paper that studies ways to make sure that the Chinese economy doesn't change in such as way that so that GDP start dropping, or ways to make sure that the chess program doesn't self-modify to get worse and worse at winning chess games rather than better and better.

It's conceivable that such a paper would be useful for building a self-improving AI, but a priori I would bet very heavily that activities such as

  • Working to increase rationality
  • Spreading concern for global welfare
  • Building human capital of people who are concerned about global welfare

are more cost-effective activities ways for reducing AI risk than doing such research.

I'm looking for an argument for why the operationalization in the paper is more likely to be relevant to creating safe AI than modeling a self-improving AI as the Chinese economy, or as the aforementioned chess program, or than a dozen other analogous operationalizations that I could make up.

Comment author: Eliezer_Yudkowsky 27 June 2013 05:09:35AM 19 points [-]

One could model a self-improving AI as the Chinese economy (which is in some sense a self-improving powerful optimization process)...

I'm looking for an argument for why the operationalization in the paper is more likely to be relevant to creating safe AI than modeling a self-improving AI as the Chinese economy, or as the aforementioned chess program, or than a dozen other analogous operationalizations that I could make up.

If somebody wrote a paper showing how an economy could naturally build another economy while being guaranteed to have all prices derived from a constant set of prices on intrinsic goods, even as all prices were set by market mechanisms as the next economy was being built, I'd think, "Hm. Interesting. A completely different angle on self-modification with natural goal preservation."

I'm surprised at the size of the apparent communications gap around the notion of "How to get started for the first time on a difficult basic question" - surely you can think of mathematical analogies to research areas where it would be significant progress just to throw out an attempted formalization as a base point?

There are all sorts of disclaimers plastered onto the paper about how this only works because logic is monotonic, probabilistic reasoning is not monotonic etcetera. The point is to have a way, any way of just getting started on stable self-modification even though we know the particular exact formalism doesn't directly work for probabilistic agents. Once you do that you can at least state what it is you can't do. A paper on a self-replicating economy with a stable set of prices on intrinsic goods would likewise be something you could look at and say, "But this formally can't do X, because Y" and then you would know more about X and Y then you did previously. Being able to say, "But the verifier-suggester separation won't work for expected utility agents because probabilistic reasoning is not monotonic" means you've gotten substantially further into FAI work than when you're staring dumbly at the problem.

AIXI was conceptual progress on AGI, and especially public discussion of AGI, because it helped people like me say much more formally all the things that we didn't like about AIXI, like the anvil problem or AIXI seizing control of its reward channel or AIXI only being able to represent utility functions of sensory data rather than environmental ontologies. Someone coming up with a list of 5 key properties the tiling architecture does not have would be significant progress, and I would like to specifically claim that as an intended, worthwhile, fully-pays-back-the-effort positive consequence if it happens - and this is not me covering all the bases in case of disappointment, the paper was presented in a way consonant with that goal and not in a way consonant with claiming one-trueness.

I don't understand the model you have of FAI research where this is not the sort of thing that you do at the beginning.

Comment author: JonahSinick 27 June 2013 06:49:24PM *  2 points [-]

Thanks for continuing to engage.

I described my position in another comment. To reiterate and elaborate:

  1. My current best guess is that there are so many unrelated potential models for AI (relative to the information that we currently have) that the probability of FAI work on a single one of them ending up being relevant is tiny. In order to make a compelling argument for the relevance of MIRI's work on the Lob problem, you have to argue that the model used isn't only one of, e.g. 10^10 distinct models of AI with similar probability of being realized in practice.

  2. One could argue that the problem is sufficiently important so that one should work on it even if the probability of the work being relevant is tiny. But there are other interventions on the table. You've made major contributions by spreading rationality and by creating a community for people who are interested in global welfare to network and collaborate with one another. These things probably substantially reduce astronomical waste (in expectation). In order to argue in favor of MIRI's FAI research being optimal philanthropy, you have to argue that the probability of the research being relevant is sufficiently great so that its expected value outweighs the expected value of these other activities.

  3. One could argue that if there are in fact so many models for AI then we're doomed anyway, so we should assume that there aren't so many models. But rather than trying to work on the models that we think most relevant now, we can wait until it becomes more clear what AGI will look like in practice, and then develop FAI for that type of AI. Whether or not this is feasible is of course related to the question of whether the world's elites will navigate the creation of AI just fine. I think that there are good reasons to think that the probability of this is pretty high, and that the most leveraged efforts are getting good people in positions of future influence rather than doing FAI research now. Your work on rationality training and community building can help, and already has helped a lot with this.

Comment author: Eliezer_Yudkowsky 27 June 2013 07:21:13PM 8 points [-]

Neither 2 nor 3 is the sort of argument I would ever make (there's such a thing as an attempted steelman which by virtue of its obvious weakness doesn't really help). You already know(?) that I vehemently reject all attempts to multiply tiny probabilities by large utilities in real-life practice, or to claim that the most probable assumption about a background epistemic question leads to a forgone doom and use this to justify its improbable negation. The part at which you lost me is of course part 1.

I still don't understand what you could be thinking here, and feel like there's some sort of basic failure to communicate going on. I could guess something along the lines of "Maybe Jonah is imagining that Friendly AI will be built around principles completely different from modern decision theory and any notion of a utility function..." (but really, is something like that one of just 10^10 equivalent candidates?) "...and more dissimilar to that than logical AI is from decision theory" (that's a lot of dissimilarity but we already demonstrated conceptual usefulness over a gap that size). Still, that's the sort of belief someone might end up with if their knowledge of AI was limited to popular books extolling the wonderful chaos of neural networks, but that idea is visibly stupid so my mental model of Anna warns me not to attribute it to you. Or I could guess, "Maybe Jonah is Holden-influenced and thinks that all of this discussion is irrelevant because we're going to build a Google Maps AGI", where in point of fact it would be completely relevant, not a tiniest bit less relevant, if we were going to build a planning Oracle. (The experience with Holden does give me pause and make me worry that EA people may think they already know how to build FAI using their personal wonderful idea, just like vast numbers of others think they already know how to build FAI.) But I still can't think of any acceptable steel version of what you mean, and I say again that it seems to me that you're saying something that a good mainstream AI person would also be staring quizzically at.

What would be one of the other points in the 10^10-sized space? If it's something along the lines of "an economic model" then I just explained why if you did something analogous with an economic model it could also be interesting progress, just as AIXI was conceptually important to the history of ideas in the field. I could explain your position by supposing that you think that mathematical ideas never generalize across architectures and so only analyzing the exact correct architecture of a real FAI could be helpful even at the very beginning of work, but this sounds like a visibly stupid position so the model of Anna in my head is warning me not to attribute it to you. On the other hand, some version of, "It is easier to make progress than Jonah thinks because the useful generalization of mathematical ideas does not require you to select correct point X out of 10^10 candidates" seems like it would almost have to be at work here somewhere.

I seriously don't understand what's going on in your head here. It sounds like any similar argument should Prove Too Much by showing that no useful work or conceptual progress could have occurred due to AI work in the last 60 years because there would be 10^10 other models for AI. Each newly written computer program is unique but the ideas behind them generalize, the resulting conceptual space can be usefully explored, that's why we don't start over with every new computer program. You can do useful things once you've collected enough treasure nuggets and your level of ability builds up, it's not a question of guessing the one true password out of 10^10 tries with nothing being progress until then. This is true on a level of generality which applies across computer science and also to AI and also to FAI and also to decision theory and also to math. Everyone takes this for granted as an obvious background fact of doing research which is why I would expect a good mainstream AI person to also be staring quizzically at your statements here. I do not feel like the defense I'm giving here is in any way different from the defense I'd give of a randomly selected interesting AI paper if you said the same thing about it. "That's just how research works," I'd say.

Please amplify point 1 in much greater detail using concrete examples and as little abstraction as possible.

Comment author: JonahSinick 27 June 2013 09:38:04PM 4 points [-]

I continue to appreciate your cordiality.

A number of people have recently told me that they have trouble understanding me unless I elaborate further, because I don't spell out my reasoning in sufficient detail. I think that this is more a matter of the ideas involved being complicated, and there being a lot of inferential distance, than it is lack of effort on my part, but I can see how it would be frustrating to my interlocutors. It seems that I'm subject to the illusion of transparency. I appreciate your patience.

Neither 2 nor 3 is the sort of argument I would ever make (there's such a thing as an attempted steelman which by virtue of its obvious weakness doesn't really help). You already know(?) that I vehemently reject all attempts to multiply tiny probabilities by large utilities in real-life practice, or to claim that the most probable assumption about a background epistemic question leads to a forgone doom and use this to justify its improbable negation. The part at which you lost me is of course part 1.

I know that you've explicitly disavowed arguments of the type in my points 2 and 3. My reason for bringing them up is to highlight the importance of addressing point 1: to emphasize that it doesn't suffice to say "the problem is important and we have to get started on it somehow." I recognize that we have very different implicit assumptions on point 1, and that that's where the core of the disagreement lies.

I still don't understand what you could be thinking here, and feel like there's some sort of basic failure to communicate going on. I could guess something along the lines of "Maybe Jonah is imagining that Friendly AI will be built around principles completely different from modern decision theory and any notion of a utility function..." (but really, is something like that one of just 10^10 equivalent candidates?) "...and more dissimilar to that than logical AI is from decision theory" (that's a lot of dissimilarity but we already demonstrated conceptual usefulness over a gap that size). Still, that's the sort of belief someone might end up with if their knowledge of AI was limited to popular books extolling the wonderful chaos of neural networks, but that idea is visibly stupid so my mental model of Anna warns me not to attribute it to you. Or I could guess, "Maybe Jonah is Holden-influenced and thinks that all of this discussion is irrelevant because we're going to build a Google Maps AGI", where in point of fact it would be completely relevant, not a tiniest bit less relevant, if we were going to build a planning Oracle. (The experience with Holden does give me pause and make me worry that EA people may think they already know how to build FAI using their personal wonderful idea, just like vast numbers of others think they already know how to build FAI.) But I still can't think of any acceptable steel version of what you mean, and I say again that it seems to me that you're saying something that a good mainstream AI person would also be staring quizzically at.

There's essentially only one existing example of an entity with general intelligence: a human. I think that our prior should be that the first AGI will have internal structure analogous to that of a human. Here I'm not suggesting that an AGI will have human values by default: I'm totally on board with your points about the dangers of anthropomorphization in that context. Rather, what I mean is that I envisage the first AGI as having many interacting specialized modules, rather than a mathematically defined utility function.

There are serious dangers of such an entity having values that are orthogonal to humans, and serious dangers of value drift. (Your elegant article Why does power corrupt?, has some relevance to the latter point.) But it seems to me that the measures that one would want to take to prevent humans' goals changing seem completely different from the sorts of measures that might emerge from MIRI's FAI research.

I'll also highlight a comment of Nick Beckstead, which you've already seen and responded to. I didn't understand your response.

I should clarify that I don't have high confidence that the first AGI will develop along these lines. But it's my best guess, and it seems much more plausible to me than models of the type in your paper.

It sounds like any similar argument should Prove Too Much by showing that no useful work or conceptual progress could have occurred due to AI work in the last 60 years because there would be 10^10 other models for AI.

The difference that I perceive between the two scenarios is the nature of the feedback loops in each case.

When one is chipping away at a problem incrementally, one has the capacity to experiment and use the feedback generated from experimentation to help one limit the search space. Based on what I know about the history of science, general relativity is one of the only successful theories that was created without lots of empirical investigation.

The engineers who designed the first bridges had trillions of combinations of design features and materials to consider a priori, the vast majority of which wouldn't work. But an empirical discovery like "material X is too weak to work within any design" greatly limits the search space, because you don't have to think further about any of the combinations involving material X. Similarly if one makes a discovery of the type "material Y is so strong that it'll work with any design." By making a series of such discoveries, one can hone in on a few promising candidates.

This is how I predict that the development of AGI will go. I think that the search space is orders of magnitude too large to think about in a useful way without a lot of experimentation, and that a priori we can't know what the first AGI will look like. I think that once it becomes more clear what the first AGI will look like, it will become much more feasible to make progress on AI safety.

Please amplify point 1 in much greater detail using concrete examples and as little abstraction as possible.

It'll take me a while to come up with a lot of concrete hypotheticals, but I'll get back to you on this.

Comment author: Eliezer_Yudkowsky 27 June 2013 10:06:20PM 5 points [-]

There's essentially only one existing example of an entity with general intelligence: a human. I think that our prior should be that the first AGI will have internal structure analogous to that of a human. Here I'm not suggesting that an AGI will have human values by default: I'm totally on board with your points about the dangers of anthropomorphization in that context. Rather, what I mean is that I envisage the first AGI as having many interacting specialized modules

Okay. This sounds like you're trying to make up your own FAI theory in much the same fashion as Holden (and it's different from Holden's, of course). Um, what I'd like to do at this point is take out a big Hammer of Authority and tell you to read "Artificial Intelligence: A Modern Approach" so your mind would have some better grist to feed on as to where AI is and what it's all about. If I can't do that... I'm not really sure where I could take this conversation. I don't have the time to personally guide you to understanding of modern AI starting from that kind of starting point. If there's somebody else you'd trust to tell you about AI, with more domain expertise, I could chat with them and then they could verify things to you. I just don't know where to take it from here.

On the object level I will quickly remark that some of the first attempts at heavier-than-air flying-machines had feathers and beaks and they did not work very well, that 'interacting specialized modules' is Selling Nonapples, that there is an old discussion in cognitive science about the degree of domain specificity in human intelligence, and that the idea that 'humans are the only example we have' is generally sterile, for reasons I've already written about but I can't remember the links offhand, hopefully someone else does. It might be in Levels of Organization in General Intelligence, I generally consider that pretty obsolete but it might be targeted to your current level.

Comment author: JonahSinick 27 June 2013 10:47:14PM *  3 points [-]

Okay. This sounds like you're trying to make up your own FAI theory in much the same fashion as Holden (and it's different from Holden's, of course).

Either of my best guess or Holden's best guess could be right, and so could lots of other ideas that we haven't thought of. My proposed conceptual framework should be viewed as one of many weak arguments.

The higher level point that I was trying to make is that [the conceptual framework implicit in view that the MIRI's current FAI research has a non-negligible chance of being relevant to AI safety] seems highly conjunctive. I don't mean this rhetorically at all – I genuinely don't understand why you think that we can make progress given how great the unknown unknowns are. You may be right, but justification of your view requires further argumentation.

Um, what I'd like to do at this point is take out a big Hammer of Authority and tell you to read "Artificial Intelligence: A Modern Approach" so your mind would have some better grist to feed on as to where AI is and what it's all about. If I can't do that... I'm not really sure where I could take this conversation. I don't have the time to personally guide you to understanding of modern AI starting from that kind of starting point. If there's somebody else you'd trust to tell you about AI, with more domain expertise, I could chat with them and then they could verify things to you. I just don't know where to take it from here.

A more diplomatic way of framing this would be something like:

"The book Artificial Intelligence: A Modern Approach has a discussion of current approaches to artificial intelligence. Are you familiar with the ideas therein? If not, I'd suggest that you take a look"

Putting that aside, based on conversations with a number of impressive people in machine learning, etc. who I know, my impression is that at the moment, there aren't strong contenders for research programs that could plausibly lead to AGI. I largely accept Luke's argument in his blog post on AI timelines, but this is based on the view that the speed of research is going to increase a lot over the coming years, rather than on the belief that any existing research programs have a reasonable chance of succeeding.

I'd be very interested in hearing about existing research programs that have a reasonable chance of succeeding.

Comment author: Eliezer_Yudkowsky 28 June 2013 12:43:02AM 4 points [-]

I genuinely don't understand why you think that we can make progress given how great the unknown unknowns are.

Is it your view that no progress has occurred in AI generally for the last sixty years?

I'd be very interested in hearing about existing research programs that have a reasonable chance of succeeding.

The field as a whole has been making perfectly good progress AFAICT. We know a bleepton more about cognition than we did in 1955 and are much less confused by many things. Has someone been giving you an impression otherwise and if so, what field were they in?

Comment author: JonahSinick 28 June 2013 01:09:51AM 4 points [-]

Is it your view that no progress has occurred in AI generally for the last sixty years?

No, it's clear that there have been many advances, for example in chess playing programs, auto-complete search technology, automated translation, driverless cars, and speech recognition.

But my impression is that this work has only made a small dent in the problem of general artificial intelligence.

The field as a whole has been making perfectly good progress AFAICT. We know a bleepton more about cognition than we did in 1955 and are much less confused by many things. Has someone been giving you an impression otherwise and if so, what field were they in?

  1. Three graduate students in machine learning at distinct elite universities.
  2. Scott Aaronson. Even though he works in theoretical computer science rather than AI, he's in close proximity with many colleagues who work on artificial intelligence at MIT, and so I give a fair amount of weight to his opinion.

Also, the fraction of scientists who I know who believe that there's a promising AGI research agenda on the table is very small, mostly consisting of people around MIRI. Few of the scientists who I know have subject matter expertise, but if there was a promising AGI research agenda on the table, I would expect news of it to have percolated to at least some of the people in question.

Comment author: jsteinhardt 01 July 2013 06:15:18PM 8 points [-]

I think I may have been one of those three graduate students, so just to clarify, my view is:

  1. Zero progress being made seems too strong a claim, but I would say that most machine learning research is neither relevant to, nor trying to be relevant to, AGI. I think that there is no real disagreement on this empirical point (at least, from talking to both Jonah and Eliezer in person, I don't get the impression that I disagree with either of you on this particular point).

  2. The model for AGI that MIRI uses seems mostly reasonable, except for the "self-modification" part, which seems to be a bit too much separated out from everything else (since pretty much any form of learning is a type of self-modification --- current AI algorithms are self-modifying all the time!).

  3. On this vein, I'm skeptical of both the need or feasibility of an AI providing an actual proof of safety of self-modification. I also think that using mathematical logic somewhat clouds the issues here, and that most of the issues that MIRI is currently working on are prerequisites for any sort of AI, not just friendly AI. I expect them to be solved as a side-effect of what I see as more fundamental outstanding problems.

  4. However, I don't have reasons to be highly confident in these intuitions, and as a general rule of thumb, having different researchers with different intuitions pursue their respective programs is a good way to make progress, so I think it's reasonable for MIRI to do what it's doing (note that this is different from the claim that MIRI's research is the most important thing and is crucial to the survival of humanity, which I don't think anyone at MIRI believes, but I'm clarifying for the benefit of onlookers).

Comment author: Eliezer_Yudkowsky 28 June 2013 01:53:03AM 2 points [-]

Hm. I'm not sure if Scott Aaronson has any weird views on AI in particular, but if he's basically mainstream-oriented we could potentially ask him to briefly skim the Tiling Agents paper and say if it's roughly the sort of paper that it's reasonable for an organization like MIRI to be working on if they want to get some work started on FAI. At the very least if he disagreed I'd expect he'd do so in a way I'd have better luck engaging conversationally, or if not then I'd have two votes for 'please explore this issue' rather than one.

I feel again like you're trying to interpret the paper according to a different purpose from what it has. Like, I suspect that if you described what you thought a promising AGI research agenda was supposed to deliver on what sort of timescale, I'd say, "This paper isn't supposed to do that."

No, it's clear that there have been many advances, for example in chess playing programs, auto-complete search technology, automated translation, driverless cars, and speech recognition.

But my impression is that this work has only made a small dent in the problem of general artificial intelligence.

This part is clearer and I think I may have a better idea of where you're coming from, i.e., you really do think the entire field of AI hasn't come any closer to AGI, in which case it's much less surprising that you don't think the Tiling Agents paper is the very first paper ever to come closer to AGI. But this sounds like a conversation that someone else could have with you, because it's not MIRI-specific or FAI-specific. I also feel somewhat at a loss for where to proceed if I can't say "But just look at the ideas behind Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, that's obviously important conceptual progress because..." In other words, you see AI doing a bunch of things, we already mostly agree on what these sorts of surface real-world capabilities are, but after checking with some friends you've concluded that this doesn't mean we're less confused about AGI then we were in 1955. I don't see how I can realistically address that except by persuading your authorities; I don't see what kind of conversation we could have about that directly without being able to talk about specific AI things.

Meanwhile, if you specify "I'm not convinced that MIRI's paper has a good chance of being relevant to FAI, but only for the same reasons I'm not convinced any other AI work done in the last 60 years is relevant to FAI" then this will make it clear to everyone where you're coming from on this issue.

Comment author: shminux 27 June 2013 10:36:17PM -1 points [-]

Just wondering why you see Jonah Sinick of high enough status to be worth explaining to what's been discussed on LW repeatedly. Or maybe I'm totally misreading this exchange.

Comment author: JonahSinick 27 June 2013 11:00:39PM *  0 points [-]

I'm puzzled as to what you think I'm missing: can you say more?

Comment author: Kawoomba 27 June 2013 11:09:46PM *  2 points [-]

Matching "first AGI will [probably] have internal structure analogous to that of a human" and "first AGI [will probably have] many interacting specialized modules" in a literal (cough uncharitable cough) manner, as evidenced by "heavier-than-air flying-machines had feathers and beaks". Your phrasing hints at an anthropocentric architectural bias, analogous to the one you specifically distance yourself from regarding values.

Maybe you should clarify that part, it's crucial to the current misunderstanding, and it's not clear whether by "interacting specialized modules" you'd also refer to "Java classes not corresponding to anything 'human' in particular", or whether you'd expect a "thalamus-module".

Comment author: JonahSinick 27 June 2013 11:33:58PM 2 points [-]

Matching "first AGI will [probably] have internal structure analogous to that of a human" and "first AGI [will probably have] many interacting specialized modules" in a literal (cough uncharitable cough) manner, as evidenced by "heavier-than-air flying-machines had feathers and beaks". Your phrasing hints at an anthropocentric architectural bias, analogous to the one you specifically distance yourself from regarding values.

I think that people should make more of an effort to pay attention to the nuances of people's statements rather than using simple pattern matching.

Maybe you should clarify that part, it's crucial to the current misunderstanding, and it's not clear whether by "interacting specialized modules" you'd also refer to "Java classes not corresponding to anything 'human' in particular", or whether you'd expect a "thalamus-module".

There's a great deal to write about this, and I'll do so at a later date.

To give you a small taste of what I have in mind: suppose you ask "How likely is it that the final digit of the Dow Jones will be 2 in two weeks." I've never thought about this question. A priori, I have no Bayesian prior. What my brain does, is to amalgamate

  1. The Dow Jones index varies in a somewhat unpredictable way
  2. The last digit is especially unpredictable.
  3. Two weeks is a really long time for unpredictable things to happen in this context
  4. The last digit could be one of 10 values between 0 and 9
  5. The probability of a randomly selected digit between 0 and 9 being 2 is equal to 10%

Different parts of my brain generate the different pieces, and another part of my brain combines them. I'm not using a single well-defined Bayesian prior, nor am I satisfying a well defined utility function.

Comment author: shminux 27 June 2013 11:57:08PM 0 points [-]

I don't want to comment on the details, as this is way outside my area of expertise, but I do want to point out that you appear to be a victim of the bright dilettante fallacy. You appear to think that your significant mathematical background makes you an expert in an unrelated field without having to invest the time and effort required to get up to speed in it.

Comment author: JonahSinick 28 June 2013 12:04:19AM *  0 points [-]

I don't claim to have any object level knowledge of AI.

My views on this point are largely based on what I've heard from people who work on AI, together with introspection as to how I and other humans reason, and the role of heuristics in reasoning.

Comment author: pop 15 July 2013 06:53:20AM 0 points [-]

Maybe something to do with Jonah being previously affiliated with GiveWell?

Comment author: Vaniver 27 June 2013 10:14:50PM *  0 points [-]

I'll also highlight a comment of Nick Beckstead, which you've already seen and responded to. I didn't understand your response.

Let me try from a different angle.

With humans, we see three broad clusters of modification: reproduction, education, and chemistry. Different people are physically constructed in different ways, and so we can see evolution of human civilization by biological evolution of the humans inside it. The environments that people find themselves in or choose leave imprints on those people. Chemicals people ingest can change those people, such as with caffeine, alcohol, morphine, or heroin. (I would include 'changing your diet to change your thought processes' under chemical changes, but the chemical changes from becoming addicted to heroin and from not being creatine deficient look very different.)

For AIs, most of the modification that's interesting and new will look like the "chemistry" cluster. An AI modifying its source code will look a lot like a human injecting itself with a new drug that it just invented. (Nick_Beckstead's example of modifying the code of the weather computer is more like education than it is like chemistry.)

This is great because some drugs dramatically improve performance, and so a person on caffeine could invent a super nootropic, and then on the super nootropic invent a cure for cancer and an even better nootropic, and so on. This is terrifying because any drug that adjusts your beliefs or your decision-making algorithm (think of 'personality' as a subset of this) dramatically changes how you behave, and might do so for the worse. This is doubly terrifying because these changes might be irreversible- you might take a drug that gets rid of your depression by making you incapable of feeling desire, and then not have any desire to restore yourself! This is triply terrifying because the effects of the drug might be unknown- you might not be able to determine what a drug will do to you until after you take it, and by then it might be too late.

For humans this problem is mostly solved by trial and error followed by patternmatching- "coffee is okay, crack is not, because Colin is rich and productive and Craig is neither"- which is not useful for new drugs, and not useful for misclassified old drugs, and not very safe for very powerful systems. The third problem- that the effects might be unknown- is the sort of thing that proofs might help with, except there are some technical obstacles to doing that. The Lobstacle is a prominent theoretical one, and while it looks like there are lots of practical obstacles as well surmounting the theoretical obstacles should help with surmounting the practical obstacles.

Any sort of AGI that's able to alter its own decision-making process will have the ability to 'do chemistry on itself,' and one with stable values will need to have solved the problem of how to do that while preserving its values. (I don't think that humans have 'stable' values; I'd call them something more like 'semi-stable.' Whether or not this is a bug or feature is unclear to me.)

Comment author: JonahSinick 27 June 2013 10:59:49PM 2 points [-]

I understand where you're coming from, and I think that you correctly highlight a potential source of concern, and one which my comment didn't adequately account for. However:

  1. I'm skeptical that it's possible to create an AI based on mathematical logic at all. Even if an AI with many interacting submodules is dangerous, it doesn't follow that working on AI safety for an AI based on mathematical logic is promising.

  2. Humans can impose selective pressures on emergent AI's so as to mimic the process of natural selection that humans experienced.

Comment author: Randaly 27 June 2013 11:24:24PM *  0 points [-]

I'm skeptical that it's possible to create an AI based on mathematical logic at all. Even if an AI with many interacting submodules is dangerous, it doesn't follow that working on AI safety for an AI based on mathematical logic is promising.

Eliezer's position is that the default mode for an AGI is failure; i.e. if an AGI is not provably safe, it will almost certainly go badly wrong. In that contest, if you accept that "an AI with many interacting submodules is dangerous," that that's more or less equivalent to believing that one of the horribly wrong outcomes will almost certainly be achieved if an AGI with many submodules is created.

Humans can impose selective pressures on emergent AI's so as to mimic the process of natural selection that humans experienced.

Humans are not Friendly. They don't even have the capability under discussion here, to preserve their values under self-modification; a human-esque singleton would likely be a horrible, horrible disaster.