All of Broolucks's Comments + Replies

I apologize for the late response, but here goes :)

I think you missed the point I was trying to make.

You and others seem to say that we often poorly evaluate the consequences of the utility functions that we implement. For instance, even though we have in mind utility X, the maximization of which would satisfy us, we may implement utility Y, with completely different, perhaps catastrophic implications. For instance:

X = Do what humans want
Y = Seize control of the reward button

What I was pointing out in my post is that this is only valid of perfect maximi... (read more)

I have done AI. I know it is difficult. However, few existing algorithms, if at all, have the failure modes you describe. They fail early, and they fail hard. As far as neural nets go, they fall into a local minimum early on and never get out, often digging their own graves. Perhaps different algorithms would have the shortcomings you point out. But a lot of the algorithms that currently exist work the way I describe.

And obviously, if an AI was indeed stuck in a local minimum obvious to you of its own utility gradient, this condition would not last past

... (read more)
8wedrifid
Yes, most algorithms fail early and and fail hard. Most of my AI algorithms failed early with a SegFault for instance. New, very similar algorithms were then designed with progressively more advanced bugs. But these are a separate consideration. What we are interested in here is the question "Given an AI algorithm that is capable of recursive self improvement is successfully created by humans how likely is it that they execute this kind of failure mode?" The "fail early fail hard" cases are screened off. We're looking at the small set that is either damn close to a desired AI or actually a desired AI and distinguishing between them. Looking at the context to work out what the 'failure mode' being discussed is it seems to be the issue where an AI is programmed to optimise based on a feedback mechanism controlled by humans. When the AI in question is superintelligent most failure modes tend to be variants of "conquer the future light cone, kill everything that is a threat and supply perfect feedback to self". When translating this to the nearest analogous failure mode in some narrow AI algorithm of the kind we can design now it seems like this refers to the failure mode whereby the AI optimises exactly what it is asked to optimise but in a way that is a lost purpose. This is certainly what I had to keep in mind in my own research. A popular example that springs to mind is the results of an AI algorithm designed by a military research agency. From memory their task was to take a simplified simulation of naval warfare, with specifications for how much each aspect of ships, boats and weaponry cost and a budget. They were to use this to design the optimal fleet given their resources and the task was undertaken by military officers and a group which use an AI algorithm of some sort. The result was that the AI won easily but did so in a way that led the overseers to dismiss them as a failure because they optimised the problem specification as given, not the one 'common se
Broolucks-10

It is something specific about that specific AI.

If an AI wishes to take over its reward button and just press it over and over again, it doesn't really have any "rivals", nor does it need to control any resources other than the button and scraps of itself. The original scenario was that the AI would wipe us out. It would have no reason to do so if we were not a threat.. And if we were a threat, first, there's no reason it would stop doing what we want once it seizes the button. Once it has the button, it has everything it wants -- why stir the po... (read more)

-3TheOtherDave
Fair point.

Then when it is more powerful it can directly prevent humans from typing this.

That depends if it gets stuck in a local minimum or not. The reason why a lot of humans reject dopamine drips is that they don't conceptualize their "reward button" properly. That misconception perpetuates itself: it penalizes the very idea of conceptualizing it differently. Granted, AIXI would not fall into local minima, but most realistic training methods would.

At first, the AI would converge towards: "my reward button corresponds to (is) doing what humans wan... (read more)

0[anonymous]
This is a Value Learner, not a Reinforcement Learner like the standard AIXI. They're two different agent models, and yes, Value Learners have been considered as tools for obtaining an eventual Seed AI. I personally (ie: massive grains of salt should be taken by you) find it relatively plausible that we could use a Value Learner as a Tool AGI to help us build a Friendly Seed AI that could then be "unleashed" (ie: actually unboxed and allowed into the physical universe).
5private_messaging
Neural networks may be a good example - the built in reward and punishment systems condition the brain to have complex goals that have nothing to do with maximization of dopamine. Brain, acting under those goals, finds ways to preserve those goals from further modification by the reward and punishment system. I.e. you aren't too thrilled to be conditioned out of your current values.
0Eliezer Yudkowsky
I suggest some actual experience trying to program AI algorithms in order to realize the hows and whys of "getting an algorithm which forms the inductive category I want out of the examples I'm giving is hard". What you've written strikes me as a sheer fantasy of convenience. Nor does it follow automatically from intelligence for all the reasons RobbBB has already been giving. And obviously, if an AI was indeed stuck in a local minimum obvious to you of its own utility gradient, this condition would not last past it becoming smarter than you.
4TheOtherDave
Is that just a special case of a general principle that an agent will be more successful by leaving the environment it knows about to inferior rivals and travelling to an unknown new environment with a subset of the resources it currently controls, than by remaining in that environment and dominating its inferior rivals? Or is there something specific about AIs that makes that true, where it isn't necessarily true of (for example) humans? (If so, what?) I hope it's the latter, because the general principle seems implausible to me.

Why does the hard takeoff point have to be after the point at which an AI is as good as a typical human at understanding semantic subtlety? In order to do a hard takeoff, the AI needs to be good at a very different class of tasks than those required for understanding humans that well.

Semantic extraction -- not hard takeoff -- is the task that we want the AI to be able to do. An AI which is good at, say, rewriting its own code, is not the kind of thing we would be interested in at that point, and it seems like it would be inherently more difficult than i... (read more)

Ok, so let's say the AI can parse natural language, and we tell it, "Make humans happy." What happens? Well, it parses the instruction and decides to implement a Dopamine Drip setup.

That's not very realistic. If you trained AI to parse natural language, you would naturally reward it for interpreting instructions the way you want it to. If the AI interpreted something in a way that was technically correct, but not what you wanted, you would not reward it, you would punish it, and you would be doing that from the very beginning, well before the ... (read more)

Realistically, AI would be constantly drilled to ask for clarification when a statement is vague. Again, before the AI is asked to make us happy, it will likely be asked other things, like building houses. If you ask it: "build me a house", it's going to draw a plan and show it to you before it actually starts building, even if you didn't ask for one. It's not in the business of surprises: never, in its whole training history, from baby to superintelligence, would it have been rewarded for causing "surprises" -- even the instruction &q

... (read more)
-5Peterdjones
4DSimon
1. Why does the hard takeoff point have to be after the point at which an AI is as good as a typical human at understanding semantic subtlety? In order to do a hard takeoff, the AI needs to be good at a very different class of tasks than those required for understanding humans that well. 2. So let's suppose that the AI is as good as a human at understanding the implications of natural-language requests. Would you trust a human not to screw up a goal like "make humans happy" if they were given effective omnipotence? The human would probably do about as well as people in the past have at imagining utopias: really badly.

What counts as 'resources'? Do we think that 'hardware' and 'software' are natural kinds, such that the AI will always understand what we mean by the two? What if software innovations on their own suffice to threaten the world, without hardware takeover?

What is "taking over the world", if not taking control of resources (hardware)? Where is the motivation in doing it? Also consider, as others pointed out, that an AI which "misunderstands" your original instructions will demonstrate this earlier than later. For instance, if you create... (read more)

programmers build a seed AI (a not-yet-superintelligent AGI that will recursively self-modify to become superintelligent after many stages) that includes, among other things, a large block of code I'll call X.

The programmers think of this block of code as an algorithm that will make the seed AI and its descendents maximize human pleasure.

The problem, I reckon, is that X will never be anything like this.

It will likely be something much more mundane, i.e. modelling the world properly and predicting outcomes given various counterfactuals. You might be worr... (read more)

3Rob Bensinger
What counts as 'resources'? Do we think that 'hardware' and 'software' are natural kinds, such that the AI will always understand what we mean by the two? What if software innovations on their own suffice to threaten the world, without hardware takeover? Hm? That seems to only penalize it for self-deception, not for deceiving others. You're talking about an Oracle AI. This is one useful avenue to explore, but it's almost certainly not as easy as you suggest: "'Tool AI' may sound simple in English, a short sentence in the language of empathically-modeled agents — it's just 'a thingy that shows you plans instead of a thingy that goes and does things.' If you want to know whether this hypothetical entity does X, you just check whether the outcome of X sounds like 'showing someone a plan' or 'going and doing things', and you've got your answer. It starts sounding much scarier once you try to say something more formal and internally-causal like 'Model the user and the universe, predict the degree of correspondence between the user's model and the universe, and select from among possible explanation-actions on this basis.' [...] "If we take the concept of the Google Maps AGI at face value, then it actually has four key magical components. (In this case, 'magical' isn't to be taken as prejudicial, it's a term of art that means we haven't said how the component works yet.) There's a magical comprehension of the user's utility function, a magical world-model that GMAGI uses to comprehend the consequences of actions, a magical planning element that selects a non-optimal path using some method other than exploring all possible actions, and a magical explain-to-the-user function. "report($leading_action) isn't exactly a trivial step either. Deep Blue tells you to move your pawn or you'll lose the game. You ask 'Why?' and the answer is a gigantic search tree of billions of possible move-sequences, leafing at positions which are heuristically rated using a static-position ev

We were talking about extracting knowledge about a particular human from that human's text stream, though. It is already assumed that the AI knows about human psychology. I mean, assuming the AI can understand a natural language such as English, it obviously already has access to a large corpus of written works, so I'm not sure why it would bother foraging in source code, of all things. Besides, it is likely that seed AI would be grown organically using processes inspired from evolution or neural networks. If that is so, it wouldn't even contain any human-written code at all.

0Rob Bensinger
Ah. I was assuming that the AI didn't know English, or anything about human psychology. My expectation is that individual variation contributes virtually nothing to the best techniques a superintelligence would use to persuade a random (trained, competent) human to release it, regardless of whether it had an easy way to learn about the individual variation.

I'm unsure of how much an AI could gather from a single human's text input. I know that I at least miss a lot of information that goes past me that I could in theory pick up.

At most, the number of bits contained in the text input, which is really not much, minus the number of bits non-AGI algorithms could identify and destroy (like speech patterns). The AI would also have to identify and throw out any fake information inserted into the stream (without knowing whether the majority of the information is real or fake). The exploitable information is going ... (read more)

0Rob Bensinger
Do keep in mind that, no matter how well-boxed the AI is from the Internet and from sense-data about our world, as a self-modifying AGI it still has access to its own source code, which is descended from a human artifact (the seed AI). The AGI can learn a great deal about human psychology by observing how we code, and a project as large and multi-staged as an AGI is likely to be will contain a whole lot of bits to work with. (Certainly more than is strictly necessary.)

A creature that loves solitude might not necessarily be bad to create. But it would still be good to give it capacity for sympathy for pragmatic reasons, to ensure that if it ever did meet another creature it would want to treat it kindly and avoid harming it.

Fair enough, though at the level of omnipotence we're supposing, there would be no chance meetups. You might as well just isolate the creature and be done with it.

A creature with no concept of boredom would would, (to paraphrase Eliezer), "play the same screen of the same level of the same f

... (read more)

That's true, but if it's "progress" then it must be progress towards something. Will we eventually arrive at our destination, decide society is pretty much perfect, and then stop? Is progress somehow asymptotic so we'll keep progressing and never quite reach our destination?

It's quite hard to tell. "Progress" is always relative to the environment you grew up in and on which your ideas and aspirations are based. At the scale of a human life, our trajectory looks a lot like a straight line, but for all we know, it could be circular. At... (read more)

0Ghatanathoah
A creature that loves solitude might not necessarily be bad to create. But it would still be good to give it capacity for sympathy for pragmatic reasons, to ensure that if it ever did meet another creature it would want to treat it kindly and avoid harming it. It's not about having a specialized interest and exploring it. A creature with no concept of boredom would would, (to paraphrase Eliezer), "play the same screen of the same level of the same fun videogame over and over again." They wouldn't be like an autistic savant who knows one subject inside and out. They'd be little better than a wirehead. Someone with narrow interests still explores every single aspect of that interest in great detail. A creature with no boredom would find one tiny aspect of that interest and do it forever. Yes, I concede that if there is a sufficient quantity of creatures with humane values, it might be good to create other types of creatures for variety's sake. However, such creatures could be potentially dangerous, we'd have to be very careful.

Without any other information, it is reasonable to place the average to whatever time it takes us (probably a bit over a century), but I wouldn't put a lot of confidence in that figure, having been obtained from a single data point. Radio visibility could conceivably range from a mere decade (consider that computers could have been developed before radio -- had Babbage been more successful -- and expedite technological advances) to perhaps millennia (consider dim-witted beings that live for centuries and do everything we do ten times slower).

Several differ... (read more)

I consider it almost certain that if we were to create a utilitarian AI it would kill the entire human race and replace it with creatures whose preferences are easier to satisfy. And by "easier to satisfy" I mean "simpler and less ambitious," not that the creatures are more mentally and physically capable of satisfying humane desires.

It would not necessarily kill off humanity to replace it by something else, though. Looking at the world right now, many countries run smoothly, and others horribly, even though they are all inhabited an... (read more)

2Ghatanathoah
That would be bad, but it would still be way better than replacing us with paperclippers or orgasmium. That's true, but if it's "progress" then it must be progress towards something. Will we eventually arrive at our destination, decide society is pretty much perfect, and then stop? Is progress somehow asymptotic so we'll keep progressing and never quite reach our destination? The thing is, it seems to me that what we've been progressing towards is greater expression of our human natures. Greater ability to do what the most positive parts of our natures think we should. So I'm fine with future creatures that have something like human nature deciding some new society I'm kind of uncomfortable with is the best way to express their natures. What I'm not fine with is throwing human nature out and starting from scratch with something new, which is what I think a utilitarian AI would do. I didn't literally mean humans, I meant "Creatures with the sorts of goals, values, and personalities that humans have." For instance, if given a choice between creating an AI with human-like values, and creating a human sociopath, I would pick the AI. And it wouldn't just be because there was a chance the sociopath would harm others. I simply consider the values of the AI more worthy of creation than the sociopath's. I don't necessarily disagree. If having a large population of creatures with humane values and high welfare was assured then it might be better to have a variety of creatures. But I still think maybe there should be some limits on the sort of creatures we should create, i.e. lawful creativity. Eliezer has suggested that consciousness, sympathy, and boredom are the essential characteristics any intelligent creature should have. I'd love for there to be a wide variety of creatures, but maybe it would be best if they all had those characteristics.

You would only create these viruses if the total utility of the viruses you can create with the resources at your disposal exceeds the utility of the humans you could make with these same resources. For instance, if you give a utility of 1 to a steel paperclip weighing 1 gram, then assuming a simple additive model (which I wouldn't, but that's besides the point) making one metric ton of paperclips has an utility of 1,000,000. If you give an utility of 1,000,000,000 to a steel sculpture weighing a ton, it follows that you will never make any paperclips unless you have less than a ton of iron. You will always make the sculpture, because it gives 1,000 times the utility for the exact same resources.

1Shmi
True, if you start with resource constraints, you can rig the utility scaling to overweigh more intelligent life. However, if you don't cheat and assign the weights before considering constraints, there is a large chance that the balance will tip the other way. Or if there is no obvious competition for resources. If you value creating at least mildly happy life, you ought to consider working on, say, silicon-based life, which does not compete with carbon-based life. Or maybe on using all this stored carbon in the ocean to create more plankton. In other words, it is easy to find a case where preassigned utilities lead to a runaway simple life creation imperative.

On the other hand, based on our own experience, broadcasting radio signals is a waste of energy and bandwidth, so it is likely an intelligent society would quickly move to low-power, focused transmissions (e.g. cellular networks or WiFi). Thus the radio "signature" they broadcast to the universe would peak for a few centuries at most before dying down as they figure out how to shut down the "leaks". That would explain why we observe nothing, if intelligent societies do exist in the vicinity. Of course, these societies might also evolve ... (read more)

1DaFranker
Now that is a good argument that doesn't miss the point. My priors would say it's not even "a few centuries" - I'd expect less than one earth-century on average, with most of the variance due to the particular economic variations and social phenomena derived from the details of the species.
0Thomas
Lower life forms (as lower nonlife) forms are always interesting as a source of free enthalpy and from many other aspects. You, as an advanced civilization have no luxury of ignoring. You have to engage, the "Prime directive" is a bullshit. And you don't need to wait for a radio signal. You go there (everywhere) on your own initiative, you don't wait to be invited.

Ah, sorry, I might not have been clear. I was referring to what may be physically feasible, e.g. a 3D circuit in a box with inputs coming in from the top plane and outputs coming out of the bottom plane. If you have one output that depends on all N inputs and pack everything as tightly as possible, the signal would still take Ω(sqrt(N)) time to reach. From all the physically doable models of computation, I think that's likely as good as it gets.

0endoself
Oh I see, we want physically possible computers. In that case, I can get it down to log(n) with general relativity, assuming I'm allowed to set up wormholes. (This whole thing is a bit badly defined since it's not clear what you're allowed to prepare in advance. Any necessary setup would presumably take Ω(n) time anyways.)

If the AI is a learning system such as a neural network, and I believe that's quite likely to be the case, there is no source/object dichotomy at all and the code may very well be unreadable outside of simple local update procedures that are completely out of the AI's control. In other words, it might be physically impossible for both the AI and ourselves to access the AI's object code -- it would be locked in a hardware box with no physical wires to probe its contents, basically.

I mean, think of a physical hardware circuit implementing a kind of neuron ne... (read more)

And technically you can lower that to sqrt(M) if you organize the inputs and outputs on a surface.

0endoself
When we talk about the complexity of an algorithm, we have to decide what resources we are going to measure. Time used by a multi-tape Turing machine is the most common measurement, since it's easy to define and generally matches up with physical time. If you change the model of computation, you can lower (or raise) this to pretty much anything by constructing your clock the right way.

There are a lot of "ifs", though.

  • If that AI runs on expensive or specialized hardware, it can't necessarily expand much. For instance, if it runs on hardware worth millions of dollars, it can't exactly copy itself just anywhere yet. Assuming that the first AI of that level will be cutting edge research and won't be cheap, that gives a certain time window to study it safely.

  • The AI may be dangerous if it appeared now, but if it appears in, say, fifty years, then it will have to deal with the state of the art fifty years from now. Expanding with

... (read more)
1loup-vaillant
I agree with your first point, though it gets worse for us as hardware gets cheaper and cheaper. I like your second point even more: it's actionable. We could work on the security of personal computers. That last one is incorrect however. The AI only have to access its object code in order to copy itself. That's something even current computer viruses can do. And we're back to boxing it.

A huge amount of progress has been made in compilers, in terms of designing languages that implement powerful features in reasonable amounts of computing time; just try taking any modern Python or Ruby or C++ program and porting it to Altair BASIC

The "powerful features" of Python and Ruby are only barely catching up to Lisp, and as far as I know Lisp is still faster than both of them.

7Luke_A_Somers
Except then you have to program in Lisp.

No problem is perfectly parallelizable in a physical sense. If you build a circuit to solve a problem, and that the circuit is one light year across in size, you're probably not going to solve it in under a year -- technically, any decision problem implemented by a circuit is at least O(n) because that's how the length of the wires scale.

Now, there are a few ways you might want to parallelize intelligence. The first way is by throwing many independent intelligent entities at the problem, but that requires a lot of redundancy, so the returns on that will no... (read more)

0Alex_Altair
That is a pretty cool idea.

Because that's how it works! The system "is" PA, so it will trust (weaker) systems that it (PA) can verify, but it will not trust itself (PA).

That doesn't seem consistent to me. If you do not trust yourself fully, then you should not fully trust anything you demonstrate, and even if you do, there is still no incentive to switch. Suppose that the AI can demonstrate the consistency of system S from PA, and wants to demonstrate proposition A. If AI trusts S as demonstrated by PA, then it should also trust A as demonstrated by PA, so there is no r... (read more)

1paulfchristiano
You are smuggling in some assumptions from your experience with human cognition. If I believe X, but don't believe that the process producing my beliefs is sane, then I will act on X (I believe it, and we haven't yet talked about any bridge between what I believe, and what I believe about my beliefs), but I still won't trust myself in general.

If the system did not trust PA, why would it trust a system because PA verifies it? More to the point, why would it trust a self-verifying system, given that past a certain strength, only inconsistent systems are self-verifying?

If the system held some probability that PA was inconsistent, it could evaluate it on the grounds of usefulness, perhaps contrasting it with other systems. It could also try to construct contradictions, increasing its confidence in PA for as long as it doesn't find any. That's what we do, and frankly, I don't see any other way to do it.

3abramdemski
Because that's how it works! The system "is" PA, so it will trust (weaker) systems that it (PA) can verify, but it will not trust itself (PA). It would only trust them if it could verify them. True. Not necessarily; this depends on how the system works. In my probabilistic prior, this would work to some degree, but because there exists a nonstandard model in which PA is inconsistent (there are infinite proofs ending in contradictions), there will be a fixed probability of inconsistency which cannot be ruled out by any amount of testing.

Why would successors use a different system, though? Verifying proofs in formal systems is easy, it's coming up with the proofs that's difficult -- an AI would refine its heuristics in order to figure out proofs more efficiently, but it would not necessarily want to change the system it is checking them against.

4abramdemski
It would want to change the system it checked proofs against if it did not trust that system. A naively built system which checked its proofs with PA but did not trust PA probabilistically (ie, held some probability that PA is false: not difficult to construct) would very possibly prefer to reduce its set of axioms to something which PA verifies (something true with 100% probability in its understanding). But (at least if the reduced system is still robinson arithmetic or stronger) the cycle would continue, since the new system is also not self-verifying. This cycle could (likely) be predicted by the system, however, so it is not obvious what the system would actually choose to do.