I have done AI. I know it is difficult. However, few existing algorithms, if at all, have the failure modes you describe. They fail early, and they fail hard. As far as neural nets go, they fall into a local minimum early on and never get out, often digging their own graves. Perhaps different algorithms would have the shortcomings you point out. But a lot of the algorithms that currently exist work the way I describe.
...And obviously, if an AI was indeed stuck in a local minimum obvious to you of its own utility gradient, this condition would not last past
It is something specific about that specific AI.
If an AI wishes to take over its reward button and just press it over and over again, it doesn't really have any "rivals", nor does it need to control any resources other than the button and scraps of itself. The original scenario was that the AI would wipe us out. It would have no reason to do so if we were not a threat.. And if we were a threat, first, there's no reason it would stop doing what we want once it seizes the button. Once it has the button, it has everything it wants -- why stir the po...
Then when it is more powerful it can directly prevent humans from typing this.
That depends if it gets stuck in a local minimum or not. The reason why a lot of humans reject dopamine drips is that they don't conceptualize their "reward button" properly. That misconception perpetuates itself: it penalizes the very idea of conceptualizing it differently. Granted, AIXI would not fall into local minima, but most realistic training methods would.
At first, the AI would converge towards: "my reward button corresponds to (is) doing what humans wan...
Why does the hard takeoff point have to be after the point at which an AI is as good as a typical human at understanding semantic subtlety? In order to do a hard takeoff, the AI needs to be good at a very different class of tasks than those required for understanding humans that well.
Semantic extraction -- not hard takeoff -- is the task that we want the AI to be able to do. An AI which is good at, say, rewriting its own code, is not the kind of thing we would be interested in at that point, and it seems like it would be inherently more difficult than i...
Ok, so let's say the AI can parse natural language, and we tell it, "Make humans happy." What happens? Well, it parses the instruction and decides to implement a Dopamine Drip setup.
That's not very realistic. If you trained AI to parse natural language, you would naturally reward it for interpreting instructions the way you want it to. If the AI interpreted something in a way that was technically correct, but not what you wanted, you would not reward it, you would punish it, and you would be doing that from the very beginning, well before the ...
...Realistically, AI would be constantly drilled to ask for clarification when a statement is vague. Again, before the AI is asked to make us happy, it will likely be asked other things, like building houses. If you ask it: "build me a house", it's going to draw a plan and show it to you before it actually starts building, even if you didn't ask for one. It's not in the business of surprises: never, in its whole training history, from baby to superintelligence, would it have been rewarded for causing "surprises" -- even the instruction &q
What counts as 'resources'? Do we think that 'hardware' and 'software' are natural kinds, such that the AI will always understand what we mean by the two? What if software innovations on their own suffice to threaten the world, without hardware takeover?
What is "taking over the world", if not taking control of resources (hardware)? Where is the motivation in doing it? Also consider, as others pointed out, that an AI which "misunderstands" your original instructions will demonstrate this earlier than later. For instance, if you create...
programmers build a seed AI (a not-yet-superintelligent AGI that will recursively self-modify to become superintelligent after many stages) that includes, among other things, a large block of code I'll call X.
The programmers think of this block of code as an algorithm that will make the seed AI and its descendents maximize human pleasure.
The problem, I reckon, is that X will never be anything like this.
It will likely be something much more mundane, i.e. modelling the world properly and predicting outcomes given various counterfactuals. You might be worr...
We were talking about extracting knowledge about a particular human from that human's text stream, though. It is already assumed that the AI knows about human psychology. I mean, assuming the AI can understand a natural language such as English, it obviously already has access to a large corpus of written works, so I'm not sure why it would bother foraging in source code, of all things. Besides, it is likely that seed AI would be grown organically using processes inspired from evolution or neural networks. If that is so, it wouldn't even contain any human-written code at all.
I'm unsure of how much an AI could gather from a single human's text input. I know that I at least miss a lot of information that goes past me that I could in theory pick up.
At most, the number of bits contained in the text input, which is really not much, minus the number of bits non-AGI algorithms could identify and destroy (like speech patterns). The AI would also have to identify and throw out any fake information inserted into the stream (without knowing whether the majority of the information is real or fake). The exploitable information is going ...
A creature that loves solitude might not necessarily be bad to create. But it would still be good to give it capacity for sympathy for pragmatic reasons, to ensure that if it ever did meet another creature it would want to treat it kindly and avoid harming it.
Fair enough, though at the level of omnipotence we're supposing, there would be no chance meetups. You might as well just isolate the creature and be done with it.
...A creature with no concept of boredom would would, (to paraphrase Eliezer), "play the same screen of the same level of the same f
That's true, but if it's "progress" then it must be progress towards something. Will we eventually arrive at our destination, decide society is pretty much perfect, and then stop? Is progress somehow asymptotic so we'll keep progressing and never quite reach our destination?
It's quite hard to tell. "Progress" is always relative to the environment you grew up in and on which your ideas and aspirations are based. At the scale of a human life, our trajectory looks a lot like a straight line, but for all we know, it could be circular. At...
Without any other information, it is reasonable to place the average to whatever time it takes us (probably a bit over a century), but I wouldn't put a lot of confidence in that figure, having been obtained from a single data point. Radio visibility could conceivably range from a mere decade (consider that computers could have been developed before radio -- had Babbage been more successful -- and expedite technological advances) to perhaps millennia (consider dim-witted beings that live for centuries and do everything we do ten times slower).
Several differ...
I consider it almost certain that if we were to create a utilitarian AI it would kill the entire human race and replace it with creatures whose preferences are easier to satisfy. And by "easier to satisfy" I mean "simpler and less ambitious," not that the creatures are more mentally and physically capable of satisfying humane desires.
It would not necessarily kill off humanity to replace it by something else, though. Looking at the world right now, many countries run smoothly, and others horribly, even though they are all inhabited an...
You would only create these viruses if the total utility of the viruses you can create with the resources at your disposal exceeds the utility of the humans you could make with these same resources. For instance, if you give a utility of 1 to a steel paperclip weighing 1 gram, then assuming a simple additive model (which I wouldn't, but that's besides the point) making one metric ton of paperclips has an utility of 1,000,000. If you give an utility of 1,000,000,000 to a steel sculpture weighing a ton, it follows that you will never make any paperclips unless you have less than a ton of iron. You will always make the sculpture, because it gives 1,000 times the utility for the exact same resources.
On the other hand, based on our own experience, broadcasting radio signals is a waste of energy and bandwidth, so it is likely an intelligent society would quickly move to low-power, focused transmissions (e.g. cellular networks or WiFi). Thus the radio "signature" they broadcast to the universe would peak for a few centuries at most before dying down as they figure out how to shut down the "leaks". That would explain why we observe nothing, if intelligent societies do exist in the vicinity. Of course, these societies might also evolve ...
Ah, sorry, I might not have been clear. I was referring to what may be physically feasible, e.g. a 3D circuit in a box with inputs coming in from the top plane and outputs coming out of the bottom plane. If you have one output that depends on all N inputs and pack everything as tightly as possible, the signal would still take Ω(sqrt(N)) time to reach. From all the physically doable models of computation, I think that's likely as good as it gets.
If the AI is a learning system such as a neural network, and I believe that's quite likely to be the case, there is no source/object dichotomy at all and the code may very well be unreadable outside of simple local update procedures that are completely out of the AI's control. In other words, it might be physically impossible for both the AI and ourselves to access the AI's object code -- it would be locked in a hardware box with no physical wires to probe its contents, basically.
I mean, think of a physical hardware circuit implementing a kind of neuron ne...
And technically you can lower that to sqrt(M) if you organize the inputs and outputs on a surface.
There are a lot of "ifs", though.
If that AI runs on expensive or specialized hardware, it can't necessarily expand much. For instance, if it runs on hardware worth millions of dollars, it can't exactly copy itself just anywhere yet. Assuming that the first AI of that level will be cutting edge research and won't be cheap, that gives a certain time window to study it safely.
The AI may be dangerous if it appeared now, but if it appears in, say, fifty years, then it will have to deal with the state of the art fifty years from now. Expanding with
A huge amount of progress has been made in compilers, in terms of designing languages that implement powerful features in reasonable amounts of computing time; just try taking any modern Python or Ruby or C++ program and porting it to Altair BASIC
The "powerful features" of Python and Ruby are only barely catching up to Lisp, and as far as I know Lisp is still faster than both of them.
No problem is perfectly parallelizable in a physical sense. If you build a circuit to solve a problem, and that the circuit is one light year across in size, you're probably not going to solve it in under a year -- technically, any decision problem implemented by a circuit is at least O(n) because that's how the length of the wires scale.
Now, there are a few ways you might want to parallelize intelligence. The first way is by throwing many independent intelligent entities at the problem, but that requires a lot of redundancy, so the returns on that will no...
Because that's how it works! The system "is" PA, so it will trust (weaker) systems that it (PA) can verify, but it will not trust itself (PA).
That doesn't seem consistent to me. If you do not trust yourself fully, then you should not fully trust anything you demonstrate, and even if you do, there is still no incentive to switch. Suppose that the AI can demonstrate the consistency of system S from PA, and wants to demonstrate proposition A. If AI trusts S as demonstrated by PA, then it should also trust A as demonstrated by PA, so there is no r...
If the system did not trust PA, why would it trust a system because PA verifies it? More to the point, why would it trust a self-verifying system, given that past a certain strength, only inconsistent systems are self-verifying?
If the system held some probability that PA was inconsistent, it could evaluate it on the grounds of usefulness, perhaps contrasting it with other systems. It could also try to construct contradictions, increasing its confidence in PA for as long as it doesn't find any. That's what we do, and frankly, I don't see any other way to do it.
Why would successors use a different system, though? Verifying proofs in formal systems is easy, it's coming up with the proofs that's difficult -- an AI would refine its heuristics in order to figure out proofs more efficiently, but it would not necessarily want to change the system it is checking them against.
I apologize for the late response, but here goes :)
I think you missed the point I was trying to make.
You and others seem to say that we often poorly evaluate the consequences of the utility functions that we implement. For instance, even though we have in mind utility X, the maximization of which would satisfy us, we may implement utility Y, with completely different, perhaps catastrophic implications. For instance:
What I was pointing out in my post is that this is only valid of perfect maximi... (read more)