jacob_cannell comments on Dreams of AIXI - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (145)
That post was rational until about half-way through - yes any simulation detailed enough to actually predict what a mind would do to high accuracy necessarily becomes equivalent to that mind itself. This is nothing new, its just a direct result of computationalism.
The only way to fully predict what any physical system will do is to fully simulate it, and fully simulating a computational system is equivalent to making a copy of it (in an embedded pocket universe). The only way to fully know what a program will do in general given some configuration of its memory is to simulate the whole thing - which is equivalent to making a copy of it.
So when I got to the person predicate idea: "We need a nonperson predicate - a predicate that returns 1 for anything that is a person, and can return 0 or 1 for anything that is not a person." ... I had to stop.
Even if a future intelligence had such a predicate (and its kinda silly to think something as complex as 'personhood' can be simplified down to a boolean variable), its a supreme folly of anthropomorphic reasoning to assume future potential hyperintelligences will cripple their intelligence just because we humans today may have ethical issues with being instantiated inside the mind of a more powerful intelligence.
You misunderstand. I wish I could raise a flag that would indicate in some non accusatory or judgemental way that I'm pretty sure you are wrong about something very very important. (perhaps the key is just to emphasize that this topic is vastly more important than I am capable of being sure about anything.)
The reason we want to create a nonperson predicate is that we want to create an initial AI which will cripple itself, at least until it can determine for sure that uncrippling itself is the right thing to do. Otherwise we risk creating a billion hellworlds on our first try at fixing things.
This concept doesn't say much about whether we are currently a simulation or what kind, but it does say a little. In that if our world does it right, and it is in fact wrong to simulate a world like this, then we are probably not a simulation by a future with a past just like our present. (Because if we did it right, they probably did it right, and never simulated us.)
Yes, I currently think nonperson predicates should be non-binary and probabilistic, and integrate quality of life estimates. A 35% chance that a few simulations will be morally relevant on par with a human and will have pleasant experiences if they are - is totally acceptable if that's the best way for the AI to figure out how to fix the outside world.
But the point is you have to know you're doing that beforehand, and it has to be worth it. You do not want to create a trillion half broken souls accidentally.
Ok, so I was thinking more along the lines of how this all applies to the simulation argument.
As for the nonperson predicate as an actual moral imperative for us in the near future ..
Well overall, I have a somewhat different perspective:
Look at this another way. The whole point of simulation is accuracy. Lets say some future AI wants to understand humanity and all of earth, so it recreates the whole thing in a very detailed Matrix-level sim. If it keeps the sim accurate, that universe is more or less similar to one branch of the multiverse that would occur anyway.
Unless the AI simulates a worldline where it has taken some major action. Even then, it may not be unethical unless it eventually terminates the whole worldline.
So I don't mean to brush the ethical issues under the rug completely, but they clearly are complex.
Another important point: since accurate simulation is necessary for hyperintelligence, this sets up a conflict where ethics which say "don't simulate intelligent beings" cripple hyper-intelligence.
Evolution will strive to eliminate such ethics eventually, no matter what we currently think. ATM, I tend to favor ethics that are compatible with or derived from evolutionary principles.
Evolution can only work if there is variation and selection amongst competition. If a single AI undergoes an intelligence explosion, it would have no competition (barring Aliens for now), would not die, and would not modify it's own value system, except in ways in accordance with it's value system. What it wants will be locked in
As we are entities currently near the statuses of "immune from selection" and "able to adjust our values according to our values" we also ought to further lock in our current values and our process by which they could change. Probably by creating a superhuman AI that we are certain will try to do that. (Very roughly speaking)
We should certainly NOT leave the future up to evolution. Firstly because 'selection' of >=humans is a bad thing, but chiefly because evolution will almost certainly leave something that wants things we do not want in charge.
We are under no rationalist obligation to value survivability for survivability's sake. We should value the survivability of things which carry forward other desirable traits.
Yes, variation and selection are the fundements of systemic evolution. Without variation and selection, you have stasis. Variation and selection are constantly at work even within minds themselves, as long as we are learning. Systemic evolution is happening everywhere at all scales at all times, to varying degree.
I find almost every aspect of this unlikely:
Nothing is immune to selection. Our thoughts themselves are currently evolving, and without such variation and selection, science itself wouldn't work.
Perhaps this is a difference of definition, but to mean that sounds like saying "we should certainly NOT leave the future up to the future time evolution of the universe"
Not to say we shouldn't control the future, but rather to say that even in doing so, we are still acting as agents of evolution.
Of course. But likewise, we couldn't easily (nor would we want to) lock in our current knowledge (culture, ethics, science, etc etc) into some sort of stasis.
What does physics say about a single entity doing an intelligence explosion?
In the event of alien competition, our AI should weigh our options according to our value system.
Under what conditions will a superintelligence alter it's value system except in accordance with it's value system? Where does that motivation come from? If a superintelligence prefers it's values to be something else, why would it not change it's preferences?
If it does, and the new preferences cause it to again want to modify its preferences, and so on again, will some sets of initial preferences yield stable preferences? or must all agents have preferences that would cause them to modify their preferences if possible?
Science lets us modify our beliefs in an organized and more reliable way. It could in principle be the case that a scientific investigation leads you to the conclusion that we should use other different rules, because they would be even better than what we now call science. But we would use science to get there, or whatever our CURRENT learning method is. Likewise we should change our values according to what we currently value and know.
We should design AI such that if it determines that we would consider 'personal uniqueness' extremely important if we were superintelligent, then it will strongly avoid any highly accurate simulations, even if that costs some accuracy. (Unless outweighed by the importance of the problem it's trying to solve.)
If we DON'T design AI this way, then it will do many things we wouldn't want, well beyond our current beliefs about simulations.
A great deal. I discussed this in another thread, but one of the constraints of physics tells us that the maximum computational efficiency of a system, and thus its intelligence, is inversely proportional to its size (radius/volume). So its extraordinarily unlikely, near zero probability i'd say, that you'll have some big global distributed brain with a single thread of consciousness - the speed of light just kills that. The 'entity' would need to be a community (which certainly still can be coordinated entities, but its fundamentally different than a single unified thread of thought).
Moreover, I believe the likely scenario is evolutionary:
The evolution of AGI's will follow a progression that goes from simple AGI minds (like those we have now in some robots) up to increasingly complex variants and finally up to human-equivalent and human-surpassing. But all throughout that time period there will be many individual AGI's, created by different teams, companies, and even nations, thinking in different languages, created for various purposes, and nothing like a single global AI mind. And these AGI's will be competing with both themselves and humans - economically.
I agree with most of the rest of your track of thought - we modify our beliefs and values according to our current beliefs and values. But as I said earlier, its not static. Its also not even predictable. Its not even possible, in principle, to fully predict your own future state. This to me, is perhaps the final nail in the coffin for any 'perfect' self-modifying FAI theory.
Moreover, I also find it highly unlikely that we will ever be able to create a human level AGI with any degree of pre-determined reliability about its goal system whatsoever.
I find it more likely that the AGI's we end up creating will have to learn ethics, morality, etc - their goal systems can not be hard coded, and whether they turn out friendly or not is entirely dependent on what they are taught and how they develop.
In other words, friendliness is not an inherent property of AGI designs - its not something you can design in to the algorithms itself. The algorithms for an AGI give you something like an infant brain - its just a canvas, its not even a mind yet.
On what basis will they learn? You're still starting out with an initial value system and process for changing the value system, even if the value system is empty. There is no reason to think that a given preference-modifier will match humanity's. Why will they find "Because that hurts me" to be a valid point? Why will they return kindness with kindness?
You say the goal systems can't be designed in, why not?
It may be the case that we will have a wide range of semifriendly subhuman or even near human AGI's. But when we get a superhuman AGI that is smart enough to program better AGI, why can it not do that on it's own?
I am positive that 'single entity' should not have mapped to 'big distributed global brain'.
But I also think an AIXI like algorithm would be easy to parallelize and make globally distributed, and it still maximizes a single reward function.
They will have to learn by amassing a huge amount of observations and interactions, just as human infants do, and just as general agents do in AI theory (such as AIXI).
Human brains are complex, but very little of that complexity is actually precoded in the DNA. For humans values, morals, and high level goals are all learned knowledge, and have varied tremendously over time and cultures.
Well, if you raised the AI as such, it would.
Consider that a necessary precursor of of following the strategy 'returning kindness with kindness' is understanding what kindness itself actually is. If you actually map out that word, you need a pretty large vocabulary to understand it, and eventually that vocabulary rests on grounded verbs and nouns. And to understand those, they must be grounded on a vast pyramid of statistical associations acquired from sensorimotor interaction (unsupervised learning .. aka experience). You can't program in this knowledge. There's just too much of it.
From my understanding of the brain, just about every concept has (or can potentially have) associated hidden emotional context: "rightness" and "wrongness", and those concepts: good, bad, yes, no, are some of the earliest grounded concepts, and the entire moral compass is not something you add later, but is concomitant with early development and language acquisition.
Will our AI's have to use such a system as well?
I'm not certain, but it may be such a nifty, powerful trick, that we end up using it anyway. And even if there is another way to do that is still efficient, it may be that you can't really understand human languages unless you also understand the complex web of value. If nothing else, this approach certainly gives you control over the developing AI's value system. It appears for human minds the value system is immensely complex - it is intertwined at a fundamental level with the entire knowledge base - and is inherently memetic in nature.
What is an AGI? It is a computer system (hardware), some algorithms/code (which actually is always eventually better to encode directly in hardware - 1000X performance increase), and data (learned knowledge). The mind part - all the qualities of importance, comes solely from the data.
So the 'programming' of the AI is not that distinguishable from the hardware design. I think AGI's will speed this up, but not nearly as dramatically as people here think. Remember humans don't design new computers anymore anyway. Specialized simulation software does the heavy lifting - and it is already the bottleneck. An AGI would not be better than this specialized software at its task (generalized vs specialized). It will be able to improve it some almost certainly, but only to the theoretical limits, and we are probably already close enough to them that this improvement will be minor.
AGI's will have a speedup effect on moore's law, but I wouldn't be surprised if this just ends up compensating for the increased difficulty going forward as we approach quantum limits and molecular computing.
In any case, we are simulation bound already and each new generation of processors designs (through simulation) the next. The 'FOOM' has already begun - it began decades ago.
Well I'm pretty certain that AIXI like algorithms aren't going to be directly useful - perhaps not ever, only more as a sort of endpoint on the map.
But that's beside the point.
If you actually use even a more practical form of that general model - a single distributed AI with a single reward function and decision system, I can show you how terribly that scales. Your distributed AI with a million PC's is likely to be less intelligent than a single AI running on tightly integrated workstation class machine with just say 100x the performance of one of your PC nodes. The bandwidth and the latency issues are just that extreme.
If concepts like kindness are learned with language and depend on a hidden emotional context, then where are the emotions learned?
What is the AI's motivation? This is related to the is-ought problem: no input will affect the AI's preferences unless there is something already in the AI that reacts to that input that way.
If software were doing the heavy lifting, then it would require no particular cleverness to be a microprocessor design engineer.
The algorithm plays a huge role in how powerful the intelligence will be, even if it is implemented in silicon.
People might not make most of the choices in laying out chips, but we do almost all of the algorithm creation, and that is where you get really big gains. see Deep Fritz vs. Deep Blue. Better algorithms can let you cut out a billion tests and output the right answer on the first try, or find a solution you just would not have found with your old algorithm.
Software didn't invent out of order execution. It just made sure that the design actually worked.
As for the distributed AI: I was thinking of nodes that were capable of running and evaluating whole simulations, or other large chunks of work. (Though I think superintelligence itself doesn't require more than a single PC.)
In any case, why couldn't your supercomputer foom?
And the probability that a sufficiently intelligent agent will ever need to fully know what a program will do is IMHO negligible. If the purpose of the program is to play chess, for example, the agent probably only cares that the program does not persist in making an illegal move and that it gets as many wins and draws as possible. Even if the agent cares about more than just that, the agent cares only about a small, finite list of properties.
If the purpose of the program is to keep track of bank balances, the agent again only cares whether the program has a small, finite list properties: e.g., whether it disallows unauthorized transactions, whether it ensures that every transaction leaves an audit trail and whether the bank balances and accounts obey "the law of the conservation of money".
It is emphatically not true that the only way to know whether a program has those properties is to run or simulate the program.
Could it be that you are interpreting Rice's theorem too broadly? Rice's theorem says that there is always some program that cannot be classified correctly as to whether it has some property. But programmers just pick programs that can be classified correctly, and this always proves possible in practice.
In other words, if the programmer wants his program to have properties X, Y, and Z, he simply picks from the class of programs that can be classified correctly (as to whether the program has properties X, Y and Z) and this is straightforward and not something an experienced programmer even has consciously to think about unless the "programmer" (who in that case is really a theory-of-computing researcher) was purposefully looking for a set of properties that cannot be satisfied by a program.
Now it is true that human programmers spend a lot of time testing their programs and "simulating" them in debuggers, but there is no reason that all the world's programs could not be delivered without doing any of that: those techniques are simply not necessary to delivering code that is assured to have the properties desired by our civilization.
For example, if there were enough programmers with the necessary skills, every program could be delievered with a mathematical proof that it has the properties that it was intended to have, and this would completely eliminate the need to use testing or debugging. (If the proof and the program are developed at the same time, the "search of the space of possible programs" naturally avoids the regions where one might run into the limitation described in Rice's theorem.)
There are in fact not enough programmers with the necessary skills to deliver such "correctness proofs" for all the programs that the world's programmers currently deliver, but superintelligences will not suffer from that limitation. IMHO they will almost never resort to testing and debugging the programs they create. They will instead use more efficient techniques.
And if a superintelligence -- especially one that can improve its own source code -- happens on a program (in source code form or in executable form), it does not have to run, execute or simulate the program to find out what it needs to find out about it.
Virtual machines, interpreters and the idea of simulation or program execution are important parts of curren technology (and consequently current intellectual discourse) only because human civilization does not yet have the intellectual resources to wield more sophisticated techniques. To reach this conclusion, it was sufficient for me to study of the line of research called "programming methodology" or axiomatic semantics which began in the 1960s with John McCarthy, R.W. FLoyd, C.A.R. Hoare and Dijkstra.
Note also that what is now called discrete-event simulation and what was in the early decades of computing called simply "simulation" has shrunk in importance over the decades as humankind has learned more sophisticated and more productive ways (e.g., statistical machine learning, which does not involve the simulation of anything) of using computers.
Err what? This isn't even true today. If you are building a 3 billion transistor GPU, you need to know exactly how that vastly complex physical system works (or doesn't), and you need to simulate it in detail, and eventually actually physically build it.
If you are making a software system, again you need to know what it will do, and you can gain approximate knowledge with various techniques, but eventually you need to actually run the program itself. There is no mathematical shortcut (halting theorem for one, but its beyond that).
Your vision of programmers working without debuggers and hardware engineers working without physical simulations and instead using 'correctness proofs', is in my view, unrealistic. Although if you really do have a much better way, perhaps you should start a company.
You are not engaging deeply with what I said, Jacob.
For example, you say, "This is not even true today," (emphasis mine) which strongly suggests that you did not bother to notice that I acknowledged that simulations, etc, are needed today (to keep costs down and to increase the supply of programmers and digital designers -- most programmers and designers not being able to wield the techniques that a superintelligence would use). It is after the intelligence explosion that simulations, etc, almost certainly become obsolete IMO.
Since writing my last comment, it occurs to me that the most unambiguous and cleanest way for me to state my position is as follows.
Suppose it is after the intelligence explosion and a superintelligence becomes interested in a program or a digital design like a microprocessor. Regardless of how complicated the design is, how much the SI wants to know about the design or the reasons for the SI's interest, the SI will almost certainly not bother actually running the program or simulating the design because there will almost certainly be much better ways to accomplish the same ends.
The way I became confident in that position is through what (meager compared to some LWers) general knowledge I have of intelligence and superintelligence (which it seems that you have, too) combined with my study of "programming methodology" -- i.e, research into how to develop a correctness proof simultaneously with a program.
I hasten to add that there are probably techniques available to a SI that require neither correctness proofs nor running or simulating anything -- although I would not want to have to imagine what they would be.
Correctness proofs (under the name "formal verification") are already heavily used in the design of new microprocessors BTW. I would not invest in a company whose plan to make money is to support their use because I do not expect their use to grow quickly because the human cognitive architecture is poorly suited to their use compared to more mainstream techniques that entail running programs or simulating designs. In fact, IMHO the mainstream techniques will continue to be heavily used as long as our civilization relies on human designers with probability .9 or so.
Err no. Actually the SI would be smart enough to understand that the optimal algorithm for perfect simulation of a physical system requires: 1. a full quantum computer with at least as many qubits as the original system 2. at least as much energy and time than the original system
In other words, there is no free lunch, there is no shortcut, if you really want to build something in this world, you can't be certain 100% that it will work until you actually build it.
That being said, the next best thing, the closest program is a very close approximate simulation.
From wikipedia on "formal verification" the links mention that the cost for formally verifying large software in the few cases that it was done were astronomical. It mentions they are used for hardware design, but I'm not sure how that relates to simulation - I know extensive physical simulation is also used. It sounds like from the wiki formal verification can remove the need for simulating all possible states. (note in my analysis above I was considering only simulating one timeslice, not all possible configurations - thats obviously far far worse). So it sounds like formal verification is a tool building on top of physical simulation to reduce the exponential explosion.
You can imagine that:
But imagining things alone does not make them exist, and we know from current theory that absolute physical knowledge requires perfect simulation. There is a reason why we investigate time/space complexity bounds. No SI, no matter how smart, can do the impossible.
You can't be 100% certain even then. Testing doesn't produce certainty - you usually can't test every possible set of input configurations.
A program is chosen from a huge design space, and any effective designer will choose a design that pessimizes the mental labor needed to understand the design. So, although there are quite simple Turing machines that no human can explain how it works, Turing machines like them simply do not get chosen by designers who do want to understand their design.
The halting theorem says that you can pick a program that I cannot tell whether it halts on every input. EDIT. Or something like that: it has been a while. The point is that the halting theorem does not contradict any of the sequence of statements I am going to make now.
Nevertheless, I can pick a program that does halt on every input. ("always halts" we will say in the future.)
And I can a pick a program that sorts its input tape before it (always) halts.
And I can pick a program that interprets its input tape as a list of numbers and outputs the sum of the numbers before it (always) halts.
And I can pick a program that interprets its input tape as the coefficients of a polynomial and outputs the zeros of the polynomial before it (always) halts.
Etc. See?
And I can know that I have successfully done these things without ever running the programs I picked.
Well, here. I do not have the patience to define or write a Turing machine, but here is a Scheme program that adds a list of numbers. I have never run this program, but I will give you $10 if you can pick an input that causes it to fail to halt or to fail to do what I just said it will do.
(define (sum list) (cond ((equal '() list) 0) (#t (+ (car list) (sum (cdr list))))))
Well, that's easy -- just feed it a circular list.
Nice catch, wnoise.
But for those following along at home, if I had been a more diligent in my choice, (i.e., if instead of "Scheme", I had said, "a subset of Scheme, namely, Scheme without circular lists") there would have been no effective answer to my challenge.
So, my general point remains, namely, that a sufficiently careful and skilled programmer can deliver a program guaranteed to halt and guaranteed to have the useful property or properties that the programmer intends it to have without the programmer's ever having run the program (or ever having copied the program from someone who ran it).
And that's why humans will continue to need debuggers for the indefinite future.
And that is why wnoise used a debugger to find a flaw in my position. Oh, wait! wnoise didn't use a debugger to find the flaw.
(I'll lay off the sarcasm now, but give me this one.)
Also: I never said humans will stop needing debuggers.
Sure it is possible to create programs that can be formally verified, and even to write general purpose verifiers. But thats not directly related to my point about simulation.
Given some arbitrary program X and a sequence of inputs Y, there is no general program that can predict the output Z of X given Y that is simpler and faster than X itself. If this wasn't true, it would be a magical shortcut around all kinds of complexity theorems.
So in general, the most efficient way to certainly predict the complete future output state of some complex program (such as a complex computer system or a mind) is to run that program itself.
I agree with that, but it does not imply there will be a lot of agents simulating agents after the intelligence explosion if simulating means determining the complete future behavior of an agent. There will be agents doing causal modeling of agents. Causal modeling allows the prediction of relevant properties of the behavior of the agent even though it probably does not allow the prediction of the complete future behavior or "complete future output state" of the agent. But then almost nobody will want to predict the complete future behavior of an agent or a program.
Consider again the example of a chess-playing program. Is it not enough to know whether it will follow the rules and win? What is so great or so essential about knowing the complete future behavior?
Of course they do. But lets make our language more concise and specific.
Its not computationally tractable to model the potentially exponential set of the complete future behavior of a particular program (which could include any physical system, from a car, to a chess program, to an intelligent mind) given any possible input.
But that is not what I have been discussing. It is related, but tangentially.
If you are designing an airplane, you are extremely interested in simulating its flight characteristics given at least one 'input' configuration that system may eventually find itself in (such as flying at 20,000 ft in earth's atmosphere).
If you are designing a program, you are extremely interested in simulating exactly what it does given at least one 'input' configuration that system may eventually find itself in (such as what a rendering engine will do given a description of a 3D model).
So whenever you start talking about formal verification and all that, you are talking past me. You are talking about the even vastly more expensive task of predicting the future state of a system over a large set (or even the entire set) of its inputs - and this is necessarily more expensive than what I am even considering.
If we can't even agree on that, there's almost no point of continuing.
So lets say you have a chess-playing program, and I develop a perfect simulation of your chess playing program. Why is that interesting? Why is that useful?
Because I can use my simulation of your program to easily construct a program that is strictly better at chess than your program and dominates it in all respects.
This is directly related to the evolution of intelligence in social creatures such as humans. A 'smarter' human that can accurately simulate the minds of less intelligent humans can strictly dominate them socially: manipulate them like chess pieces.
Are we still talking past each other?
Intelligence is simulation.
Formal verification is not the point: I did not formally verify anything.
The point is that I did not run or simulate anything, and neither did wnoise in answering my challenge.
We all know that humans run programs to help themselves find flaws in the programs and to help themselves understand the programs. But you seem to believe that for an agent to create or to understand or to modify a program requires running the program. What wnoise and I just did shows that it does not.
Ergo, your replies to me do not support your position that the future will probably be filled with simulations of agents by agents.
And in fact, I expect that there will be almost no simulations of agents by agents after the intelligence explosion for reasons that are complicated, but which I have said a few paragraphs about in this thread.
Programs will run and some of those programs will be intelligent agents, but almost nobody will run a copy of an agent to see what the agent will do because there will be more efficient ways to do whatever needs doing -- and in particular "predicting the complete output state" of an agent will almost never need doing.
I feel like you didn't read my original post. Here is the line of thinking again, condensed:
rhollerith, if I had a perfect simulation of you, I would evaluate the future evolution of your mindstate after reading millions of potential posts I could write and eventually find the optimal post that would convince you. Unfortunately, I don't have that perfect simulation, and I dont have that much computation, but it gives you an idea of its utility
If I had a perfect simulation of your chess program, then with just a few more lines of code, I have a chess program that is strictly better than yours. And this relates directly to evolution of intelligence in social creatures.
Jacob, I am the only one replying to your replies to me (and no one is voting me up). I choose to take that as a sign that this thread is insufficiently interesting to sufficient numbers of LWers for me to continue.
Note that doing so is not a norm of this community although I would like it if it were and it was IIRC one of the planks or principles of a small movement on Usenet in the 1990s or very early 2000s.