Very soon, Eliezer is supposed to start posting a new sequence, on "Open Problems in Friendly AI". After several years in which its activities were dominated by the topic of human rationality, this ought to mark the beginning of a new phase for the Singularity Institute, one in which it is visibly working on artificial intelligence once again. If everything comes together, then it will now be a straight line from here to the end.
I foresee that, once the new sequence gets going, it won't be that easy to question the framework in terms of which the problems are posed. So I consider this my last opportunity for some time, to set out an alternative big picture. It's a framework in which all those rigorous mathematical and computational issues still need to be investigated, so a lot of "orthodox" ideas about Friendly AI should carry across. But the context is different, and it makes a difference.
Begin with the really big picture. What would it take to produce a friendly singularity? You need to find the true ontology, find the true morality, and win the intelligence race. For example, if your Friendly AI was to be an expected utility maximizer, it would need to model the world correctly ("true ontology"), value the world correctly ("true morality"), and it would need to outsmart its opponents ("win the intelligence race").
Now let's consider how SI will approach these goals.
The evidence says that the working ontological hypothesis of SI-associated researchers will be timeless many-worlds quantum mechanics, possibly embedded in a "Tegmark Level IV multiverse", with the auxiliary hypothesis that algorithms can "feel like something from inside" and that this is what conscious experience is.
The true morality is to be found by understanding the true decision procedure employed by human beings, and idealizing it according to criteria implicit in that procedure. That is, one would seek to understand conceptually the physical and cognitive causation at work in concrete human choices, both conscious and unconscious, with the expectation that there will be a crisp, complex, and specific answer to the question "why and how do humans make the choices that they do?" Undoubtedly there would be some biological variation, and there would also be significant elements of the "human decision procedure", as instantiated in any specific individual, which are set by experience and by culture, rather than by genetics. Nonetheless one expects that there is something like a specific algorithm or algorithm-template here, which is part of the standard Homo sapiens cognitive package and biological design; just another anatomical feature, particular to our species.
Having reconstructed this algorithm via scientific analysis of human genome, brain, and behavior, one would then idealize it using its own criteria. This algorithm defines the de-facto value system that human beings employ, but that is not necessarily the value system they would wish to employ; nonetheless, human self-dissatisfaction also arises from the use of this algorithm to judge ourselves. So it contains the seeds of its own improvement. The value system of a Friendly AI is to be obtained from the recursive self-improvement of the natural human decision procedure.
Finally, this is all for naught if seriously unfriendly AI appears first. It isn't good enough just to have the right goals, you must be able to carry them out. In the global race towards artificial general intelligence, SI might hope to "win" either by being the first to achieve AGI, or by having its prescriptions adopted by those who do first achieve AGI. They have some in-house competence regarding models of universal AI like AIXI, and they have many contacts in the world of AGI research, so they're at least engaged with this aspect of the problem.
Upon examining this tentative reconstruction of SI's game-plan, I find I have two major reservations. The big one, and the one most difficult to convey, concerns the ontological assumptions. In second place is what I see as an undue emphasis on the idea of outsourcing the methodological and design problems of FAI research to uploaded researchers and/or a proto-FAI which is simulating or modeling human researchers. This is supposed to be a way to finesse philosophical difficulties like "what is consciousness anyway"; you just simulate some humans until they agree that they have solved the problem. The reasoning goes that if the simulation is good enough, it will be just as good as if ordinary non-simulated humans solved it.
I also used to have a third major criticism, that the big SI focus on rationality outreach was a mistake; but it brought in a lot of new people, and in any case that phase is ending, with the creation of CFAR, a separate organization. So we are down to two basic criticisms.
First, "ontology". I do not think that SI intends to just program its AI with an apriori belief in the Everett multiverse, for two reasons. First, like anyone else, their ventures into AI will surely begin with programs that work within very limited and more down-to-earth ontological domains. Second, at least some of the AI's world-model ought to be obtained rationally. Scientific theories are supposed to be rationally justified, e.g. by their capacity to make successful predictions, and one would prefer that the AI's ontology results from the employment of its epistemology, rather than just being an axiom; not least because we want it to be able to question that ontology, should the evidence begin to count against it.
For this reason, although I have campaigned against many-worlds dogmatism on this site for several years, I'm not especially concerned about the possibility of SI producing an AI that is "dogmatic" in this way. For an AI to independently assess the merits of rival physical theories, the theories would need to be expressed with much more precision than they have been in LW's debates, and the disagreements about which theory is rationally favored would be replaced with objectively resolvable choices among exactly specified models.
The real problem, which is not just SI's problem, but a chronic and worsening problem of intellectual culture in the era of mathematically formalized science, is a dwindling of the ontological options to materialism, platonism, or an unstable combination of the two, and a similar restriction of epistemology to computation.
Any assertion that we need an ontology beyond materialism (or physicalism or naturalism) is liable to be immediately rejected by this audience, so I shall immediately explain what I mean. It's just the usual problem of "qualia". There are qualities which are part of reality - we know this because they are part of experience, and experience is part of reality - but which are not part of our physical description of reality. The problematic "belief in materialism" is actually the belief in the completeness of current materialist ontology, a belief which prevents people from seeing any need to consider radical or exotic solutions to the qualia problem. There is every reason to think that the world-picture arising from a correct solution to that problem will still be one in which you have "things with states" causally interacting with other "things with states", and a sensible materialist shouldn't find that objectionable.
What I mean by platonism, is an ontology which reifies mathematical or computational abstractions, and says that they are the stuff of reality. Thus assertions that reality is a computer program, or a Hilbert space. Once again, the qualia are absent; but in this case, instead of the deficient ontology being based on supposing that there is nothing but particles, it's based on supposing that there is nothing but the intellectual constructs used to model the world.
Although the abstract concept of a computer program (the abstractly conceived state machine which it instantiates) does not contain qualia, people often treat programs as having mind-like qualities, especially by imbuing them with semantics - the states of the program are conceived to be "about" something, just like thoughts are. And thus computation has been the way in which materialism has tried to restore the mind to a place in its ontology. This is the unstable combination of materialism and platonism to which I referred. It's unstable because it's not a real solution, though it can live unexamined for a long time in a person's belief system.
An ontology which genuinely contains qualia will nonetheless still contain "things with states" undergoing state transitions, so there will be state machines, and consequently, computational concepts will still be valid, they will still have a place in the description of reality. But the computational description is an abstraction; the ontological essence of the state plays no part in this description; only its causal role in the network of possible states matters for computation. The attempt to make computation the foundation of an ontology of mind is therefore proceeding in the wrong direction.
But here we run up against the hazards of computational epistemology, which is playing such a central role in artificial intelligence. Computational epistemology is good at identifying the minimal state machine which could have produced the data. But it cannot by itself tell you what those states are "like". It can only say that X was probably caused by a Y that was itself caused by Z.
Among the properties of human consciousness are knowledge that something exists, knowledge that consciousness exists, and a long string of other facts about the nature of what we experience. Even if an AI scientist employing a computational epistemology managed to produce a model of the world which correctly identified the causal relations between consciousness, its knowledge, and the objects of its knowledge, the AI scientist would not know that its X, Y, and Z refer to, say, "knowledge of existence", "experience of existence", and "existence". The same might be said of any successful analysis of qualia, knowledge of qualia, and how they fit into neurophysical causality.
It would be up to human beings - for example, the AI's programmers and handlers - to ensure that entities in the AI's causal model were given appropriate significance. And here we approach the second big problem, the enthusiasm for outsourcing the solution of hard problems of FAI design to the AI and/or to simulated human beings. The latter is a somewhat impractical idea anyway, but here I want to highlight the risk that the AI's designers will have false ontological beliefs about the nature of mind, which are then implemented apriori in the AI. That strikes me as far more likely than implanting a wrong apriori about physics; computational epistemology can discriminate usefully between different mathematical models of physics, because it can judge one state machine model as better than another, and current physical ontology is essentially one of interacting state machines. But as I have argued, not only must the true ontology be deeper than state-machine materialism, there is no way for an AI employing computational epistemology to bootstrap to a deeper ontology.
In a phrase: to use computational epistemology is to commit to state-machine materialism as your apriori ontology. And the problem with state-machine materialism is not that it models the world in terms of causal interactions between things-with-states; the problem is that it can't go any deeper than that, yet apparently we can. Something about the ontological constitution of consciousness makes it possible for us to experience existence, to have the concept of existence, to know that we are experiencing existence, and similarly for the experience of color, time, and all those other aspects of being that fit so uncomfortably into our scientific ontology.
It must be that the true epistemology, for a conscious being, is something more than computational epistemology. And maybe an AI can't bootstrap its way to knowing this expanded epistemology - because an AI doesn't really know or experience anything, only a consciousness, whether natural or artificial, does those things - but maybe a human being can. My own investigations suggest that the tradition of thought which made the most progress in this direction was the philosophical school known as transcendental phenomenology. But transcendental phenomenology is very unfashionable now, precisely because of apriori materialism. People don't see what "categorial intuition" or "adumbrations of givenness" or any of the other weird phenomenological concepts could possibly mean for an evolved Bayesian neural network; and they're right, there is no connection. But the idea that a human being is a state machine running on a distributed neural computation is just a hypothesis, and I would argue that it is a hypothesis in contradiction with so much of the phenomenological data, that we really ought to look for a more sophisticated refinement of the idea. Fortunately, 21st-century physics, if not yet neurobiology, can provide alternative hypotheses in which complexity of state originates from something other than concatenation of parts - for example, entanglement, or from topological structures in a field. In such ideas I believe we see a glimpse of the true ontology of mind, one which from the inside resembles the ontology of transcendental phenomenology; which in its mathematical, formal representation may involve structures like iterated Clifford algebras; and which in its biophysical context would appear to be describing a mass of entangled electrons in that hypothetical sweet spot, somewhere in the brain, where there's a mechanism to protect against decoherence.
Of course this is why I've talked about "monads" in the past, but my objective here is not to promote neo-monadology, that's something I need to take up with neuroscientists and biophysicists and quantum foundations people. What I wish to do here is to argue against the completeness of computational epistemology, and to caution against the rejection of phenomenological data just because it conflicts with state-machine materialism or computational epistemology. This is an argument and a warning that should be meaningful for anyone trying to make sense of their existence in the scientific cosmos, but it has a special significance for this arcane and idealistic enterprise called "friendly AI". My message for friendly AI researchers is not that computational epistemology is invalid, or that it's wrong to think about the mind as a state machine, just that all that isn't the full story. A monadic mind would be a state machine, but ontologically it would be different from the same state machine running on a network of a billion monads. You need to do the impossible one more time, and make your plans bearing in mind that the true ontology is something more than your current intellectual tools allow you to represent.
I will try to get across what I mean by calling states of consciousness "intrinsic", "objectively existing", and so forth; by describing what it would mean for them to not have these attributes.
It would mean that you only exist by convention or by definition. It would mean that there is no definite fact about whether your life is part of reality. It wouldn't just be that some models of reality acknowledge your existence and others don't; it would mean that you are nothing more than a fuzzy heuristic concept in someone else's model, and that if they switched models, you would no longer exist even in that limited sense.
I would like to think that you personally have a robust enough sense of your own reality to decisively reject such propositions. But by now, nothing would surprise me, coming from a materialist. It's been amply demonstrated that people can be willing to profess disbelief in anything and everything, if they think that's the price of believing in science. So I won't presume that you believe that you exist, I'll just hope that you do, because if you don't, it will be hard to have a sensible conversation about these topics.
But... if you do agree that you definitely exist, independently of any "model" that actual or hypothetical observers have, then it's a short step to saying that you must also have some of your properties intrinsically, rather than through model-dependent attribution. The alternative would be to say that you exist, you're a "thing", but not any particular thing; which is the sort of untenable objective vagueness that I was talking about.
The concept of an intrinsic property is arising somewhat differently here, than it does in your discussion of squares and rectangles. The idealized geometrical figures have their intrinsic properties by definition, or by logical implication from the definition. But I can say that you have intrinsic properties, not by definition (or not just by definition), but because you exist, and to be is to be something. (Also known as the "law of identity".) It would make no sense to say that you are real, but otherwise devoid of ontological definiteness.
For exactly the same reason, it would make no sense to have a fundamentally vague "physical theory of you". Here I want to define "you" as narrowly as possible - this you, in this world, even just in this moment if necessary. I don't want the identity issues of a broadly defined "you" to interfere. I hope we have agreed that you-here-now exist, that you exist objectively, that you must have some identifying or individuating properties which are also held objectively and intrinsically; the properties which make you what you are.
If we are going to be ontological materialists about you-here-now, and we are also going to acknowledge you-here-now as completely and independently real, then there also can't be any vagueness or arbitrariness about which physical object is you-here-now. For every particle - if we have particles in our physical ontology - either it is definitely a part of you-here-now, or it definitely isn't.
At this point I'm already departing radically from the standard materialist account of personhood, which would say that we can be vague about whether a few atoms are a part of you or not. The reason we can't do that, is precisely the objectivity of your existence. If you are an objectively existing entity, I can't at the same time say that you are an entity whose boundaries aren't objectively defined. For some broader notion, like "your body", sure, we can be vague about where its boundaries are. But there has to be a core notion of what you are that is correct, exact, fully objective; and the partially objective definitions of "you" come from watering down this core notion by adding inessential extra properties.
Now let's contrast this situation with the piece of lumber that is close to being a square but isn't a perfect square. My arguments against fundamental vagueness are not about insisting that the piece of lumber is a perfect square. I am merely insisting that it is what it is, and whatever it is, it is that, exactly and definitely.
The main difference between "you-here-now" and the piece of lumber, is that we don't have the same reason to think that the lumber has a hard ontological core. It's an aggregate of atoms, electrons will be streaming off it, and there will be some arbitrariness about when such an electron stops being "part of the lumber". To find indisputably objective physical facts in this situation, you probably need to talk in terms of immediate relations between elementary particles.
The evidence for a hard core in you-here-now is primarily phenomenological and secondarily logical. The phenomenological evidence is what we call the unity of experience: what's happening to you in any moment is a gestalt; it's one thing happening to one person. Your experience of the world may have fuzzy edges to it, but it's still a whole and hence objectively a unity. The logical "evidence" is just the incoherence of supposing there can be a phenomenological unity without there being an ontological unity at any level. This experiential whole may have parts, but you can't use the existence of the parts to then turn around and deny the existence of the whole.
The evidence for an ontological hard core to you-here-now does not come from physics. Physically the brain looks like it should be just like the piece of lumber, an aggregate of very many very small things. This presumption is obviously why materialists often end up regarding their own existence as something less than objective, or why the search for a microphysically exact theory of the self sounds like a mistake. Instead we are to be content with the approximations of functionalism, because that's the most you could hope to do with such an entity.
I hope it's now very clear where I'm coming from. The phenomenological and ontological arguments for a "hard core" to the self are enough to override any counterargument from physics. They tell us that a mesoscopic theory of what's going on, like functionalism, is at best incomplete; it cannot be the final word. The task is to understand the conscious brain as a biophysical system, in terms of a physical ontology that can contain "real selves". And fortunately, it's no longer the 19th century, we have quantum mechanics and the ingredients for something more sophisticated than classic atomism.
I'm going back and forth on whether to tap out here. On the one hand I feel like I'm making progress in understanding your perspective. On the other hand the progress is clarifying that it would take a large amount of time and energy to derive a vocabulary to converse in a mutually transparent way about material truth claims in this area. It had not occurred to me that pulling on the word "intrinsic" would flip the conversation into a solipsistic zone by way of Cartesian skepticism. Ooof.
Perhaps we could schedule a few hours of IM or IRC to... (read more)