Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Some people like to assume that the cosmos is ours for the taking, even though this could make us special to the order of 1 in 1080. The argument is that the cosmos could be transformed by technology - engineered on astronomical scales - but hasn't been thus transformed.
The most common alternative hypothesis is that "we are in a simulation". Perhaps we are. But there are other possibilities too.
One is that technological life usually destroys, not just its homeworld, but its whole bubble of space-time, by using high-energy physics to cause a "vacuum decay", in which physics changes in a way that makes space uninhabitable. For example, the mass of an elementary particle is essentially equal to the energy density of the Higgs field, times a quantity called a "yukawa coupling". If the Higgs field increased its energy density by orders of magnitude, but the yukawas stayed the same, matter as we know it would be destroyed, everywhere that the change spread.
Here I want to highlight a different possibility. The idea is that the universe contains very large lifeforms and very small lifeforms. We are among the small. The large ones are, let's say, mostly dark matter, galactic in scale, and stars and planets for them are like biomolecules for us; tiny functional elements which go together to make up the whole. And - the crucial part - they have immune systems which automatically crush anything which interferes with the natural celestial order.
This is why the skies are full of untamed stars rather than Dyson spheres - any small life which begins to act on that scale is destroyed by dark-matter antibodies. And it explains anthropically why you're human-size rather than galactic-size: small life is more numerous than large life, just not so numerous as cosmic colonization would imply.
Two questions arise - how did large life evolve, and, shouldn't anthropics favor universes which have no large life, just space-colonizing small life? I could spin a story about cosmological natural selection, and large life which uses small life to reproduce, but it doesn't really answer the second question, in particular. Still, I feel that this is a huge unexplored topic - the anthropic consequences of "biocosmic" ecology and evolution - and who knows what else is lurking here, waiting to be discovered?
Although it regularly discusses the possibility of superintelligences with the power to transform the universe in the service of some value system - whether that value system is paperclip maximization or some elusive extrapolation of human values - it seems that Less Wrong has never systematically discussed the possibility that we are already within the domain of some superintelligence, and what that would imply. So how about it? What are the possibilities, what are the probabilities, and how should they affect our choices?
Quantum field theory (QFT) is the basic framework of particle physics. Particles arise from the quantized energy levels of field oscillations; Feynman diagrams are the simple tool for approximating their interactions. The "standard model", the success of which is capped by the recent observation of a Higgs boson lookalike, is a quantum field theory.
But just like everything mathematical, quantum field theory has hidden depths. For the past decade, new pictures of the quantum scattering process (in which particles come together, interact, and then fly apart) have incrementally been developed, and they presage a transformation in the understanding of what a QFT describes.
At the center of this evolution is "N=4 super-Yang-Mills theory", the maximally supersymmetric QFT in four dimensions. I want to emphasize that from a standard QFT perspective, this theory contains nothing but scalar particles (like the Higgs), spin-1/2 fermions (like electrons or quarks), and spin-1 "gauge fields" (like photons and gluons). The ingredients aren't something alien to real physics. What distinguishes an N=4 theory is that the particle spectrum and the interactions are arranged so as to produce a highly extended form of supersymmetry, in which particles have multiple partners (so many LWers should be comfortable with the notion).
In 1997, Juan Maldacena discovered that the N=4 theory is equivalent to a type of string theory in a particular higher-dimensional space. In 2003, Edward Witten discovered that it is also equivalent to a different type of string theory in a supersymmetric version of Roger Penrose's twistor space. Those insights didn't come from nowhere, they explained algebraic facts that had been known for many years; and they have led to a still-accumulating stockpile of discoveries about the properties of N=4 field theory.
What we can say is that the physical processes appearing in the theory can be understood as taking place in either of two dual space-time descriptions. Each space-time has its own version of a particular large symmetry, "superconformal symmetry", and the superconformal symmetry of one space-time is invisible in the other. And now it is becoming apparent that there is a third description, which does not involve space-time at all, in which both superconformal symmetries are manifest, but in which space-time locality and quantum unitarity are not "visible" - that is, they are not manifest in the equations that define the theory in this third picture.
I cannot provide an authoritative account of how the new picture works. But here is my impression. In the third picture, the scattering processes of the space-time picture become a complex of polytopes - higher-dimensional polyhedra, joined at their faces - and the quantum measure becomes the volume of these polyhedra. Where you previously had particles, you now just have the dimensions of the polytopes; and the fact that in general, an n-dimensional space doesn't have n special directions suggests to me that multi-particle entanglements can be something more fundamental than the separate particles that we resolve them into.
It will be especially interesting to see whether this polytope combinatorics, that can give back the scattering probabilities calculated with Feynman diagrams in the usual picture, can work solely with ordinary probabilities. That was Penrose's objective, almost fifty years ago, when he developed the theory of "spin networks" as a new language for the angular momentum calculations of quantum theory, and which was a step towards the twistor variables now playing an essential role in these new developments. If the probability calculus of quantum mechanics can be obtained from conventional probability theory applied to these "structures" that may underlie familiar space-time, then that would mean that superposition does not need to be regarded as ontological.
I'm talking about this now because a group of researchers around Nima Arkani-Hamed, who are among the leaders in this area, released their first paper in a year this week. It's very new, and so arcane that, among physics bloggers, only Lubos Motl has talked about it.
This is still just one step in a journey. Not only does the paper focus on the N=4 theory - which is not the theory of the real world - but the results only apply to part of the N=4 theory, the so-called "planar" part, described by Feynman diagrams with a planar topology. (For an impressionistic glimpse of what might lie ahead, you could try this paper, whose author has been shouting from the wilderness for years that categorical knot theory is the missing piece of the puzzle.)
The N=4 theory is not reality, but the new perspective should generalize. Present-day calculations in QCD already employ truncated versions of the N=4 theory; and Arkani-Hamed et al specifically mention another supersymmetric field theory (known as ABJM after the initials of its authors), a deformation of which is holographically dual to a theory-of-everything candidate from 1983.
When it comes to seeing reality in this new way, we still only have, at best, a fruitful chaos of ideas and possibilities. But the solid results - the mathematical equivalences - will continue to pile up, and the end product really ought to be nothing less than a new conception of how physics works.
I visited #lesswrong on freenode yesterday and was able to get in some discussion of FAI-related matters. But that channel also exists to allow discussion of rationalist fanfiction, political opinions, and whatever else people want to talk about.
I would like for there to be a place on that network where the topic actually is Friendly AI - where you can go to brainstorm, and maybe you'll have to wait because they're already talking about cognitive neuroscience or automated theorem provers, but not because they're talking about ponies or politics.
Surely there are enough people with a serious, technical interest in FAI and related topics (and I don't just mean among LW regulars) to make such a channel sustainable. I'll bet that there are other people holding back from participation precisely because existing forums are so full of uninformed noise and conversational tangents. It's inevitable that entropy would set in after a while, but if the default baseline was still that even the chatter was technically informed and focused on what's coming - that would be mission accomplished.
I explored the freenode namespace a little. #FAI redirects to #unavailable, so it may be an abandoned project. #AGI exists but is invite-only. #AI exists but I'm told it's dull, and besides, the agenda here is meant to be, not just AI, but singularity-relevant AI. So there seems to be an opening. Or am I reinventing the wheel?
In discussing scenarios of the future, I speak of "slow futures" and "fast futures". A fast future is exemplified by what is now called a hard takeoff singularity: something bootstraps its way to superhuman intelligence in a short time. A slow future is a continuation of history as we know it: decades pass and the world changes, with new politics, culture, and technology. To some extent the Hanson vs Yudkowsky debate was about slow vs fast; Robin's future is fast-moving, but on the way there, there's never an event in which some single "agent" becomes all-powerful by getting ahead of all others.
The Singularity Institute does many things, but I take its core agenda to be about a fast scenario. The theoretical objective is to design an AI which would still be friendly if it became all-powerful. There is also the practical objective of ensuring that the first AI across the self-enhancement threshold is friendly. One way to do that is to be the one who makes it, but that's asking a lot. Another way is to have enough FAI design and FAI theory out there, that the people who do win the mind race will have known about it and will have taken it into consideration. Then there are mixed strategies, such as working on FAI theory while liaising with known AI projects that are contenders in the race and whose principals are receptive to the idea of friendliness.
I recently criticised a lot of the ideas that circulate in conjunction with the concept of friendly AI. The "sober" ideas and the "extreme" ideas have a certain correlation with slow-future and fast-future scenarios, respectively. The sober future is a slow one where AIs exist and posthumanity expands into space, but history, politics, and finitude aren't transcended. The extreme future is a fast one where one day the ingredients for a hard takeoff are brought together in one place, an artificial god is born, and, depending on its inclinations and on the nature of reality, something transcendental happens: everyone uploads to the Planck scale, our local overmind reaches out to other realities, we "live forever and remember it afterwards".
Although I have criticised such transcendentalism, saying that it should not be the default expectation of the future, I do think that the "hard takeoff" and the "all-powerful agent" would be among the strategic considerations in an ideal plan for the future, though in a rather broader sense than is usually discussed. The reason is that if one day Earth is being ruled by, say, a coalition of AIs with a particular value system, with natural humans reduced to the status of wildlife, then the functional equivalent of a singularity has occurred, even if these AIs have no intention of going on to conquer the galaxy; and I regard that as a quite conceivable scenario. It is fantastic (in the sense of mind-boggling), but it's not transcendental. All the scenario implies is that the human race is no longer at the top of the heap; it has successors and they are now in charge.
But we can view those successors as, collectively, the "all-powerful agent" that has replaced human hegemony. And we can regard the events, whatever they were, that first gave the original such entities their unbeatable advantage in power, as the "hard takeoff" of this scenario. So even a slow, sober future scenario can issue in a singularity where the basic premises and motivations of existing FAI research apply. It's just that one might need to be imaginative in anticipating how they are realized.
For example, perhaps hegemonic superintelligence could emerge, not from a single powerful AI research program, but from a particular clique of networked neurohackers who have the right combination of collaborative tools, brain interfaces, and concrete plans for achieving transhuman intelligence. They might go on to build an army of AIs, and subdue the world that way, but the crucial steps which made them the winners in the mind race, and which determined what they would do with their victory, would lie in their methods of brain modification, enhancement, and interfacing, and in the ends to which they applied those methods.
In such a scenario, we could speak of "FIA" - friendly intelligence augmentation. A basic idea of existing FAI discourse is that the true human utility function needs to be determined, and then the values that make an AI human-friendly would be extrapolated from that. Similar thinking can be applied to the prospect of brain modification and intelligence increase in human beings. Human brains work a certain way, modified or augmented human brains will work in specifically different ways, and we should want to know which modifications are genuinely enhancements, what sort of modifications stabilize value and which ones destabilize value, and so on.
If there was a mature and sophisticated culture of preparing for the singularity, then there would be FAI research, FIA research, and a lot of communication between the two fields. (For example, researchers in both fields need to figure out how the human brain works.) Instead, the biggest enthusiasts of FAI are a futurist subculture with a lot of conceptual baggage, and FIA is nonexistent. However, we can at least start thinking and discussing about how this broader culture of research into "friendly minds" could take shape.
Despite its flaws, the Singularity Institute stands alone as an organization concerned with the fast future scenario, the hard takeoff. I have argued that a sober futurology, while forecasting a slowly evolving future for some time to come, must ultimately concern itself with the emergence of a posthuman power arising from some cognitive technology, whether that is AI, neurotechnology, or a combination of these. So I have asked myself who, among "slow futurists", is best equipped to develop an outlook and a plan which is sober and realistic, yet also visionary enough to accommodate the really overwhelming responsibility of designing the architecture of friendly posthuman minds capable of managing a future that we would want.
At the moment, my favorites in this respect are the various branches, scattered around the world, of the Longevity Party that was started in Russia a few months ago. (It shouldn't be confused with "Evolution 2045", a big-budget rival backed by an Internet entrepreneur, that especially promotes mind uploading. For some reason, transhumanist politics has begun to stir in that country.) If the Singularity Institute falls short of the ideal, then the "longevity parties" are even further away from living up to their ambitious agenda. Outside of Russia, they are mostly just small Facebook groups; the most basic issues of policy and practice are still being worked out; no-one involved has much of a history of political achievement.
Nonetheless, if there were no prospect of singularity but otherwise science and technology were advancing as they are, the agenda here looks just about ideal. People age and decline until it kills them, an extrapolation of biomedical knowledge suggests this is not a law of nature but just a sign of primitive technology, and the Longevity Party exists to rectify this situation. It's visionary and despite the current immaturity and growing pains, an effective longevity politics must arise one day, simply because the advance of technology will force the issue on us! The human race cannot currently muster enough will to live, to openly make rejuvenation a political goal, but the incremental pursuit of health and well-being is taking us in that direction anyway.
There's a vacuum of authority and intention in the realm of life extension, and transhuman technology generally, and these would-be longevity politicians are stepping into that vacuum. I don't think they are ready for all the issues that transhuman power entails, but the process has to start somewhere. Faced with the infinite possibilities of technological transformation, the basic affirmation of the desire to live as well as reality permits, can serve as a founding principle against which to judge attitudes and approaches for all the more complicated "issues" that arise in a world where anyone can become anything.
Maria Konovalenko, a biomedical researcher and one of the prime movers behind the Russian Longevity Party, wrote an essay setting out her version of how the world ought to work. You'll notice that she manages to include friendly AI on her agenda. This is another example, a humble beginning, of the sort of conceptual development which I think needs to happen. The sort of approach to FAI that Eliezer has pioneered needs a context, a broader culture concerned with FIA and the interplay between neuroscience and pure AI, and we need realistic yet visionary political thinking which encompasses both the shocking potentials of a slow future, above all rejuvenation and the conquest of aging, and the singularity imperative.
Unless there is simply a catastrophe, one day someone, some thing, some coalition will wield transhuman power. It may begin as a corporation, or as a specific technological research subculture, or as the peak political body in a sovereign state. Perhaps it will be part of a broader global culture of "competitors in the mind race" who know about each other and recognize each other as contenders for the first across the line. Perhaps there will be coalitions in the race: contenders who agree on the need for friendliness and the form it should take, and others who are pursuing private power, or who are just pushing AI ahead without too much concern for the transformation of the world that will result. Perhaps there will be a war as one contender begins to visibly pull ahead, and others resort to force to stop them.
But without a final and total catastrophe, however much slow history there remains ahead of us, eventually someone or something will "win", and after that the world will be reshaped according to its values and priorities. We don't need to imagine this as "tiling the universe"; it should be enough to think of it as a ubiquitous posthuman political order, in which all intelligent agents are either kept so powerless as to not be a threat, or managed and modified so as to be reliably friendly to whatever the governing civilizational values are. I see no alternative to this if we are looking for a stable long-term way of living in which ultimate technological powers exist; the ultimate powers of coercion and destruction can't be left lying around, to be taken up by entities with arbitrary values.
So the supreme challenge is to conceive of a social and technological order where that power exists, and is used, but it's still a world that we want to live in. FAI is part of the answer, but so is FIA, and so is the development of political concepts and projects which can encompass such an agenda. The Singularity Institute and the Longevity Party are fledgling institutions, and if they live they will surely, eventually, form ties with older and more established bodies; but right now, they seem to be the crucial nuclei of the theoretical research and the political vision that we need.
And I don't mean that they must concern themselves with death in the sense of ending death, or removing its sting through mental backups, or delaying it to the later ages of the universe; or in the sense of working to decrease the probability of extinction risks and other forms of megadeath; or even in the sense of saving as many lives as possible, as efficiently as possible. All of that is legitimate and interesting. But I mean something far more down to earth.
First, let me specify more precisely who I am talking about. I mean people who are trying to maximize the general welfare; who are trying to achieve the greatest good for the greatest number; who are trying to do the best thing possible with their lives. When someone like that makes decisions, they are implicitly choosing among possible futures in a very radical way. They may be making judgments about whether a future with millions or billions of extra lives is better than some alternative. Whether anyone is ever in a position to make that much of a difference is another matter; but we can think of it like voting. You are at least making a statement about which sort of future you think you prefer, and then you do what you can, and that either makes a difference or it doesn't.
It seems to me that the discussions about the value of life among utilitarians are rather superficial. The typical notion is that we should maximize net pleasure and minimize net pain. Already that poses the question of whether a life of dull persistent happiness is better or worse than a life of extreme highs and lows. A more sophisticated notion is that we should just aspire to maximize "utility", where perhaps we don't even know what utility is yet. Certainly the CEV philosophy is that we don't yet know what utility really is for human beings. It would be interesting to see people who took that agnosticism to heart, people whose life-strategy amounted to (1) discovering true utility as soon as possible (2) living according to interim heuristics whose uncertainty is recognized, but which are adopted out of the necessity of having some sort of personal decision procedure.
So what I'm going to say pertains to (2). You may, if you wish, hold to the idea that the nature of true utility, like true friendliness, won't be known until the true workings of the human mind are known. What follows is something you should think on in order to refine your interim heuristics.
The first thing is that to create a life is to create a death. A life ends. And while the end of a life may not be its most important moment, it reminds us that a life is a whole. Any accurate estimation of the utility of a life is going to be a judgment of that whole.
So a utilitarian ought to contemplate the deaths of the world, and the lives that reach their ends in those deaths. Because the possible futures, that you wish to choose between, are distinguished by the number and nature of the whole lives that they contain. And all these dozens of people, all around the world of the present, ceasing to exist in every minute that passes, are examples of completed lives. Those lives weren't necessarily complete, in the sense of all personal desires and projects having come to their conclusion; but they came to their physical completion.
To choose one future over another is to prefer one set of completed lives to another set. It would be a godlike decision to truly be solely responsible for such a choice. In the real world, people hardly choose their own futures, let alone the future of the world; choice is a lifelong engagement with an evolving and partially known situation, not a once-off choice between several completely known scenarios; and even when a single person does end up being massively influential, they generally don't know what sort of future they're bringing about. The actual limitations on the knowledge and power of any individual may make the whole quest of the "ambitious utilitarian" seem quixotic. But a new principle, a new heuristic, can propagate far beyond one individual, so thinking big can have big consequences.
The main principle that I derive, from contemplating the completed lives of the world, is cautionary antinatalism. The badness of what can happen in a life, and the disappointing character of what usually happens, are what do it for me. I am all for the transhumanist quest and the struggle for a friendly singularity, and I support the desire of people who are already alive to make the most of that life. But I would recommend against the creation of life, at least until the current historical drama has played itself out - until the singularity, if I must use that word. We are in the process of gaining new powers and learning new things, there are obvious unknowns in front of us that we are on the way to figuring out, so at least hold off until they have been figured out and we have a better idea of what reality is about, and what we can really hope for, from existence.
However, the object of this post is not to argue for my special flavor of antinatalism. It is to encourage realistic consideration of what lives and futures are like. In particular, I would encourage more "story thinking", which has been criticized in favor of "systems thinking". Every actual life is a "story", in the sense of being a sequence of events that happens to someone. If you were judging the merit of a whole possible world on the basis of the whole lives that it contained, then you would be making a decision about whether those stories ought to actually occur. The biographical life-story is the building block of such possible worlds.
So an ambitious utilitarian, who aspires to have a set of criteria for deciding among whole possible worlds, really needs to understand possible lives. They need to know what sort of lives are likely under various circumstances; they need to know the nature of the different possible lives - what it's like to be that person; they need to know what sort of bad is going to accompany the sort of good that they decide to champion. They need to have some estimation of the value of a whole life, up to and including its death.
As usual, we are talking about a depth of knowledge that may in practice be impossible to attain. But before we go calling something impossible, and settling for a lesser ambition, let's at least try to grasp what the greater ambition truly entails. To truly choose a whole world would be to make the decision of a god, about the lives and deaths that will occur in that world. The future of our world, for some time to come, will repeat the sorts of lives and deaths that have already occurred in it. So if, in your world-planning, you don't just count on completely abolishing the present world and/or replacing it with a new one that works in a completely different way, you owe it to your cause to form a judgement about the totality of what has already happened here on Earth, and you need to figure out what you approve of, what you disapprove of, whether you can have the good without the bad, and how much badness is too much.
The project of Friendly AI would benefit from being approached in a much more down-to-earth way. Discourse about the subject seems to be dominated by a set of possibilities which are given far too much credence:
- A single AI will take over the world
- A future galactic civilization depends on 21st-century Earth
- 10n-year lifespans are at stake, n greater than or equal to 3
- We might be living in a simulation
- Acausal deal-making
- Multiverse theory
Add up all of that, and you have a great recipe for enjoyable irrelevance. Negate every single one of those ideas, and you have an alternative set of working assumptions that are still consistent with the idea that Friendly AI matters, and which are much more suited to practical success:
- There will always be multiple centers of power
- What's at stake is, at most, the future centuries of a solar-system civilization
- No assumption that individual humans can survive even for hundreds of years, or that they would want to
- Assume that the visible world is the real world
- Assume that life and intelligence are about causal interaction
- Assume that the single visible world is the only world we affect or have reason to care about
The simplest reason to care about Friendly AI is that we are going to be coexisting with AI, and so we should want it to be something we can live with. I don't see that anything important would be lost by strongly foregrounding the second set of assumptions, and treating the first set of possibilities just as possibilities, rather than as the working hypothesis about reality.
This article should really be called "Patching the argumentative flaw in the Sequences created by the Quantum Physics Sequence".
There's only one big thing wrong with that Sequence: the central factual claim is wrong. I don't mean the claim that the Many Worlds interpretation is correct; I mean the claim that the Many Worlds interpretation is obviously correct. I don't agree with the ontological claim either, but I especially don't agree with the epistemological claim. It's a strawman which reduces the quantum debate to Everett versus Bohr - well, it's not really Bohr, since Bohr didn't believe wavefunctions were physical entities. Everett versus Collapse, then.
I've complained about this from the beginning, simply because I've also studied the topic and profoundly disagree with Eliezer's assessment. What I would like to see discussed on this occasion is not the physics, but rather how to patch the arguments in the Sequences that depend on this wrong sub-argument. To my eyes, this is a highly visible flaw, but it's not a deep one. It's a detail, a bug. Surely it affects nothing of substance.
However, before I proceed, I'd better back up my criticism. So: consider the existence of single-world retrocausal interpretations of quantum mechanics, such as John Cramer's transactional interpretation, which is descended from Wheeler-Feynman absorber theory. There are no superpositions, only causal chains running forward in time and backward in time. The calculus of complex-valued probability amplitudes is supposed to arise from this.
The existence of the retrocausal tradition already shows that the debate has been represented incorrectly; it should at least be Everett versus Bohr versus Cramer. I would also argue that when you look at the details, many-worlds has no discernible edge over single-world retrocausality:
- Relativity isn't an issue for the transactional interpretation: causality forwards and causality backwards are both local, it's the existence of loops in time which create the appearance of nonlocality.
- Retrocausal interpretations don't have an exact derivation of the Born rule, but neither does many-worlds.
- Many-worlds finds hope of such a derivation in a property of the quantum formalism: the resemblance of density matrix entries to probabilities. But single-world retrocausality finds such hope too: the Born probabilities can be obtained from the product of ψ with ψ*, its complex conjugate, and ψ* is the time reverse of ψ.
- Loops in time just fundamentally bug some people, but splitting worlds have the same effect on others.
I am not especially an advocate of retrocausal interpretations. They are among the possibilities; they deserve consideration and they get it. Retrocausality may or may not be an element of the real explanation of why quantum mechanics works. Progress towards the discovery of the truth requires exploration on many fronts, that's happening, we'll get there eventually. I have focused on retrocausal interpretations here just because they offer the clearest evidence that the big picture offered by the Sequence is wrong.
It's hopeless to suggest rewriting the Sequence, I don't think that would be a good use of anyone's time. But what I would like to have, is a clear idea of the role that "the winner is ... Many Worlds!" plays in the overall flow of argument, in the great meta-sequence that is Less Wrong's foundational text; and I would also like to have a clear idea of how to patch the argument, so that it routes around this flaw.
In the wiki, it states that "Cleaning up the old confusion about QM is used to introduce basic issues in rationality (such as the technical version of Occam's Razor), epistemology, reductionism, naturalism, and philosophy of science." So there we have it - a synopsis of the function that this Sequence is supposed to perform. Perhaps we need a working group that will identify each of the individual arguments, and come up with a substitute for each one.
Very soon, Eliezer is supposed to start posting a new sequence, on "Open Problems in Friendly AI". After several years in which its activities were dominated by the topic of human rationality, this ought to mark the beginning of a new phase for the Singularity Institute, one in which it is visibly working on artificial intelligence once again. If everything comes together, then it will now be a straight line from here to the end.
I foresee that, once the new sequence gets going, it won't be that easy to question the framework in terms of which the problems are posed. So I consider this my last opportunity for some time, to set out an alternative big picture. It's a framework in which all those rigorous mathematical and computational issues still need to be investigated, so a lot of "orthodox" ideas about Friendly AI should carry across. But the context is different, and it makes a difference.
Begin with the really big picture. What would it take to produce a friendly singularity? You need to find the true ontology, find the true morality, and win the intelligence race. For example, if your Friendly AI was to be an expected utility maximizer, it would need to model the world correctly ("true ontology"), value the world correctly ("true morality"), and it would need to outsmart its opponents ("win the intelligence race").
Now let's consider how SI will approach these goals.
The evidence says that the working ontological hypothesis of SI-associated researchers will be timeless many-worlds quantum mechanics, possibly embedded in a "Tegmark Level IV multiverse", with the auxiliary hypothesis that algorithms can "feel like something from inside" and that this is what conscious experience is.
The true morality is to be found by understanding the true decision procedure employed by human beings, and idealizing it according to criteria implicit in that procedure. That is, one would seek to understand conceptually the physical and cognitive causation at work in concrete human choices, both conscious and unconscious, with the expectation that there will be a crisp, complex, and specific answer to the question "why and how do humans make the choices that they do?" Undoubtedly there would be some biological variation, and there would also be significant elements of the "human decision procedure", as instantiated in any specific individual, which are set by experience and by culture, rather than by genetics. Nonetheless one expects that there is something like a specific algorithm or algorithm-template here, which is part of the standard Homo sapiens cognitive package and biological design; just another anatomical feature, particular to our species.
Having reconstructed this algorithm via scientific analysis of human genome, brain, and behavior, one would then idealize it using its own criteria. This algorithm defines the de-facto value system that human beings employ, but that is not necessarily the value system they would wish to employ; nonetheless, human self-dissatisfaction also arises from the use of this algorithm to judge ourselves. So it contains the seeds of its own improvement. The value system of a Friendly AI is to be obtained from the recursive self-improvement of the natural human decision procedure.
Finally, this is all for naught if seriously unfriendly AI appears first. It isn't good enough just to have the right goals, you must be able to carry them out. In the global race towards artificial general intelligence, SI might hope to "win" either by being the first to achieve AGI, or by having its prescriptions adopted by those who do first achieve AGI. They have some in-house competence regarding models of universal AI like AIXI, and they have many contacts in the world of AGI research, so they're at least engaged with this aspect of the problem.
Upon examining this tentative reconstruction of SI's game-plan, I find I have two major reservations. The big one, and the one most difficult to convey, concerns the ontological assumptions. In second place is what I see as an undue emphasis on the idea of outsourcing the methodological and design problems of FAI research to uploaded researchers and/or a proto-FAI which is simulating or modeling human researchers. This is supposed to be a way to finesse philosophical difficulties like "what is consciousness anyway"; you just simulate some humans until they agree that they have solved the problem. The reasoning goes that if the simulation is good enough, it will be just as good as if ordinary non-simulated humans solved it.
I also used to have a third major criticism, that the big SI focus on rationality outreach was a mistake; but it brought in a lot of new people, and in any case that phase is ending, with the creation of CFAR, a separate organization. So we are down to two basic criticisms.
First, "ontology". I do not think that SI intends to just program its AI with an apriori belief in the Everett multiverse, for two reasons. First, like anyone else, their ventures into AI will surely begin with programs that work within very limited and more down-to-earth ontological domains. Second, at least some of the AI's world-model ought to be obtained rationally. Scientific theories are supposed to be rationally justified, e.g. by their capacity to make successful predictions, and one would prefer that the AI's ontology results from the employment of its epistemology, rather than just being an axiom; not least because we want it to be able to question that ontology, should the evidence begin to count against it.
For this reason, although I have campaigned against many-worlds dogmatism on this site for several years, I'm not especially concerned about the possibility of SI producing an AI that is "dogmatic" in this way. For an AI to independently assess the merits of rival physical theories, the theories would need to be expressed with much more precision than they have been in LW's debates, and the disagreements about which theory is rationally favored would be replaced with objectively resolvable choices among exactly specified models.
The real problem, which is not just SI's problem, but a chronic and worsening problem of intellectual culture in the era of mathematically formalized science, is a dwindling of the ontological options to materialism, platonism, or an unstable combination of the two, and a similar restriction of epistemology to computation.
Any assertion that we need an ontology beyond materialism (or physicalism or naturalism) is liable to be immediately rejected by this audience, so I shall immediately explain what I mean. It's just the usual problem of "qualia". There are qualities which are part of reality - we know this because they are part of experience, and experience is part of reality - but which are not part of our physical description of reality. The problematic "belief in materialism" is actually the belief in the completeness of current materialist ontology, a belief which prevents people from seeing any need to consider radical or exotic solutions to the qualia problem. There is every reason to think that the world-picture arising from a correct solution to that problem will still be one in which you have "things with states" causally interacting with other "things with states", and a sensible materialist shouldn't find that objectionable.
What I mean by platonism, is an ontology which reifies mathematical or computational abstractions, and says that they are the stuff of reality. Thus assertions that reality is a computer program, or a Hilbert space. Once again, the qualia are absent; but in this case, instead of the deficient ontology being based on supposing that there is nothing but particles, it's based on supposing that there is nothing but the intellectual constructs used to model the world.
Although the abstract concept of a computer program (the abstractly conceived state machine which it instantiates) does not contain qualia, people often treat programs as having mind-like qualities, especially by imbuing them with semantics - the states of the program are conceived to be "about" something, just like thoughts are. And thus computation has been the way in which materialism has tried to restore the mind to a place in its ontology. This is the unstable combination of materialism and platonism to which I referred. It's unstable because it's not a real solution, though it can live unexamined for a long time in a person's belief system.
An ontology which genuinely contains qualia will nonetheless still contain "things with states" undergoing state transitions, so there will be state machines, and consequently, computational concepts will still be valid, they will still have a place in the description of reality. But the computational description is an abstraction; the ontological essence of the state plays no part in this description; only its causal role in the network of possible states matters for computation. The attempt to make computation the foundation of an ontology of mind is therefore proceeding in the wrong direction.
But here we run up against the hazards of computational epistemology, which is playing such a central role in artificial intelligence. Computational epistemology is good at identifying the minimal state machine which could have produced the data. But it cannot by itself tell you what those states are "like". It can only say that X was probably caused by a Y that was itself caused by Z.
Among the properties of human consciousness are knowledge that something exists, knowledge that consciousness exists, and a long string of other facts about the nature of what we experience. Even if an AI scientist employing a computational epistemology managed to produce a model of the world which correctly identified the causal relations between consciousness, its knowledge, and the objects of its knowledge, the AI scientist would not know that its X, Y, and Z refer to, say, "knowledge of existence", "experience of existence", and "existence". The same might be said of any successful analysis of qualia, knowledge of qualia, and how they fit into neurophysical causality.
It would be up to human beings - for example, the AI's programmers and handlers - to ensure that entities in the AI's causal model were given appropriate significance. And here we approach the second big problem, the enthusiasm for outsourcing the solution of hard problems of FAI design to the AI and/or to simulated human beings. The latter is a somewhat impractical idea anyway, but here I want to highlight the risk that the AI's designers will have false ontological beliefs about the nature of mind, which are then implemented apriori in the AI. That strikes me as far more likely than implanting a wrong apriori about physics; computational epistemology can discriminate usefully between different mathematical models of physics, because it can judge one state machine model as better than another, and current physical ontology is essentially one of interacting state machines. But as I have argued, not only must the true ontology be deeper than state-machine materialism, there is no way for an AI employing computational epistemology to bootstrap to a deeper ontology.
In a phrase: to use computational epistemology is to commit to state-machine materialism as your apriori ontology. And the problem with state-machine materialism is not that it models the world in terms of causal interactions between things-with-states; the problem is that it can't go any deeper than that, yet apparently we can. Something about the ontological constitution of consciousness makes it possible for us to experience existence, to have the concept of existence, to know that we are experiencing existence, and similarly for the experience of color, time, and all those other aspects of being that fit so uncomfortably into our scientific ontology.
It must be that the true epistemology, for a conscious being, is something more than computational epistemology. And maybe an AI can't bootstrap its way to knowing this expanded epistemology - because an AI doesn't really know or experience anything, only a consciousness, whether natural or artificial, does those things - but maybe a human being can. My own investigations suggest that the tradition of thought which made the most progress in this direction was the philosophical school known as transcendental phenomenology. But transcendental phenomenology is very unfashionable now, precisely because of apriori materialism. People don't see what "categorial intuition" or "adumbrations of givenness" or any of the other weird phenomenological concepts could possibly mean for an evolved Bayesian neural network; and they're right, there is no connection. But the idea that a human being is a state machine running on a distributed neural computation is just a hypothesis, and I would argue that it is a hypothesis in contradiction with so much of the phenomenological data, that we really ought to look for a more sophisticated refinement of the idea. Fortunately, 21st-century physics, if not yet neurobiology, can provide alternative hypotheses in which complexity of state originates from something other than concatenation of parts - for example, entanglement, or from topological structures in a field. In such ideas I believe we see a glimpse of the true ontology of mind, one which from the inside resembles the ontology of transcendental phenomenology; which in its mathematical, formal representation may involve structures like iterated Clifford algebras; and which in its biophysical context would appear to be describing a mass of entangled electrons in that hypothetical sweet spot, somewhere in the brain, where there's a mechanism to protect against decoherence.
Of course this is why I've talked about "monads" in the past, but my objective here is not to promote neo-monadology, that's something I need to take up with neuroscientists and biophysicists and quantum foundations people. What I wish to do here is to argue against the completeness of computational epistemology, and to caution against the rejection of phenomenological data just because it conflicts with state-machine materialism or computational epistemology. This is an argument and a warning that should be meaningful for anyone trying to make sense of their existence in the scientific cosmos, but it has a special significance for this arcane and idealistic enterprise called "friendly AI". My message for friendly AI researchers is not that computational epistemology is invalid, or that it's wrong to think about the mind as a state machine, just that all that isn't the full story. A monadic mind would be a state machine, but ontologically it would be different from the same state machine running on a network of a billion monads. You need to do the impossible one more time, and make your plans bearing in mind that the true ontology is something more than your current intellectual tools allow you to represent.
Celia Green is a figure who should interest some LW readers. If you can imagine Eliezer, not as an A.I. futurist in 2000s America, but as a parapsychologist in 1960s Britain - she must have been a little like that. She founded her own research institute in her mid-20s, invented psychological theories meant to explain why the human race was walking around resigned to mortality and ignorance, felt that her peers (who got all the research money) were doing everything wrong... I would say that her two outstanding books are The Human Evasion and Advice to Clever Children. The first book, while still very obscure, has slowly acquired a fanbase online; but the second book remains thoroughly unknown.
For a synopsis of what the books are about, I think something I wrote in 1993 (I've been promoting her work on the Internet for years) remains reasonable. They contain an analysis of the alleged deficiencies and hidden motivations of normal human psychology, description of an alternative outlook, and an examination of various topics from that new perspective. There is some similarity to the rationalist ideal developed in the Sequences here, in that her alternative involves existential urgency, deep respect for uncertainty, and superhuman aspiration.
There are also prominent differences. Green's starting point is not Bayesian calculation, it's Humean skepticism. Green would agree that one should aspire to "think like reality", but for her this would mean, above all, being mindful of "total uncertainty". It's a fact that I don't know what comes next, that I don't know the true nature of reality, that I don't know what's possible if I try; I may have habitual opinions about these matters, but a moment's honest reflection shows that none of these opinions are knowledge in any genuine sense; even if they are correct, I don't know them to be correct. So if I am interested in thinking like reality, I can begin by acknowledging the radical uncertainty of my situation. I exist, I don't know why, I don't know what I am, I don't know what the world is or what it has planned for me. I may have my ideas, but I should be able to see them as ideas and hold them apart from the unknown reality.
If you are like me, you will enjoy the outlook of open-ended striving that Green develops in this intellectual context, but you will be jarred by her account of ordinary, non-striving psychology. Her answer to the question, why does the human race have such petty interests and limited ambitions, is that it is sunk in an orgy of mutual hatred, mostly disguised, and resulting from an attempt to evade the psychology of striving. More precisely, to be a finite human being is to be in a desperate and frustrating situation; and people attempt to solve this problem, not by overcoming their limitations, but by suppressing their reactions to the situation. Other people are central to the resulting psychological maneuvers. They are a way for you to distract yourself from your own situation, and they are a safe target if the existential frustration and desperation reassert themselves.
Celia Green's psychological ideas are the product of her personal confrontation with the mysterious existential situation, and also her confrontation with an uncomprehending society. I've thought for some time that her portrayal of universal human depravity results from overestimating the potential of the average human being; that in effect she has asked herself, if I were that person, how could I possibly lead the life I see them living, and say the things I hear them saying, unless I were that twisted up inside? Nonetheless, I do think she has described an aspect of human psychology which is real and largely unexamined, and also that her advice on how to avoid the resentful turning-away from reality, and live in the uncertainty, is quite profound. One reason I'm promoting these books is in the hope that some small part of the culture at large is finally ready to digest their contents and critically assess them. People ought to be doing PhDs on the thought of Celia Green, but she's unknown in that world.
As for Celia Green herself, she's still alive and still going. She has a blog and a personal website and an organization based near Oxford. She's an "academic exile", but true to her philosophy, she hasn't compromised one iota and hopes to start her own private university. She may especially be of interest to the metaphysically inclined faction of LW readers, identified by Yvain in a recent blog post.
View more: Next