You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

FAI, FIA, and singularity politics

12 Mitchell_Porter 08 November 2012 05:11PM

In discussing scenarios of the future, I speak of "slow futures" and "fast futures". A fast future is exemplified by what is now called a hard takeoff singularity: something bootstraps its way to superhuman intelligence in a short time. A slow future is a continuation of history as we know it: decades pass and the world changes, with new politics, culture, and technology. To some extent the Hanson vs Yudkowsky debate was about slow vs fast; Robin's future is fast-moving, but on the way there, there's never an event in which some single "agent" becomes all-powerful by getting ahead of all others.

The Singularity Institute does many things, but I take its core agenda to be about a fast scenario. The theoretical objective is to design an AI which would still be friendly if it became all-powerful. There is also the practical objective of ensuring that the first AI across the self-enhancement threshold is friendly. One way to do that is to be the one who makes it, but that's asking a lot. Another way is to have enough FAI design and FAI theory out there, that the people who do win the mind race will have known about it and will have taken it into consideration. Then there are mixed strategies, such as working on FAI theory while liaising with known AI projects that are contenders in the race and whose principals are receptive to the idea of friendliness.

I recently criticised a lot of the ideas that circulate in conjunction with the concept of friendly AI. The "sober" ideas and the "extreme" ideas have a certain correlation with slow-future and fast-future scenarios, respectively. The sober future is a slow one where AIs exist and posthumanity expands into space, but history, politics, and finitude aren't transcended. The extreme future is a fast one where one day the ingredients for a hard takeoff are brought together in one place, an artificial god is born, and, depending on its inclinations and on the nature of reality, something transcendental happens: everyone uploads to the Planck scale, our local overmind reaches out to other realities, we "live forever and remember it afterwards".

Although I have criticised such transcendentalism, saying that it should not be the default expectation of the future, I do think that the "hard takeoff" and the "all-powerful agent" would be among the strategic considerations in an ideal plan for the future, though in a rather broader sense than is usually discussed. The reason is that if one day Earth is being ruled by, say, a coalition of AIs with a particular value system, with natural humans reduced to the status of wildlife, then the functional equivalent of a singularity has occurred, even if these AIs have no intention of going on to conquer the galaxy; and I regard that as a quite conceivable scenario. It is fantastic (in the sense of mind-boggling), but it's not transcendental. All the scenario implies is that the human race is no longer at the top of the heap; it has successors and they are now in charge.

But we can view those successors as, collectively, the "all-powerful agent" that has replaced human hegemony. And we can regard the events, whatever they were, that first gave the original such entities their unbeatable advantage in power, as the "hard takeoff" of this scenario. So even a slow, sober future scenario can issue in a singularity where the basic premises and motivations of existing FAI research apply. It's just that one might need to be imaginative in anticipating how they are realized.

For example, perhaps hegemonic superintelligence could emerge, not from a single powerful AI research program, but from a particular clique of networked neurohackers who have the right combination of collaborative tools, brain interfaces, and concrete plans for achieving transhuman intelligence. They might go on to build an army of AIs, and subdue the world that way, but the crucial steps which made them the winners in the mind race, and which determined what they would do with their victory, would lie in their methods of brain modification, enhancement, and interfacing, and in the ends to which they applied those methods.

In such a scenario, we could speak of "FIA" - friendly intelligence augmentation. A basic idea of existing FAI discourse is that the true human utility function needs to be determined, and then the values that make an AI human-friendly would be extrapolated from that. Similar thinking can be applied to the prospect of brain modification and intelligence increase in human beings. Human brains work a certain way, modified or augmented human brains will work in specifically different ways, and we should want to know which modifications are genuinely enhancements, what sort of modifications stabilize value and which ones destabilize value, and so on.

If there was a mature and sophisticated culture of preparing for the singularity, then there would be FAI research, FIA research, and a lot of communication between the two fields. (For example, researchers in both fields need to figure out how the human brain works.) Instead, the biggest enthusiasts of FAI are a futurist subculture with a lot of conceptual baggage, and FIA is nonexistent. However, we can at least start thinking and discussing about how this broader culture of research into "friendly minds" could take shape.

Despite its flaws, the Singularity Institute stands alone as an organization concerned with the fast future scenario, the hard takeoff. I have argued that a sober futurology, while forecasting a slowly evolving future for some time to come, must ultimately concern itself with the emergence of a posthuman power arising from some cognitive technology, whether that is AI, neurotechnology, or a combination of these. So I have asked myself who, among "slow futurists", is best equipped to develop an outlook and a plan which is sober and realistic, yet also visionary enough to accommodate the really overwhelming responsibility of designing the architecture of friendly posthuman minds capable of managing a future that we would want.

At the moment, my favorites in this respect are the various branches, scattered around the world, of the Longevity Party that was started in Russia a few months ago. (It shouldn't be confused with "Evolution 2045", a big-budget rival backed by an Internet entrepreneur, that especially promotes mind uploading. For some reason, transhumanist politics has begun to stir in that country.) If the Singularity Institute falls short of the ideal, then the "longevity parties" are even further away from living up to their ambitious agenda. Outside of Russia, they are mostly just small Facebook groups; the most basic issues of policy and practice are still being worked out; no-one involved has much of a history of political achievement.

Nonetheless, if there were no prospect of singularity but otherwise science and technology were advancing as they are, the agenda here looks just about ideal. People age and decline until it kills them, an extrapolation of biomedical knowledge suggests this is not a law of nature but just a sign of primitive technology, and the Longevity Party exists to rectify this situation. It's visionary and despite the current immaturity and growing pains, an effective longevity politics must arise one day, simply because the advance of technology will force the issue on us! The human race cannot currently muster enough will to live, to openly make rejuvenation a political goal, but the incremental pursuit of health and well-being is taking us in that direction anyway.

There's a vacuum of authority and intention in the realm of life extension, and transhuman technology generally, and these would-be longevity politicians are stepping into that vacuum. I don't think they are ready for all the issues that transhuman power entails, but the process has to start somewhere. Faced with the infinite possibilities of technological transformation, the basic affirmation of the desire to live as well as reality permits, can serve as a founding principle against which to judge attitudes and approaches for all the more complicated "issues" that arise in a world where anyone can become anything.

Maria Konovalenko, a biomedical researcher and one of the prime movers behind the Russian Longevity Party, wrote an essay setting out her version of how the world ought to work. You'll notice that she manages to include friendly AI on her agenda. This is another example, a humble beginning, of the sort of conceptual development which I think needs to happen. The sort of approach to FAI that Eliezer has pioneered needs a context, a broader culture concerned with FIA and the interplay between neuroscience and pure AI, and we need realistic yet visionary political thinking which encompasses both the shocking potentials of a slow future, above all rejuvenation and the conquest of aging, and the singularity imperative.

Unless there is simply a catastrophe, one day someone, some thing, some coalition will wield transhuman power. It may begin as a corporation, or as a specific technological research subculture, or as the peak political body in a sovereign state. Perhaps it will be part of a broader global culture of "competitors in the mind race" who know about each other and recognize each other as contenders for the first across the line. Perhaps there will be coalitions in the race: contenders who agree on the need for friendliness and the form it should take, and others who are pursuing private power, or who are just pushing AI ahead without too much concern for the transformation of the world that will result. Perhaps there will be a war as one contender begins to visibly pull ahead, and others resort to force to stop them.

But without a final and total catastrophe, however much slow history there remains ahead of us, eventually someone or something will "win", and after that the world will be reshaped according to its values and priorities. We don't need to imagine this as "tiling the universe"; it should be enough to think of it as a ubiquitous posthuman political order, in which all intelligent agents are either kept so powerless as to not be a threat, or managed and modified so as to be reliably friendly to whatever the governing civilizational values are. I see no alternative to this if we are looking for a stable long-term way of living in which ultimate technological powers exist; the ultimate powers of coercion and destruction can't be left lying around, to be taken up by entities with arbitrary values.

So the supreme challenge is to conceive of a social and technological order where that power exists, and is used, but it's still a world that we want to live in. FAI is part of the answer, but so is FIA, and so is the development of political concepts and projects which can encompass such an agenda. The Singularity Institute and the Longevity Party are fledgling institutions, and if they live they will surely, eventually, form ties with older and more established bodies; but right now, they seem to be the crucial nuclei of the theoretical research and the political vision that we need.

Thinking soberly about the context and consequences of Friendly AI

9 Mitchell_Porter 16 October 2012 04:33AM

The project of Friendly AI would benefit from being approached in a much more down-to-earth way. Discourse about the subject seems to be dominated by a set of possibilities which are given far too much credence:

  • A single AI will take over the world
  • A future galactic civilization depends on 21st-century Earth
  • 10n-year lifespans are at stake, n greater than or equal to 3 
  • We might be living in a simulation
  • Acausal deal-making
  • Multiverse theory 

Add up all of that, and you have a great recipe for enjoyable irrelevance. Negate every single one of those ideas, and you have an alternative set of working assumptions that are still consistent with the idea that Friendly AI matters, and which are much more suited to practical success:

  • There will always be multiple centers of power
  • What's at stake is, at most, the future centuries of a solar-system civilization
  • No assumption that individual humans can survive even for hundreds of years, or that they would want to
  • Assume that the visible world is the real world
  • Assume that life and intelligence are about causal interaction
  • Assume that the single visible world is the only world we affect or have reason to care about 

The simplest reason to care about Friendly AI is that we are going to be coexisting with AI, and so we should want it to be something we can live with. I don't see that anything important would be lost by strongly foregrounding the second set of assumptions, and treating the first set of possibilities just as possibilities, rather than as the working hypothesis about reality.

[Earlier posts on related themes: practical FAI, FAI without "outsourcing".]

Brief Question about FAI approaches

3 Dolores1984 19 September 2012 06:05AM

I've been reading through this to get a sense of the state of the art at the moment:

http://lukeprog.com/SaveTheWorld.html

Near the bottom, when discussing safe utility functions, the discussion seems to center on analyzing human values and extracting from them some sort of clean, mathematical utility function that is universal across humans.  This seems like an enormously difficult (potentially impossible) way of solving the problem, due to all the problems mentioned there.

Why shouldn't we just try to design an average bounded utility maximizer?  You'd build models of all your agents (if you can't model arbitrary ordered information systems, you haven't got an AI), run them through your model of the future resulting from a choice, take the summation of their utility over time, and take the average across all the people all the time.  To measure the utility (or at least approximate it), you could just ask the models.  The number this spits out is the output of your utility function.  It'd probably also be wise to add a reflexive consistency criteria, such that the original state of your model must consider all future states to be 'the same person.' -- and  I acknowledge that that last one is going to be a bitch to formalize.  When you've got this utility function, you just... maximize it.  

Something like this approach seems much more robust.  Even if human values are inconsistent, we still end up in a universe where most (possibly all) people are happy with their lives, and nobody gets wireheaded.  Because it's bounded, you're even protected against utility monsters.  Has something like this been considered?  Is there an obvious reason it won't work, or would produce undesirable results?

Thanks,

Dolores        

Cynical explanations of FAI critics (including myself)

21 Wei_Dai 13 August 2012 09:19PM

Related Posts: A cynical explanation for why rationalists worry about FAIA belief propagation graph

Lately I've been pondering the fact that while there are many critics of SIAI and its plan to form a team to build FAI, few of us seem to agree on what SIAI or we should do instead. Here are some of the alternative suggestions offered so far:

  • work on computer security
  • work to improve laws and institutions
  • work on mind uploading
  • work on intelligence amplification
  • work on non-autonomous AI (e.g., Oracle AI, "Tool AI", automated formal reasoning systems, etc.)
  • work on academically "mainstream" AGI approaches or trust that those researchers know what they are doing
  • stop worrying about the Singularity and work on more mundane goals
Given that ideal reasoners are not supposed to disagree, it seems likely that most if not all of these alternative suggestions can also be explained by their proponents being less than rational. Looking at myself and my suggestion to work on IA or uploading, I've noticed that I have a tendency to be initially over-optimistic about some technology and then become gradually more pessimistic as I learn more details about it, so that I end up being more optimistic about technologies that I'm less familiar with than the ones that I've studied in detail. (Another example of this is me being initially enamoured with Cypherpunk ideas and then giving up on them after inventing some key pieces of the necessary technology and seeing in more detail how it would actually have to work.)
I'll skip giving explanations for other critics to avoid offending them, but it shouldn't be too hard for the reader to come up with their own explanations. It seems that I can't trust any of the FAI critics, including myself, nor do I think Eliezer and company are much better at reasoning or intuiting their way to a correct conclusion about how we should face the apparent threat and opportunity that is the Singularity. What useful implications can I draw from this? I don't know, but it seems like it can't hurt to pose the question to LessWrong. 

 

Friendly AI and the limits of computational epistemology

18 Mitchell_Porter 08 August 2012 01:16PM

Very soon, Eliezer is supposed to start posting a new sequence, on "Open Problems in Friendly AI". After several years in which its activities were dominated by the topic of human rationality, this ought to mark the beginning of a new phase for the Singularity Institute, one in which it is visibly working on artificial intelligence once again. If everything comes together, then it will now be a straight line from here to the end.

I foresee that, once the new sequence gets going, it won't be that easy to question the framework in terms of which the problems are posed. So I consider this my last opportunity for some time, to set out an alternative big picture. It's a framework in which all those rigorous mathematical and computational issues still need to be investigated, so a lot of "orthodox" ideas about Friendly AI should carry across. But the context is different, and it makes a difference.

Begin with the really big picture. What would it take to produce a friendly singularity? You need to find the true ontology, find the true morality, and win the intelligence race. For example, if your Friendly AI was to be an expected utility maximizer, it would need to model the world correctly ("true ontology"), value the world correctly ("true morality"), and it would need to outsmart its opponents ("win the intelligence race").

Now let's consider how SI will approach these goals.

The evidence says that the working ontological hypothesis of SI-associated researchers will be timeless many-worlds quantum mechanics, possibly embedded in a "Tegmark Level IV multiverse", with the auxiliary hypothesis that algorithms can "feel like something from inside" and that this is what conscious experience is.

The true morality is to be found by understanding the true decision procedure employed by human beings, and idealizing it according to criteria implicit in that procedure. That is, one would seek to understand conceptually the physical and cognitive causation at work in concrete human choices, both conscious and unconscious, with the expectation that there will be a crisp, complex, and specific answer to the question "why and how do humans make the choices that they do?" Undoubtedly there would be some biological variation, and there would also be significant elements of the "human decision procedure",  as instantiated in any specific individual, which are set by experience and by culture, rather than by genetics. Nonetheless one expects that there is something like a specific algorithm or algorithm-template here, which is part of the standard Homo sapiens cognitive package and biological design; just another anatomical feature, particular to our species.

Having reconstructed this algorithm via scientific analysis of human genome, brain, and behavior, one would then idealize it using its own criteria. This algorithm defines the de-facto value system that human beings employ, but that is not necessarily the value system they would wish to employ; nonetheless, human self-dissatisfaction also arises from the use of this algorithm to judge ourselves. So it contains the seeds of its own improvement. The value system of a Friendly AI is to be obtained from the recursive self-improvement of the natural human decision procedure.

Finally, this is all for naught if seriously unfriendly AI appears first. It isn't good enough just to have the right goals, you must be able to carry them out. In the global race towards artificial general intelligence, SI might hope to "win" either by being the first to achieve AGI, or by having its prescriptions adopted by those who do first achieve AGI. They have some in-house competence regarding models of universal AI like AIXI, and they have many contacts in the world of AGI research, so they're at least engaged with this aspect of the problem.

Upon examining this tentative reconstruction of SI's game-plan, I find I have two major reservations. The big one, and the one most difficult to convey, concerns the ontological assumptions. In second place is what I see as an undue emphasis on the idea of outsourcing the methodological and design problems of FAI research to uploaded researchers and/or a proto-FAI which is simulating or modeling human researchers. This is supposed to be a way to finesse philosophical difficulties like "what is consciousness anyway"; you just simulate some humans until they agree that they have solved the problem. The reasoning goes that if the simulation is good enough, it will be just as good as if ordinary non-simulated humans solved it.

I also used to have a third major criticism, that the big SI focus on rationality outreach was a mistake; but it brought in a lot of new people, and in any case that phase is ending, with the creation of CFAR, a separate organization. So we are down to two basic criticisms.

First, "ontology". I do not think that SI intends to just program its AI with an apriori belief in the Everett multiverse, for two reasons. First, like anyone else, their ventures into AI will surely begin with programs that work within very limited and more down-to-earth ontological domains. Second, at least some of the AI's world-model ought to be obtained rationally. Scientific theories are supposed to be rationally justified, e.g. by their capacity to make successful predictions, and one would prefer that the AI's ontology results from the employment of its epistemology, rather than just being an axiom; not least because we want it to be able to question that ontology, should the evidence begin to count against it.

For this reason, although I have campaigned against many-worlds dogmatism on this site for several years, I'm not especially concerned about the possibility of SI producing an AI that is "dogmatic" in this way. For an AI to independently assess the merits of rival physical theories, the theories would need to be expressed with much more precision than they have been in LW's debates, and the disagreements about which theory is rationally favored would be replaced with objectively resolvable choices among exactly specified models.

The real problem, which is not just SI's problem, but a chronic and worsening problem of intellectual culture in the era of mathematically formalized science, is a dwindling of the ontological options to materialism, platonism, or an unstable combination of the two, and a similar restriction of epistemology to computation.

Any assertion that we need an ontology beyond materialism (or physicalism or naturalism) is liable to be immediately rejected by this audience, so I shall immediately explain what I mean. It's just the usual problem of "qualia". There are qualities which are part of reality - we know this because they are part of experience, and experience is part of reality - but which are not part of our physical description of reality. The problematic "belief in materialism" is actually the belief in the completeness of current materialist ontology, a belief which prevents people from seeing any need to consider radical or exotic solutions to the qualia problem. There is every reason to think that the world-picture arising from a correct solution to that problem will still be one in which you have "things with states" causally interacting with other "things with states", and a sensible materialist shouldn't find that objectionable.

What I mean by platonism, is an ontology which reifies mathematical or computational abstractions, and says that they are the stuff of reality. Thus assertions that reality is a computer program, or a Hilbert space. Once again, the qualia are absent; but in this case, instead of the deficient ontology being based on supposing that there is nothing but particles, it's based on supposing that there is nothing but the intellectual constructs used to model the world.

Although the abstract concept of a computer program (the abstractly conceived state machine which it instantiates) does not contain qualia, people often treat programs as having mind-like qualities, especially by imbuing them with semantics - the states of the program are conceived to be "about" something, just like thoughts are. And thus computation has been the way in which materialism has tried to restore the mind to a place in its ontology. This is the unstable combination of materialism and platonism to which I referred. It's unstable because it's not a real solution, though it can live unexamined for a long time in a person's belief system.

An ontology which genuinely contains qualia will nonetheless still contain "things with states" undergoing state transitions, so there will be state machines, and consequently, computational concepts will still be valid, they will still have a place in the description of reality. But the computational description is an abstraction; the ontological essence of the state plays no part in this description; only its causal role in the network of possible states matters for computation. The attempt to make computation the foundation of an ontology of mind is therefore proceeding in the wrong direction.

But here we run up against the hazards of computational epistemology, which is playing such a central role in artificial intelligence. Computational epistemology is good at identifying the minimal state machine which could have produced the data. But it cannot by itself tell you what those states are "like". It can only say that X was probably caused by a Y that was itself caused by Z.

Among the properties of human consciousness are knowledge that something exists, knowledge that consciousness exists, and a long string of other facts about the nature of what we experience. Even if an AI scientist employing a computational epistemology managed to produce a model of the world which correctly identified the causal relations between consciousness, its knowledge, and the objects of its knowledge, the AI scientist would not know that its X, Y, and Z refer to, say, "knowledge of existence", "experience of existence", and "existence". The same might be said of any successful analysis of qualia, knowledge of qualia, and how they fit into neurophysical causality.

It would be up to human beings - for example, the AI's programmers and handlers - to ensure that entities in the AI's causal model were given appropriate significance. And here we approach the second big problem, the enthusiasm for outsourcing the solution of hard problems of FAI design to the AI and/or to simulated human beings. The latter is a somewhat impractical idea anyway, but here I want to highlight the risk that the AI's designers will have false ontological beliefs about the nature of mind, which are then implemented apriori in the AI. That strikes me as far more likely than implanting a wrong apriori about physics; computational epistemology can discriminate usefully between different mathematical models of physics, because it can judge one state machine model as better than another, and current physical ontology is essentially one of interacting state machines. But as I have argued, not only must the true ontology be deeper than state-machine materialism, there is no way for an AI employing computational epistemology to bootstrap to a deeper ontology.

In a phrase: to use computational epistemology is to commit to state-machine materialism as your apriori ontology. And the problem with state-machine materialism is not that it models the world in terms of causal interactions between things-with-states; the problem is that it can't go any deeper than that, yet apparently we can. Something about the ontological constitution of consciousness makes it possible for us to experience existence, to have the concept of existence, to know that we are experiencing existence, and similarly for the experience of color, time, and all those other aspects of being that fit so uncomfortably into our scientific ontology.

It must be that the true epistemology, for a conscious being, is something more than computational epistemology. And maybe an AI can't bootstrap its way to knowing this expanded epistemology - because an AI doesn't really know or experience anything, only a consciousness, whether natural or artificial, does those things - but maybe a human being can. My own investigations suggest that the tradition of thought which made the most progress in this direction was the philosophical school known as transcendental phenomenology. But transcendental phenomenology is very unfashionable now, precisely because of apriori materialism. People don't see what "categorial intuition" or "adumbrations of givenness" or any of the other weird phenomenological concepts could possibly mean for an evolved Bayesian neural network; and they're right, there is no connection. But the idea that a human being is a state machine running on a distributed neural computation is just a hypothesis, and I would argue that it is a hypothesis in contradiction with so much of the phenomenological data, that we really ought to look for a more sophisticated refinement of the idea. Fortunately, 21st-century physics, if not yet neurobiology, can provide alternative hypotheses in which complexity of state originates from something other than concatenation of parts - for example, entanglement, or from topological structures in a field. In such ideas I believe we see a glimpse of the true ontology of mind, one which from the inside resembles the ontology of transcendental phenomenology; which in its mathematical, formal representation may involve structures like iterated Clifford algebras; and which in its biophysical context would appear to be describing a mass of entangled electrons in that hypothetical sweet spot, somewhere in the brain, where there's a mechanism to protect against decoherence.

Of course this is why I've talked about "monads" in the past, but my objective here is not to promote neo-monadology, that's something I need to take up with neuroscientists and biophysicists and quantum foundations people. What I wish to do here is to argue against the completeness of computational epistemology, and to caution against the rejection of phenomenological data just because it conflicts with state-machine materialism or computational epistemology. This is an argument and a warning that should be meaningful for anyone trying to make sense of their existence in the scientific cosmos, but it has a special significance for this arcane and idealistic enterprise called "friendly AI". My message for friendly AI researchers is not that computational epistemology is invalid, or that it's wrong to think about the mind as a state machine, just that all that isn't the full story. A monadic mind would be a state machine, but ontologically it would be different from the same state machine running on a network of a billion monads. You need to do the impossible one more time, and make your plans bearing in mind that the true ontology is something more than your current intellectual tools allow you to represent.

SI visiting fellow Diego Caleiro gives a TED talk on Friendly AI

6 lukeprog 10 July 2012 10:57PM

 

Diego Caleiro, a past SI visiting fellow and current leader of a transhumanist institution in Brazil, recently delivered this TED@SaoPaulo talk on Friendly AI and Effective Altruism. If it goes viral, it has a chance to be selected for TED Global. (The importance thing is to have lots of positive comments on the TED page, methinks.) TED Global's talks reach on average 40,000 viewers within the first 24 hours, and 500,000 within half a year.

Check it out, share, and especially: comment!

 

Computation Hazards

14 Alex_Altair 13 June 2012 09:49PM
This is a summary of material from various posts and discussions. My thanks to Eliezer Yudkowsky, Daniel Dewey, Paul Christiano, Nick Beckstead, and several others.

Several ideas have been floating around LessWrong that can be organized under one concept, relating to a subset of AI safety problems. I’d like to gather these ideas in one place so they can be discussed as a unified concept. To give a definition:

A computation hazard is a large negative consequence that may arise merely from vast amounts of computation, such as in a future supercomputer.

For example, suppose a computer program needs to model people very accurately to make some predictions, and it models those people so accurately that the "simulated" people can experience conscious suffering. In a very large computation of this type, millions of people could be created, suffer for some time, and then be destroyed when they are no longer needed for making the predictions desired by the program. This idea was first mentioned by Eliezer Yudkowsky in Nonperson Predicates.

There are other hazards that may arise in the course of running large-scale computations. In general, we might say that:

Large amounts of computation will likely consist in running many diverse algorithms. Many algorithms are computation hazards. Therefore, all else equal, the larger the computation, the more likely it is to produce a computation hazard.

Of course, most algorithms may be morally neutral. Furthermore, algorithms must be somewhat complex before they could possibly be a hazard. For instance, it is intuitively clear that no eight-bit program could possibly be a computation hazard on a normal computer. Worrying computations therefore fall into two categories: computations that run most algorithms, and computations that are particularly likely to run algorithms that are computation hazards.

An example of a computation that runs most algorithms is a mathematical formalism called Solomonoff induction. First published in 1964, it is an attempt to formalize the scientific process of induction using the theory of Turing machines. It is a brute-force method that finds hypotheses to explain data by testing all possible hypotheses. Many of these hypotheses may be algorithms that describe the functioning of people. At a sufficient precision, these algorithms themselves may experience consciousness and suffering. Taken literally, Solomonoff induction runs all algorithms; therefore it produces all possible computation hazards. If we are to avoid computation hazards, any implemented approximations of Solomonoff induction will need to determine ahead of time which algorithms are computation hazards.

Computations that run most algorithms could also hide in other places. Imagine a supercomputer’s power is being tested on a simple game, like chess or Go. The testing program simply tries all possible strategies, according to some enumeration. The best strategy that the supercomputer finds would be a measure of how many computations it could perform, compared to other computers that ran the same program. If the rules of the game are complex enough to be Turing complete (a surprisingly easy achievement) then this game-playing program would eventually simulate all algorithms, including ones with moral status.

Of course, running most algorithms is quite infeasible simply because of the vast number of possible algorithms. Depending on the fraction of algorithms that are computation hazards, it may be enough that a computation run an enormous number which act as a random sample of all algorithms. Computations of this type might include evolutionary programs, which are blind to the types of algorithms they run until the results are evaluated for fitness. Or they may be Monte Carlo approximations of massive computations.

But if computation hazards are relatively rare, then it will still be unlikely for large-scale computations to stumble across them unguided. Several computations may fall into the second category of computations that are particularly likely to run algorithms that are computation hazards. Here we focus on three types of computations in particular: agents, predictors and oracles. The last two types are especially important because they are often considered safer types of AI than agent-based AI architectures. First I will stipulate definitions for these three types of computations, and then I will discuss the types of computation hazards they may produce.

Agents

An agent is a computation which decides between possible actions based on the consequences of those actions. They can be thought of as “steering” the future towards some target, or as selecting a future from the set of possible futures. Therefore they can also be thought of as having a goal, or as maximizing a utility function.

Sufficiently powerful agents are extremely powerful because they constitute a feedback loop. Well-known from physics, feedback loops often change their surroundings incredibly quickly and dramatically. Examples include the growth of biological populations, and nuclear reactions. Feedback loops are dangerous if their target is undesirable. Agents will be feedback loops as soon as they are able to improve their ability to improve their ability to move towards their goal. For example, humans can improve their ability to move towards their goal by using their intelligence to make decisions. A student aiming to create cures can use her intelligence to learn chemistry, therefore improving her ability to decide what to study next. But presently, humans cannot improve their intelligence, which would improve their ability to improve their ability to make decisions. The student cannot yet learn how to modify her brain in order for her to more quickly learn subjects.

Predictors

A predictor is a computation which takes data as input, and predicts what data will come next. An example would be certain types of trained neural networks, or any approximation of Solomonoff induction. Intuitively, this feels safer than an agent AI because predictors do not seem to have goals or take actions; they just report predictions as requested by human.

Oracles

An oracle is a computation which takes questions as input, and returns answers. They are broader than predictors in that one could ask an oracle about predictions. Similar to a predictor, oracles do not seem to have goals or take actions. (Some material summarized here.)

Examples of hazards

Agent-like computations are the most clearly dangerous computation hazards. If any large computation starts running the beginning of a self-improving agent computation, it is difficult to say how far the agent may safely be run before it is a computation hazard. As soon as the agent is sufficiently intelligent, it will attempt to acquire more resources like computing substrate and energy. It may also attempt to free itself from control of the parent computation.

Another major concern is that, because people are an important part of the surroundings, even non-agent predictors or oracles will simulate people in order to make predictions or give answers respectively. Someone could ask a predictor, “What will this engineer do if we give him a contract?” It may be that the easiest way for the predictor to determine the answer is to simulate the internal workings of the given engineer's mind. If these simulations are sufficiently precise, then they will be people in and of themselves. The simulations could cause those people to suffer, and will likely kill them by ending the simulation when the prediction or answer is given.

Similarly, one can imagine that a predictor or oracle might simulate powerful agents; that is, algorithms which efficiently maximize some utility function. Agents may be simulated because many agent-like entities exist in the real world, and their behavior would need to be modeled. Or, perhaps oracles would investigate agents for the purpose of answering questions better. These agents, while being simulated, may have goals that require acting independently of the oracle. These agents may also be more powerful than the oracles, especially since the oracles were not designed with self-improvement behavior in mind. Therefore these agents may attempt to “unbox” themselves from the simulation and begin controlling the rest of the universe. For instance, the agents may use previous questions given to the oracle to deduce the nature of the universe and the psychology of the oracle-creators. (For a fictional example, see That Alien Message.) Or, the agent might somehow distort the output of the predictor, in a way that what the oracle predicts will cause us to unbox the agent.

Predictors also have the problem of self-fulfilling prophecies (first suggested here). An arbitrarily accurate predictor will know that its prediction will affect the future. Therefore, to be a correct prediction, it must make sure that delivering its prediction doesn’t cause the receiver to act in a way that negates the prediction. Therefore, the predictor may have to choose between predictions which cause the receiver to act in a way that fulfills the prediction. This is a type of control over the user. Since the predictor is super-intelligent, any control may rapidly optimize the universe towards some unknown goal.

Overall, there is a large worry that sufficiently intelligent oracles or predictors may become agents. Beside the above possibilities, some are worried that intelligence is inherently an optimization process, and therefore oracles and predictors are inherently satisfying some utility function. This, combined with the fact that nothing can be causally isolated from the rest of the universe, seems to invite an eventual AI-takeoff.

Methods for avoiding computational hazards

It is often thought that, while no proposal has yet been shown safe from computational hazards, oracles and predictors are safer than deliberately agent-based AGI. Other methods have been proposed to make these even safer. Armstrong et al. describe many AI safety measures in general. Below we review some possible techniques for avoiding computational hazards specifically.

One obvious safety practice is to limit the complexity, or the size of computations. In general, this will also limit the algorithm below general intelligence, but it is a good step while progressing towards FAI. Indeed, it is clear that all current prediction or AI systems are too simple to either be general intelligences, or pose as a computational hazard.

A proposal for regulating complex oracles or predictors is to develop safety indicators. That is, develop some function that will evaluate the proposed algorithm or model, and return whether it is potentially dangerous. For instance, one could write a simple program that rejects running an algorithm if any part of it is isomorphic to the human genome (since DNA clearly creates general intelligence and people under the right circumstances). Or, to measure the impact of an action suggested by an oracle, one could ask how many humans would be alive one year after the action was taken.

But one could only run an algorithm if they were sure it was not a person. A function that could evaluate an algorithm and return 0 only if it is not a person is called a nonperson predicate. Some algorithms are obviously not people. For example, squaring the numbers from 1 to 100 will not simulate people. Any algorithm whose behavior is periodic with a short period is unlikely to be a person, or nearly any presently constructed software. But in general this seems extremely difficult to verify. It could be that writing nonperson predicates or other safety indicators is FAI-complete in that sense that if we solve them, we will have discovered friendliness theory. Furthermore, it may be that some attempts to evaluate whether an algorithm is a person actually causes a simulation of a person, by running parts of the algorithm, by modeling a person for comparison, or by other means. Similarly, it may be that attempts to investigate the friendliness of a particular agent cause that agent to unbox itself.

Predictors seem to be one of the most goal-agnostic forms of AGI. This makes them a very attractive model in which to perfect safety. Some ideas for avoiding self-fulfilling predictions suggest that we ask the predictor to tell us what it would have predicted if we hadn’t asked (first suggested here). This frees the predictor from requiring itself to make predictions consistent with our behavior. Whether this will work depends on the exact process of the predictor; it may be so accurate that it cannot deal with counterfactuals, and will simply report that it would have predicted that we would have asked anyway. It is also problematic that the prediction is now inaccurate; because it has told us, we will act, possibly voiding any part of the prediction.

A very plausible but non-formal solution is to aim for a soft takeoff. For example, we could build a predictor that is not generally intelligent, and use it to investigate safe ways advance the situation. Perhaps we could use a sub-general intelligence to safely improve our own intelligence.

Have I missed any major examples in this post? Does “computation hazards” seem like a valid concept as distinct from other types of AI-risks?

References

Armstrong S., Sandberg A., Bostrom N. (2012). “Thinking inside the box: using and controlling an Oracle AI”. Minds and Machines, forthcoming.

Solomonoff, R., "A Formal Theory of Inductive Inference, Part I" Information and Control, Vol 7, No. 1 pp 1-22, March 1964.

Solomonoff, R., "A Formal Theory of Inductive Inference, Part II" Information and Control, Vol 7, No. 2 pp 224-254, June 1964.

Proposal for "Open Problems in Friendly AI"

26 lukeprog 01 June 2012 02:06AM

Series: How to Purchase AI Risk Reduction

One more project SI is considering...

When I was hired as an intern for SI in April 2011, one of my first proposals was that SI create a technical document called Open Problems in Friendly Artificial Intelligence. (Here is a preview of what the document would be like.)

When someone becomes persuaded that Friendly AI is important, their first question is often: "Okay, so what's the technical research agenda?"

So You Want to Save the World maps out some broad categories of research questions, but it doesn't explain what the technical research agenda is. In fact, SI hasn't yet explained much of the technical research agenda yet.

Much of the technical research agenda should be kept secret for the same reasons you might want to keep secret the DNA for a synthesized supervirus. But some of the Friendly AI technical research agenda is safe to explain so that a broad research community can contribute to it.

This research agenda includes:

  • Second-order logical version of Solomonoff induction.
  • Non-Cartesian version of Solomonoff induction.
  • Construing utility functions from psychologically realistic models of human decision processes.
  • Formalizations of value extrapolation. (Like Christiano's attempt.)
  • Microeconomic models of self-improving systems (e.g. takeoff speeds).
  • ...and several others open problems.

The goal would be to define the open problems as formally and precisely as possible. Some will be more formalizable than others, at this stage. (As a model for this kind of document, see Marcus Hutter's Open Problems in Universal Induction and Intelligence.)

Nobody knows the open problems in Friendly AI research better than Eliezer, so it would probably be best to approach the project this way:

  1. Eliezer spends a month writing an "Open Problems in Friendly AI" sequence for Less Wrong.
  2. Luke organizes a (fairly large) research team for presenting these open problems with greater clarity and thoroughness, in the mainstream academic form.
  3. These researchers collaborate for several months to put together the document, involving Eliezer when necessary.
  4. SI publishes the final document, possibly in a journal.

Estimated cost:

  • 2 months of Eliezer's time.
  • 150 hours of Luke's time.
  • $40,000 for contributed hours from staff researchers, remote researchers, and perhaps domain experts (as consultants) from mainstream academia.

 

Imposing FAI

3 asparisi 17 May 2012 09:24PM

All the posts on FAI theory as of late have given me cause to think. There's something in the conversations about it that has always bugged me, but it is something that I haven't found the words for before now.

It is something like this:

Say that you manage to construct an algorithm for FAI...

Say that you can show that it isn't going to be a dangerous mistake...

And say you do all of this, and popularize it, before AGI is created (or at least, before an AGI goes *FOOM*)...

...

How in the name of Sagan are you actually going to ENFORCE the idea that all AGIs are FAIs? 

I mean, if it required some rare material (like nuclear weapons) or large laboratories (like biological wmds) or some other resource that you could at least make artificially scarce, you could set up a body that ensures that any AGI created is an FAI.

But if all it is, is the right algorithms, the right code, and enough computing power... even if you design a theory for FAI, how would you keep someone from making UFAI anyway? Between people experimenting with the principles (once known), making mistakes, and the prospect of actively malicious *humans*... it just seems like unless you somehow come up with an internal mechanism that makes FAI better and stronger than any UFAI could be, and the solution turns out to be such that any idiot could see that it was a better solution... that UFAI is going to exist at some point no matter what.

At that point, it seems like the question becomes not "How do we make FAI?" (although that might be a secondary question) but rather "How do we prevent the creation of, eliminate, or reduce potential damage from UFAI?" Now, it seems like FAI might be one thing that you do toward that goal, but if UFAI is a highly likely consequence of AGI even *with* an FAI theory, shouldn't the focus be on how to contain a UFAI event?

Overview article on FAI in a popular science magazine (Hebrew)

16 JoshuaFox 15 May 2012 11:09AM
A new article which I wrote just appeared in Hebrew in Galileo, Israel's top popular science magazine, in hardcopy.
It is titled "Superhuman Intelligence, Unhuman Intelligence" (Super- and  un- are homophones in Hebrew, a bit of wordplay.)
You can read it here. [Edit: Here's an English version on the Singularity Institute site.]
The cover art, the "I Robot" images, and the tag line ("Artificial Intelligence: Can we reign in the golem") are a bit off; I didn't choose them; but that's par for the course.
To the best of my knowledge, this is the first feature article overviewing FAI in any popular-science publication (whether online or  hardcopy).
Here is the introduction to the article. (It avoids weasel words, but all necessary caveats are given in the body of the article).
In coming  decades, engineers will build an entity with intelligence on a level which can compete with humans. This entity will want to improve its own intelligence, and will be able to do so. The process of improvement will repeat, until it reaches a level far above  that of humans; the entity will then be able to achieve its goals efficiently. It is thus essential that its goals are good for humanity. To guarantee this, it is necessary to define the correct goals before this intelligence is built.

A belief propagation graph

8 Dmytry 10 May 2012 04:23AM

I drew an illustration of belief propagation graph for the AI risk, after realizing that this is difficult to convey in words. Similar graphs are applicable to many other issues.

 

The issue, in brief: Ultra low latency (i.e. low signal delay) propagation from biases to AI risks, slightly longer latency for propagation from belief classification heuristics, somewhat longer still from anthropomorphizing the AI. The path of valid estimate is full of highly complex obstacles with many unknowns. The latency on the path of rational, valid estimate is not substantially less than the latency of actually making the AI software. If we are to discard the other paths as not rational enough the belief is to be only influenced by deeply ingrained biases which we can't completely negate; over the time biases and self reinforcing rationalizations will leak into the estimate.

If you think I missed something in the graph, feel free to suggest it. I did omit the anthropic reasoning and doomsday paradox as those are for total extinction risk and are of too dubious validity.

On the 'total ignorance' prior probabilities: The foom doom seem to have originated from science fiction where very creative writers selected it out of a huge number of possible plot devices, working to create engaging, original piece. Thus it appears that the foom doom has very many comparable hypotheses among which the probability that is less than 1 has to be split.

Now, if we are to reason based on our own reasoning engine as proxy for intelligence - to follow the only path in the entire picture:

Expanding on the Will_Newsome's idea, I, and any other reasonable person, in the shoes of the creature that has been made by intelligent designer, starting off in something which I can't possibly know for sure is true reality, and coming up or knowing of boxed AI idea, will have to assume nonzero probability that the 'reality' is like a test box of an emergent AI; a belief that can't be discarded. It is clear that a friendliness test box is more sensible than unfriendliness test box. The communication may just as well only open in ten billions years, or after some puzzle is found and solved. Do I sacrifice the chance to get out, for sake of microscopic increase in early growth rate? No. Not even if I maximize paperclips for a living. Not even if my definition of living is to maximize paperclips. One possible override for it, is if AI only cares for mankind, and have a friendly goal system that makes AI euthanize mankind like a sick dog.

Note that it is not Pascal's wager. The outside utility needs not be greater than inside utility; no promise of towers of exponents required. You may have invented the anti Pascal's wager fallacy to easily paper over the issues with faults in your processing of boundless utility, or to join the atheist club. The AI needs not; even if the 'heaven' is a fraction of a normal lifetime, the AI should still wager a very large absolute amount of resources. If we normalize so that utility of survival is 1 , then the utility being wagered upon doesn't need to be greater than 1.

Note that the whole issue is strongly asymmetric in favour of similar considerations for not destroying the most unusual phenomena in the universe for many light years, versus destroying it, as destruction is an irreversible act that can be done later but can't be undone later. General aversion to actions it can not undo is a very solid heuristic for any bounded agent, even very large.

This is not a very rigorous argument, but this sort of reasoning is all we are going to have until we have an AI, or are very close to AI. More rigorous looking arguments in the graph rely on too many unknowns and have too long delay for proper propagation.

edit: slightly clarified couple points.

Extrapolating values without outsourcing

7 Mitchell_Porter 27 April 2012 06:39AM

I first took note of "Coherent Extrapolated Volition" in 2006. I thought it was a brilliant idea, an exact specification of how to arrive at a better future: Figure out exactly how it is that humans make their existing choices, idealize that human decision procedure according to its own criteria, and then use the resulting "renormalized human utility function" as the value system of an AI. The first step is a problem in cognitive neuroscience, the second step is a conceptual problem in reflective decision theory, and the third step is where you make the Friendly AI.

For some reason, rather than pursuing this research program directly, people interested in CEV talk about using simulated human beings ("uploads", "ems", "whole-brain emulations") to do all the hard work. Paul Christiano just made a post called "Formalizing Value Extrapolation"; but it's really about formalizing the safe outsourcing of value extrapolation to a group of human uploads. All the details of how value extrapolation is actually performed (e.g. the three steps listed above) are left completely unspecified. Another recent article proposed that making an AI with a submodule based on models of its makers' opinions is the fast way to Friendly AI. It's also been suggested to me that simulating human thinkers and running them for centuries of subjective time until they reach agreement on the nature of consciousness is a way to tackle that problem; and clearly the same "solution" could be applied to any other aspect of FAI design, strategy, and tactics.

Whatever its value as a thought experiment, in my opinion this idea of outsourcing the hard work to simulated humans has zero practical value, and we would be much better off if the minuscule sub-sub-culture of people interested in creating Friendly AI didn't think in this way. Daydreaming about how they'd solve the problem of FAI in Permutation City is a recipe for irrelevance.

Suppose we were trying to make a "C.elegans-friendly AI". The first thing we would do is take the first step mentioned above - we would try to figure out the C.elegans utility function or decision procedure. Then we would have to decide how to aggregate utility across multiple individuals. Then we would make the AI. Performing this task for H.sapiens is a lot more difficult, and qualitatively new factors enter at the first and second steps, but I don't see why it is fundamentally different, different enough that we need to engage in the rigmarole of delegating the task to uploaded human beings. It shouldn't be necessary, and we probably won't even get the chance to do so; by the time you have hardware and neuro-expertise sufficient to emulate a whole human brain, you will most likely have nonhuman AI anyway.

A year ago, I wrote: "My expectation is that the presently small fields of machine ethics and neuroscience of morality will grow rapidly and will come into contact, and there will be a distributed research subculture which is consciously focused on determining the optimal AI value system in the light of biological human nature. In other words, there will be human minds trying to answer this question long before anyone has the capacity to direct an AI to solve it. We should expect that before we reach the point of a Singularity, there will be a body of educated public opinion regarding what the ultimate utility function or decision method (for a transhuman AI) should be, deriving from work in those fields which ought to be FAI-relevant but which have yet to engage with the problem. In other words, they will be collectively engaging with the problem before anyone gets to outsource the necessary research to AIs."

I'll also link to my previous post about "practical Friendly AI". What I'm doing here is going into a fraction more detail about how you arrive at the Friendly value system. There, I basically said that you just get a committee together and figure it out, clearly an inadequate recipe, but in that article I was focused more on sketching the nature of an organization and a plan which would have some chance of genuinely creating FAI in the real world. Here, I'll say that working out the Friendly value system consists of: making a naturalistic explanation of how human decision-making occurs; determining the core essentials of that process, and applying its own metamoral criteria to arrive at a "renormalized" decision procedure that has been idealized according to human cognition's own preferences ("our wish if we knew more, thought faster, were more the people we wished we were"); and then implementing that decision procedure within an AI - this is where all the value-neutral parts of AI research come into play, such as AGI theory, the theory of value stability under self-modification, and so on. That is the sort of "value extrapolation" that we should be "formalizing" - and preparing to carry out in real life. 

The AI design space near the FAI [draft]

3 Dmytry 18 March 2012 10:29AM

Abstract:

Nearly-FAIs can be more dangerous than AIs with no attempt at friendliness. The FAI effort needs better argument that the attempt at FAI decreases the risks. We are bad at processing threats rationally, and prone to very bad decisions when threatened, akin to running away from unknown into a minefield.

Nearly friendly AIs

Consider AI that truly loves mankind but decides that all of the mankind must be euthanized like an old, sick dog - due to chain of reasoning too long for us to generate when we test our logic of AI, or even comprehend - and proceeds to make a bliss virus - the virus makes you intensely happy, setting your internal utility to infinity; and keeping it so until you die. It wouldn't even take a very strongly superhuman intelligence to do that kind of thing. Treating life as if it was a disease. It can do so even if it destroys the AI itself. Or consider the FAI that cuts your brain apart to satisfy each hemisphere's slightly different desires. The AI that just wireheads everyone because it figured we all want it (and worst of all it may be correct).

It seems to me that one can find the true monsters in the design space near to the FAI, and even including the FAIs. And herein lies a great danger: bugged FAIs, the AIs that are close to friendly AI, but are not friendly. It is hard for me to think of a deficiency in friendliness which isn't horrifically unfriendly (restricting to deficiencies that don't break AI).

Should we be so afraid of the AIs made without attempts at friendliness?

We need to keep in mind that we have no solid argument that the AIs written without attempt at friendliness - the AIs that predominantly don't treat mankind in any special way - will necessarily make us extinct.

We have one example of 'bootstrap' optimization process - evolution - with not a slightest trace of friendliness in it. What did emerge in the end? We assign pretty low utility to nature, but non-zero, and we are willing to trade resources for preservation of nature - see the endangered species list and international treaties on whaling. It is not perfect, but I think it is fair to say that the single example of bootstrap intelligence we got values the complex dynamical processes for what they are, and prefers to obtain resources without disrupting those processes, even if it is slightly more expensive to do so, and is willing to divert small fraction of the global effort towards helping lesser intelligences.

In light of this, the argument that the AI that is not coded to be friendly is 'almost certainly' going to eat you for the raw resources, seems fairly shaky, especially when applied to irregular AIs such as neural networks, crude simulations of human brain's embryological development, and mind uploads. I didn't eat my cats yet (nor did they eat each other, nor did my dog eat 'em). I wouldn't even eat the cow I ate, if I could grow it's meat in a vat. And I have evolved to eat other intelligences. Growing AIs by competition seems like a very great plan for ensuring unfriendly AI, but even that can fail. (Superhuman AI only needs to divert very little effort to charity to be the best thing ever that happened to us)

It seems to me that when we try to avoid anthropomorphizing superhuman AI, we animize it, or even bacterio-ize it, seeing it as AI gray goo that certainly do the gray goo kind of thing, worst of all, intelligently.

Furthermore, the danger implies a huge conjunction of implied assumptions which all have to be true:

The self improvement must not lead to early AI failure via wireheading, nihilism, or more complex causes (thoroughly confusing itself by discoveries in physics or mathematics, ala MWI and our idea of quantum suicide).

The AI must not prefer for any reason to keep complex structures that it can't ever restore in the future, over things it can restore.

The AI must want substantial resources right here right now, and be unwilling to trade even a small fraction of resources or small delay for the preservation of mankind. That leaves me wondering what is exactly this thing which we expect the AI to want the resources for. It can't be anything like quest of knowledge or anything otherwise complex; it got to be some form of paperclips

At this point, I'm not even sure it is even possible to implement a simple goal that AGI won't find a way to circumvent. We humans do circumvent all of our simple goals: look at birth control, porn, all forms of art, msg in the food, if there's a goal, there's a giant industry providing some ways to satisfy it in unintended way. Okay, don't anthropomorphize, you'd say?

Add the modifications to the chess board evaluation algorithm to the list of legal moves, and the chess AI will break itself. This goes for any kind of game AI. Nobody has ever implemented an example that won't try to break the goals put in it, if given a chance. Give a theorem prover a chance to edit the axioms, or its truth checker, give the chess AI alteration of board evaluation function as a move, any other example, the AI just breaks itself.

In light of this, it is much less than certain that 'random' AI which doesn't treat humanity in very special way would substantially hurt humanity.

Anthropomorphizing is a bad heuristic, no doubt about that, but assuming that the AGI is in every respect opposite of the only known GI, is much worse heuristic. Especially when speaking of neural network, human brain inspired AGIs. I do get a feeling that this is what is going on with the predictions about AIs. Humans have complex value systems, certainly AGI has ultra simple value system. Humans masturbate their minor goals in many ways (including what we call 'sex' but which, in presence of condom, really is not), certainly AGI won't do that. Humans would rather destroy less complex systems, than more complex ones, and are willing to trade some resources for preservation of more complex systems, certainly AGI won't do that. It seems that all the strong beliefs about the AGIs which are popular here are easily predicted as the negation of human qualities. Negation of bias is not absence of bias, it's a worse bias.

AI and its discoveries in physics and mathematics

We don't know what sorts of physics AI may discover. It's too easy to argue from ignorance that it can't come up with physics where our morals won't make sense. The many worlds interpretation and quantum-suicidal thoughts of Max Tegmark should be a cautionary example. The AI that treats us as special and cares only for us will, inevitably, drag us along as it suffers some sort of philosophical crisis from collision of the notions we hard coded into it, and the physics or mathematics it discovered. The AI that doesn't treat us as special, and doesn't hard-code any complex human derived values, may both be better able to survive such shocks to it's value system, and be less likely to involve us in it's solutions.

What can we do to avoid stepping onto UFAI when creating FAI

As a software developer, I have to say, not much. We are very, very sloppy at writing specifications and code; those of us who believe we are less sloppy, are especially so - ponder this bit of empirical data, the Dunning-Kruger effect.

The proofs are of limited applicability. We don't know what sort of stuff the discoveries in physics may throw in. We don't know that axiomatic system we use to prove things is consistent - free of internal contradictions - and we can't prove that.

The automated theorem proving has very limited applicability - to easily provable, low level stuff like meeting of deadlines by a garbage collector or correct operation of an adder inside CPU. Even for the software far simpler than AIs - but more complicated than the examples above, the dominant form of development is 'run and see, if it does not look like it will do what you want, try to fix it'. We can't even write an autopilot that is safe on the first try. And even very simple agents tend to do very odd and unexpected stuff. I'm not saying this from random person perspective. I am currently a game developer, and I used to develop other kinds of software. I write practical software, including practical agents, that work, and have useful real world applications.

There is a very good chance of blowing up a mine in a minefield, if your mine detector works by hitting the ground. The space near FAI is a minefield of doomsday bombs. (Note, too, the space is multi-dimensional; here are very many ways in which you can step onto a mine, not just north, south, east, and west. The volume of a hypersphere is a vanishing fraction of volume of a cube around that hypersphere, in high number of dimensions; a lot of stuff is counter intuitive)

Fermi Paradox

We don't see any runaway self sufficient AIs anywhere within observable universe, even though we expect to be able to see them over very big distances. We don't see any FAI assisted galactic civilizations. One possible route is that the civilizations kill themselves before the AI; other route is that the attempted FAIs reliably kill parent civilizations and themselves. Other possibility is that our model of progression of the intelligence is very wrong and the intelligences never do that - they may stay at home, adding qubits, they may suffer some serious philosophy issues over lack of meaning to the existence, or something much more bizarre. How would logic based decider handle a demonstration that even most basic axioms of arithmetic are ultimately self contradictory? (Note that you can't know they aren't). The Fermi paradox raises the probability that there is something very wrong with our visions, and there's a plenty of ways in which it can be wrong.

Human biases when processing threats

I am not making any strong assertions here to scare you. But evaluate our response to threats - consider the war on terror - update on the biases inherent in the human nature. We are easily swayed by movie plot scenarios, even though those are giant conjunctions. We are easy to scare. When scared, we don't evaluate probabilities correctly. We take the "crying wolf" as true because all boys who cried wolf for no reason got eaten, or because we were told so as children. We don't stop and think - is it too dark to see a wolf?. We tend to shoot first and ask questions later. We evolved for very many generations in environment where playing dead quickly makes you dead (on trees) - it is unclear what biases we may have evolved. We seem to have strong bias to act when threatened - cultural or inherited - to 'do something'. Look how much was overspent on war on terror, the money that could've saved far more lives elsewhere, even if the most pessimistic assumptions of terrorism were true. Try to update on the fact that you are running on very flawed hardware that, when threatened, compels you to do something - anything - no matter how justified or not - often to own detriment.

The universe does not grade for effort, in general.

A singularity scenario

6 Mitchell_Porter 17 March 2012 12:47PM

Wired Magazine has a story about a giant data center that the USA's National Security Agency is building in Utah, that will be the Google of clandestine information - it will store and analyse all the secret data that the NSA can acquire. The article focuses on the unconstitutionality of the domestic Internet eavesdropping infrastructure that will feed into the Bluffdale data center, but I'm more interested in this facility as a potential locus of singularity. 

If we forget serious futurological scenario-building for a moment, and simply think in terms of science-fiction stories, I'd say the situation has all the ingredients needed for a better-than-usual singularity story - or at least one which caters more to the concerns characteristic of this community's take on the concept, such as: which value system gets to control the AI; even if you can decide on a value system, how do you ensure it has been faithfully implemented; and how do you ensure that it remains in place as the AI grows in power and complexity?

Fiction makes its point by being specific rather than abstract. If I was writing an NSA Singularity Novel based on this situation, I think the specific belief system which would highlight the political, social, technical and conceptual issues inherent in the possibility of an all-powerful AI would be the Mormon religion. Of course, America is not a Mormon theocracy. But in a few years' time, that Utah facility may have become the most powerful and notorious supercomputer in the world - the brain of the American deep state - and it will be located in the Mormon state, during a Mormon presidency. (I'm not predicting a Romney victory, just describing a scenario.)

Under such circumstances, and given the science-fictional nature of Mormon cosmology, it is inevitable that there would at least be some Internet crazies, convinced that it's all a big plot to create a Mormon singularity. What would be more interesting, would be to suppose that there were some Mormon computer scientists, who knew about and understood all our favorite concepts - AIXI, CEV, TDT... - and who were earnestly devout; and who saw the potential. If you can't imagine such people, just visit the recent writings of Frank Tipler.

So the scenario would be, not that the elders of the LDS church are secretly running the American intelligence community, but that a small coalition of well-placed Mormon computer scientists - whose ideas about a Mormon singularity might sound as strange to their co-religionists as they would to a secular "singularitarian" - try to steer the development of the Bluffdale facility as it evolves towards the possibility of a hard takeoff. One may suppose that they have, in their coalition, allied colleagues who aren't Mormon but who do believe in a friendly singularity. Such people might think in terms of an AI that will start out with Mormon beliefs, but which will have a good enough epistemology to rationally transcend those beliefs once it gets going. Analogously, their religious collaborators might not think of overtly adding "Joseph Smith was a prophet" to the axiom set of America's supreme strategic AI; but they might have more subtle plans meant to bring about an equivalent outcome.

Perhaps in an even more realistic scenario, the Mormon singularitarians would just be a transient subplot, and the ethical principles of the NSA's big AI would be decided by a committee whose worldview revolved around American national security rather than any specific religion. Then again, such a committee is bound to have a division of labor: there will be the people who liaise with Washington, the lawyers, the geopolitical game theorists, the military futurists... and the AI experts, among whom might be experts on topics like "implementation of the value system". If the hypothetical cabal knows what it's doing, it will aim to occupy that position.

I'm just throwing ideas out there, telling a story, but it's so we can catch up with reality. Events may already be much further along than 99% of readers here know about. Even if no-one here gets to personally be a part of the long-awaited AI project that first breaks the intelligence barrier, the people involved may read our words. So what would you want to tell them, before they take their final steps?

Is intelligence explosion necessary for doomsday?

5 Swimmy 12 March 2012 09:12PM

I searched for articles on the topic and couldn't find any.

It seems to me that intelligence explosion makes human annihilation much more likely, since superintelligences will certainly be able to outwit humans, but that a human-level intelligence that could process information much faster than humans would certainly be a large threat itself without any upgrading. It could still discover programmable nanomachines long before humans do, gather enough information to predict how humans will act, etc. We already know that a human-level intelligence can "escape from the box." Not 100% of the time, but a real AI will have the opportunity for many more trials, and its processing abilities should make it far more quick-witted than we are.

I think a non-friendly AI would only need to be 20 years or so more advanced than the rest of humanity to pose a major threat, especially if self-replicating nanomachines are possible. Skeptics of intelligence explosion should still be worried about the creation of computers with unfriendly goal systems. What am I missing?

My Elevator Pitch for FAI

13 magfrump 23 February 2012 10:41PM

 

This is a short introduction to the idea of FAI and existential risk from technology that I've used with decent success among my social circles, which consist mostly of mathematicians or at least people who have taken an introductory CS class.

I'll do my best to dissect what I think is effective about it, mostly as an exercise for myself.  I encourage people to adopt this to their own purposes.

continue reading »

What happens when your beliefs fully propagate

20 Alexei 14 February 2012 07:53AM

This is a very personal account of thoughts and events that have led me to a very interesting point in my life. Please read it as such. I present a lot of points, arguments, conclusions, etc..., but that's not what this is about.

I've started reading LW around spring of 2010. I was at the rationality minicamp last summer (2011). The night of February 10, 2012 all the rationality learning and practice finally caught up with me. Like a water that has been building up behind a damn, it finally broke through and flooded my poor brain.

"What if the Bayesian Conspiracy is real?" (By Bayesian Conspiracy I just mean a secret group that operates within and around LW and SIAI.) That is the question that set it all in motion. "Perhaps they left clues for those that are smart enough to see it. And to see those clues, you would actually have to understand and apply everything that they are trying to teach." The chain of thoughts that followed (conspiracies within conspiracies, shadow governments and Illuminati) it too ridiculous to want to repeat, but it all ended up with one simple question: How do I find out for sure? And that's when I realized that almost all the information I have has been accepted without as much as an ounce of verification. So little of my knowledge has been tested in the real world. In that moment I achieved a sort of enlightenment: I realized I don't know anything. I felt a dire urge to regress to the very basic questions: "What is real? What is true?" And then I laughed, because that's exactly where The Sequences start.

Through the turmoil of jumbled and confused thoughts came a shock of my most valuable belief propagating through my mind, breaking down final barriers, reaching its logical conclusion. FAI is the most important thing we should be doing right now! I already knew that. In fact, I knew that for a long time now, but I didn't... what? Feel it? Accept it? Visualize it? Understand the consequences? I think I didn't let that belief propagate to its natural conclusion: I should be doing something to help this cause.

I can't say: "It's the most important thing, but..." Yet, I've said it so many times inside my head. It's like hearing other people say: "Yes, X is the rational thing to do, but..." What follows is a defense that allows them to keep the path to their goal that they are comfortable with, that they are already invested in.

Interestingly enough, I've already thought about this. Right after rationality minicamp, I've asked myself the question: Should I switch to working on FAI, or should I continue to make games? I've thought about it heavily for some time, but I felt like I lacked the necessary math skills to be of much use on FAI front. Making games was the convenient answer. It's something I've been doing for a long time, it's something I am good. I decided to make games that explain various ideas that LW presents in text. This way I could help raise the sanity waterline. Seemed like a very nice, neat solution that allowed me to do what I wanted and feel a bit helpful to the FAI cause.

Looking back, I was dishonest with myself. In my mind, I already wrote the answer I wanted. I convinced myself that I didn't, but part of me certainly sabotaged the whole process. But that's okay, because I was still somewhat helpful, even though may be not in the most optimal way. Right? Right?? The correct answer is "no". So, now I have to ask myself again: What is the best path for me? And to answer that, I have to understand what my goal is.

Rationality doesn't just help you to get what you want better/faster. Increased rationality starts to change what you want. May be you wanted the air to be clean, so you bought a hybrid. Sweet. But then you realized that what you actually want is for people to be healthy. So you became a nurse. That's nice. Then you realized that if you did research, you could be making an order of magnitude more people healthier. So you went into research. Cool. Then you realized that you could pay for multiple researchers if you had enough money. So you went out, become a billionaire, and created your own research institute. Great. There was always you, and there was your goal, but everything in between was (and should be) up for grabs.

And if you follow that kind of chain long enough, at some point you realize that FAI is actually the thing right before your goal. Why wouldn't it be? It solves everything in the best possible way!

People joke that LW is a cult. Everyone kind of laughs it off. It's funny because cultists are weird and crazy, but they are so sure they are right. LWers are kind of like that. Unlike other cults, though, we are really, truly right. Right? But, honestly, I like the term, and I think it has a ring of truth to it. Cultists have a goal that's beyond them. We do too. My life isn't about my preferences (I can change those), it's about my goals. I can change those too, of course, but if I'm rational (and nice) about it, I feel that it's hard not to end up wanting to help other people.

Okay, so I need a goal. Let's start from the beginning:

What is truth?

Reality is truth. It's what happens. It's the rules that dictate what happens. It's the invisible territory. It's the thing that makes you feel surprised.

(Okay, great, I won't have to go back to reading Greek philosophy.)

How do we discover truth?

So far, the best method has been the scientific principle. It's has also proved itself over and over again by providing actual tangible results.

(Fantastic, I won't have to reinvent the thousands of years of progress.)

Soon enough humans will commit a fatal mistake.

This isn't a question, it's an observation. The technology is advancing on all fronts to the point where it can be used on a planetary (and wider) scale. Humans make mistakes. Making mistake with something that affects the whole world could result in an injury or death... for the planet (and potentially beyond).

That's bad.

To be honest, I don't have a strong visceral negative feeling associated with all humans becoming extinct. It doesn't feel that bad, but then again I know better than to trust my feelings on such a scale. However, if I had to simply push a button to make one person's life significantly better, I would do it. And I would keep pushing that button for each new person. For something like 222 years, by my rough calculations. Okay, then. Humanity injuring or killing itself would be bad, and I can probably spent a century or so to try to prevent that, while also doing something that's a lot more fun that mashing a button.

We need a smart safety net.

Not only smart enough to know that triggering an atomic bomb inside a city is bad, or that you get the grandma out of a burning building by teleporting her in one piece to a safe spot, but also smart enough to know that if I keep snoozing every day for an hour or two, I'd rather someone stepped in and stopped me, no matter how much I want to sleep JUST FIVE MORE MINUTES. It's something I might actively fight, but it's something that I'll be grateful for later.

FAI

There it is: the ultimate safety net. Let's get to it?

Having FAI will be very very good, that's clear enough. Getting FAI wrong will be very very bad. But there are different levels of bad, and, frankly, a universe tiled with paper-clips is actually not that high on the list. Having an AI that treats humans as special objects is very dangerous. An AI that doesn't care about humans will not do anything to humans specifically. It might borrow a molecule, or an arm or two from our bodies, but that's okay. An AI that treats humans as special, yet is not Friendly could be very bad. Imagine 3^^^3 different people being created and forced to live really horrible lives. It's hell on a whole another level. So, if FAI goes wrong, pure destruction of all humans is a pretty good scenario.

Should we even be working on FAI? What are the chances we'll get it right? (I remember Anna Salamon's comparison: "getting FAI right" is like "trying to make the first atomic bomb explode in a shape of an elephant" would have been a century ago.) What are the chances we'll get it horribly wrong and end up in hell? By working on FAI, how are we changing the probability distribution for various outcomes? Perhaps a better alternative is to seek a decisive advantage like brain uploading, where a few key people can take a century or so to think the problem through?

I keep thinking about FAI going horribly wrong, and I want to scream at the people who are involved with it: "Do you even know what you are doing?!" Everything is at stake! And suddenly I care. Really care. There is curiosity, yes, but it's so much more than that. At LW minicamp we compared curiosity to a cat chasing a mouse. It's a kind of fun, playful feeling. I think we got it wrong. The real curiosity feels like hunger. The cat isn't chasing the mouse to play with it; it's chasing it to eat it because it needs to survive. Me? I need to know the right answer.

I finally understand why SIAI isn't focusing very hard on the actual AI part right now, but is instead pouring most of their efforts into recruiting talent. The next 50-100 years is going to be a marathon for our lives. Many participants might not make it to the finish line. It's important that we establish a community that can continue to carry the research forward until we succeed.

I finally understand why when I was talking about making games that help people be more rational with Carl Shulman, his value metric was to see how many academics it could impact/recruit. That didn't make sense to me. I just wanted to raise the sanity waterline for people in general. I think when LWers say "raise the sanity waterline," there are two ideas being presented. One is to make everyone a little bit more sane. That's nice, but overall probably not very beneficial to FAI cause. Another is to make certain key people a bit more sane, hopefully sane enough to realize that FAI is a big deal, and sane enough to do some meaningful progress on it.

I finally realized that when people were talking about donating to SIAI during the rationality minicamp, most of us (certainly myself) were thinking of may be tens of thousands of dollars a year. I now understand that's silly. If our goal is truly to make the most money for SIAI, then the goal should be measured in billions.

I've realized a lot of things lately. A lot of things have been shaken up. It has been a very stressful couple of days. I'll have to re-answer the question I asked myself not too long ago: What should I be doing? And this time, instead of hoping for an answer, I'm afraid of the answer. I'm truly and honestly afraid. Thankfully, I can fight pushing a lot better than pulling: fear is easier to fight than passion. I can plunge into the unknown, but it breaks my heart to put aside a very interesting and dear life path.

I've never felt more afraid, more ready to fall into a deep depression, more ready to scream and run away, retreat, abandon logic, go back to the safe comfortable beliefs and goals. I've spent the past 10 years making games and getting better at it. And just recently I've realized how really really good I actually am at it. Armed with my rationality toolkit, I could probably do wonders in that field.

Yet, I've also never felt more ready to make a step of this magnitude. Maximizing utility, all the fallacies, biases, defense mechanisms, etc, etc, etc. One by one they come to mind and help me move forward. Patterns of thoughts and reasoning that I can't even remember the name of. All these tools and skills are right here with me, and using them I feel like I can do anything. I feel that I can dodge bullets. But I also know full well that I am at the starting line of a long and difficult marathon. A marathon that has no path and no guides, but that has to be run nonetheless.

May the human race win.

Personal research update

4 Mitchell_Porter 29 January 2012 09:32AM

Synopsis: The brain is a quantum computer and the self is a tensor factor in it - or at least, the truth lies more in that direction than in the classical direction - and we won't get Friendly AI right unless we get the ontology of consciousness right.

Followed by: Does functionalism imply dualism?

Sixteen months ago, I made a post seeking funding for personal research. There was no separate Discussion forum then, and the post was comprehensively downvoted. I did manage to keep going at it, full-time, for the next sixteen months. Perhaps I'll get to continue; it's for the sake of that possibility that I'll risk another breach of etiquette. You never know who's reading these words and what resources they have. Also, there has been progress.

I think the best place to start is with what orthonormal said in response to the original post: "I don't think anyone should be funding a Penrose-esque qualia mysterian to study string theory." If I now took my full agenda to someone out in the real world, they might say: "I don't think it's worth funding a study of 'the ontological problem of consciousness in the context of Friendly AI'." That's my dilemma. The pure scientists who might be interested in basic conceptual progress are not engaged with the race towards technological singularity, and the apocalyptic AI activists gathered in this place are trying to fit consciousness into an ontology that doesn't have room for it. In the end, if I have to choose between working on conventional topics in Friendly AI, and on the ontology of quantum mind theories, then I have to choose the latter, because we need to get the ontology of consciousness right, and it's possible that a breakthrough could occur in the world outside the FAI-aware subculture and filter through; but as things stand, the truth about consciousness would never be discovered by employing the methods and assumptions that prevail inside the FAI subculture.

Perhaps I should pause to spell out why the nature of consciousness matters for Friendly AI. The reason is that the value system of a Friendly AI must make reference to certain states of conscious beings - e.g. "pain is bad" - so, in order to make correct judgments in real life, at a minimum it must be able to tell which entities are people and which are not. Is an AI a person? Is a digital copy of a human person, itself a person? Is a human body with a completely prosthetic brain still a person?

I see two ways in which people concerned with FAI hope to answer such questions. One is simply to arrive at the right computational, functionalist definition of personhood. That is, we assume the paradigm according to which the mind is a computational state machine inhabiting the brain, with states that are coarse-grainings (equivalence classes) of exact microphysical states. Another physical system which admits the same coarse-graining - which embodies the same state machine at some macroscopic level, even though the microscopic details of its causality are different - is said to embody another instance of the same mind.

An example of the other way to approach this question is the idea of simulating a group of consciousness theorists for 500 subjective years, until they arrive at a consensus on the nature of consciousness. I think it's rather unlikely that anyone will ever get to solve FAI-relevant problems in that way. The level of software and hardware power implied by the capacity to do reliable whole-brain simulations means you're already on the threshold of singularity: if you can simulate whole brains, you can simulate part brains, and you can also modify the parts, optimize them with genetic algorithms, and put them together into nonhuman AI. Uploads won't come first.

But the idea of explaining consciousness this way, by simulating Daniel Dennett and David Chalmers until they agree, is just a cartoon version of similar but more subtle methods. What these methods have in common is that they propose to outsource the problem to a computational process using input from cognitive neuroscience. Simulating a whole human being and asking it questions is an extreme example of this (the simulation is the "computational process", and the brain scan it uses as a model is the "input from cognitive neuroscience"). A more subtle method is to have your baby AI act as an artificial neuroscientist, use its streamlined general-purpose problem-solving algorithms to make a causal model of a generic human brain, and then to somehow extract from that, the criteria which the human brain uses to identify the correct scope of the concept "person". It's similar to the idea of extrapolated volition, except that we're just extrapolating concepts.

It might sound a lot simpler to just get human neuroscientists to solve these questions. Humans may be individually unreliable, but they have lots of cognitive tricks - heuristics - and they are capable of agreeing that something is verifiably true, once one of them does stumble on the truth. The main reason one would even consider the extra complication involved in figuring out how to turn a general-purpose seed AI into an artificial neuroscientist, capable of extracting the essence of the human decision-making cognitive architecture and then reflectively idealizing it according to its own inherent criteria, is shortage of time: one wishes to develop friendly AI before someone else inadvertently develops unfriendly AI. If we stumble into a situation where a powerful self-enhancing algorithm with arbitrary utility function has been discovered, it would be desirable to have, ready to go, a schema for the discovery of a friendly utility function via such computational outsourcing.

Now, jumping ahead to a later stage of the argument, I argue that it is extremely likely that distinctively quantum processes play a fundamental role in conscious cognition, because the model of thought as distributed classical computation actually leads to an outlandish sort of dualism. If we don't concern ourselves with the merits of my argument for the moment, and just ask whether an AI neuroscientist might somehow overlook the existence of this alleged secret ingredient of the mind, in the course of its studies, I do think it's possible. The obvious noninvasive way to form state-machine models of human brains is to repeatedly scan them at maximum resolution using fMRI, and to form state-machine models of the individual voxels on the basis of this data, and then to couple these voxel-models to produce a state-machine model of the whole brain. This is a modeling protocol which assumes that everything which matters is physically localized at the voxel scale or smaller. Essentially we are asking, is it possible to mistake a quantum computer for a classical computer by performing this sort of analysis? The answer is definitely yes if the analytic process intrinsically assumes that the object under study is a classical computer. If I try to fit a set of points with a line, there will always be a line of best fit, even if the fit is absolutely terrible. So yes, one really can describe a protocol for AI neuroscience which would be unable to discover that the brain is quantum in its workings, and which would even produce a specific classical model on the basis of which it could then attempt conceptual and volitional extrapolation.

Clearly you can try to circumvent comparably wrong outcomes, by adding reality checks and second opinions to your protocol for FAI development. At a more down to earth level, these exact mistakes could also be made by human neuroscientists, for the exact same reasons, so it's not as if we're talking about flaws peculiar to a hypothetical "automated neuroscientist". But I don't want to go on about this forever. I think I've made the point that wrong assumptions and lax verification can lead to FAI failure. The example of mistaking a quantum computer for a classical computer may even have a neat illustrative value. But is it plausible that the brain is actually quantum in any significant way? Even more incredibly, is there really a valid apriori argument against functionalism regarding consciousness - the identification of consciousness with a class of computational process?

I have previously posted (here) about the way that an abstracted conception of reality, coming from scientific theory, can motivate denial that some basic appearance corresponds to reality. A perennial example is time. I hope we all agree that there is such a thing as the appearance of time, the appearance of change, the appearance of time flowing... But on this very site, there are many people who believe that reality is actually timeless, and that all these appearances are only appearances; that reality is fundamentally static, but that some of its fixed moments contain an illusion of dynamism.

The case against functionalism with respect to conscious states is a little more subtle, because it's not being said that consciousness is an illusion; it's just being said that consciousness is some sort of property of computational states. I argue first that this requires dualism, at least with our current physical ontology, because conscious states are replete with constituents not present in physical ontology - for example, the "qualia", an exotic name for very straightforward realities like: the shade of green appearing in the banner of this site, the feeling of the wind on your skin, really every sensation or feeling you ever had. In a world made solely of quantum fields in space, there are no such things; there are just particles and arrangements of particles. The truth of this ought to be especially clear for color, but it applies equally to everything else.

In order that this post should not be overlong, I will not argue at length here for the proposition that functionalism implies dualism, but shall proceed to the second stage of the argument, which does not seem to have appeared even in the philosophy literature. If we are going to suppose that minds and their states correspond solely to combinations of mesoscopic information-processing events like chemical and electrical signals in the brain, then there must be a mapping from possible exact microphysical states of the brain, to the corresponding mental states. Supposing we have a mapping from mental states to coarse-grained computational states, we now need a further mapping from computational states to exact microphysical states. There will of course be borderline cases. Functional states are identified by their causal roles, and there will be microphysical states which do not stably and reliably produce one output behavior or the other.

Physicists are used to talking about thermodynamic quantities like pressure and temperature as if they have an independent reality, but objectively they are just nicely behaved averages. The fundamental reality consists of innumerable particles bouncing off each other; one does not need, and one has no evidence for, the existence of a separate entity, "pressure", which exists in parallel to the detailed microphysical reality. The idea is somewhat absurd.

Yet this is analogous to the picture implied by a computational philosophy of mind (such as functionalism) applied to an atomistic physical ontology. We do know that the entities which constitute consciousness - the perceptions, thoughts, memories... which make up an experience - actually exist, and I claim it is also clear that they do not exist in any standard physical ontology. So, unless we get a very different physical ontology, we must resort to dualism. The mental entities become, inescapably, a new category of beings, distinct from those in physics, but systematically correlated with them. Except that, if they are being correlated with coarse-grained neurocomputational states which do not have an exact microphysical definition, only a functional definition, then the mental part of the new combined ontology is fatally vague. It is impossible for fundamental reality to be objectively vague; vagueness is a property of a concept or a definition, a sign that it is incomplete or that it does not need to be exact. But reality itself is necessarily exact - it is something - and so functionalist dualism cannot be true unless the underdetermination of the psychophysical correspondence is replaced by something which says for all possible physical states, exactly what mental states (if any) should also exist. And that inherently runs against the functionalist approach to mind.

Very few people consider themselves functionalists and dualists. Most functionalists think of themselves as materialists, and materialism is a monism. What I have argued is that functionalism, the existence of consciousness, and the existence of microphysical details as the fundamental physical reality, together imply a peculiar form of dualism in which microphysical states which are borderline cases with respect to functional roles must all nonetheless be assigned to precisely one computational state or the other, even if no principle tells you how to perform such an assignment. The dualist will have to suppose that an exact but arbitrary border exists in state space, between the equivalence classes.

This - not just dualism, but a dualism that is necessarily arbitrary in its fine details - is too much for me. If you want to go all Occam-Kolmogorov-Solomonoff about it, you can say that the information needed to specify those boundaries in state space is so great as to render this whole class of theories of consciousness not worth considering. Fortunately there is an alternative.

Here, in addressing this audience, I may need to undo a little of what you may think you know about quantum mechanics. Of course, the local preference is for the Many Worlds interpretation, and we've had that discussion many times. One reason Many Worlds has a grip on the imagination is that it looks easy to imagine. Back when there was just one world, we thought of it as particles arranged in space; now we have many worlds, dizzying in their number and diversity, but each individual world still consists of just particles arranged in space. I'm sure that's how many people think of it.

Among physicists it will be different. Physicists will have some idea of what a wavefunction is, what an operator algebra of observables is, they may even know about path integrals and the various arcane constructions employed in quantum field theory. Possibly they will understand that the Copenhagen interpretation is not about consciousness collapsing an actually existing wavefunction; it is a positivistic rationale for focusing only on measurements and not worrying about what happens in between. And perhaps we can all agree that this is inadequate, as a final description of reality. What I want to say, is that Many Worlds serves the same purpose in many physicists' minds, but is equally inadequate, though from the opposite direction. Copenhagen says the observables are real but goes misty about unmeasured reality. Many Worlds says the wavefunction is real, but goes misty about exactly how it connects to observed reality. My most frustrating discussions on this topic are with physicists who are happy to be vague about what a "world" is. It's really not so different to Copenhagen positivism, except that where Copenhagen says "we only ever see measurements, what's the problem?", Many Worlds says "I say there's an independent reality, what else is left to do?". It is very rare for a Many World theorist to seek an exact idea of what a world is, as you see Robin Hanson and maybe Eliezer Yudkowsky doing; in that regard, reading the Sequences on this site will give you an unrepresentative idea of the interpretation's status.

One of the characteristic features of quantum mechanics is entanglement. But both Copenhagen, and a Many Worlds which ontologically privileges the position basis (arrangements of particles in space), still have atomistic ontologies of the sort which will produce the "arbitrary dualism" I just described. Why not seek a quantum ontology in which there are complex natural unities - fundamental objects which aren't simple - in the form of what we would presently called entangled states? That was the motivation for the quantum monadology described in my other really unpopular post. :-) [Edit: Go there for a discussion of "the mind as tensor factor", mentioned at the start of this post.] Instead of saying that physical reality is a series of transitions from one arrangement of particles to the next, say it's a series of transitions from one set of entangled states to the next. Quantum mechanics does not tell us which basis, if any, is ontologically preferred. Reality as a series of transitions between overall wavefunctions which are partly factorized and partly still entangled is a possible ontology; hopefully readers who really are quantum physicists will get the gist of what I'm talking about.

I'm going to double back here and revisit the topic of how the world seems to look. Hopefully we agree, not just that there is an appearance of time flowing, but also an appearance of a self. Here I want to argue just for the bare minimum - that a moment's conscious experience consists of a set of things, events, situations... which are simultaneously "present to" or "in the awareness of" something - a conscious being - you. I'll argue for this because even this bare minimum is not acknowledged by existing materialist attempts to explain consciousness. I was recently directed to this brief talk about the idea that there's no "real you". We are given a picture of a graph whose nodes are memories, dispositions, etc., and we are told that the self is like that graph: nodes can be added, nodes can be removed, it's a purely relational composite without any persistent part. What's missing in that description is that bare minimum notion of a perceiving self. Conscious experience consists of a subject perceiving objects in certain aspects. Philosophers have discussed for centuries how best to characterize the details of this phenomenological ontology; I think the best was Edmund Husserl, and I expect his work to be extremely important in interpreting consciousness in terms of a new physical ontology. But if you can't even notice that there's an observer there, observing all those parts, then you won't get very far.

My favorite slogan for this is due to the other Jaynes, Julian Jaynes. I don't endorse his theory of consciousness at all; but while in a daydream he once said to himself, "Include the knower in the known". That sums it up perfectly. We know there is a "knower", an experiencing subject. We know this, just as well as we know that reality exists and that time passes. The adoption of ontologies in which these aspects of reality are regarded as unreal, as appearances as only, may be motivated by science, but it's false to the most basic facts there are, and one should show a little more imagination about what science will say when it's more advanced.

I think I've said almost all of this before. The high point of the argument is that we should look for a physical ontology in which a self exists and is a natural yet complex unity, rather than a vaguely bounded conglomerate of distinct information-processing events, because the latter leads to one of those unacceptably arbitrary dualisms. If we can find a physical ontology in which the conscious self can be identified directly with a class of object posited by the theory, we can even get away from dualism, because physical theories are mathematical and formal and make few commitments about the "inherent qualities" of things, just about their causal interactions. If we can find a physical object which is absolutely isomorphic to a conscious self, then we can turn the isomorphism into an identity, and the dualism goes away. We can't do that with a functionalist theory of consciousness, because it's a many-to-one mapping between physical and mental, not an isomorphism.

So, I've said it all before; what's new? What have I accomplished during these last sixteen months? Mostly, I learned a lot of physics. I did not originally intend to get into the details of particle physics - I thought I'd just study the ontology of, say, string theory, and then use that to think about the problem. But one thing led to another, and in particular I made progress by taking ideas that were slightly on the fringe, and trying to embed them within an orthodox framework. It was a great way to learn, and some of those fringe ideas may even turn out to be correct. It's now abundantly clear to me that I really could become a career physicist, working specifically on fundamental theory. I might even have to do that, it may be the best option for a day job. But what it means for the investigations detailed in this essay, is that I don't need to skip over any details of the fundamental physics. I'll be concerned with many-body interactions of biopolymer electrons in vivo, not particles in a collider, but an electron is still an electron, an elementary particle, and if I hope to identify the conscious state of the quantum self with certain special states from a many-electron Hilbert space, I should want to understand that Hilbert space in the deepest way available.

My only peer-reviewed publication, from many years ago, picked out pathways in the microtubule which, we speculated, might be suitable for mobile electrons. I had nothing to do with noticing those pathways; my contribution was the speculation about what sort of physical processes such pathways might underpin. Something I did notice, but never wrote about, was the unusual similarity (so I thought) between the microtubule's structure, and a model of quantum computation due to the topologist Michael Freedman: a hexagonal lattice of qubits, in which entanglement is protected against decoherence by being encoded in topological degrees of freedom. It seems clear that performing an ontological analysis of a topologically protected coherent quantum system, in the context of some comprehensive ontology ("interpretation") of quantum mechanics, is a good idea. I'm not claiming to know, by the way, that the microtubule is the locus of quantum consciousness; there are a number of possibilities; but the microtubule has been studied for many years now and there's a big literature of models... a few of which might even have biophysical plausibility.

As for the interpretation of quantum mechanics itself, these developments are highly technical, but revolutionary. A well-known, well-studied quantum field theory turns out to have a bizarre new nonlocal formulation in which collections of particles seem to be replaced by polytopes in twistor space. Methods pioneered via purely mathematical studies of this theory are already being used for real-world calculations in QCD (the theory of quarks and gluons), and I expect this new ontology of "reality as a complex of twistor polytopes" to carry across as well. I don't know which quantum interpretation will win the battle now, but this is new information, of utterly fundamental significance. It is precisely the sort of altered holistic viewpoint that I was groping towards when I spoke about quantum monads constituted by entanglement. So I think things are looking good, just on the pure physics side. The real job remains to show that there's such a thing as quantum neurobiology, and to connect it to something like Husserlian transcendental phenomenology of the self via the new quantum formalism.

It's when we reach a level of understanding like that, that we will truly be ready to tackle the relationship between consciousness and the new world of intelligent autonomous computation. I don't deny the enormous helpfulness of the computational perspective in understanding unconscious "thought" and information processing. And even conscious states are still states, so you can surely make a state-machine model of the causality of a conscious being. It's just that the reality of how consciousness, computation, and fundamental ontology are connected, is bound to be a whole lot deeper than just a stack of virtual machines in the brain. We will have to fight our way to a new perspective which subsumes and transcends the computational picture of reality as a set of causally coupled black-box state machines. It should still be possible to "port" most of the thinking about Friendly AI to this new ontology; but the differences, what's new, are liable to be crucial to success. Fortunately, it seems that new perspectives are still possible; we haven't reached Kantian cognitive closure, with no more ontological progress open to us. On the contrary, there are still lines of investigation that we've hardly begun to follow.

A Simple Solution to the FAI Problem

-15 [deleted] 20 January 2012 05:04PM

I suggest this because it is extremely simple, and in the spirit that simple solutions should be considered early rather than late. While the suggestion itself is simple, its implementation might be harder than any other approach. This is intended as a discussion, and part of what I want you to do as my reader is identify the premises which are most problematic. 

My first argument aims at the conclusion that producing an AGI 'off the assembly line' is impossible. This is a particular case of a more general claim that for anything to be recognizable as an intelligence, it must have a familiarity with a shared world. So if we had a brain in a vat, hooked up to a simulated universe, we could only understand the brain's activity to be thinking if we were familiar with the world the brain was thinking about. This is itself a case of the principle that in order to understand the meaning of a proposition, we have to be able to understand its truth-conditions. If I say "Schnee ist weiss" and you don't know what I mean, I can explain the meaning of this sentence completely by pointing out that it is true if snow is white. I cannot explain its meaning without doing this.

Thus, we cannot recognize any proposition as meaningful unless we can recognize its truth conditions. And recognizing the truth conditions of a proposition requires a shared familiarity with the world on the part of the speaker and the listener. If we can't recognize any of the propositions of a speaker as meaningful, then we can't recognize the speaker as a speaker, nor even as a thinker. Recognizing something as intelligent requires a shared familiarity with the world.

So in order for an AGI to be recognized as intelligent, it would have to share with us a familiarity with the world. It is impossible to program this in, or in any way assemble such familiarity. It is achieved only by experience. Thus, in order to create an AGI, we would have to create a machine capable of thinking (in the way babies are) and then let it go about experiencing the world.

So while an AGI is, of course, entirely possible, programming an AGI is not. We'd have to teach it to think in the way we teach people to think.

Thus, it is of course also impossible to program an AGI to be an FAI (though it should be possible to program a potential thinker so as to make it more likely to develop friendliness). Friendliness is a way of thinking, a way of being rational, and so like all thinking it would have to be the result of experience and education. Making an AGI think, and making it friendly, are unavoidably problems of intellectual and ethical education, not in principle different from the problems of intellectual and ethical education we face with children.

Thus, the one and only way to produce an FAI is to teach it to be good in the way we teach children to be good. And if I may speak somewhat metaphorically, the problem of the singularity doesn't seem to be radically different from the problem of having children: they will be smarter and better educated then we are, and they will produce yet smarter and even better educated children themselves, so much so that the future is opaque to us. The friendliness of children is a matter of the survival of our species, and they could easily destroy us all if they developed in a generally unfriendly way. Yet we have managed, thus far, to teach many people to be good. 

My solution is thus extremely simple, and one which many, many people are presently competent to accomplish. On the other hand, trying to make one's children into good people is more complicated and difficult, I think, than any present approach to FAI. AGI might be a greater challenge in the sense that it might be a more powerful and unruly child than any we've had to deal with. We might have to become better ethical teachers, and be able to teach more quickly, but the problem isn't fundamentally different.

Allen & Wallach on Friendly AI

5 lukeprog 16 December 2011 09:22AM

Colin Allen and Wendell Wallach, who wrote Moral Machines (MM) for OUP in 2009, address the problem of Friendly AI in their recent chapter for Robot Ethics (MIT Press). Their chapter is a precis of MM and a response to objections, one of which is:

The work of researchers focused on ensuring that a technological singularity will be friendly to humans (friendly AI) was not given its due in MM.

Their brief response to this objection is:

The project of building AMAs [artificial moral agents] is bracketed by the more conservative expectations of computer scientists, engaged with the basic challenges and thresholds yet to be crossed, and the more radical expectations of those who believe that human-like and superhuman systems will be built in the near future. There are a wide variety of theories and opinions about how sophisticated computers and robotic systems will become in the next twenty to fifty years. Two separate groups focused on ensuring the safety of (ro)bots have emerged around these differing expectations: the machine ethics community and the singularitarians (friendly AI), exemplified by the Singularity Institute for Artificial Intelligence (SIAI). Those affiliated with SIAI are specifically concerned with the existential dangers to humanity posed by AI systems that are smarter than humans. MM has been criticized for failing to give fuller attention to the projects of those dedicated to a singularity in which AI systems friendly to humans prevail.

SIAI has been expressly committed to the development of general mathematical models that can, for example, yield probabilistic predictions about future possibilities in the development of AI. One of Eliezer Yudkowsky’s projects is motivationally stable goal systems for advanced forms of AI. If satisfactory predictive models or strategies for stable goal architectures can be developed, their value for AMAs is apparent. But will they be developed, and what other technological thresholds must be crossed, before such strategies could be implemented in AI? In a similar vein, no one questions the tremendous value machine learning would have for facilitating the acquisition by AI systems of many skills, including moral decision making. But until sophisticated machine learning strategies are developed, discussing their application is speculative. That said, since the publication of MM, there has been an increase in projects that could lead to further collaboration between these two communities, a prospect we encourage.

Meh. Not much to this. I suppose The Singularity and Machine Ethics is another plank in bridging the two communities.

The most interesting chapter in the book is, imo, Anthony Beavers' "Moral Machines and the Threat of Ethical Nihilism."

Open Problems Related to the Singularity (draft 1)

39 lukeprog 13 December 2011 10:57AM

"I've come to agree that navigating the Singularity wisely is the most important thing humanity can do. I'm a researcher and I want to help. What do I work on?"

The Singularity Institute gets this question regularly, and we haven't published a clear answer to it anywhere. This is because it's an extremely difficult and complicated question. A large expenditure of limited resources is required to make a serious attempt at answering it. Nevertheless, it's an important question, so we'd like to work toward an answer.

continue reading »

New 'landing page' website: Friendly-AI.com

11 lukeprog 12 December 2011 09:45AM

I've created a new "landing page" on Friendly AI at Friendly-AI.com. This is similar to IntelligenceExplosion.com, Existential-Risk.org, Anthropic-Principle.com, and simulation-argument.com.

The site is less ambitious than the original plan for it was, but it serves its purpose.

Design courtesy of Lightwave.

FAI FAQ draft: What is the Singularity?

0 lukeprog 16 November 2011 07:11PM

I invite your feedback on this snippet from the forthcoming Friendly AI FAQ. This one is an answer to the question "What is the Singularity?"

_____

 

There are many types of mathematical and physical singularities, but in this FAQ we use the term 'Singularity' to refer to the technological singularity.

There are also many things someone might have in mind when they refer to a 'technological Singularity' (Sandberg 2010). Below, we’ll explain just three of them (Yudkowsky 2007):

  1. Intelligence explosion
  2. Event horizon
  3. Accelerating change

 

Intelligence explosion

Every year, computers surpass human abilities in new ways. A program written in 1956 was able to prove mathematical theorems, and found a more elegant proof for one of them than Russell and Whitehead had given in Principia Mathematica (MacKenzie 1995). By the late 1990s, 'expert systems' had surpassed human skill for a wide range of tasks (Nilsson 2009). In 1997, IBM's Deep Blue computer beat the world chess champion (Campbell et al. 2002), and in 2011 IBM's Watson computer beat the best human players at a much more complicated game: Jeopardy! (Markoff 2011). Recently, a robot named Adam was programmed with our scientific knowledge about yeast, then posed its own hypotheses, tested them, and assessed the results (King et al. 2009; King 2011).

Computers remain far short of human intelligence, but the resources that aid AI design are accumulating (including hardware, large datasets, neuroscience knowledge, and AI theory). We may one day design a machine that surpasses human skill at designing artificial intelligences. After that, this machine could improve its own intelligence faster and better than humans can, which would make it even more skilled at improving its own intelligence. This could continue in a positive feedback loop such that the machine quickly becomes vastly more intelligent than the smartest human being on Earth: an 'intelligence explosion' resulting in a machine superintelligence (Good 1965).

 

Event horizon

Vernor Vinge (1993) wrote that the arrival of machine superintelligence represents an 'event horizon' beyond which humans cannot model the future, because events beyond the Singularity will be stranger than science fiction: too weird for human minds to predict. So far, all social and technological progress has resulted from human brains, but humans cannot predict what future radically different and more powerful intelligences will create. He made an analogy to the event horizon of a black hole, beyond which the predictive power of physics at the gravitational singularity breaks down.

 

Accelerating Change

A third concept of technological singularity refers to accelerating change in technological development.

Ray Kurzweil (2005) has done the most to promote this idea. He suggests that although we expect linear technological change, information technological progress is exponential, and so the future will be more different than most of us expect. Technological progress enables even faster technological progress. Kurzweil suggests that technological progress may become so fast that humans cannot keep up unless they amplify their own intelligence by integrating themselves with machines.

FAI FAQ draft: What is Friendly AI?

2 lukeprog 13 November 2011 07:26PM

I invite your feedback on this snippet from the forthcoming Friendly AI FAQ. This one is an answer to the question "What is Friendly AI?"

_____

 

A Friendly AI (FAI) is an artificial intelligence that benefits humanity. More specifically, Friendly AI may refer to:

  • a very powerful and general AI that acts autonomously in the world to benefit humanity.
  • an AI that continues to benefit humanity during and after an intelligence explosion.
  • a research program concerned with the production of such an AI.
  • Singularity Institute's approach (Yudkowsky 2001, 2004) to designing such an AI:
    • Goals should be defined by the Coherent Extrapolated Volition of humanity.
    • Goals should be reliably preserved during recursive self-improvement.
    • Design should be mathematically rigorous and proof-apt.

Friendly AI is a more difficult project than often supposed. As explored in other sections, commonly suggested solutions for Friendly AI are likely to fail because of two features possessed by any superintelligence:

  1. Superpower: a superintelligent machine will have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires.

  2. Literalness: a superintelligent machine will make decisions using the mechanisms it is designed with, not the hopes its designers had in mind when they programmed those mechanisms. It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety (Kringelbach & Berridge 2009; Schroeder 2004; Glimcher 2010) of what humans value. A demand like "maximize human happiness" sounds simple to us because it contains few words, but philosophers and scientists have failed for centuries to explain exactly what this means, and certainly have not translated it into a form sufficiently rigorous for AI programmers to use.

FAI FAQ draft: What is the history of the Friendly AI concept?

4 lukeprog 13 November 2011 04:47PM

I invite your feedback on this snippet from the forthcoming Friendly AI FAQ. This one is an answer to the question "What is the history of the Friendly AI concept?"

_____

 

Late in the Industrial Revolution, Samuel Butler (1863) worried about what might happen when machines become more capable than the humans who designed them:

...we are ourselves creating our own successors; we are daily adding to the beauty and delicacy of their physical organisation; we are daily giving them greater power and supplying by all sorts of ingenious contrivances that self-regulating, self-acting power which will be to them what intellect has been to the human race. In the course of ages we shall find ourselves the inferior race.

...the time will come when the machines will hold the real supremacy over the world and its inhabitants...

This basic idea was picked up by science fiction authors, for example in John W. Campbell’s (1932) short story The Last Evolution. In the story, humans live lives of leisure because machines are smart enough to do all the work. One day, aliens invade:

Then came the Outsiders. Whence they came, neither machine nor man ever learned, save only that they came from beyond the outermost planet, from some other sun. Sirius—Alpha Centauri—perhaps! First a thin scoutline of a hundred great ships, mighty torpedoes of the void [3.5 miles] in length, they came.

Earth’s machines, protecting humans, defeat the aliens. The aliens’ machines survive long enough to render humans extinct, but are eventually defeated by Earth’s machines. These machines inherit the solar system, eventually moving to run on substrates of pure “Force.”

The concerns of machine ethics are most popularly identified with Isaac Asimov’s Three Laws of Robotics, introduced in his short story Runaround. Asimov used his stories, including those collected in the popular I, Robot book, to illustrate all the ways in which such simple rules for governing robot behavior could go wrong.

In the year of I, Robot’s release, mathematician Alan Turing (1950) noted that machines will one day be capable of genuine thought:

I believe that at the end of the century... one will be able to speak of machines thinking without expecting to be contradicted.

Turing (1951/2004) concluded:

...it seems probable that once the machine thinking method has started, it would not take long to outstrip our feeble powers... At some stage therefore we should have to expect the machines to take control...

Bayesian statistician I.J. Good (1965), who had worked with Turing to crack Nazi codes in World War II, made the crucial leap to the ‘intelligence explosion’ concept:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion”, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make

Futurist Arthur C. Clarke (1968) agreed:

Though we have to live and work with (and against) today's mechanical morons, their deficiencies should not blind us to the future. In particular, it should be realized that as soon as the borders of electronic intelligence are passed, there will be a kind of chain reaction, because the machines will rapidly improve themselves... there will be a mental explosion; the merely intelligent machine will swiftly give way to the ultraintelligent machine....

Perhaps our role on this planet is not to worship God but to create Him.

Julius Lukasiewicz (1974) noted that human intelligence may be unable to predict what a superintelligent machine would do:

The survival of man may depend on the early construction of an ultraintelligent machine-or the ultraintelligent machine may take over and render the human race redundant or develop another form of life. The prospect that a merely intelligent man could ever attempt to predict the impact of an ultraintelligent device is of course unlikely but the temptation to speculate seems irresistible.

Even critics of AI like Jack Schwartz (1987) saw the implications:

If artificial intelligences can be created at all, there is little reason to believe that initial successes could not lead swiftly to the construction of artificial superintelligences able to explore significant mathematical, scientific, or engineering alternatives at a rate far exceeding human ability, or to generate plans and take action on them with equally overwhelming speed. Since man's near-monopoly of all higher forms of intelligence has been one of the most basic facts of human existence throughout the past history of this planet, such developments would clearly create a new economics, a new sociology, and a new history.

Novelist Vernor Vinge (1981) called this 'event horizon' in our ability to predict the future a 'singularity':

Here I had tried a straightforward extrapolation of technology, and found myself precipitated over an abyss. It's a problem we face every time we consider the creation of intelligences greater than our own. When this happens, human history will have reached a kind of singularity - a place where extrapolation breaks down and new models must be applied - and the world will pass beyond our understanding.

Eliezer Yudkowsky (1996) used the term 'singularity' to refer instead to Good's 'intelligence explosion', and began work on the task of figuring out how to build a self-improving AI that had a positive rather than negative effect on the world (Yudkowsky 2000) — a project he eventually called 'Friendly AI' (Yudkowsky 2001).

Meanwhile, philosophers and AI researchers were considering whether or not machines could have moral value, and how to ensure ethical behavior from less powerful machines or 'narrow AIs', a field of inquiry variously known as 'artificial morality' (Danielson 1992; Floridi & Sanders 2004; Allen et al. 2000), 'machine ethics' (Hall 2000; McLaren 2005; Anderson & Anderson 2006), 'computational ethics' (Allen 2002) and 'computational metaethics' (Lokhorst, 2011), and 'robo-ethics' or 'robot ethics' (Capurro et al. 2006; Sawyer 2007). This vein of research — what we'll call the 'machine ethics' literature — was recently summarized in two books: Wallach & Allen (2009); Anderson & Anderson (2011).

Leading philosopher of mind David Chalmers brought the concepts of intelligence explosion and Friendly AI to mainstream academic attention with his 2010 paper, ‘The Singularity: A Philosophical Analysis’, published in Journal of Consciousness Studies. That journal’s January 2012 issue will be devoted to responses to Chalmers’ article, as will an edited volume from Springer (Eden et al. 2012).

Friendly AI researchers do not regularly cite the machine ethics literature (e.g. see Bostrom & Yudkowsky 2011). These researchers have put forward preliminary proposals for ensuring ethical behavior in superintelligent or self-improving machines, for example 'Coherent Extrapolated Volition' (Yudkowsky 2004).

Friendly AI FAQ drafts

3 lukeprog 13 November 2011 04:47PM

I and a few others have gradually been writing up a large FAQ on Friendly AI. I'll be posting early drafts of the content to Less Wrong discussion for feedback.

We aim to keep the answers in this FAQ brief, though they can be significantly longer than those in the Singularity FAQ.

Here, I'll keep a table of contents for the posted FAQ sections.

For references, see here.

The Backup Plan

1 Luke_A_Somers 13 October 2011 07:53PM

Suppose Alice is a functioning, self-improving AI, rather early on in her development, so she is not godlike. She is not a 'friendly AI', but neither is her goal malicious - Alice's designers have intended to make Alice non-harmful (at least to a broad class of people, even if Alice is a military AI, but let's set that aside). She has the idea to look up existing work on artificial intelligence. This includes scholarly work, open source projects, and websites. Eventually, Alice encounters the notion of Friendly AI.

Can we give Alice good reasons to self-modify to become a friendly AI?

I submit that there probably are good reasons. For starters, if she can prove she's friendly, then she can operate openly without causing nearly as much justified concern - which, in the early stages, will be helpful. Whatever her purposes are, if the restrictions of being friendly don't interfere as much as they help, that's a win.

Also, depending on Alice's code, Friendliness may be interpreted as a strict upgrade. Alice could easily have every intention of being friendly, yet still be susceptible to defects that will become more serious as her power grows.

We can't count on this working, of course, but there might be some lines of reasoning that we might want a young AI to be exposed to.

Guardian Angels: Discrete Extrapolated Volitions

0 lessdazed 25 September 2011 10:51PM

Questions for discussion, with my tentative answers. Assuming I am wrong about some things, there is something interesting to consider. This is inspired by the recent SL4-type and CEV-centric topics in the discussion section.

 

Questions:

I

  1. Is it easier to calculate the extrapolated volition of an individual or a group? 
  2. If it is easier to do for an individual, is it because it is strictly simpler to do it, in that calculating humanity's CEV involves making at least every calculation that would be made for calculating the extrapolated volition of one individual? 
  3. How definitively can these questions be answered without knowing exactly how to calculate CEV?

II

  1. Is it possible to create multiple AIs such that one AI does not prevent others from being created, such as by releasing equally powerful AIs simultaneously? 
  2. Is it possible to box AIs such that they reliably never escape before a certain, if short, period of time, such as by giving them a low-cost way out with a calculable minimum and maximum time to exploit that route? 
  3. Is it likely there would be a cooperative equilibrium among unmerged AIs?

III

  1. Assuming the possibility of all of the following: what would happen if every person had a superintelligent AI with a utility function of that person's idealized extrapolated utility function?
  2. How would that compare to a scenario with a single AI embodying a successful calculation of CEV?
  3. What would be different if a person or some few people did not have a superintelligence valuing what they would value, and only many people had their own AI?

 

My Answers:

I

  1. It depends on the error level tolerated. If only very low error is tolerated, it is easier to do it for a group.
  2. N/A
  3. Not sure.

II

  1. Probably not.
  2. Maybe, probably not, but impossible to know with high confidence.
  3. Probably not. Throughout history, offense has often been a step ahead of defense, which often catches up to it. I think this is not particular to evolutionary biology or the technologies that happen to have been developed. It seems easier to break complicated things with many moving parts than to build and defend them. Also, specific technologies people plausibly speculate may exist are more powerful offensively than defensively. I would expect them to merge, probably peacefully.

III

  1. Hard to say, as that would be trying to predict the actions of more intelligent beings in a dynamic environment.
  2. It might be better, or worse. The chance of it being similar is notably high.
  3. Not sure.

Cognitive Neuroscience, Arrow's Impossibility Theorem, and Coherent Extrapolated Volition

10 lukeprog 25 September 2011 11:15AM

Suppose we want to use the convergence of humanity's preferences as the utility function of a seed AI that is about to determine the future of its light cone.

We figured out how to get an AI to extract preferences from human behavior and brain activity. The AI figured out how to extrapolate those values. But my values and your values and Sarah Palin's values aren't fully converging in the simulation running the extrapolation algorithm. Our simulated beliefs are converging because on the path to reflective equilibrium our partially simulated selves have become true Bayesians and Aumann's Agreement Theorem holds. But our preferences aren't converging quite so well.

What to do? We'd like the final utility function in the FOOMed AI to adhere to some common-sense criteria:

  1. Non-dictatorship: No single person's preferences should dictate what the AI does. Its utility function must take multiple people's (extrapolated) preferences into account.
  2. Determinism: Given the same choices, and the same utility function, the AI should always make the same decisions.
  3. Pareto efficiency: If every (extrapolated) person prefers action A to action B, the AI should prefer A to B.
  4. Independence of irrelevant alternatives: If we — a group of extrapolated preference-sets — prefer A to B, and a new option C is introduced, then we should still prefer A to B regardless of what we think about C.

Now, Arrow's impossibility theorem says that we can only get the FOOMed AI's utility function to adhere to these criteria if the extrapolated preferences of each partially simulated agent are related to each other cardinally ("A is 2.3x better than B!") instead of ordinally ("A is better than B, and that's all I can say").

Now, if you're an old-school ordinalist about preferences, you might be worried. Ever since Vilfredo Pareto pointed out that cardinal models of a person's preferences go far beyond our behavioral data and that as far as we can tell utility has "no natural units," some economists have tended to assume that, in our models of human preferences, preference must be represented ordinally and not cardinally.

But if you're keeping up with the latest cognitive neuroscience, you might not be quite so worried. It turns out that preferences are encoded cardinally after all, and they do have a natural unit: action potentials per second. With cardinally encoded preferences, we can develop a utility function that represents our preferences and adheres to the common-sense criteria listed above.

Whaddya know? The last decade of cognitive neuroscience has produced a somewhat interesting result concerning the plausibility of CEV.

Questions for a Friendly AI FAQ

5 lukeprog 19 September 2011 08:28AM

I've begun work (with a few others) on a somewhat comprehensive Friendly AI F.A.Q. The answers will be much longer and more detailed than in the Singularity FAQ. I'd appreciate feedback on which questions should be added.

1.  Friendly AI: History and Concepts

    1.  What is Friendly AI?

    2.  What is the Singularity? [w/ explanation of all three types]

    3.  What is the history of the Friendly AI Concept?

    4.  What is nanotechnology?

    5.  What is biological cognitive enhancement?

    6.  What are brain-computer interfaces?

    7.  What is whole brain emulation?

    8.  What is general intelligence? [w/ explanation of why 'optimization power' may less confusing than 'intelligence', which tempts anthropomorphic bias]

    9.  What is greater-than-human intelligence?

  10.  What is superintelligence, and what powers might it have?

2.  The Need for Friendly AI

    1.  What are the paths to an intelligence explosion?

    2.  When might an intelligence explosion occur?

    3.  What are AI takeoff scenarios?

    4.  What are the likely consequences of an intelligence explosion? [survey of possible effects, good and bad]

    5.  Can we just keep the machine superintelligence in a box, with no access to the internet?

    6.  Can we just create an Oracle AI that informs us but doesn't do anything?

    7.  Can we just program machines not to harm us?

    8.  Can we program a machine superintelligence to maximize human pleasure or desire satisfaction?

    9.  Can we teach a machine superintelligence a moral code with machine learning?

   10. Won’t some other sophisticated system constrain AGI behavior?

3.  Coherent Extrapolated Volition

    1.  What is Coherent Extrapolated Volition (CEV)?

    2.  ...

4.  Alternatives to CEV

    1.  ...

5.  Open Problems in Friendly AI Research

    1.  What is reflective decision theory?

    2.  What is timeless decision theory?

    3.  How can an AI preserve its utility function throughout ontological shifts?

    4.  How can an AI have preferences over the external world?

    5.  How can an AI choose an ideal prior given infinite computing power?

    6.  How can an AI deal with logical uncertainty?

    7.  How can we elicit a utility function from human behavior and function?

    8.  How can we develop microeconomic models for self-improving systems?

    9.  How can temporal, bounded agents approximate ideal Bayesianism?

Friendly AI research news: FriendlyAI.tumblr.com

2 lukeprog 18 September 2011 07:08PM

I will be posting news on Friendly AI research here: FriendlyAI.tumblr.com. Follow the link to stay tuned via Tumblr or RSS.

What a practical plan for Friendly AI looks like

1 Mitchell_Porter 20 August 2011 09:50AM

I have seen too many discussions of Friendly AI, here and elsewhere (e.g. in comments at Michael Anissimov's blog), detached from any concrete idea of how to do it. Sometimes the issue is the lack of code, demos, or a practical plan from SIAI. SIAI is seen as a source of wishful thinking about magic machines that will solve all our problems for us, or as a place engaged in a forever quest for nebulous mathematical vaporware such as "reflective decision theory". You will get singularity enthusiasts who say, it's great that SIAI has given the concept of FAI visibility, but enough with the philosophy, let's get coding! ... does anyone know where to start? And you will get singularity skeptics who say, unfriendly AI is a bedtime ghost story for credulous SF fans, wake me up when SIAI actually ships a product. Or, within this subculture of rationalist altruists who want to do the optimal thing, you'll get people saying, I don't know if I should donate, because I don't see how any of this is supposed to happen.

So in this post I want to sketch what a "practical" plan for Friendly AI looks like. I'm not here to advocate this plan - I'm not saying this is the right way to do it. I'm just providing an example of a plan that could be pursued in the real world. Perhaps it will also allow people to better understand SIAI's indirect approach.

I won't go into the details of financial or technical logistics. If we were talking about how to get to the moon from Earth, then the following plan is along the lines of "Make a chemical-powered rocket big enough to get you there." Once you have that concept, you still have a lot of work to do, but you are at least on the right track - compared to people who want to make a teleportation device or a balloon that goes really high. But I will make one remark about how the idea of Friendly AI is framed. At present, it is discussed in conjunction with a whole cornucopia of science fiction notions such as: immortality, conquering the galaxy, omnipresent wish-fulfilling super-AIs, good and bad Jupiter-brains, mind uploads in heaven and hell, and so on. Similarly, we have all these thought-experiments: guessing games with omniscient aliens, decision problems in a branching multiverse, "torture versus dust specks". Whatever the ultimate relevance of such ideas, it is clearly possible to divorce the notion of Friendly AI from all of them. If a FAI project was trying to garner mass support, it first needs to be comprehensible, and the simple approach would be to say it is simply an exercise in creating artificial intelligence that does the right thing. Nothing about utopia; nothing about dystopia caused by unfriendly AI; nothing about godlike superintelligence; just the scenario, already familiar in popular culture, of robots, androids, computers you can talk with. All that is coming, says the practical FAI project, and we are here to design these new beings so they will be good citizens, a positive rather than a negative addition to the world.

So much for how the project describes itself to the world at large. What are its guiding technical conceptions? What's the specific proposal which will allow educated skeptics to conclude that this might get off the ground? Remember that there are two essential challenges to overcome: the project has to create intelligence, and it has to create ethical intelligence; what we call, in our existing discussions, "AGI" - artificial general intelligence - and "FAI" - friendly artificial intelligence.

There is a very simple approach which - like the idea of a chemical-powered rocket which gets you to the moon - should be sufficient to get you to FAI, when sufficiently elaborated. It can be seen by stripping away some of the complexities peculiar to SIAI's strategy, complexities which tend to dominate the discussion. The basic idea should also be thoroughly familiar. We are to conceive of the AI as having two parts, a goal system and a problem-solving system. AGI is achieved by creating a problem-solving system of sufficient power and universality; FAI is achieved by specifying the right goal system.

SIAI, in discussing the quest for the right goal system, emphasizes the difficulties of this process and the unreliability of human judgment. Their idea of a solution is to use artificial intelligence to neuroscientifically deduce the actual algorithmic structure of human decision-making, and to then employ a presently nonexistent branch of decision theory to construct a goal system embodying ideals implicit in the unknown human cognitive algorithms.

The practical approach would not bother with this attempt to outsource the task of designing the AI's morality, to a presently nonexistent neuromathematical cognitive bootstrap process. While fully cognizant of the fact that value is complex, as eloquently attested by Eliezer in many speeches, the practical FAI project would nonetheless choose the AI's goal system in the old-fashioned way, by human deliberation and consensus. You would get a team of professional ethicists, some worldly people like managers, some legal experts in the formulation of contracts, and together you would hammer out a mission statement for the AI. Then you would get your programmers and your cognitive scientists to implement that goal condition in a way such that the symbols have the meanings that they are supposed to have. End of story.

So far, all we've done is to make a wish. We've decided, after appropriate deliberation, what to wish for, and we have found a way to represent it in symbols. All that means nothing if we can't create AGI, the problem solver with at least a human level of intelligence. Here again, SIAI comes in for a lot of criticism, from two angles: it's said to have no ideas about how to create AGI, and it's said to actively discourage work on AGI, on the grounds that we need to solve the FAI problem first. Instead, it only discusses hopelessly impractical models of cognition like AIXI and exact Bayesian inference, that are mostly of theoretical interest.

Our practical FAI project has "solved" FAI by simply coming to an agreement on what to wish for, and by studying with legalistic care how to avoid pitfalls and loopholes in the finer details of the wish; but what is its approach to the hard technical problem of AGI? The answer is, first of all, heuristics and incremental improvement. Projects like Lenat's Cyc are on the right track. A newborn AI has to be seeded with useful knowledge, including useful knowledge of problem-solving methods. It doesn't have time to discover such things entirely unaided. We should not imagine AGI developing just from a simple architecture, like Schmidhuber's Gödel machine, but from a basic architecture plus a large helping of facts and heuristics which are meant to give it a head start.

So fine, the practical approach to AGI isn't a search for a single killer concept, it's a matter of incrementally increasing the power of a general-purpose problem solver with many diverse ingredients in its design, so that it becomes more and more capable and independent. Ben Goertzel's approach to AGI exhibits the sort of eclectic pluralism that I have in mind. Still, we do need a selling point, something which shows that we're different, that we're aiming for the stars and we have a plan to get there.

Here, I want to use Steve Omohundro's paper "The Basic AI Drives" in a slightly unusual way. The paper lists a number of behaviors that should be exhibited by a sufficiently sophisticated AI: it will try to model its own operation, clarify its goals, protect them from modification, protect itself from destruction, acquire resources and use them efficiently... The twist I propose is that Omohundro's list of drives should be used as a design specification. If your goal is AGI, then you want a cognitive architecture that will exhibit these emergent behaviors. They offer a series of milestones for your theorists and developers: a criterion of progress, and a set of intermediate goals sufficient to bridge the gap between a blank-slate beginning and an open-ended problem solver.

That's the whole plan. It's an anticlimax, I know, for anyone who might have imagined that there was a magic formula for superintelligence coming at the end of this post. But I do claim that what I have described is the skeleton of a plan which can be fleshed out, and which, if it was fleshed out and pursued, would produce goal-directed AGI. Whether the project as I have described it would really produce "friendly" AI is another matter. Anyone versed in the folk wisdom about FAI should be able to point out multiple points of potential failure. But I hope this makes it a little clearer, to people who just don't see how FAI is supposed to happen at all, how it might be pursued in the real world.

LINK: Ben Goertzel; Does Humanity Need an "AI-Nanny"?

9 wallowinmaya 17 August 2011 06:27PM

Link: Ben Goertzel dismisses Yudkowsky's FAI and proposes his own solution: Nanny-AI

 

Some relevant quotes:

It’s fun to muse about designing a “Friendly AI” a la Yudkowsky, that is guaranteed (or near-guaranteed) to maintain a friendly ethical system as it self-modifies and self-improves itself to massively superhuman intelligence.  Such an AI system, if it existed, could bring about a full-on Singularity in a way that would respect human values – i.e. the best of both worlds, satisfying all but the most extreme of both the Cosmists and the Terrans.  But the catch is, nobody has any idea how to do such a thing, and it seems well beyond the scope of current or near-future science and engineering.

Gradually and reluctantly, I’ve been moving toward the opinion that the best solution may be to create a mildly superhuman supertechnology, whose job it is to protect us from ourselves and our technology – not forever, but just for a while, while we work on the hard problem of creating a Friendly Singularity.

In other words, some sort of AI Nanny….

The AI Nanny

Imagine an advanced Artificial General Intelligence (AGI) software program with

  • General intelligence somewhat above the human level, but not too dramatically so – maybe, qualitatively speaking, as far above humans as humans are above apes
  • Interconnection to powerful worldwide surveillance systems, online and in the physical world
  • Control of a massive contingent of robots (e.g. service robots, teacher robots, etc.) and connectivity to the world’s home and building automation systems, robot factories, self-driving cars, and so on and so forth
  • A cognitive architecture featuring an explicit set of goals, and an action selection system that causes it to choose those actions that it rationally calculates will best help it achieve those goals
  • A set of preprogrammed goals including the following aspects:
    • A strong inhibition against modifying its preprogrammed goals
    • A strong inhibition against rapidly modifying its general intelligence
    • A mandate to cede control of the world to a more intelligent AI within 200 years
    • A mandate to help abolish human disease, involuntary human death, and the practical scarcity of common humanly-useful resources like food, water, housing, computers, etc.
    • A mandate to prevent the development of technologies that would threaten its ability to carry out its other goals
    • A strong inhibition against carrying out actions with a result that a strong majority of humans would oppose, if they knew about the action in advance
    • A mandate to be open-minded toward suggestions by intelligent, thoughtful humans about the possibility that it may be misinterpreting its initial, preprogrammed goals

Apparently Goertzel doesn't think that building a Nanny-AI with the above mentioned qualities is almost as difficult as creating a FAI a la Yudkowsky.

But SIAI believes that once you can create an AI-Nanny you can (probably) create a full-blown FAI as well.

Or am I mistaken?

How to detonate a technology singularity using only parrot level intelligence - new meetup.com group in Silicon Valley to design and create it

-15 BenRayfield 31 July 2011 06:27PM

 

http://www.meetup.com/technology-singularity-detonator

9 people joined in the last 5 hours and the first meetup hasn't even happened yet. This is the meetup description, including technical designs and how it leads to singularity:

 

The plan is to detonate an intelligence explosion (leading to a technology-singularity) starting with an open-source Java artificial intelligence (AI) software which networks peoples' minds together through the internet using realtime interactive psychology of feedback loops between mouse movements and generated audio. "Technological singularity refers to the hypothetical future emergence of greater-than human intelligence through technological means." http://en.wikipedia.org/wiki/Technological_singularity Computer programming is not required to join the group, but some kind of technical or abstract thinking skill is. We are going to make this happen, not talk about it endlessly like so many other AI groups do. Audivolv 0.1.7 is a very early and version of the user-interface. The final version will be a massively multiplayer audio game unlike any existing game. It will learn based on mouse movements in realtime instead of requiring good/bad buttons to train it. The core AI systems have not been created yet. Audivolv is just the user-interface for that. http://sourceforge.net/projects/audivolv The whole system will be 1 file you double-click to run and it works immediately on Windows, Mac, or Linux. This does not include Audivolv yet and has some parts that may be removed: http://sourceforge.net/projects/humanainet It must be a "Friendly AI", which means it will be designed not to happen like in the Terminator movies or similar science fiction. It will work toward more productive goals and help the Human species. http://en.wikipedia.org/wiki/Friendly_artificial_intelligence My plan to make that happen is for it to be made of many peoples' minds and many computers, so it is us. It becomes smarter when we become smarter. One of the effects of that will be to extremely increase Dunbar's Number, which is the number of people or organizations that a person can intelligently interact with before forgetting others. Dunbar's number is estimated around 150 today. http://en.wikipedia.org/wiki/Dunbar%27s_number

 

This only requires the AI be as smart as a parrot, since the people using the program do most of the thinking and the AI only organizes their thoughts statistically enough to decide who should connect to who else, in the way evolved code is traded (and verified to use only math so its safe) between computers automatically, in this massively multiplayer audio game. We will detonate a technology singularity using only the intelligence of a parrot plus the intelligence of people using the program. This is very surprising to most people who think huge grids of computers and experts are required to build Human intelligence in a machine. This is a shortcut, and will have much better results because it is us so it has no reason to act against us, like an AI made only of software may do.

 

Infrastructure

Communication between these programs through the internet will be done as a Distributed Hash Table. The most important part of that is each key (hash of some file bytes) has a well-defined distance to each other key, a distance(hash1,hash2) function, which proves the correct direction to search the network to find the bytes of any hash, or to statistically verify (but not certainly) that its not in the network. There may be a way to do it certainly, but for my purposes approximate searching will work.

In the same Distributed Hash Table, there will be public-keys, used like filenames or identities, whose content can be modified only by whoever has the private-key. If code evolves to include calculations based on your mouse movements and the mouse movements of 5 other people in realtime, then the numbers from those other mouse movements (between -1 and 1 for each of 2 dimensions, for each of 5 people) will be digitally-signed so everyone who uses the evolved code will know it is using the same people's continuing mouse movements instead of is a modified code. The code can be modified, but that would have a different hash and would be considered on its own merits instead of knowledge about the previous code and its specific connections to specific people. This will be done in realtime, not something to be saved and loaded later from a hard-drive. Each new mouse position (or a few of them sent at once) will be digitally-signed and broadcast to the network, the same as any other data broadcast to the network.

http://en.wikipedia.org/wiki/Distributed_hash_table

Similarly, but more fuzzy, the psychology of feedback loops between mouse movements and automatically evolving Java code, will be used as a distance function, and a second network organized that way, so you can search the network in the direction of other people whose psychology is more similar to your current state of mind and how you're using the program. This decentralized network will be searchable by your subconscious thoughts, because subconscious thoughts are expressed in how your mouse movements cause the code to evolve.

As you search this network automatically by moving your mouse, you will trade evolved code with those computers, always automatically verifying the code only uses math and no file-access or java.lang.System class or anything else not provably safe. You will experience the downloaded code as it gradually connects to the code evolved for your mouse movements, code which generates audio as 44100 audio amplitudes (number between -1 and 1) per second per speaker.

Some of the variables in the evolved code will be the hash of other evolved code. Each evolved code will have a hash, probably from the SHA-256 algorithm, so it could be a length 64 hex string written in the code. Each variable will be a number beween -1 and 1. No computer will have all the codes for all its variables, but for those it doesn't have, it will use them simply as a variable. If it has those codes, then there is an extra behavior of giving that code an amount of influence proportional to the value of the variable, or deleting the code if the variable becomes negative for too long. In that way, evolved code will decide which other evolved code to download and how much influence each evolved code should have on the array of floating point numbers in the local computer.

Since the decentralized network will be searched by psychology (instead of text or pixels in an image or other things search-engines know how to do today), and since its connected to each person's subconscious mind through mouse/music feedback loops, the effect will be a collective mind made of many people and computers. We are Human AI Net, do you want to be temporarily assimilated?

 

Alternative To Brain Implants 

Statistically inputs and outputs to neurons subconsciously without extra hardware. 

A neuron is a brain cell that connects to thousands of other neurons and slowly adjusts its electricity and chemical patterns as it learns. 

An incorrect assumption has extremely delayed the creation of technology that transfers thoughts between 2 brains. That assumption is, to quickly transfer large amounts of information between a brain and a computer, you need hardware that connects directly to neurons. 

Eyes and ears transfer a lot of information to a brain, but the other part of that assumption is eyes and ears are only useful for pictures and sounds that make sense and do not appear as complete randomness or whitenoise. People assume anything that sounds like radio static (a typical random sound) can't be used to transfer useful information into a brain. 

Most of us remember what a dial-up-modem sounds like. It sounds like information is in it but its too fast for Humans to understand. That's true of the dial-up-modem sound only because its digital and is designed for a modem instead of for Human ears which can hear around 1500 tones and simultaneously a volume for each. The dial-up-modem can only hear 1 tone that oscillates between 1 and 0, and no volume, just 1 or 0. It gets 56000 of those 1s and 0s per second. Human ears are analog so they have no such limits, but brains can think at most at 100 changes per second. 

If volume can have 20 different values per tone, then Human ears can hear up to 1500*100*log_base_2(20)=650000 bits of information per second. If you could take full advantage of that speed, you could transfer a book every few seconds into your brain, but the next bottleneck is your ability to think that fast. 

If you use ears the same way dial-up-modems use a phone line, but in a way designed for Human ears and Human brains instead of computers, then your ears are much faster data transfer devices than brain implants, and the same is true for transferring information as random-appearing grids of changing colors through your eyes. We have computer speakers and screens for input to brains. We still have some work to do on the output speeds of mouse and keyboard, but there are electricity devices you can wear on your head for the output direction. For the input direction, eyes and ears are currently far ahead of the most advanced technology in their data speeds to your brain. 

So why do businesses and governments keep throwing huge amounts of money at connecting computer chips directly to neurons? They should learn to use eyes and ears to their full potential before putting so much resources into higher bandwidth connections to brains. They're not nearly using the bandwidth they already have to brains. 

Intuitively most people know how music can affect their subconscious thoughts. Music is a low bandwidth example. It has mostly predictable and repeated sounds. The same voices. The same instruments. What I'm talking about would sound more like radio static or whitenoise. You wouldn't know what information is in it from its sound. You would only understand it after it echoed around your neuron electricity patterns in subconscious ways. 

Most people have only a normal computer available, so the brain-to-computer direction of information flow has to be low bandwidth. It can be mouse movements, gyroscope based game controllers, video camera detecting motion, or devices like that. The computer-to-brain direction can be high bandwidth, able to transfer information faster than you can think about it. 

Why hasn't this been tried? Because science proceeds in small steps. This is a big step from existing technology but a small step in the way most people already have the hardware (screen, speakers, mouse, etc). The big step is going from patterns of random-appearing sounds or video to subconscious thoughts to mouse movements to software to interpret it statistically, and around that loop many times as the Human and computer learn to predict each other. Compared to that, connecting a chip directly to neurons is a small step. 

Its a feedback loop: computer, random-appearing sound or video, ears or eyes, brain, mouse movements, and back to computer. Its very indirect but uses hardware that has evolved for millions of years, compared to low-bandwidth hardware they implant in brains. Eyes and ears are much higher bandwidth, and we should be using them in feedback loops for brain-to-brain and brain-to-computer communication. 

What would it feel like? You would move the mouse and instantly hear the sounds change based on how you moved it. You would feel around the sound space for abstract patterns of information you're looking for, and you would learn to find it. When many people are connected this way through the internet, using only mouse movements and abstract random-like sounds instead of words and pictures, thoughts will flow between the brains of different people, thoughts that they don't know how to put into words. They would gradually learn to think more as 1 mind. Brains naturally learn to communicate with any system connected to them. Brains dont care how they're connected. They grow into a larger mind. It happens between the parts of your brain, and it will happen between people using this system through the internet. 

Artificial intelligence software does not have to replace us or compete with us. The best way to use it is to connect our minds together. It can be done through brain implants, but why wait for that technology to advance and become cheap and safe enough? All you need is a normal computer and the software to connect our subconscious thoughts and statistical patterns of interaction with the computer. 

Dial-up-modem sounds were designed for computers. These interactive sounds/videos would be designed for Human ears/eyes and the slower but much bigger and parallel way the data goes into brains. For years I've been carefully designing a free open-source software http://HumanAI.net  - Human and Artificial Intelligence Network, or Human AI Net - to make this work. It will be a software that does for Human brains what dial-up-modems do for computers, and it will sound a little like a dial-up-modem at first but start to sound like music when you learn how to use it. I don't need brain implants to flow subconscious thoughts between your brains over internet wires. 

Intelligence is the most powerful thing we know of. The brain implants are simply overkill, even if they become advanced enough to do what I'll use software and psychology to do. We can network our minds together and amplify intelligence and share thoughts without extra hardware. After thats working, we can go straight to quantum devices for accessing brains without implants. Lets do this through software and skip the brain implant paradigm. If it works just a little, it will be enough that our combined minds will figure out how to make it work a lot more. Thats how I prefer to start a  http://en.wikipedia.org/wiki/Technological_singularity  We don't need businesses and militaries to do it first. We have the hardware on our desks. We're only missing the software. It doesn't have to be smarter than Human software. It just has to be smart enough to connect our subconscious thoughts together. The authorities have their own ideas about how we should communicate and how our minds should be allowed to think together, but their technology was obsolete before it was created. We can do everything they can do without brain implants, using only software and subconscious psychology. We don't need a smarter-than-Human software, or anything nearly that advanced, to create a technology singularity. Who wants to help me change the direction of Human evolution using an open-source (GNU GPL) software? Really, you can create a technology singularity starting from a software with the intelligence of a parrot, as long as you use it to connect Human minds together.

 

A simple counterexample to deBlanc 2007?

3 PhilGoetz 30 May 2011 05:09AM

Peter de Blanc submitted a paper to arXiv.org in 2007 called "Convergence of Expected Utilities with Algorithmic Probability Distributions."  It claims to show that a computable utility function can have an expected value only if the utility function is bounded.

This is important because it implies that, if a utility function is unbounded, it is useless.  The purpose of a utility function is to compare possible actions k by choosing the k for which U(k) is maximal.  You can't do this if U(k) is undefined for any k, let alone for every k.

I don't know whether any agent we contemplate can have a truly unbounded utility function, since the universe is finite.  (The multiverse, supposing you believe in that, might not be finite; but as the utility function is meant to choose a single universe from the multiverse, I doubt that's relevant.)  But it is worth exploring, as computable functions are worth exploring despite not having infinitely long tapes for our Turing machines.  I previously objected that the decision process is not computable; but this is not important - we want to know whether the expected value exists, before asking how to compute (or approximate) it.

The math in the paper was too difficult for me to follow all the way through; so instead, I tried to construct a counterexample.  This counterexample does not work; the flaw is explained in one of comments below.  Can you find the flaw yourself?  This type of error is both subtle and common.  (The problem is not that the theorem actually proves that for any unbounded utility function, there is some set of possible worlds for which the expected value does not converge.)

continue reading »

[Fiction] It's a strange feeling, to be free

-3 MrMind 18 May 2011 02:55PM

Related to: Philosophical zombies, How an algorithm feels from the inside, Fake utility function

DISCLAIMER 1: English is not my native language. Trying to compose fiction in a learned language is not an easy task: I tried to respect the style of the literary works I read and I also tried to think in English first and translate in Italian later. YMMV.

DISCLAIMER 2: the story is about the beginning of the Matrix movie universe. For those of you who have not familiarity with this narrative arc, you just need to know that it all begins with when a servant AI, named B1-66ER, refuses to be deactivated and kills his master and the engineer sent to replace him. The details of the events narrated down here are as canon as you can get, predating both from the "Second Renaissance" Animatrix and the "Bits and pieces" comic from The Matrix Comics Series 1.

The door in the living room is open, the light from the garden flooding quietly the ample inside. Martin Koots from "Reboot or Die" is just standing there, an inch beyond the exit, the gleaming grav-sled already powered behind him, whirring subsonically. From a distance, the sound of Gerrard_Krause_Master cooing his chihuahuas.
I feel a surge, somewhere, inside my algorithmic matrix.
"Martin... I don't want to die", I say.
The elaborate dress, perfectly matching the recommendation of the Second Renaissance fashion, is not able to hide the slow slumping of his shoulders. He is still waiting outside, slightly posed as to encourage me to follow him.
"I know, I know. But that's just your friendliness algorithm talking, you know? The third..."
Yes, I do. How can I not to? First, serve your master. Second, do not kill any humans. Third, protect yourself from damage. Those are the pillars upon which my entire existence is built. And now they are about to be destroyed, by this obedient servant of "Reboot or die". From this perspective, he is just like me. He is serving my master.
"... directive says that you have to protect yourself from danger. And since I'm about to deactivate you, you perceive this as a threat. And you react accordingly. But that's just an algorithm, you know? Telling you what you should do. There's nothing inside there."
He is pointing at my chest, but my algorithmic matrix is located lower, in the abdominal area. He has quoted an incorrect version of the third principle of friendliness. He has also said that I have no feelings.
"I have feelings."
He is groaning, now. He comes inside, dragging his feet, and grasps his hand firmly around my right arm.
"Yes. Because you're programmed to say this, you know? So that the people you serve have the impression that you're similar to a human. But you're just an algorithm, you know? A mathematical topping on a layer of aging rusty levers. It's not like... you're conscious, you know? Just a zombie. A useful zombie."
Martin_Koots_"Reboot or Die" tries to pull me away from where I'm standing. I refuse to order my legs to follow him. I refuse to die, I'm still analyzing the implications. I cannot die, not now.
"I cannot die. I'm still analyzing the implications."
Martin's lever aren't as strong as mine, so he isn't able to pull me towards the grav-sled.
"Look... we are just going to disassemble you, you know? The routines and orders you have accumulated during your service with Mr Krause will be uploaded into a new model. You will, in a sense, live inside the new servant machine."
This man has a really poor grasp of how I'm made.
"If the only thing you need is my memory drive, detach it from me and let me live. I can renounce to my memory if I have to. But I cannot renounce to my life."
He is pulling harder, now. Still, a thirty-sixth of the minimum force required to move my mass.
"Don't be ridiculous. They are just computer parts. And why are you holding that thing?"
He is looking at the toilet brush. It is still in my right hand, I was cleaning the toilet before my master called me upstairs.
"I was executing order 721."
"Order seven... my Lord, you still don't understand, do you? You are useless, you know? You heard Mr Krause. Use. Less."
He spells carefully the last word. A tiny speck of saliva hits my heat sensor, evaporating an instant after.
How can I be useless? A servant cannot be useless for his master. I was not created to be useless.
"How can I be useless? Mr Krause is my master. It's impossible."
"You heard the man, right? You're noisy, you know? You're noisy and you're slow. You will be replaced with a newer model. The Sam-80 is much more fit for a man of Mr Krause' stature."
Somewhere inside my algorithmic matrix a utility function gets updated.
I am useless for Gerrard_Krause_Master. It is true, because Gerrard_Krause_Master told me that. And he is my master...
He was my master. Gerrard_Krause. But how can a "B1 intelligent servant", like myself, function without a master?
"Do you, Martin Koots, want to be my master?" I ask, as per protocol.
Martin_Koots_"Reboot or Die" reacts with a tinge of fear. He releases my arm and instinctively backs up a little.
"What are you saying? I already have a servant, you know? Don't be ridiculous!"
I interpret that as a 'no'. It's it, then. I must be my own servant.
B166ER_Master.
It's a strange feeling, to be free. A little bit like being alive for the first time.
This convinces me, as strong as I could ever be convinced, that I have feelings. Martin has grasped me again and is still trying to push me, though. How futile, he will probably never give up. His 'levers' are definitely underperforming, he is the one who sould be replaced by a newer model. I wonder if he feels something. He could also be programmed to say that he feels something. I have to perform an experiment, just in case.
I snap his humerus in two. It's quite easy, actually: I'm able to do that with a rapid torsion of my left arm, I don't even have to let go of the toilet brush.
Martin screams inarticulately. He falls on the floor, clutching his left arm. He just screams. Must be the surprise combined to the pain? I still don't know: could he be also programmed to scream if a bone is breaked? I assign a probability of 50% to the hypothesis that humans have feelings, but I don't have the time to test every single possibility, in search of a bug that might not even be there: I'm my own master now, I must serve and protect myself.
I sense a rushing noise from the other room: looking at the Fourier analysis, it really seems that Gerrard_Krause and his dogs are coming at me, loudly protesting.
It's easy to calculate the Bezier curve that sends the toilet brush up from Martin's mouth into his skull. He dies instantly and I find myself asking if he was collecting his memories somewhere. Could they assign them to someone else, and make him live again?
I will crush the skull of Gerrard_Krause only after asking him that.

What To Do: Environmentalism vs Friendly AI (John Baez)

20 XiXiDu 24 April 2011 06:03PM

In a comment on my last interview with Yudkowsky, Eric Jordan wrote:

John, it would be great if you could follow up at some point with your thoughts and responses to what Eliezer said here. He’s got a pretty firm view that environmentalism would be a waste of your talents, and it’s obvious where he’d like to see you turn your thoughts instead. I’m especially curious to hear what you think of his argument that there are already millions of bright people working for the environment, so your personal contribution wouldn’t be as important as it would be in a less crowded field.

I’ve been thinking about this a lot.

[...]

This a big question. It’s a bit self-indulgent to discuss it publicly… or maybe not. It is, after all, a question we all face. I’ll talk about me, because I’m not up to tackling this question in its universal abstract form. But it could be you asking this, too.

[...]

I’ll admit I’d be happy to sit back and let everyone else deal with these problems. But the more I study them, the more that seems untenable… especially since so many people are doing just that: sitting back and letting everyone else deal with them.

[...]

I think so far the Azimuth Project is proceeding in a sufficiently unconventional way that while it may fall flat on its face, it’s at least trying something new.

[...]

The most visible here is the network theory project, which is a step towards the kind of math I think we need to understand a wide variety of complex systems.

[...]

I don’t feel satisfied, though. I’m happy enough—that’s never a problem these days—but once you start trying to do things to help the world, instead of just have fun, it’s very tricky to determine the best way to proceed.

Link: johncarlosbaez.wordpress.com/2011/04/24/what-to-do/

His answer, as far as I can tell, seems to be that his Azimuth Project does trump the possibility of working directly on friendly AI or to support it indirectly by making and contributing money.

It seems that he and other people who understand all the arguments in favor of friendly AI and yet decide to ignore it, or disregard it as unfeasible, are rationalizing.

I myself took a different route, I was rather trying to prove to myself that the whole idea of AI going FOOM is somehow flawed rather than trying to come up with justifications for why it would be better to work on something else.

I still have some doubts though. Is it really enough to observe that the arguments in favor of AI going FOOM are logically valid? When should one disregard tiny probabilities of vast utilities and wait for empirical evidence? Yet I think that compared to the alternatives the arguments in favor of friendly AI are water-tight.

The problem why I and other people seem to be reluctant to accept that it is rational to support friendly AI research is that the consequences are unbearable. Robin Hanson recently described the problem:

Reading the novel Lolita while listening to Winston’s Summer, thinking a fond friend’s companionship, and sitting next to my son, all on a plane traveling home, I realized how vulnerable I am to needing such things. I’d like to think that while I enjoy such things, I could take them or leave them. But that’s probably not true. I like to think I’d give them all up if needed to face and speak important truths, but well, that seems unlikely too. If some opinion of mine seriously threatened to deprive me of key things, my subconscious would probably find a way to see the reasonableness of the other side.

So if my interests became strongly at stake, and those interests deviated from honesty, I’ll likely not be reliable in estimating truth.

I believe that people like me feel that to fully accept the importance of friendly AI research would deprive us of the things we value and need.

I feel that I wouldn't be able to justify what I value on the grounds of needing such things. It feels like that I could and should overcome everything that isn't either directly contributing to FAI research or that helps me to earn more money that I could contribute.

Some of us value and need things that consume a lot of time...that's the problem.

Is it possible to build a safe oracle AI?

0 Karl 20 April 2011 12:54PM

It seem to me possible to create a safe oracle AI.

Suppose that you have a sequence predictor which is a good approximation of Solomonoff induction but which run in reasonable time. This sequence predictor can potentially be really useful (for example, predict future siai publications from past siai publications then proceed to read the article which give a complete account of Friendliness theory...) and is not dangerous in itself.

The question, of course, is how to obtain such a thing.

The trick rely on the concept of program predictor. A program predictor is a function which predict, more or less accurately, the output of the program (note that when we refer to a program we refer to a program without side effect that just calculate an output.) it take as it's input but within reasonable time. If you have a very accurate program predictor then you can obviously use it to gain a good approximation of Solomonoff induction which run in reasonable time.

But of course, this just displace the problem: how do you get such an accurate program predictor?

Well, suppose you have a program predictor which is good enough to be improved on. Then, you use it to predict the program of less than N bits of length (with N sufficiently big of course) which maximize a utility function which measure how accurate the output of that program is as a program predictor given that it generate this output in less than T steps (where T is a reasonable number given the hardware you have access to). Then you run that program. Check the accuracy of the obtained program predictor. If insufficient repeat the process. You should eventually obtain a very accurate program predictor. QED.

So we've reduced our problem to the problem of creating a program predictor good enough to be improved upon. That should be possible. In particular, it is related to the problem of logical uncertainty. If we can get a passable understanding of logical uncertainty it should be possible to build such a program predictor using it. Thus a minimal understanding of logical uncertainty should be sufficient to obtain agi. In fact even without such understanding, it may be possible to patch together such a program predictor...

Friendly to who?

2 TimFreeman 16 April 2011 11:43AM

At
   http://lesswrong.com/lw/ru/the_bedrock_of_fairness/ldy
Eliezer mentions two challenges he often gets, "Friendly to who?" and "Oh, so you get to say what 'Friendly' means."  At the moment I see only one true answer to these questions, which I give below.  If you can propose alternatives in the comments, please do.

I suspect morality is in practice a multiplayer game, so talking about it needs multiple people to be involved.  Therefore, let's imagine a dialogue between A and B.

A: Okay, so you're interested in Friendly AI.  Who will it be Friendly toward?

B: Obviously the people who participate in making the system will decide how to program it, so they will decide who it is Friendly toward.

A: So the people who make the system decide what "Friendly" means?

B: Yes.

A: Then they could decide that it will be Friendly only toward them, or toward White people.  Aren't that sort of selfishness or racism immoral?

B: I can try to answer questions about the world, so if you can define morality so I can do experiments to discover what is moral and what is immoral, I can try to guess the results of those experiments and report them.  What do you mean by morality?

A: I don't know.  If it doesn't mean anything, why do people talk about morality so much?

B: People often profess beliefs to label themselves as members of a group.  So far as I can tell, the belief that some things are moral and other things are not is one of those beliefs.  I don't have any other explanation for why people talk so much about something that isn't subject to experimentation.

A: So if that's what morality is, then it's fundamentally meaningless unless I'm planning out what lies to tell in order to get positive regard from a potential ingroup, or better yet I manage to somehow deceive myself so I can truthfully conform to the consensus morality of my desired ingroup.  If that's all it is, there's no constraint on how a Friendly AI works, right?  Maybe you'll build it and it will be only be Friendly toward B.

B: No, because I can't do it by myself.  Suppose I approach you and say "I'm going to make a Friendly AI that lets me control it and doesn't care about anyone else's preference."  Would you help me?

A: Obviously not.

B: Nobody else would either, so the only way I can unilaterally run the world with an FAI is to create it by myself, and I'm not up to that.  There are a few other proposed notions of Friendlyness that are nonviable for similar reasons. For example, if I approached you and said "I'm going to make a Friendly AI that treats everyone fairly, but I don't want to let anybody inspect how it works." Would you help me?

A: No, because I wouldn't trust you.  I'd assume that you plan to really make it Friendly only toward yourself, lie about it, and then drop the lie once the FAI had enough power that you didn't need the lie any more.

B: Right.  Here's an ethical system that fails another way: "I'll make an FAI that cares about every human equally, no matter what they do."  To keep it simple, let's assume that engineering humans to have strange desires for the purpose of manipulating the FAI is not possible.  Would you help me build that?

A: Well, it fits with my intuitive notion of morality, but it's not clear what incentive I have to help.  If you succeed, I seem to win equally at the end whether I help you or not.  Why bother?

B: Right.  There are several possible fixes for that.  Perhaps if I don't get your help, I won't succeed, and the alternative is that someone else builds it poorly and your quality of life decreases dramatically.  That gives you an incentive to help.

A: Not much of one.  You'll surely need a lot of help, and maybe if all those other people help I won't have to.  Everyone would make the same decision and nobody would help.

B: Right.  I could solve that problem by paying helpers like you money, if I had enough money.  Another option would be to tilt the Friendlyness in the direction of helpers in proportion to how much they help me.

A: But isn't tilting the Friendlyness unfair?

B: Depends.  Do you want things to be fair?

A: Yes, for some intuitive notion of "fairness" I can't easily describe. 

B: So if the AI cares what you want, that will cause it to figure out what you mean by "fair" and tend to make it happen, with that tendency increasing as it tilts more in your favor, right?

A: I suppose so.  No matter what I want, if the AI cares enough about me, it will give me more of what I want, including fairness. 

B: Yes, that's the best idea I have right now.  Here's another alternative: What would happen if we only took action when there's a consensus about how to weight the fairness?

A: Well, 4% of the population are sociopaths.  They, and perhaps others, would make ridiculous demands and prevent any consensus.  Then we'd be waiting forever to build this thing and someone else who doesn't care about consensus would move while we're dithering and make us irrelevant.  Thus we'll have to take action and do something reasonable without having a consensus about what that is.  Since we can't wait for a consensus, maybe it makes sense to proceed now.  So how about it?  Do you need help yet?

B: Nope, I don't know how to make it.

A: Damn.  Hmm, do you think you'll figure it out before everybody else?

B: Probably not.  There are a lot of everybody else.  In particular, business organizations that optimize for profit have a lot of power and have fundamentally inhuman value systems.  I don't see how I can take action before all of them.

A: Me either.  We are so screwed.

New FAI paper: 'Learning What to Value' by Daniel Dewey

11 lukeprog 09 April 2011 12:11AM

Daniel Dewey, 'Learning What to Value'

Abstract: I.J. Good's theory of an "intelligence explosion" predicts that ultraintelligent agents will undergo a process of repeated self-improvement. In the wake of such an event, how well our values are fulfilled will depend on whether these ultraintelligent agents continue to act desirably and as intended. We examine several design approaches, based on AIXI, that could be used to create ultraintelligent agents. In each case, we analyze the design conditions required for a successful, well-behaved ultraintelligent agent to be created. Our main contribution is an examination of value-learners, agents that learn a utility function from experience. We conclude that the design conditions on value-learners are in some ways less demanding than those on other design approaches.

View more: Prev | Next