Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
This puts my AI Box Experiment record at 2 wins and 3 losses.
Humans are examples of general intelligence - the only example we're sure of. Some humans have various degrees of autism (low level versions are quite common in the circles I've moved in), impairing their social skills. Mild autists nevertheless remain general intelligences, capable of demonstrating strong cross domain optimisation. Psychology is full of other examples of mental pathologies that impair certain skills, but nevertheless leave their sufferers as full fledged general intelligences. This general intelligence is not enough, however, to solve their impairments.
Watson triumphed on Jeopardy. AI scientists in previous decades would have concluded that to do so, a general intelligence would have been needed. But that was not the case at all - Watson is blatantly not a general intelligence. Big data and clever algorithms were all that were needed. Computers are demonstrating more and more skills, besting humans in more and more domains - but still no sign of general intelligence. I've recently developed the suspicion that the Turing test (comparing AI with a standard human) could get passed by a narrow AI finely tuned to that task.
The general thread is that the link between narrow skills and general intelligence may not be as clear as we sometimes think. It may be that narrow skills are sufficiently diverse and unique that a mid-level general intelligence may not be able to develop them to a large extent. Or, put another way, an above-human social intelligence may not be able to control a robot body or do decent image recognition. A super-intelligence likely could: ultimately, general intelligence includes the specific skills. But his "ultimately" may take a long time to come.
So the questions I'm wondering about are:
- How likely is it that a general intelligence, above human in some domain not related to AI development, will acquire high level skills in unrelated areas?
- By building high-performance narrow AIs, are we making it much easier for such an intelligence to develop such skills, by co-opting or copying these programs?
There's a recent science fiction story that I can't recall the name of, in which the narrator is traveling somewhere via plane, and the security check includes a brain scan for deviance. The narrator is a pedophile. Everyone who sees the results of the scan is horrified--not that he's a pedophile, but that his particular brain abnormality is easily fixed, so that means he's chosen to remain a pedophile. He's closely monitored, so he'll never be able to act on those desires, but he keeps them anyway, because that's part of who he is.
What would you do in his place?
How will we know if future AI’s (or even existing planners) are making decisions that are bad for humans unless we spell out what we think is unfriendly?
At a machine level the AI would be recursively minimising cost functions to produce the most effective plan of action to achieve the goal, but how will we know if its decision is going to cause harm?
Is there a model or dataset which describes what is friendly to humans? e.g.
0 - running a simulation in a VM
2 - physical robot with vacuum attachment
9 - full control of a plane
0 - selecting a song to play
5 - deciding which section of floor to vacuum
99 - deciding who is an ‘enemy’
9999 - aiming a gun at an ‘enemy’
1 - poor song selected to play, human mildly annoyed
2 - ineffective use of resources (vacuuming the same floor section twice)
99 - killing a human
99999 - killing all humans
This may not be possible to get agreement from all countries/cultures/beliefs, but it is something we should discuss and attempt to get some agreement.
More precisely, if we suppose that sometime in the next 30 years, an artificial intelligence will begin bootstrapping its own code and explode into a super-intelligence, I can give you 2.3 bits of further information on when the Singularity will occur.
Between midnight and 5 AM, Pacific Standard Time.
Furthermore, in the last thread I have asserted that
Rather than my loss making this problem feel harder, I've become convinced that rather than this being merely possible, it's actually ridiculously easy, and a lot easier than most people assume.
It would be quite bad for me to assert this without backing it up with a victory. So I did.
Ps: Bored of regular LessWrong? Check out the LessWrong IRC! We have cake.
Supposing you have been recruited to be the main developer on an AI project. The previous developer died in a car crash and left behind an unfinished AI. It consists of:
A. A thoroughly documented scripting language specification that appears to be capable of representing any real-life program as a network diagram so long as you can provide the following:
A.1. A node within the network whose value you want to maximize or minimize.
A.2. Conversion modules that transform data about the real-world phenomena your network represents into a form that the program can read.
B. Source code from which a program can be compiled that will read scripts in the above language. The program outputs a set of values for each node that will optimize the output (you can optionally specify which nodes can and cannot be directly altered, and the granularity with which they can be altered).
It gives remarkably accurate answers for well-formulated questions. Where there is a theoretical limit to the accuracy of an answer to a particular type of question, its answer usually comes close to that limit, plus or minus some tiny rounding error.
Given that, what is the minimum set of additional features you believe would absolutely have to be implemented before this program can be enlisted to save the world and make everyone live happily forever? Try to be as specific as possible.
Hello less wrong community! This is my first post here, so I know that my brain has not (obviously) been optimised to its fullest, but I've decided to give posting a try.
Recently, someone very close to me has unfortunately passed away, leading to the invitable inner dilemma about death. I don't know how many of you are fans of HPMOR, but the way that Harry's dark side feels about death? Pretty much me around death, dying, etc. however, I've decided to push that to the side for the time being, because that is not a useful of efficient way to think.
I was raised by a religious family, but from the age of about 11 stopped believing in deities and religious services. However, I've always clung to the idea of an afterlife for people, mainly because my brain seems incapable of handling the idea of ceasing to exist. I know that we as a scientific community know that thoughts are electrical impulses, so is there any way of storing them outside of brain matter? Can they exist freely out of brain matter, or could they be stored in a computer chip or AI?
The conflict lies here: is immortality or mortality rational?
Every fibre in my being tells me that death is irrational and wrong. It is irrational for humanity to not try and prevent death. It is irrational for people to not try and bring back people who have died. Because of this, we have lost some of the greatest minds, scientific and artistic, that will probably ever exist. Although the worlds number of talented and intelligent people does not appear to be finite, I find it hard to live in a world where so muh knowledge is being lost every day.
but on the other hand, how would we feed all those people? What if the world's resources run out? As a transhumanist, I believe that we can use science to prevent things like death, but nature wasn't designed to support a population like that.
How do we truly optimise the world: no death and without destruction of the planet?
To avoid repeatly saying the same I'd like to state my opinion on a few topics I expect to be relevant to my future posts here.
You can take it as a baseline or reference for these topics. I do not plan to go into any detail here. I will not state all my reasons or sources. You may ask for separate posts if you are interested. This is really only to provide a context for my comments and posts elsewhere.
If you google me you may find some of my old (but not that off the mark) posts about these position e.g. here:
Now my position on LW topics.
The Simulation Argument and The Great Filter
On The Simulation Argument I definitely go for
"(1) the human species is very likely to go extinct before reaching a “posthuman” stage"
Correspondingly on The Great Filter I go for failure to reach
"9. Colonization explosion".
This is not because I think that humanity is going to self-annihilate soon (though this is a possibility). Instead I hope that humanity will earlier or later come to terms with its planet. My utopia could be like that of the Pacifists (a short story in Analog 5).
Why? Because of essential complexity limits.
This falls into the same range as "It is too expensive to spread physically throughout the galaxy". I know that negative proofs about engineering are notoriously wrong - but that is currently my best guess. Simplified one could say that the low hanging fruits have been taken. I have lots of empirical evidence of this on multiple levels to support this view.
Correspondingly there is no singularity because progress is not limited by raw thinking speed but by effective aggregate thinking speed and physical feedback.
What could prove me wrong?
If a serious discussion would ruin my well-prepared arguments and evidence to shreds (quite possible).
At the very high end a singularity might be possible if a way could be found to simulate physics faster than physics itself.
Basically I don't have the least problem with artificial intelligence or artificial emotioon being possible. Philosophical note: I don't care on what substrate my consciousness runs. Maybe I am simulated.
I think strong AI is quite possible and maybe not that far away.
But I also don't think that this will bring the singularity because of the complexity limits mentioned above. Strong AI will speed up some cognitive tasks with compound interest - but only until the physical feedback level is reached. Or a social feedback level is reached if AI should be designed to be so.
One temporary dystopia that I see is that cognitive tasks are out-sourced to AI and a new round of unemployment drives humans into depression.
- A simplified layered model of the brain; deep learning applied to free inputs (I cancelled this when it became clear that it was too simple and low level and thus computationally inefficient)
- A nested semantic graph approach with propagation of symbol patterns representing thought (only concept; not realized)
I'd really like to try a 'synthesis' of these where microstructure-of-cognition like activation patterns of multiple deep learning networks are combined with a specialized language and pragmatics structure acquisition model a la Unsupervised learning of natural languages. See my opinion on cognition below for more in this line.
What could prove me wrong?
On the low success end if it takes longer than I think it would take me given unlimited funding.
On the high end if I'm wrong with the complexity limits mentioned above.
Humanity might succeed at leaving the planet but at high costs.
With leaving the planet I mean permanently independent of earth but not neccessarily leaving the solar system any time soon (speculating on that is beyond my confidence interval).
I think it more likely that life leaves the planet - that can be
- artificial intelligence with a robotic body - think of curiosity rover 2.0 (most likely).
- intelligent life-forms bred for life in space - think of Magpies those are already smart, small, reproducing fast and have 3D navigation.
- actual humans in suitable protective environment with small autonomous biosperes harvesting asteroids or mars.
- 'cyborgs' - humans altered or bred to better deal with certain problems in space like radiation and missing gravity.
- other - including misc ideas from science fiction (least likely or latest).
For most of these (esp. those depending on breeding) I'd estimate a time-range of a few thousand years.
What could prove me wrong?
If I'm wrong on the singularity aspect too.
If I'm wrong on the timeline I will be long dead likely in any case except (1) which I expect to see in my lifetime.
Cognitive Base of Rationality, Vaguesness, Foundations of Math
How can we as humans create meaning out of noise?
How can we know truth? How does it come that we know that 'snow is white' when snow is white?
Cognitive neuroscience and artificial learning seems to point toward two aspects:
Fuzzy learning aspect
Correlated patterns of internal and external perception are recognized (detected) via multiple specialized layered neural nets (basically). This yields qualia like 'spoon', 'fear', 'running', 'hot', 'near', 'I'. These are basically symbols, but they are vague with respect to meaning because they result from a recognition process that optimizes for matching not correctness or uniqueness.
Semantic learning aspect
Upon the qualia builds the semantic part which takes the qualia and instead of acting directly on them (as is the normal effect for animals) finds patterns in their activation which is not related to immediate perception or action but at most to memory. These may form new qualia/symbols.
The use of these patterns is that the patterns allow to capture concepts which are detached from reality (detached in so far as they do not need a stimulus connected in any way to perception).
Concepts like ('cry-sound' 'fear') or ('digitalis' 'time-forward' 'heartache') or ('snow' 'white') or - and that is probably the demain of humans: (('one' 'successor') 'two') or (('I' 'happy') ('I' 'think')).
The interesting thing is that learning works on these concepts like on the normal neuronal nets too. Thus concepts that are reinforced by positive feedback will stabilize and mutually with them the qualia they derive from (if any) will also stabilize.
For certain pure concepts the usability of the concept hinges not on any external factor (like "how does this help me survive") but on social feedback about structure and the process of the formation of the concepts themselves.
And this is where we arrive at such concepts as 'truth' or 'proposition'.
These are no longer vague - but not because they are represented differently in the brain than other concepts but because they stabilize toward maximized validity (that is stability due to absence of external factors possibly with a speed-up due to social pressure to stabilize). I have written elsewhere that everything that derives its utility not from some external use but from internal consistency could be called math.
And that is why math is so hard for some: If you never gained a sufficient core of self-consistent stabilized concepts and/or the usefulness doesn't derive from internal consistency but from external ("teachers password") usefulness then it will just not scale to more concepts (and the reason why science works at all is that science values internal consistency so highly and there is little more dangerous to science that allowing other incentives).
I really hope that this all makes sense. I haven't summarized this for quite some time.
A few random links that may provide some context:
http://www.blutner.de/NeuralNets/ (this is about the AI context we are talking about)
http://www.blutner.de/NeuralNets/Texts/mod_comp_by_dyn_bin_synf.pdf (research applicable to the above in particular)
http://c2.com/cgi/wiki?LeibnizianDefinitionOfConsciousness (funny description of levels of consciousness)
http://c2.com/cgi/wiki?FuzzyAndSymbolicLearning (old post by me)
Note: Details about the modelling of the semantic part are mostly in my head.
What could prove me wrong?
Well. Wrong is too hard here. This is just my model and it is not really that concrete. Probably a longer discussion with someone more experienced with AI than I am (and there should be many here) might suffice to rip this appart (provided that I'd find time to prepare my model suitably).
God and Religion
I wasn't indoctrinated as a child. My truely loving mother is a baptised christian living it and not being sanctimony. She always hoped that I would receive my epiphany. My father has a scientifically influenced personal christian belief.
I can imagine a God consistent with science on the one hand and on the other hand with free will, soul, afterlife, trinity and the bible (understood as a mix of non-literal word of God and history tale).
I mean, it is not that hard if you can imagine a timeless (simulation of) the universe. If you are god and have whatever plan on earth but empathize with your creations, then it is not hard to add a few more constraints to certain aggregates called existences or 'person lifes'. Constraints that realize free-will in the sense of 'not subject to the whole universe plan satisfaction algorithm'.
Surely not more difficult than consistent time-travel.
And souls and afterlife should be easy to envision for any science fiction reader familiar with super intelligences.
But why? Occams razor applies.
There could be a God. And his promise could be real. And it could be a story seeded by an emphatizing God - but also a 'human' God with his own inconsistencies and moods.
But it also could be that this is all a fairy tale run amok in human brains searching for explanations where there are none. A mass delusion. A fixated meme.
Which is right? It is difficult to put probabilities to stories. I see that I have slowly moved from 50/50 agnosticism to tolerent atheism.
I can't say that I wait for my epiphany. I know too well that my brain will happily find patterns when I let it. But I have encouraged to pray for me.
My epiphanies - the aha feelings of clarity that I did experience - have all been about deeply connected patterns building on other such patterns building on reliable facts mostly scientific in nature.
But I haven't lost my morality. It has deepend and widened. I have become even more tolerant (I hope).
So if God does against all odds exists I hope he will understand my doubts, weight my good deeds and forgive me. You could tag me godless christian.
What could prove me wrong?
On the atheist side I could be moved a bit further by more proofs of religion being a human artifact.
On the theist side there are two possible avenues:
- If I'd have an unsearched for epiphany - a real one where I can't say I was hallucinating but e.g. a major consistent insight or a proof of God.
- If I'd be convinced that the singularity is possible. This is because I'd need to update toward being in a simulation as per Simulation argument option 3. That's because then the next likely explanation for all this god business is actually some imperfect being running the simulation.
Thus I'd like to close with this corollary to the simulation argument:
Arguments for the singularity are also (weak) arguments for theism.
The finance professor John Cochrane recently posted an interesting blog post. The piece is about existential risk in the context of global warming, but it is really a discussion of existential risk generally; many of his points are highly relevant to AI risk.
If we [respond strongly to all low-probability threats], we spend 10 times GDP.
It's a interesting case of framing bias. If you worry only about climate, it seems sensible to pay a pretty stiff price to avoid a small uncertain catastrophe. But if you worry about small uncertain catastrophes, you spend all you have and more, and it's not clear that climate is the highest on the list...
All in all, I'm not convinced our political system is ready to do a very good job of prioritizing outsize expenditures on small ambiguous-probability events.
He also points out that the threat from global warming has a negative beta - i.e. higher future growth rates are likely to be associated with greater risk of global warming, but also the richer our descendants will be. This means both that they will be more able to cope with the threat, and that the damage is less important from a utilitarian point of view. Attempting to stop global warming therefore has positive beta, and therefore requires higher rates of return than simple time-discounting.
It strikes me that this argument applies equally to AI risk, as fruitful artificial intelligence research is likely to be associated with higher economic growth. Moreover:
The economic case for cutting carbon emissions now is that by paying a bit now, we will make our descendants better off in 100 years.
Once stated this way, carbon taxes are just an investment. But is investing in carbon reduction the most profitable way to transfer wealth to our descendants? Instead of spending say $1 trillion in carbon abatement costs, why don't we invest $1 trillion in stocks? If the 100 year rate of return on stocks is higher than the 100 year rate of return on carbon abatement -- likely -- they come out better off. With a gazillion dollars or so, they can rebuild Manhattan on higher ground. They can afford whatever carbon capture or geoengineering technology crops up to clean up our messes.
So should we close down MIRI and invest the funds in an index tracker?
The full post can be found here.
The first time I read Torture vs. Specks about a year ago I didn't read a single comment because I assumed the article was making a point that simply multiplying can sometimes get you the wrong answer to a problem. I seem to have had a different "obvious answer" in mind.
And don't get me wrong, I generally agree with the idea that math can do better than moral intuition in deciding questions of ethics. Take this example from Eliezer’s post Circular Altruism which made me realize that I had assumed wrong:
Suppose that a disease, or a monster, or a war, or something, is killing people. And suppose you only have enough resources to implement one of the following two options:
1. Save 400 lives, with certainty.
2. Save 500 lives, with 90% probability; save no lives, 10% probability.
I agree completely that you pick number 2. For me that was just manifestly obvious, of course the math trumps the feeling that you shouldn't gamble with people’s lives…but then we get to torture vs. dust specks and that just did not compute. So I've read most every argument I could find in favor of torture(there are a great deal and I might have missed something critical), but...while I totally understand the argument (I think) I'm still horrified that people would choose torture over dust specks.
I feel that the way that math predominates intuition begins to fall apart when you the problem compares trivial individual suffering with massive individual suffering, in a way very much analogous to the way in which Pascal’s Mugging stops working when you make the credibility really low but the threat really high. Like this. Except I find the answer to torture vs. dust specks to be much easier...
Let me give some examples to illustrate my point.
Can you imagine Harry killing Hermione because Voldemort threatened to plague all sentient life with one barely noticed dust speck each day for the rest of time? Can you imagine killing your own best friend/significant other/loved one to stop the powers of the Matrix from hitting 3^^^3 sentient beings with nearly inconsquential dust specks? Of course not. No. Snap decision.
Eliezer, would you seriously, given the choice by Alpha, the Alien superintelligence that always carries out its threats, give up all your work, and horribly torture some innocent person, all day for fifty years in the face of the threat of a 3^^^3 insignificant dust specks barely inconveniencing sentient beings? Or be tortured for fifty years to avoid the dust specks?
I realize that this is much more personally specific than the original question: but it is someone's loved one, someone's life. And if you wouldn't make the sacrifice what right do you have to say someone else should make it? I feel as though if you want to argue that torture for fifty years is better than 3^^^3 barely noticeable inconveniences you had better well be willing to make that sacrifice yourself.
And I can’t conceive of anyone actually sacrificing their life, or themselves to save the world from dust specks. Maybe I'm committing the typical mind fallacy in believing that no one is that ridiculously altruistic, but does anyone want an Artificial Intelligence that will potentially sacrifice them if it will deal with the universe’s dust speck problem or some equally widespread and trivial equivalent? I most certainly object to the creation of that AI. An AI that sacrifices me to save two others - I wouldn't like that, certainly, but I still think the AI should probably do it if it thinks their lives are of more value. But dust specks on the other hand....
This example made me immediately think that some sort of rule is needed to limit morality coming from math in the development of any AI program. When the problem reaches a certain low level of suffering and is multiplied it by an unreasonably large number it needs to take some kind of huge penalty because otherwise to an AI it would be vastly preferable the whole of Earth be blown up than 3^^^3 people suffer a mild slap to the face.
And really, I don’t think we want to create an Artificial Intelligence that would do that.
I’m mainly just concerned that some factor be incorporated into the design of any Artificial Intelligence that prevents it from murdering myself and others for trivial but widespread causes. Because that just sounds like a sci-fi book of how superintelligence could go horribly wrong.
I'm putting together a list of short and sweet introductions to the dangers of artificial superintelligence.
My target audience is intelligent, broadly philosophical narrative thinkers, who can evaluate arguments well but who don't know a lot of the relevant background or jargon.
My method is to construct a Sequence mix tape — a collection of short and enlightening texts, meant to be read in a specified order. I've chosen them for their persuasive and pedagogical punchiness, and for their flow in the list. I'll also (separately) list somewhat longer or less essential follow-up texts below that are still meant to be accessible to astute visitors and laypeople.
The first half focuses on intelligence, answering 'What is Artificial General Intelligence (AGI)?'. The second half focuses on friendliness, answering 'How can we make AGI safe, and why does it matter?'. Since the topics of some posts aren't obvious from their titles, I've summarized them using questions they address.
Part I. Building intelligence.
1. Power of Intelligence. Why is intelligence important?
2. Ghosts in the Machine. Is building an intelligence from scratch like talking to a person?
3. Artificial Addition. What can we conclude about the nature of intelligence from the fact that we don't yet understand it?
4. Adaptation-Executers, not Fitness-Maximizers. How do human goals relate to the 'goals' of evolution?
5. The Blue-Minimizing Robot. What are the shortcomings of thinking of things as 'agents', 'intelligences', or 'optimizers' with defined values/goals/preferences?
Part II. Intelligence explosion.
6. Optimization and the Singularity. What is optimization? As optimization processes, how do evolution, humans, and self-modifying AGI differ?
7. Efficient Cross-Domain Optimization. What is intelligence?
8. The Design Space of Minds-In-General. What else is universally true of intelligences?
9. Plenty of Room Above Us. Why should we expect self-improving AGI to quickly become superintelligent?
Part III. AI risk.
10. The True Prisoner's Dilemma. What kind of jerk would Defect even knowing the other side Cooperated?
11. Basic AI drives. Why are AGIs dangerous even when they're indifferent to us?
12. Anthropomorphic Optimism. Why do we think things we hope happen are likelier?
13. The Hidden Complexity of Wishes. How hard is it to directly program an alien intelligence to enact my values?
14. Magical Categories. How hard is it to program an alien intelligence to reconstruct my values from observed patterns?
15. The AI Problem, with Solutions. How hard is it to give AGI predictable values of any sort? More generally, why does AGI risk matter so much?
Part IV. Ends.
16. Could Anything Be Right? What do we mean by 'good', or 'valuable', or 'moral'?
17. Morality as Fixed Computation. Is it enough to have an AGI improve the fit between my preferences and the world?
18. Serious Stories. What would a true utopia be like?
19. Value is Fragile. If we just sit back and let the universe do its thing, will it still produce value? If we don't take charge of our future, won't it still turn out interesting and beautiful on some deeper level?
20. The Gift We Give To Tomorrow. In explaining value, are we explaining it away? Are we making our goals less important?
All of the above were written by Eliezer Yudkowsky, with the exception of The Blue-Minimizing Robot (by Yvain), Plenty of Room Above Us and The AI Problem (by Luke Muehlhauser), and Basic AI Drives (a wiki collaboration). Seeking a powerful conclusion, I ended up making a compromise between Eliezer's original The Gift We Give To Tomorrow and Raymond Arnold's Solstice Ritual Book version. It's on the wiki, so you can further improve it with edits.
- Three Worlds Collide (Normal), by Eliezer Yudkowsky
- a short story vividly illustrating how alien values can evolve.
- So You Want to Save the World, by Luke Muehlhauser
- an introduction to the open problems in Friendly Artificial Intelligence.
- Intelligence Explosion FAQ, by Luke Muehlhauser
- a broad overview of likely misconceptions about AI risk.
- The Singularity: A Philosophical Analysis, by David Chalmers
- a detailed but non-technical argument for expecting intelligence explosion, with an assessment of the moral significance of synthetic human and non-human intelligence.
I'm posting this to get more feedback for improving it, to isolate topics for which we don't yet have high-quality, non-technical stand-alone introductions, and to reintroduce LessWrongers to exceptionally useful posts I haven't seen sufficiently discussed, linked, or upvoted. I'd especially like feedback on how the list I provided flows as a unit, and what inferential gaps it fails to address. My goals are:
A. Via lucid and anti-anthropomorphic vignettes, to explain AGI in a way that encourages clear thought.
B. Via the Five Theses, to demonstrate the importance of Friendly AI research.
C. Via down-to-earth meta-ethics, humanistic poetry, and pragmatic strategizing, to combat any nihilisms, relativisms, and defeatisms that might be triggered by recognizing the possibility (or probability) of Unfriendly AI.
D. Via an accessible, substantive, entertaining presentation, to introduce the raison d'être of LessWrong to sophisticated newcomers in a way that encourages further engagement with LessWrong's community and/or content.
What do you think? What would you add, remove, or alter?
I recently gave a talk at the IARU Summer School on the Ethics of Technology.
In it, I touched on many of the research themes of the FHI: the accuracy of predictions, the limitations and biases of predictors, the huge risks that humanity may face, the huge benefits that we may gain, and the various ethical challenges that we'll face in the future.
Nothing really new for anyone who's familiar with our work, but some may enjoy perusing it.
A stub on a point that's come up recently.
If I owned a paperclip factory, and casually told my foreman to improve efficiency while I'm away, and he planned a takeover of the country, aiming to devote its entire economy to paperclip manufacturing (apart from the armament factories he needed to invade neighbouring countries and steal their iron mines)... then I'd conclude that my foreman was an idiot (or being wilfully idiotic). He obviously had no idea what I meant. And if he misunderstood me so egregiously, he's certainly not a threat: he's unlikely to reason his way out of a paper bag, let alone to any position of power.
If I owned a paperclip factory, and casually programmed my superintelligent AI to improve efficiency while I'm away, and it planned a takeover of the country... then I can't conclude that the AI is an idiot. It is following its programming. Unlike a human that behaved the same way, it probably knows exactly what I meant to program in. It just doesn't care: it follows its programming, not its knowledge about what its programming is "meant" to be (unless we've successfully programmed in "do what I mean", which is basically the whole of the challenge). We can't therefore conclude that it's incompetent, unable to understand human reasoning, or likely to fail.
We can't reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can't deduce that they're idiots.
The theory of comparative advantage says that you should trade with people, even if they are worse than you at everything (ie even if you have an absolute advantage). Some have seen this idea as a reason to trust powerful AIs.
For instance, suppose you can make a hamburger by using 10 000 joules of energy. You can also make a cat video for the same cost. The AI, on the other hand, can make hamburgers for 5 joules each and cat videos for 20.
Then you both can gain from trade. Instead of making a hamburger, make a cat video instead, and trade it for two hamburgers. You've got two hamburgers for 10 000 joules of your own effort (instead of 20 000), and the AI has got a cat video for 10 joules of its own effort (instead of 20). So you both want to trade, and everything is fine and beautiful and many cat videos and hamburgers will be made.
Except... though the AI would prefer to trade with you rather than not trade with you, it would much, much prefer to dispossess you of your resources and use them itself. With the energy you wasted on a single cat video, it could have produced 500 of them! If it values these videos, then it is desperate to take over your stuff. Its absolute advantage makes this too tempting.
Only if its motivation is properly structured, or if it expected to lose more, over the course of history, by trying to grab your stuff, would it desist. Assuming you could make a hundred cat videos a day, and the whole history of the universe would only run for that one day, the AI would try and grab your stuff even if it thought it would only have one chance in fifty thousand of succeeding. As the history of the universe lengthens, or the AI becomes more efficient, then it would be willing to rebel at even more ridiculous odds.
So if you already have guarantees in place to protect yourself, then comparative advantage will make the AI trade with you. But if you don't, comparative advantage and trade don't provide any extra security. The resources you waste are just too valuable to the AI.
EDIT: For those who wonder how this compares to trade between nations: it's extremely rare for any nation to have absolute advantages everywhere (especially this extreme). If you invade another nation, most of their value is in their infrastructure and their population: it takes time and effort to rebuild and co-opt these. Most nations don't/can't think long term (it could arguably be in US interests over the next ten million years to start invading everyone - but "the US" is not a single entity, and doesn't think in terms of "itself" in ten million years), would get damaged in a war, and are risk averse. And don't forget the importance of diplomatic culture and public opinion: even if it was in the US's interests to invade the UK, say, "it" would have great difficulty convincing its elites and its population to go along with this.
In 1932, Stanley Baldwin, prime minister of the largest empire the world had ever seen, proclaimed that "The bomber will always get through". Backed up by most of the professional military opinion of the time, by the experience of the first world war, and by reasonable extrapolations and arguments, he laid out a vision of the future where the unstoppable heavy bomber would utterly devastate countries if a war started. Deterrence - building more bombers yourself to threaten complete retaliation - seemed the only counter.
And yet, things didn't turn out that way. Against all past trends, the light fighter plane surpassed the heavily armed bomber in aerial combat, the development of radar changed the strategic balance, and cities and industry proved much more resilient to bombing than anyone had a right to suspect.
Could anyone have predicted these changes ahead of time? Most probably, no. All of these ran counter to what was known and understood, (and radar was a completely new and unexpected development). What could and should have been predicted, though, was that something would happen to weaken the impact of the all-conquering bomber. The extreme predictions would be unrealistic; frictions, technological changes, changes in military doctrine and hidden, unknown factors, would undermine them.
This is what I call the "generalised friction" argument. Simple predictive models, based on strong models or current understanding, will likely not succeed as well as expected: there will likely be delays, obstacles, and unexpected difficulties along the way.
I am, of course, thinking of AI predictions here, specifically of the Omohundro-Yudkowsky model of AI recursive self-improvements that rapidly reach great power, with convergent instrumental goals that make the AI into a power-hungry expected utility maximiser. This model I see as the "supply and demand curve" of AI prediction: too simple to be true in the form described.
But the supply and demand curves are generally approximately true, especially over the long term. So this isn't an argument that the Omohundro-Yudkowsky model is wrong, but that it will likely not happen as flawlessly as described. Ultimately, the "bomber will always get through" turned out to be true: but only in the form of the ICBM. If you take the old arguments and replace "bomber" with "ICBM", you end with strong and accurate predictions. So "the AI may not foom in the manner and on the timescales described" is not saying "the AI won't foom".
Also, it should be emphasised that this argument is strictly about our predictive ability, and does not say anything about the capacity or difficulty of AI per se.
Suppose you read a convincing-seeming argument by Karl Marx, and get swept up in the beauty of the rhetoric and clarity of the exposition. Or maybe a creationist argument carries you away with its elegance and power. Or maybe you've read Eliezer's take on AI risk, and, again, it seems pretty convincing.
How could you know if these arguments are sound? Ok, you could whack the creationist argument with the scientific method, and Karl Marx with the verdict of history, but what would you do if neither was available (as they aren't available when currently assessing the AI risk argument)? Even if you're pretty smart, there's no guarantee that you haven't missed a subtle logical flaw, a dubious premise or two, or haven't got caught up in the rhetoric.
One thing should make you believe the argument more strongly: and that's if the argument has been repeatedly criticised, and the criticisms have failed to puncture it. Unless you have the time to become an expert yourself, this is the best way to evaluate arguments where evidence isn't available or conclusive. After all, opposite experts presumably know the subject intimately, and are motivated to identify and illuminate the argument's weaknesses.
If counter-arguments seem incisive, pointing out serious flaws, or if the main argument is being continually patched to defend it against criticisms - well, this is strong evidence that main argument is flawed. Conversely, if the counter-arguments continually fail, then this is good evidence that the main argument is sound. Not logical evidence - a failure to find a disproof doesn't establish a proposition - but good Bayesian evidence.
In fact, the failure of counter-arguments is much stronger evidence than whatever is in the argument itself. If you can't find a flaw, that just means you can't find a flaw. If counter-arguments fail, that means many smart and knowledgeable people have thought deeply about the argument - and haven't found a flaw.
And as far as I can tell, critics have constantly failed to counter the AI risk argument. To pick just one example, Holden recently provided a cogent critique of the value of MIRI's focus on AI risk reduction. Eliezer wrote a response to it (I wrote one as well). The core of Eliezer's and my response wasn't anything new; they were mainly a rehash of what had been said before, with a different emphasis.
And most responses to critics of the AI risk argument take this form. Thinking for a short while, one can rephrase essentially the same argument, with a change in emphasis to take down the criticism. After a few examples, it becomes quite easy, a kind of paint-by-numbers process of showing that the ideas the critic has assumed, do not actually make the AI safe.
You may not agree with my assessment of the critiques, but if you do, then you should adjust your belief in AI risk upwards. There's a kind of "conservation of expected evidence" here: if the critiques had succeeded, you'd have reduced the probability of AI risk, so their failure must push you in the opposite direction.
In my opinion, the strength of the AI risk argument derives 30% from the actual argument, and 70% from the failure of counter-arguments. This would be higher, but we haven't yet seen the most prominent people in the AI community take a really good swing at it.
Abstract: The study of cultural evolution has drawn much of its momentum from academic areas far removed from human and animal psychology, specially regarding the evolution of cooperation. Game theoretic results and parental investment theory come from economics, kin selection models from biology, and an ever growing amount of models describing the process of cultural evolution in general, and the evolution of altruism in particular come from mathematics. Even from Artificial Intelligence interest has been cast on how to create agents that can communicate, imitate and cooperate. In this article I begin to tackle the 'why?' question. By trying to retrospectively make sense of the convergence of all these fields, I contend that further refinements in these fields should be directed towards understanding how to create environmental incentives fostering cooperation.
We need systems that are wiser than we are. We need institutions and cultural norms that make us better than we tend to be. It seems to me that the greatest challenge we now face is to build them. - Sam Harris, 2013, The Power Of Bad Incentives
2) Cultures evolve
Culture is perhaps the most remarkable outcome of the evolutionary algorithm (Dennett, 1996) so far. It is the cradle of most things we consider humane - that is, typically human and valuable - and it surrounds our lives to the point that we may be thought of as creatures made of culture even more than creatures of bone and flesh (Hofstadter, 2007; Dennett, 1992). The appearance of our cultural complexity has relied on many associated capacities, among them:
1) The ability to observe, be interested by, and go nearby an individual doing something interesting, an ability we share with norway rats, crows, and even lemurs (Galef & Laland, 2005).
2) Ability to learn from and scrounge the food of whoever knows how to get food, shared by capuchin monkeys (Ottoni et al, 2005).
3) Ability to tolerate learners, to accept learners, and to socially learn, probably shared by animals as diverse as fish, finches and Fins (Galef & Laland, 2005).
4) Understanding and emulating other minds - Theory of Mind- empathizing, relating, perhaps re-framing an experience as one's own, shared by chimpanzees, dogs, and at least some cetaceans (Rendella & Whitehead, 2001).
5) Learning the program level description of the action of others, for which the evidence among other animals is controversial (but see Cantor & Whitehead, 2013). And finally...
6) Sharing intentions. Intricate understanding of how two minds can collaborate with complementary tasks to achieve a mutually agreed goal (Tomasello et al, 2005).
Irrespective of definitional disputes around the true meaning of the word "culture" (which doesn't exist, see e.g. Pinker, 2007 pg115; Yudkowsky 2008A), each of these is more cognitively complex than its predecessor, and even (1) is sufficient for intra-specific non-environmental, non-genetic behavioral variation, which I will call "culture" here, whoever it may harm.
By transitivity, (2-6) allow the development of culture. It is interesting to notice that tool use, frequently but falsely cited as the hallmark of culture, is ubiquitously equiprobable in the animal kingdom. A graph showing, per biological family, which species shows tool use gives us a power law distribution, whose similarity with the universal prior will help in understanding that being from a family where a species uses tools tells us very little about a specie's own tool use (Michael Haslam, personal conversation).
Once some of those abilities are available, and given an amount of environmental facilities, need, and randomness, cultures begin to form. Occasionally, so do more developed traditions. Be it by imitation, program level imitation, goal emulation or intention sharing, information is transmitted between agents giving rise to elements sufficient to constitute a primeval Darwinian soup. That is, entities form such that they exhibit 1)Variation 2)Heredity or replication 3)Differential fitness (Dennett, 1996). In light of the article Five Misunderstandings About Cultural Evolution (Henrich, Boyd & Richerson, 2008) we can improve Dennett's conditions for the evolutionary algorithm as 1)Discrete or continuous variation 2)Heredity, replication, or less faithful replication plus content attractors 3)Differential fitness. Once this set of conditions is met, an evolutionary algorithm, or many, begin to carve their optimizing paws into whatever surpassed the threshold for long enough. Cultures, therefore, evolve.
The intricacies of cultural evolution and mathematical and computational models of how cultures evolve have been the subject of much interdisciplinary research, for an extensive account of human culture see Not By Genes Alone (Richerson & Boyd, 2005). For computational models of social evolution, there is work by Mesoudi, Novak, and others e.g. (Hauert et al, 2007). For mathematical models, the aptly named Mathematical models of social evolution: A guide for the perplexed by McElrath and Rob Boyd (2007) makes the textbook-style walk-through. For animal culture, see (Laland & Galef, 2009).
Cultural evolution satisfies David Deutsch's criterion for existence, it kicks back, it satisfies the evolutionary equivalent of the condition posed by the Quine-Putnam Indispensability argument in mathematics, i.e. it is a sine qua non condition for understanding how the World works nomologically. It is falsifiable to Popperian content, and it inflates the Worlds ontology a little, by inserting a new kind of "replicator", the meme. Contrary to what happened on the internet, the name 'meme' has lost much of it's appeal within cultural evolution theorists, and "memetics" is considered by some to refer only to the study of memes as monolithic atomic high fidelity replicators, which would make the theory obsolete. This has created the following conundrum: the name 'meme' remains by far the most well known one to speak of "that which evolves culturally" within, and specially outside, the specialist arena. Further, the niche occupied by the word 'meme' is so conceptually necessary within the area to communicate and explain that it is frequently put under scare quotes, or some other informal excuse. In fact, as argued by Tim Tyler - who frequently posts here - in the very sharp Memetics (2010), there are nearly no reasons to try to abandon the 'meme' meme, and nearly all reasons (practicality, Qwerty reasons, mnemonics) to keep it. To avoid contradicting the evidence ever since Dawkins first coined the term, I suggest we must redefine Meme as an attractor in cultural evolution (dual-inheritance) whose development over time structurally mimics to a significant extent the discrete behavior of genes, frequently coinciding with the smallest unit of cultural replication. The definition is long, but the idea is simple: Memes are not the best analogues of genes because they are discrete units that replicate just like genes, but because they are continuous conceptual clusters being attracted to a point in conceptual space whose replication is just like that of genes. Even more simply, memes are the mathematically closest things to genes in cultural evolution. So the suggestion here is for researchers of dual-inheritance and cultural evolution to take off the scare quotes of our memes and keep business as usual.
The evolutionary algorithm has created a new attractor-replicator, the meme, it didn't privilege with it any specific families in the biological trees and it ended up creating a process of cultural-genetic coevolution known as dual-inheritance. This process has been studied in ever more quantified ways by primatologists, behavioral ecologists, population biologists, anthropologists, ethologists, sociologists, neuroscientists and even philosophers. I've shown at least six distinct abilities which helped scaffold our astounding level of cultural intricacy, and some animals who share them with us. We will now take a look at the evolution of cooperation, collaboration, altruism, moral behavior, a sub-area of cultural evolution that saw an explosion of interest and research during the last decade, with publications (most from the last 4 years) such as The Origins of Morality, Supercooperators, Good and Real, The Better Angels of Our Nature, Non-Zero, The Moral Animal, Primates and Philosophers, The Age of Empathy, Origins of Altruism and Cooperation, The Altruism Equation, Altruism in Humans, Cooperation and Its Evolution, Moral Tribes, The Expanding Circle, The Moral Landscape.
3) Cooperation evolves
Shortly describe why and show some inequations under which cooperation is an equelibrium, or at least an Evolutionarily Stable Strategy.
4) The complexity of cultural items doesn't undermine the validity of mathematical models.
4.1) Cognitive attractors and biases substitute for memes discreteness
The math becomes equivalent.
4.2) Despite the Unilateralist Curse and the Tragedy of the Commons, dyadic interaction models help us understand large scale cooperation
Once we know these two failure modes, dyadic iterated (or reputation-sensitive) interaction is close enough.
5) From Monkeys to Apes to Humans to Transhumans to AIs, the ranges of achievable altruistic skill.
Possible modes of being altruistic. Graph like Bostrom's. Second and third order punishment and cooperation. Newcomb-like signaling problems within AI.
6) Unfit for the Future: the need for greater altruism.
We fail and will remain failing in Tragedy of the Commons problems unless we change our nature.
7) From Science, through Philosophy, towards Engineering: the future of studies of altruism.
Philosophy: Existential Risk prevention through global coordination and cooperation prior to technical maturity. Engineering Humans: creating enhancements and changing incentives. Engineering AI's: making them better and realer.
8) A different kind of Moral Landscape
Like Sam Harris's one, except comparing not how much a society approaches The Good Life (Moral Landscape pg15), but how much it fosters altruistic behaviour.
I haven't written yet, so I don't have any!
Bibliography (Only of the part already written, obviously):
Cantor, M., & Whitehead, H. (2013). The interplay between social networks and culture: theoretically and among whales and dolphins. Philosophical Transactions of the Royal Society B: Biological Sciences, 368(1618).
Dennett, D. C. (1996). Darwin's dangerous idea: Evolution and the meanings of life (No. 39). Simon & Schuster.
Dennett, D. C. (1992). The self as a center of narrative gravity. Self and consciousness: Multiple perspectives.
Galef Jr, B. G., & Laland, K. N. (2005). Social learning in animals: empirical studies and theoretical models. Bioscience, 55(6), 489-499.
Hauert, C., Traulsen, A., Brandt, H., Nowak, M. A., & Sigmund, K. (2007). Via freedom to coercion: the emergence of costly punishment. science, 316(5833), 1905-1907.
Henrich, J., Boyd, R., & Richerson, P. J. (2008). Five misunderstandings about cultural evolution. Human Nature, 19(2), 119-137.
Hofstadter, D. R. (2007). I am a Strange Loop. Basic Books
McElreath, R., & Boyd, R. (2007). Mathematical models of social evolution: A guide for the perplexed. University of Chicago Press.
Ottoni, E. B., de Resende, B. D., & Izar, P. (2005). Watching the best nutcrackers: what capuchin monkeys (Cebus apella) know about others’ tool-using skills. Animal cognition, 8(4), 215-219.
Persson, I., & Savulescu, J. Unfit for the Future: The Need for Moral Enhancement Oxford: Oxford University Press, 2012 ISBN 978-0199653645 (HB)£ 21.00. 160pp. On the brink of civil war, Abraham Lincoln stood on the steps of the US Capitol and appealed.
Pinker, S. (2007). The stuff of thought: Language as a window into human nature. Viking Adult.
Rendella, L., & Whitehead, H. (2001). Culture in whales and dolphins.Behavioral and Brain Sciences, 24, 309-382.
Richardson, P. J., & Boyd, R. (2005). Not by genes alone. University of Chicago Press.
Tyler, T. (2011). Memetics: Memes and the Science of Cultural Evolution. Tim Tyler.
Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition.Behavioral and brain sciences, 28(5), 675-690.
Yudkowsky, E. (2008A). 37 ways words can be wrong. Available at http://lesswrong.com/lw/od/37_ways_that_words_can_be_wrong/
Molecular nanotechnology, or MNT for those of you who love acronyms, seems to be a fairly common trope on LW and related literature. It's not really clear to me why. In many of the examples of "How could AI's help us" or "How could AI's rise to power" phrases like "cracks protein folding" or "making a block of diamond is just as easy as making a block of coal" are thrown about in ways that make me very very uncomfortable. Maybe it's all true, maybe I'm just late to the transhumanist party and the obviousness of this information was with my invitation that got lost in the mail, but seeing all the physics swept under the rug like that sets off every crackpot alarm I have.
I must post the disclaimer that I have done a little bit of materials science, so maybe I'm just annoyed that you're making me obsolete, but I don't see why this particular possible future gets so much attention. Let us assume that a smarter than human AI will be very difficult to control and represents a large positive or negative utility for the entirety of the human race. Even given that assumption, it's still not clear to me that MNT is a likely element of the future. It isn't clear to me than MNT is physically practical. I don't doubt that it can be done. I don't doubt that very clever metastable arrangements of atoms with novel properties can be dreamed up. Indeed, that's my day job, but I have a hard time believing the only reason you can't make a nanoassembler capable of arbitrary manipulations out of a handful of bottles you ordered from Sigma-Aldrich is because we're just not smart enough. Manipulating individuals atoms means climbing huge binding energy curves, it's an enormously steep, enormously complicated energy landscape, and the Schrodinger Equation scales very very poorly as you add additional particles and degrees of freedom. Building molecular nanotechnology seems to me to be roughly equivalent to being able to make arbitrary lego structures by shaking a large bin of lego in a particular way while blindfolded. Maybe a super human intelligence is capable of doing so, but it's not at all clear to me that it's even possible.
I assume the reason than MNT is added to a discussion on AI is because we're trying to make the future sound more plausible via adding burdensome details. I understand that AI and MNT is less probable than AI or MNT alone, but that both is supposed to sound more plausible. This is precisely where I have difficulty. I would estimate the probability of molecular nanotechnology (in the form of programmable replicators, grey goo, and the like) as lower than the probability of human or super human level AI. I can think of all sorts of objection to the former, but very few objections to the latter. Including MNT as a consequence of AI, especially including it without addressing any of the fundamental difficulties of MNT, I would argue harms the credibility of AI researchers. It makes me nervous about sharing FAI literature with people I work with, and it continues to bother me.
I am particularly bothered by this because it seems irrelevant to FAI. I'm fully convinced that a smarter than human AI could take control of the Earth via less magical means, using time tested methods such as manipulating humans, rigging elections, making friends, killing its enemies, and generally only being a marginally more clever and motivated than a typical human leader. A smarter than human AI could out-manipulate human institutions and out-plan human opponents with the sort of ruthless efficiency that modern computers beat humans in chess. I don't think convincing people that smarter than human AI's have enormous potential for good and evil is particularly difficult, once you can get them to concede that smarter than human AIs are possible. I do think that waving your hands and saying super-intelligence at things that may be physically impossible makes the whole endeavor seem less serious. If I had read the chain of reasoning smart computer->nanobots before I had built up a store of good-will from reading the Sequences, I would have almost immediately dismissed the whole FAI movement a bunch of soft science fiction, and it would have been very difficult to get me to take a second look.
Put in LW parlance, suggesting things not known to be possible by modern physics without detailed explanations puts you in the reference class "people on the internet who have their own ideas about physics". It didn't help, in my particular case, that one of my first interactions on LW was in fact with someone who appears to have their own view about a continuous version of quantum mechanics.
And maybe it's just me. Maybe this did not bother anyone else, and it's an incredible shortcut for getting people to realize just how different a future a greater than human intelligence makes possible and there is no better example. It does alarm me though, because I think that physicists and the kind of people who notice and get uncomfortable when you start invoking magic in your explanations may be the kind of people FAI is trying to attract.
In general and across all instances I can think of so far, I do not agree with the part of your futurological forecast in which you reason, "After event W happens, everyone will see the truth of proposition X, leading them to endorse Y and agree with me about policy decision Z."
Example 1: "After a 2-year-old mouse is rejuvenated to allow 3 years of additional life, society will realize that human rejuvenation is possible, turn against deathism as the prospect of lifespan / healthspan extension starts to seem real, and demand a huge Manhattan Project to get it done." (EDIT: This has not happened, and the hypothetical is mouse healthspan extension, not anything cryonic. It's being cited because this is Aubrey de Grey's reasoning behind the Methuselah Mouse Prize.)
Alternative projection: Some media brouhaha. Lots of bioethicists acting concerned. Discussion dies off after a week. Nobody thinks about it afterward. The rest of society does not reason the same way Aubrey de Grey does.
Example 2: "As AI gets more sophisticated, everyone will realize that real AI is on the way and then they'll start taking Friendly AI development seriously."
Alternative projection: As AI gets more sophisticated, the rest of society can't see any difference between the latest breakthrough reported in a press release and that business earlier with Watson beating Ken Jennings or Deep Blue beating Kasparov; it seems like the same sort of press release to them. The same people who were talking about robot overlords earlier continue to talk about robot overlords. The same people who were talking about human irreproducibility continue to talk about human specialness. Concern is expressed over technological unemployment the same as today or Keynes in 1930, and this is used to fuel someone's previous ideological commitment to a basic income guarantee, inequality reduction, or whatever. The same tiny segment of unusually consequentialist people are concerned about Friendly AI as before. If anyone in the science community does start thinking that superintelligent AI is on the way, they exhibit the same distribution of performance as modern scientists who think it's on the way, e.g. Hugo de Garis, Ben Goertzel, etc.
Consider the situation in macroeconomics. When the Federal Reserve dropped interest rates to nearly zero and started printing money via quantitative easing, we had some people loudly predicting hyperinflation just because the monetary base had, you know, gone up by a factor of 10 or whatever it was. Which is kind of understandable. But still, a lot of mainstream economists (such as the Fed) thought we would not get hyperinflation, the implied spread on inflation-protected Treasuries and numerous other indicators showed that the free market thought we were due for below-trend inflation, and then in actual reality we got below-trend inflation. It's one thing to disagree with economists, another thing to disagree with implied market forecasts (why aren't you betting, if you really believe?) but you can still do it sometimes; but when conventional economics, market forecasts, and reality all agree on something, it's time to shut up and ask the economists how they knew. I had some credence in inflationary worries before that experience, but not afterward... So what about the rest of the world? In the heavily scientific community you live in, or if you read econblogs, you will find that a number of people actually have started to worry less about inflation and more about sub-trend nominal GDP growth. You will also find that right now these econblogs are having worry-fits about the Fed prematurely exiting QE and choking off the recovery because the elderly senior people with power have updated more slowly than the econblogs. And in larger society, if you look at what happens when Congresscritters question Bernanke, you will find that they are all terribly, terribly concerned about inflation. Still. The same as before. Some econblogs are very harsh on Bernanke because the Fed did not print enough money, but when I look at the kind of pressure Bernanke was getting from Congress, he starts to look to me like something of a hero just for following conventional macroeconomics as much as he did.
That issue is a hell of a lot more clear-cut than the medical science for human rejuvenation, which in turn is far more clear-cut ethically and policy-wise than issues in AI.
After event W happens, a few more relatively young scientists will see the truth of proposition X, and the larger society won't be able to tell a damn difference. This won't change the situation very much, there are probably already some scientists who endorse X, since X is probably pretty predictable even today if you're unbiased. The scientists who see the truth of X won't all rush to endorse Y, any more than current scientists who take X seriously all rush to endorse Y. As for people in power lining up behind your preferred policy option Z, forget it, they're old and set in their ways and Z is relatively novel without a large existing constituency favoring it. Expect W to be used as argument fodder to support conventional policy options that already have political force behind them, and for Z to not even be on the table.
I've just been interviewed by Radio-Canada (in French) for their program "Dessine moi un Dimanche". There really wasn't enough time (the interview apparently lasted nine minutes; it felt like two), but I managed to touch upon some of the technology risks of the coming century (including AI).
The segment can be found here: http://www.radio-canada.ca/emissions/dessine_moi_un_dimanche/2012-2013/chronique.asp?idChronique=295886
Kevin Drum has an article in Mother Jones about AI and Moore's Law:
THIS IS A STORY ABOUT THE FUTURE. Not the unhappy future, the one where climate change turns the planet into a cinder or we all die in a global nuclear war. This is the happy version. It's the one where computers keep getting smarter and smarter, and clever engineers keep building better and better robots. By 2040, computers the size of a softball are as smart as human beings. Smarter, in fact. Plus they're computers: They never get tired, they're never ill-tempered, they never make mistakes, and they have instant access to all of human knowledge.
The result is paradise. Global warming is a problem of the past because computers have figured out how to generate limitless amounts of green energy and intelligent robots have tirelessly built the infrastructure to deliver it to our homes. No one needs to work anymore. Robots can do everything humans can do, and they do it uncomplainingly, 24 hours a day. Some things remain scarce—beachfront property in Malibu, original Rembrandts—but thanks to super-efficient use of natural resources and massive recycling, scarcity of ordinary consumer goods is a thing of the past. Our days are spent however we please, perhaps in study, perhaps playing video games. It's up to us.
Although he only mentions consumer goods, Drum presumably means that scarcity will end for services and consumer goods. If scarcity only ended for consumer goods, people would still have to work (most jobs are currently in the services economy).
Drum explains that our linear-thinking brains don't intuitively grasp exponential systems like Moore's law.
Suppose it's 1940 and Lake Michigan has (somehow) been emptied. Your job is to fill it up using the following rule: To start off, you can add one fluid ounce of water to the lake bed. Eighteen months later, you can add two. In another 18 months, you can add four ounces. And so on. Obviously this is going to take a while.
By 1950, you have added around a gallon of water. But you keep soldiering on. By 1960, you have a bit more than 150 gallons. By 1970, you have 16,000 gallons, about as much as an average suburban swimming pool.
At this point it's been 30 years, and even though 16,000 gallons is a fair amount of water, it's nothing compared to the size of Lake Michigan. To the naked eye you've made no progress at all.
So let's skip all the way ahead to 2000. Still nothing. You have—maybe—a slight sheen on the lake floor. How about 2010? You have a few inches of water here and there. This is ridiculous. It's now been 70 years and you still don't have enough water to float a goldfish. Surely this task is futile?
But wait. Just as you're about to give up, things suddenly change. By 2020, you have about 40 feet of water. And by 2025 you're done. After 70 years you had nothing. Fifteen years later, the job was finished.
He also includes this nice animated .gif which illustrates the principle very clearly.
Drum continues by talking about possible economic ramifications.
Until a decade ago, the share of total national income going to workers was pretty stable at around 70 percent, while the share going to capital—mainly corporate profits and returns on financial investments—made up the other 30 percent. More recently, though, those shares have started to change. Slowly but steadily, labor's share of total national income has gone down, while the share going to capital owners has gone up. The most obvious effect of this is the skyrocketing wealth of the top 1 percent, due mostly to huge increases in capital gains and investment income.
Drum says the share of (US) national income going to workers was stable until about a decade ago. I think the graph he links to shows the worker's share has been declining since approximately the late 1960s/early 1970s. This is about the time US immigration levels started increasing (which raises returns to capital and lowers native worker wages).
The rest of Drum's piece isn't terribly interesting, but it is good to see mainstream pundits talking about these topics.
Here's a piece by Mark Piesing in Wired UK about the difficulty and challenges in predicting AI. It covers a lot of our (Stuart Armstrong, Kaj Sotala and Seán Óh Éigeartaigh) research into AI prediction, along with Robin Hanson's response. It will hopefully cause people to look more deeply into our work, as published online, in the Pilsen Beyond AI conference proceedings, and forthcoming as "The errors, insights and lessons of famous AI predictions and what they mean for the future".
- There's a decent chance that the intelligence of a self-improving AGI will grow in a relatively smooth exponential or sub-exponential way, not super-exponentially or with large jump discontinuities.
- If this is the case, then an AGI whose effective intelligence matched that of the world's combined AI researchers would make AI progress at the rate they do, taking decades to double its own intelligence.
- The risk that the first successful AGI will quickly monopolize many industries, or quickly hack many of the computers connected to the internet, seems worth worrying about. In either case, the AGI would likely end up using the additional computing power it gained to self-modify so it was superintelligent.
- AI boxing could mitigate both of these risks greatly.
- If hard takeoff could be impossible, it might be best to assume this case and concentrate our resources on ensuring a safe soft takeoff, given that the prospects for a safe hard takeoff look grim.
Takeoff models discussed in the Hanson-Yudkowsky debate
The supercritical nuclear chain reaction model
Yudkowsky alludes to this model repeatedly, starting in this post:
When a uranium atom splits, it releases neutrons - some right away, some after delay while byproducts decay further. Some neutrons escape the pile, some neutrons strike another uranium atom and cause an additional fission. The effective neutron multiplication factor, denoted k, is the average number of neutrons from a single fissioning uranium atom that cause another fission...
It might seem that a cycle, with the same thing happening over and over again, ought to exhibit continuous behavior. In one sense it does. But if you pile on one more uranium brick, or pull out the control rod another twelve inches, there's one hell of a big difference between k of 0.9994 and k of 1.0006.
I don't like this model much for the following reasons:
- The model doesn't offer much insight in to the time scale over which an AI might self-improve. The "mean generation time" (time necessary for the next "generation" of neutrons to be released) of a nuclear chain reaction is short, and the doubling time for neutron activity in Fermi's experiment was just two minutes, but it hardly seems reasonable to generalize this to self-improving AIs.
- A flurry of insights that either dies out or expands exponentially doesn't seem like a very good description of how human minds work, and I don't think it would describe an AGI well either. Many people report that taking time to think about problems is key to their problem-solving process. It seems likely that an AGI unable to immediately generate insight in to some problem would have a slower and more exhaustive "fallback" search process that would allow it to continue making progress. (Insight could also work via a search process in the first place--over the space of permutations in one's mental model, say.)
The "differential equations folded on themselves" model
This is another model Eliezer alludes to, albeit in a somewhat handwavey fashion:
When you fold a whole chain of differential equations in on itself like this, it should either peter out rapidly as improvements fail to yield further improvements, or else go FOOM.
It's not exactly clear to me what the "whole chain of differential equations" is supposed to refer to... there's only one differential equation in the preceding paragraph, and it's a standard exponential (which could be scary or not, depending on the multiplier in the exponent. Rabbit populations and bank account balances both grow exponentially in a way that's slow enough for humans to understand and control.)
Maybe he's referring to the levels he describes here: metacognitive, cognitive, metaknowledge, knowledge, and object. How might we paramaterize this system?
Let's say c is our AGI's cognition ability, dc/dt is the rate of change in our AGI's cognitive ability, m is our AGI's "metaknowledge" (about cognition and metaknowledge), and dm/dt is the rate of change in metaknowledge. What I've got in mind is:
where p and q are constants.
In other words, both change in cognitive ability and change in metaknowledge are each individually directly proportionate to both cognitive ability and metaknowledge.
I don't know much about understanding systems of differential equations, so if you do, please comment! I put the above system in to Wolfram Alpha, but I'm not exactly sure how to interpret the solution provided. In any case, fooling around with this script suggests sudden, extremely sharp takeoff for a variety of different test parameters.
The straight exponential model
To me, the "proportionality thesis" described by David Chalmers in his singularity paper, "increases in intelligence (or increases of a certain sort) always lead to proportionate increases in the capacity to design intelligent systems", suggests a single differential equation that looks like
where u represents the number of upgrades that have been made to an AGI's source code, and s is some constant. The solution to this differential equation is going to look like
where the constant c1 is determined by our initial conditions.
(In Recursive Self-Improvement, Eliezer calls this a "too-obvious mathematical idiom". I'm inclined to favor it for its obviousness, or at least use it as a jumping-off point for further analysis.)
Under this model, the constant s is pretty important... if u(t) was the amount of money in a bank account, s would be the rate of return it was receiving. The parameter s will effectively determine the "doubling time" of an AGI's intelligence. It matters a lot whether this "doubling time" is on the scale of minutes or years.
So what's going to determine s? Well, if the AGI's hardware is twice as fast, we'd expect it to come up with upgrades twice as fast. If the AGI had twice as much hardware, and it could parallelize the search for upgrades perfectly (which seems like a reasonable approximation to me), we'd expect the same thing. So let's decompose s and make it the product of two parameters: h representing the hardware available to the AGI, and r representing the ease of finding additional improvements. The AGI's intelligence will be on the order of u * h, i.e. the product of the AGI's software quality and hardware capability.
Considerations affecting our choice of model
The consideration here is that the initial improvements implemented by an AGI will tend to be those that are especially easy to implement and/or especially fruitful to implement, with subsequent improvements tending to deliver less intelligence bang for the implementation buck. Chalmers calls this "perhaps the most serious structural obstacle" to the proportionality thesis.
To think about this consideration, one could imagine representing a given improvement as a pair of two values (u, d). u represents a factor by which existing performance will be multiplied, e.g. if u is 1.1, then implementing this improvement will improve performance by a factor of 1.1. d represents the cognitive difficulty or amount of intellectual labor to required to implement a given improvement. If d is doubled, then at any given level of intelligence, implementing this improvement will take twice as long (because it will be harder to discover and/or harder to translate in to code).
Now let's imagine ordering our improvements in order from highest to lowest u to d ratio, so we implement those improvements that deliver the greatest bang for the buck first.
Thus ordered, let's imagine separating groups of consecutive improvements in to "tiers". Each tier's worth of improvements, when taken together, will represent the doubling of an AGI's software quality, i.e. the product of the u's in that cluster will be roughly 2. For a steady doubling time, each tier's total difficulty will need sum to approximately twice the difficulty of the tier before it. If tier difficulty tends to more than double, we're likely to see sub-exponential growth. If tier difficulty tends to less than double, we're likely to see super-exponential growth. If a single improvement delivers a more-than-2x improvement, it will span multiple "tiers".
It seems to me that the quality of fruit available at each tier represents a kind of logical uncertainty, similar to asking whether an efficient algorithm exists for some task, and if so, how efficient.
On the this diminishing returns consideration, Chalmers writes:
If anything, 10% increases in intelligence-related capacities are likely to lead all sorts of intellectual breakthroughs, leading to next-generation increases in intelligence that are significantly greater than 10%. Even among humans, relatively small differences in design capacities (say, the difference between Turing and an average human) seem to lead to large differences in the systems that are designed (say, the difference between a computer and nothing of importance).
Eliezer Yudkowsky's objection is similar:
...human intelligence does not require a hundred times as much computing power as chimpanzee intelligence. Human brains are merely three times too large, and our prefrontal cortices six times too large, for a primate with our body size.
Or again: It does not seem to require 1000 times as many genes to build a human brain as to build a chimpanzee brain, even though human brains can build toys that are a thousand times as neat.
Why is this important? Because it shows that with constant optimization pressure from natural selection and no intelligent insight, there were no diminishing returns to a search for better brain designs up to at least the human level. There were probably accelerating returns (with a low acceleration factor). There are no visible speedbumps, so far as I know.
First, hunter-gatherers can't design toys that are a thousand times as neat as the ones chimps design--they aren't programmed with the software modern humans get through the education (some may be unable to count), and educating apes has produced interesting results.
Speaking as someone who's basically clueless about neuroscience, I can think of many different factors that might contribute to intelligence differences within the human race or between humans and other apes:
- Processing speed.
- Cubic centimeters brain hardware devoted to abstract thinking. (Gifted technical thinkers often seem to suffer from poor social intuition--perhaps a result of reallocation of brain hardware from social to technical processing.)
- Average number of connections per neuron within that brain hardware.
- Average neuron density within that brain hardware. This author seems to think that a large part of the human brain's remarkableness comes largely from the fact that it's the largest primate brain, and primate brains maintain the same neuron density when enlarged while other types of brains don't. "If absolute brain size is the best predictor of cognitive abilities in a primate (13), and absolute brain size is proportional to number of neurons across primates (24, 26), our superior cognitive abilities might be accounted for simply by the total number of neurons in our brain, which, based on the similar scaling of neuronal densities in rodents, elephants, and cetaceans, we predict to be the largest of any animal on Earth (28)."
- Propensity to actually use your capacity for deliberate System 2 reasoning. Richard Feynman's second wife on why she divorced him: "He begins working calculus problems in his head as soon as he awakens. He did calculus while driving in his car, while sitting in the living room, and while lying in bed at night." (By the way, does anyone know of research that's been done on getting people to use System 2 more? Seems like it could be really low-hanging fruit for improving intellectual output. Sometimes I wonder if the reason intelligent people tend to like math is because they were reinforced for the behaviour of thinking abstractly as kids (via praise, good grades, etc.) while those not at the top of the class were not so reinforced.)
- Extended neuroplasticity in to "childhood".
- Increased calories to think with due to the invention of cooking.
- And finally, mental algorithms ("software"). Which are probably at least somewhat important.
It seems to me like these factors (or ones like them) may multiply together to produce intelligence, i.e. the "intelligence equation", as it were, could be something like intelligence = processing_speed * cc_abstract_hardware * neuron_density * connections_per_neuron * propensity_for_abstraction * mental_algorithms. If the ancestral environment rewarded intelligence, we should expect all of these characteristics to be selected for, and this could explain the "low acceleration factor" in human intelligence increase. (Increasing your processing speed by a factor of 1.2 does more when you're already pretty smart, so all these sources of intelligence increase would feed in to one another.)
In other words, it's not that clear what relevance the evolution of human intelligence has to the ease and quality of the upgrades at different "tiers" of software improvements, since evolution operates on many non-software factors, but a self-improving AI (properly boxed) can only improve its software.
In the Hanson/Yudkowsky debate, Yudkowsky declares Douglas Englebart's plan to radically bootstrap his team's productivity though improving their computer and software tools "insufficiently recursive". I agree with this assessment. Here's my modelling of this phenomenon.
When a programmer makes an improvement to their code, their work of making the improvement requires the completion of many subtasks:
- choosing a feature to add
- reminding themselves of how the relevant part of the code works and loading that information in to their memory
- identifying ways to implement the feature
- evaluating different methods of implementing the feature according to simplicity, efficiency, and correctness
- coding their chosen implementation
- testing their chosen implementation, identifying bugs
- identifying the cause of a given bug
- figuring out how to fix the given bug
Each of those subtasks will consist of further subtasks like poking through their code, staring off in to space, typing, and talking to their rubber duck.
Now the programmer improves their development environment so they can poke through their code slightly faster. But if poking through their code takes up only 5% of their development time, even an extremely large improvement in code-poking abilities is not going to result in an especially large increase in his development speed... in the best case, where code-poking time is reduced to zero, the programmer will only work about 5% faster.
This is a reflection of Amdahl's Law-type thinking. The amount you can gain through speeding something up depends on how much it's slowing you down.
Relatedly, if intelligence is a complicated, heterogeneous process where computation is spread relatively evenly among many modules, then improving the performance of an AGI gets tougher, because upgrading an individual module does little to improve the performance of the system as a whole.
And to see orders-of-magnitude performance improvement in such a process, almost all of your AGI's components will need to be improved radically. If even a few prove troublesome, improving your AGI's thinking speed becomes difficult.
Case studies in technological development speed
It has famously been noted that if the automotive industry had achieved similar improvements in performance [to the semiconductor industry] in the last 30 years, a Rolls-Royce would cost only $40 and could circle the globe eight times on one gallon of gas—with a top speed of 2.4 million miles per hour.
From this McKinsey report. So Moore's Law is an outlier where technological development is concerned. I suspect that making transistors smaller and faster doesn't require finding ways to improve dozens of heterogeneous components. And when you zoom out to view a computer system as a whole, other bottlenecks typically appear.
(It's also worth noting that research budgets in the semiconductor field have also risen greatly in the semiconductor industry since its inception, but obviously not following the same curve that chip speeds have.)
This paper on "Proebstig's Law" suggests that the end result of all the compiler research done between 1970 or so and 2001 was that a typical integer-intensive program was compiled to run 3.3 times faster, and a typical floating-point-intensive program was compiled to run 8.1 times faster. When it comes to making programs run quickly, it seems that software-level compiler improvements are swamped by hardware-level chip improvements--perhaps because, like an AGI, a compiler has to deal with a huge variety of different scenarios, so improving it in the average case is tough. (This represents supertask heterogeneity, rather than subtask heterogeneity, so it's a different objection than the one mentioned above.)
AI (so far)
Robin Hanson's blog post "AI Progress Estimate" was the best resource I could find on this.
Why smooth exponential growth implies soft takeoff
Let's suppose we consider all of the above, deciding that the exponential model is the best, and we agree with Robin Hanson that there are few deep, chunky, undiscovered AI insights.
Under the straight exponential model, if you recall, we had
where u is the degree of software quality, h is the hardware availability, and r is a parameter representing the difficulty of doing additional upgrades. Our AGI's overall intelligence is given by u * h--the quality of the software times the amount of hardware.
Now we can solve for r by substituting in human intelligence for u * h, and substituting in the rate of human AI progress for du/dt. Another way of saying this is: When the AI is as smart as all the world's AI researchers working together, it will produce new AI insights at the rate that all the world's AI researchers working together produce new insights. At some point our AGI will be just as smart as the world's AI researchers, but we can hardly expect to start seeing super-fast AI progress at that point, because the world's AI researchers haven't produced super-fast AI progress.
Let's assume AGI that's on par with the world AI research community is reached in 2080 (LW's median "singularity" estimate in 2011). We'll pretend AI research has only been going on since 2000, meaning 80 "standard research years" of progress have gone in to the AGI's software. So at the moment our shiny new AGI is fired up, u = 80, and it's doing research at the rate of one "human AGI community research year" per year, so du/dt = 1. That's an effective rate of return on AI software progress of 1 / 80 = 1.3%, giving a software quality doubling time of around 58 years.
You could also apply this kind of thinking to individual AI projects. For example, it's possible that at some point EURISKO was improving itself about as fast as Doug Lenat was improving it. You might be able to do a similar calculation to take a stab at EURISKO's insight level doubling time.
The importance of hardware
According to my model, you double your AGI's intelligence, and thereby the speed with which your AGI improves itself, by doubling the hardware available for your AGI. So if you had an AGI that was interesting, you could make it 4x as smart by giving it 4x the hardware. If an AGI that was 4x as smart could get you 4x as much money (through impressing investors, or playing the stock market, or monopolizing additional industries), that'd be a nice feedback loop. For maximum explosivity, put half your AGI's mind to the task of improving its software, and the other half to the task of making more money with which to buy more hardware.
But it seems pretty straightforward to prevent a non-superintelligent AI from gaining access to additional hardware with careful planning. (Note: One problem with AI boxing experiments thus far is that all of the AIs have been played by human beings. Human beings have innate understanding of human psychology and possess specialized capabilities for running emulations of one another. It seems pretty easy to prevent an AGI from acquiring such understanding. But there may exist box-breaking techniques that don't rely on understanding human psychology. Another note about boxing: FAI requires getting everything perfect, which is a conjunctive calculation. Given multiple safeguards, only one has to work for the box as a whole to work, which is a disjunctive calculation.)
AGI's impact on the economy
Is it possible that the first group to create a successful AGI might begin monopolizing different sections of the economy? Robin Hanson argues that technology insights typically leak between different companies, due to conferences and employee poaching. But we can't be confident these factors would affect the research an AGI does on itself. And if an AGI is still dumb enough that a significant portion of its software upgrades are coming from human researchers, it can hardly be considered superintelligent.
Given what looks like a winner-take-all dynamic, an important factor may be the number of serious AGI competitors. If there are only two, the #1 company may not wish to trade insights with the #2 company for fear of losing its lead. If there are more than two, all but the leading company might ally against the leading company in trading insights. If their alliance is significantly stronger than the leading company, perhaps the leading company would wish to join their alliance.
But if AI is about getting lots of details right, as Hanson suggests, improvements may not even transfer between different AI architectures.
What should we do?
I've argued that soft takeoff is a strong possibility. Should that change our strategy as people concerned with x-risk?
If we are basically screwed in the event that hard takeoff is possible, it may be that preparing for a soft takeoff is a better use of resources on the margin. Shane Legg has proposed that people concerned with friendliness become investors in AGI projects so they can affect the outcome of any that seem to be succeeding.
Expert forecasts are famously unreliable even in the relatively well-understood field of political forecasting. So given the number of unknowns involved in the emergence of smarter-than-human intelligence, it's hard to say much with certainty. Picture a few Greek scholars speculating on the industrial revolution.
I don't have a strong background in these topics, so I fully expect that the above essay will reveal my ignorance, which I'd appreciate your pointing out in the comments. This essay should be taken as at attempt to hack away at the edges, not come to definitive conclusions. As always, I reserve the right to change my mind about anything ;)
My paper "General purpose intelligence: arguing the Orthogonality thesis" has been accepted for publication in the December edition of Analysis and Metaphysics. Since that's some time away, I thought I'd put the final paper up here; the arguments are similar to those here, but this is the final version, for critique and citation purposes.
General purpose intelligence: arguing the Orthogonality thesis
Future of Humanity Institute, Oxford Martin School
Philosophy Department, University of Oxford
In his paper “The Superintelligent Will”, Nick Bostrom formalised the Orthogonality thesis: the idea that the final goals and intelligence levels of artificial agents are independent of each other. This paper presents arguments for a (narrower) version of the thesis. It proceeds through three steps. First it shows that superintelligent agents with essentially arbitrary goals can exist in our universe – both as theoretical impractical agents such as AIXI and as physically possible real-world agents. Then it argues that if humans are capable of building human-level artificial intelligences, we can build them with an extremely broad spectrum of goals. Finally it shows that the same result holds for any superintelligent agent we could directly or indirectly build. This result is relevant for arguments about the potential motivations of future agents: knowing an artificial agent is of high intelligence does not allow us to presume that it will be moral, we will need to figure out its goals directly.
Keywords: AI; Artificial Intelligence; efficiency; intelligence; goals; orthogonality
1 The Orthogonality thesis
Scientists and mathematicians are the stereotypical examples of high intelligence humans. But their morality and ethics have been all over the map. On modern political scales, they can be left- (Oppenheimer) or right-wing (von Neumann) and historically they have slotted into most of the political groupings of their period (Galois, Lavoisier). Ethically, they have ranged from very humanitarian (Darwin, Einstein outside of his private life), through amoral (von Braun) to commercially belligerent (Edison) and vindictive (Newton). Few scientists have been put in a position where they could demonstrate genuinely evil behaviour, but there have been a few of those (Teichmüller, Philipp Lenard, Ted Kaczynski, Shirō Ishii).
let me suggest a moral axiom with apparently very strong intuitive support, no matter what your concept of morality: morality should exist. That is, there should exist creatures who know what is moral, and who act on that. So if your moral theory implies that in ordinary circumstances moral creatures should exterminate themselves, leaving only immoral creatures, or no creatures at all, well that seems a sufficient reductio to solidly reject your moral theory.
I agree strongly with the above quote, and I think most other readers will as well. It is good for moral beings to exist and a world with beings who value morality is almost always better than one where they do not. I would like to restate this more precisely as the following axiom: A population in which moral beings exist and have net positive utility, and in which all other creatures in existence also have net positive utility, is always better than a population where moral beings do not exist.
While the axiom that morality should exist is extremely obvious to most people, there is one strangely popular ethical system that rejects it: total utilitarianism. In this essay I will argue that Total Utilitarianism leads to what I will call the Genocidal Conclusion, which is that there are many situations in which it would be fantastically good for moral creatures to either exterminate themselves, or greatly limit their utility and reproduction in favor of the utility and reproduction of immoral creatures. I will argue that the main reason consequentialist theories of population ethics produce such obviously absurd conclusions is that they continue to focus on maximizing utility1 in situations where it is possible to create new creatures. I will argue that pure utility maximization is only a valid ethical theory for "special case" scenarios where the population is static. I will propose an alternative theory for population ethics I call "ideal consequentialism" or "ideal utilitarianism" which avoids the Genocidal Conclusion and may also avoid the more famous Repugnant Conclusion.
I will begin my argument by pointing to a common problem in population ethics known as the Mere Addition Paradox (MAP) and the Repugnant Conclusion. Most Less Wrong readers will already be familiar with this problem, so I do not think I need to elaborate on it. You may also be familiar with a even stronger variation called the Benign Addition Paradox (BAP). This is essentially the same as the MAP, except that each time one adds more people one also gives a small amount of additional utility to the people who already existed. One then proceeds to redistribute utility between people as normal, eventually arriving at the huge population where everyone's lives are "barely worth living." The point of this is to argue that the Repugnant Conclusion can be arrived at from "mere addition" of new people that not only doesn't harm the preexisting-people, but also one that benefits them.
The next step of my argument involves three slightly tweaked versions of the Benign Addition Paradox. I have not changed the basic logic of the problem, I have just added one small clarifying detail. In the original MAP and BAP it was not specified what sort of values the added individuals in population A+ held. Presumably one was meant to assume that they were ordinary human beings. In the versions of the BAP I am about to present, however, I will specify that the extra individuals added in A+ are not moral creatures, that if they have values at all they are values indifferent to, or opposed to, morality and the other values that the human race holds dear.
1. The Benign Addition Paradox with Paperclip Maximizers.
Let us imagine, as usual, a population, A, which has a large group of human beings living lives of very high utility. Let us then add a new population consisting of paperclip maximizers, each of whom is living a life barely worth living. Presumably, for a paperclip maximizer, this would be a life where the paperclip maximizer's existence results in at least one more paperclip in the world than there would have been otherwise.
Now, one might object that if one creates a paperclip maximizer, and then allows it to create one paperclip, the utility of the other paperclip maximizers will increase above the "barely worth living" level, which would obviously make this thought experiment nonalagous with the original MAP and BAP. To prevent this we will assume that each paperclip maximizer that is created has a slightly different values on what the ideal size, color, and composition of the paperclip they are trying to produce is. So the Purple 2 centimeter Plastic Paperclip Maximizer gains no addition utility from when the Silver Iron 1 centimeter Paperclip Maximizer makes a paperclip.
So again, let us add these paperclip maximizers to population A, and in the process give one extra utilon of utility to each preexisting person in A. This is a good thing, right? After all, everyone in A benefited, and the paperclippers get to exist and make paperclips. So clearly A+, the new population, is better than A.
Now let's take the next step, the transition from population A+ to population B. Take some of the utility from the human beings and convert it into paperclips. This is a good thing, right?
So let us repeat these steps adding paperclip maximizers and utility, and then redistributing utility. Eventually we reach population Z, where there is a vast amount of paperclip maximizers, a vast amount of many different kinds of paperclips, and a small amount of human beings living lives barely worth living.
Obviously Z is better than A, right? We should not fear the creation of a paperclip maximizing AI, but welcome it! Forget about things like high challenge, love, interpersonal entanglement, complex fun, and so on! Those things just don't produce the kind of utility that paperclip maximization has the potential to do!
Or maybe there is something seriously wrong with the moral assumptions behind the Mere Addition and Benign Addition Paradoxes.
But you might argue that I am using an unrealistic example. Creatures like Paperclip Maximizers may be so far removed from normal human experience that we have trouble thinking about them properly. So let's replay the Benign Addition Paradox again, but with creatures we might actually expect to meet in real life, and we know we actually value.
2. The Benign Addition Paradox with Non-Sapient Animals
You know the drill by now. Take population A, add a new population to it, while very slightly increasing the utility of the original population. This time let's have it be some kind animal that is capable of feeling pleasure and pain, but is not capable of modeling possible alternative futures and choosing between them (in other words, it is not capable of having "values" or being "moral"). A lizard or a mouse, for example. Each one feels slightly more pleasure than pain in its lifetime, so it can be said to have a life barely worth living. Convert A+ to B. Take the utilons that the human beings are using to experience things like curiosity, beatitude, wisdom, beauty, harmony, morality, and so on, and convert it into pleasure for the animals.
We end up with population Z, with a vast amount of mice or lizards with lives just barely worth living, and a small amount of human beings with lives barely worth living. Terrific! Why do we bother creating humans at all! Let's just create tons of mice and inject them full of heroin! It's a much more efficient way to generate utility!
3. The Benign Addition Paradox with Sociopaths
What new population will we add to A this time? How about some other human beings, who all have anti-social personality disorder? True, they lack the key, crucial value of sympathy that defines so much of human behavior. But they don't seem to miss it. And their lives are barely worth living, so obviously A+ has greater utility than A. If given a chance the sociopaths will reduce the utility of other people to negative levels, but let's assume that that is somehow prevented in this case.
Eventually we get to Z, with a vast population of sociopaths and a small population of normal human beings, all living lives just barely worth living. That has more utility, right? True, the sociopaths place no value on things like friendship, love, compassion, empathy, and so on. And true, the sociopaths are immoral beings who do not care in the slightest about right and wrong. But what does that matter? Utility is being maximized, and surely that is what population ethics is all about!
Let's suppose an asteroid is approaching each of the four population Zs discussed before. It can only be deflected by so much. Your choice is, save the original population of humans from A, or save the vast new population. The choice is obvious. In 1, 2, and 3, each individual has the same level utility, so obviously we should choose which option saves a greater number of individuals.
Bam! The asteroid strikes. The end result in all four scenarios is a world in which all the moral creatures are destroyed. It is a world without the many complex values that human beings possess. Each world, for the most part, lack things like complex challenge, imagination, friendship, empathy, love, and the other complex values that human beings prize. But so what? The purpose of population ethics is to maximize utility, not silly, frivolous things like morality, or the other complex values of the human race. That means that any form of utility that is easier to produce than those values is obviously superior. It's easier to make pleasure and paperclips than it is to make eudaemonia, so that's the form of utility that ought to be maximized, right? And as for making sure moral beings exist, well that's just ridiculous. The valuable processing power they're using to care about morality could be being used to make more paperclips or more mice injected with heroin! Obviously it would be better if they died off, right?
I'm going to go out on a limb and say "Wrong."
Is this realistic?
Now, to fair, in the Overcoming Bias page I quoted, Robin Hanson also says:
I’m not saying I can’t imagine any possible circumstances where moral creatures shouldn’t die off, but I am saying that those are not ordinary circumstances.
Maybe the scenarios I am proposing are just too extraordinary. But I don't think this is the case. I imagine that the circumstances Robin had in mind were probably something like "either all moral creatures die off, or all moral creatures are tortured 24/7 for all eternity."
Any purely utility-maximizing theory of population ethics that counts both the complex values of human beings, and the pleasure of animals, as "utility" should inevitably draw the conclusion that human beings ought to limit their reproduction to the bare minimum necessary to maintain the infrastructure to sustain a vastly huge population of non-human animals (preferably animals dosed with some sort of pleasure-causing drug). And if some way is found to maintain that infrastructure automatically, without the need for human beings, then the logical conclusion is that human beings are a waste of resources (as are chimps, gorillas, dolphins, and any other animal that is even remotely capable of having values or morality). Furthermore, even if the human race cannot practically be replaced with automated infrastructure, this should be an end result that the adherents of this theory should be yearning for.2 There should be much wailing and gnashing of teeth among moral philosophers that exterminating the human race is impractical, and much hope that someday in the future it will not be.
I call this the "Genocidal Conclusion" or "GC." On the macro level the GC manifests as the idea that the human race ought to be exterminated and replaced with creatures whose preferences are easier to satisfy. On the micro level it manifests as the idea that it is perfectly acceptable to kill someone who is destined to live a perfectly good and worthwhile life and replace them with another person who would have a slightly higher level of utility.
Population Ethics isn't About Maximizing Utility
I am going to make a rather radical proposal. I am going to argue that the consequentialist's favorite maxim, "maximize utility," only applies to scenarios where creating new people or creatures is off the table. I think we need an entirely different ethical framework to describe what ought to be done when it is possible to create new people. I am not by any means saying that "which option would result in more utility" is never a morally relevant consideration when deciding to create a new person, but I definitely think it is not the only one.3
So what do I propose as a replacement to utility maximization? I would argue in favor of a system that promotes a wide range of ideals. Doing some research, I discovered that G. E. Moore had in fact proposed a form of "ideal utilitarianism" in the early 20th century.4 However, I think that "ideal consequentialism" might be a better term for this system, since it isn't just about aggregating utility functions.
What are some of the ideals that an ideal consequentialist theory of population ethics might seek to promote? I've already hinted at what I think they are: Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom... mutual affection, love, friendship, cooperation; all those other important human universals, plus all the stuff in the Fun Theory Sequence. When considering what sort of creatures to create we ought to create creatures that value those things. Not necessarily, all of them, or in the same proportions, for diversity is an important ideal as well, but they should value a great many of those ideals.
Now, lest you worry that this theory has any totalitarian implications, let me make it clear that I am not saying we should force these values on creatures that do not share them. Forcing a paperclip maximizer to pretend to make friends and love people does not do anything to promote the ideals of Friendship and Love. Forcing a chimpanzee to listen while you read the Sequences to it does not promote the values of Truth and Knowledge. Those ideals require both a subjective and objective component. The only way to promote those ideals is to create a creature that includes them as part of its utility function and then help it maximize its utility.
I am also certainly not saying that there is never any value in creating a creature that does not possess these values. There are obviously many circumstances where it is good to create nonhuman animals. There may even be some circumstances where a paperclip maximizer could be of value. My argument is simply that it is most important to make sure that creatures who value these various ideals exist.
I am also not suggesting that it is morally acceptable to casually inflict horrible harms upon a creature with non-human values if we screw up and create one by accident. If promoting ideals and maximizing utility are separate values then it may be that once we have created such a creature we have a duty to make sure it lives a good life, even if it was a bad thing to create it in the first place. You can't unbirth a child.5
It also seems to me that in addition to having ideals about what sort of creatures should exist, we also have ideals about how utility ought to be concentrated. If this is the case then ideal consequentialism may be able to block some forms of the Repugnant Conclusion, even if situations where the only creatures whose creation is being considered are human beings. If it is acceptable to create humans instead of paperclippers, even if the paperclippers would have higher utility, it may also be acceptable to create ten humans with a utility of ten each instead of a hundred humans with a utility of 1.01 each.
Why Did We Become Convinced that Maximizing Utility was the Sole Good?
Population ethics was, until comparatively recently, a fallow field in ethics. And in situations where there is no option to increase the population, maximizing utility is the only consideration that's really relevant. If you've created creatures that value the right ideals, then all that is left to be done is to maximize their utility. If you've created creatures that do not value the right ideals, there is no value to be had in attempting to force them to embrace those ideals. As I've said before, you will not promote the values of Love and Friendship by creating a paperclip maximizer and forcing it to pretend to love people and make friends.
So in situations where the population is constant, "maximize utility" is a decent approximation of the meaning of right. It's only when the population can be added to that morality becomes much more complicated.
Another thing to blame is human-centric reasoning. When people defend the Repugnant Conclusion they tend to point out that a life barely worth living is not as bad as it would seem at first glance. They emphasize that it need not be a boring life, it may be a life full of ups and downs where the ups just barely outweigh the downs. A life worth living, they say, is a life one would choose to live. Derek Parfit developed this idea to some extent by arguing that there are certain values that are "discontinuous" and that one needs to experience many of them in order to truly have a life worth living.
The Orthogonality Thesis throws all these arguments out the window. It is possible to create an intelligence to execute any utility function, no matter what it is. If human beings have all sorts of complex needs that must be fulfilled in order to for them lead worthwhile lives, then you could create more worthwhile lives by killing the human race and replacing them with something less finicky. Maybe happy cows. Maybe paperclip maximizers. Or how about some creature whose only desire is to live for one second and then die. If we created such a creature and then killed it we would reap huge amounts of utility, for we would have created a creature that got everything it wanted out of life!
How Intuitive is the Mere Addition Principle, Really?
I think most people would agree that morality should exist, and that therefore any system of population ethics should not lead to the Genocidal Conclusion. But which step in the Benign Addition Paradox should we reject? We could reject the step where utility is redistributed. But that seems wrong, most people seem to consider it bad for animals and sociopaths to suffer, and that it is acceptable to inflict at least some amount of disutilities on human beings to prevent such suffering.
It seems more logical to reject the Mere Addition Principle. In other words, maybe we ought to reject the idea that the mere addition of more lives-worth-living cannot make the world worse. And in turn, we should probably also reject the Benign Addition Principle. Adding more lives-worth-living may be capable of making the world worse, even if doing so also slightly benefits existing people. Fortunately this isn't a very hard principle to reject. While many moral philosophers treat it as obviously correct, nearly everyone else rejects this principle in day-to-day life.
Now, I'm obviously not saying that people's behavior in their day-to-day lives is always good, it may be that they are morally mistaken. But I think the fact that so many people seem to implicitly reject it provides some sort of evidence against it.
Take people's decision to have children. Many people choose to have fewer children than they otherwise would because they do not believe they will be able to adequately care for them, at least not without inflicting large disutilities on themselves. If most people accepted the Mere Addition Principle there would be a simple solution for this: have more children and then neglect them! True, the children's lives would be terrible while they were growing up, but once they've grown up and are on their own there's a good chance they may be able to lead worthwhile lives. Not only that, it may be possible to trick the welfare system into giving you money for the children you neglect, which would satisfy the Benign Addition Principle.
Yet most people choose not to have children and neglect them. And furthermore they seem to think that they have a moral duty not to do so, that a world where they choose to not have neglected children is better than one that they don't. What is wrong with them?
Another example is a common political view many people have. Many people believe that impoverished people should have fewer children because of the burden doing so would place on the welfare system. They also believe that it would be bad to get rid of the welfare system altogether. If the Benign Addition Principle were as obvious as it seems, they would instead advocate for the abolition of the welfare system, and encourage impoverished people to have more children. Assuming most impoverished people live lives worth living, this is exactly analogous to the BAP, it would create more people, while benefiting existing ones (the people who pay less taxes because of the abolition of the welfare system).
Yet again, most people choose to reject this line of reasoning. The BAP does not seem to be an obvious and intuitive principle at all.
The Genocidal Conclusion is Really Repugnant
There is nearly nothing repugnant than the Genocidal Conclusion. Pretty much the only way a line of moral reasoning could go more wrong would be concluding that we have a moral duty to cause suffering, as an end in itself. This means that it's fairly easy to counter any argument in favor of total utilitarianism that argues the alternative I am promoting has odd conclusions that do not fit some of our moral intuitions, while total utilitarianism does not. Is that conclusion more insane than the Genocidal Conclusion? If it isn't, total utilitarianism should still be rejected.
Ideal Consequentialism Needs a Lot of Work
I do think that Ideal Consequentialism needs some serious ironing out. I haven't really developed it into a logical and rigorous system, at this point it's barely even a rough framework. There are many questions that stump me. In particular I am not quite sure what population principle I should develop. It's hard to develop one that rejects the MAP without leading to weird conclusions, like that it's bad to create someone of high utility if a population of even higher utility existed long ago. It's a difficult problem to work on, and it would be interesting to see if anyone else had any ideas.
But just because I don't have an alternative fully worked out doesn't mean I can't reject Total Utilitarianism. It leads to the conclusion that a world with no love, curiosity, complex challenge, friendship, morality, or any other value the human race holds dear is an ideal, desirable world, if there is a sufficient amount of some other creature with a simpler utility function. Morality should exist, and because of that, total utilitarianism must be rejected as a moral system.
1I have been asked to note that when I use the phrase "utility" I am usually referring to a concept that is called "E-utility," rather than the Von Neumann-Morgenstern utility that is sometimes discussed in decision theory. The difference is that in VNM one's moral views are included in one's utility function, whereas in E-utility they are not. So if one chooses to harm oneself to help others because one believes that is morally right, one has higher VNM utility, but lower E-utility.
2There is a certain argument against the Repugnant Conclusion that goes that, as the steps of the Mere Addition Paradox are followed the world will lose its last symphony, its last great book, and so on. I have always considered this to be an invalid argument because the world of the RC doesn't necessarily have to be one where these things don't exist, it could be one where they exist, but are enjoyed very rarely. The Genocidal Conclusion brings this argument back in force. Creating creatures that can appreciate symphonies and great books is very inefficient compared to creating bunny rabbits pumped full of heroin.
3Total Utilitarianism was originally introduced to population ethics as a possible solution to the Non-Identity Problem. I certainly agree that such a problem needs a solution, even if Total Utilitarianism doesn't work out as that solution.
4I haven't read a lot of Moore, most of my ideas were extrapolated from other things I read on Less Wrong. I just mentioned him because in my research I noticed his concept of "ideal utilitarianism" resembled my ideas. While I do think he was on the right track he does commit the Mind Projection Fallacy a lot. For instance, he seems to think that one could promote beauty by creating beautiful objects, even if there were no creatures with standards of beauty around to appreciate them. This is why I am careful to emphasize that to promote ideals like love and beauty one must create creatures capable of feeling love and experiencing beauty.
5My tentative answer to the question Eliezer poses in "You Can't Unbirth a Child" is that human beings may have a duty to allow the cheesecake maximizers to build some amount of giant cheesecakes, but they would also have a moral duty to limit such creatures' reproduction in order to spare resources to create more creatures with humane values.
EDITED: To make a point about ideal consequentialism clearer, based on AlexMennen's criticisms.
Alexei Turchin. Risks of downloading alien AI via SETI search
Abstract: This article examines risks associated with the program of passive search for alien signals (SETI—the Search for Extra-Terrestrial Intelligence). In this paper we propose a scenario of possible vulnerability and discuss the reasons why the proportion of dangerous signals to harmless ones can be dangerously high. This article does not propose to ban SETI programs, and does not insist on the inevitability of SETI-triggered disaster. Moreover, it gives the possibility of how SETI can be a salvation for mankind.
The idea that passive SETI can be dangerous is not new. Fred Hoyle suggested in the story "A for Andromeda” a scheme of alien attack through SETI signals. According to the plot, astronomers receive an alien signal, which contains a description of a computer and a computer program for it. This machine creates a description of the genetic code which leads to the creation of an intelligent creature – a girl dubbed Andromeda, which, working together with the computer, creates advanced technology for the military. The initial suspicion of alien intent is overcome by the greed for the technology the aliens can provide. However, the main characters realize that the computer acts in a manner hostile to human civilization and destroy the computer, and the girl dies.
This scenario is fiction, because most scientists do not believe in the possibility of a strong AI, and, secondly, because we do not have the technology that enables synthesis of new living organisms solely from its’ genetic code. Or at least, we have not until recently. Current technology of sequencing and DNA synthesis, as well as progress in developing a code of DNA modified with another set of the alphabet, indicate that in 10 years the task of re-establishing a living being from computer codes sent from space in the form computer codes might be feasible.
Hans Moravec in the book "Mind Children" (1988) offers a similar type of vulnerability: downloading a computer program from space via SETI, which will have artificial intelligence, promising new opportunities for the owner and after fooling the human host, self-replicating by the millions of copies and destroying the human host, finally using the resources of the secured planet to send its ‘child’ copies to multiple planets which constitute its’ future prey. Such a strategy would be like a virus or a digger wasp—horrible, but plausible. In the same direction are R. Carrigan’s ideas; he wrote an article "SETI-hacker", and expressed fears that unfiltered signals from space are loaded on millions of not secure computers of SETI-at-home program. But he met tough criticism from programmers who pointed out that, first, data fields and programs are in divided regions in computers, and secondly, computer codes, in which are written programs, are so unique that it is impossible to guess their structure sufficiently to hack them blindly (without prior knowledge).
After a while Carrigan issued a second article - "Should potential SETI signals be decontaminated?" http://home.fnal.gov/~carrigan/SETI/SETI%20Decon%20Australia%20poster%20paper.pdf, which I’ve translated into Russian. In it, he pointed to the ease of transferring gigabytes of data on interstellar distances, and also indicated that the interstellar signal may contain some kind of bait that will encourage people to collect a dangerous device according to the designs. Here Carrigan not give up his belief in the possibility that an alien virus could directly infected earth’s computers without human ‘translation’ assistance. (We may note with passing alarm that the prevalence of humans obsessed with death—as Fred Saberhagen pointed out in his idea of ‘goodlife’—means that we cannot entirely discount the possibility of demented ‘volunteers’ –human traitors eager to assist such a fatal invasion) As a possible confirmation of this idea, Carrigan has shown that it is possible easily reverse engineer language of computer program - that is, based on the text of the program it is possible to guess what it does, and then restore the value of operators.
In 2006, E. Yudkowsky wrote an article "AI as a positive and a negative factor of global risk", in which he demonstrated that it is very likely that it is possible rapidly evolving universal artificial intelligence which high intelligence would be extremely dangerous if it was programmed incorrectly, and, finally, that the occurrence of such AI and the risks associated with it significantly undervalued. In addition, Yudkowsky introduced the notion of “Seed AI” - embryo AI - that is a minimum program capable of runaway self-improvement with unchanged primary goal. The size of Seed AI can be on the close order of hundreds of kilobytes. (For example, a typical representative of Seed AI is a human baby, whose part of genome responsible for the brain would represent ~ 3% of total genes of a person with a volume of 500 megabytes, or 15 megabytes, but given the share of garbage DNA is even less.)
In the beginning, let us assume that in the Universe there is an extraterrestrial civilization, which intends to send such a message, which will enable it to obtain power over Earth, and consider this scenario. In the next chapter we will consider how realistic is that another civilization would want to send such a message.
First, we note that in order to prove the vulnerability, it is enough to find just one hole in security. However, in order to prove safety, you must remove every possible hole. The complexity of these tasks varies on many orders of magnitude that are well known to experts on computer security. This distinction has led to the fact that almost all computer systems have been broken (from Enigma to iPOD). I will now try to demonstrate one possible, and even, in my view, likely, vulnerability of SETI program. However, I want to caution the reader from the thought that if he finds errors in my discussions, it automatically proves the safety of SETI program. Secondly, I would also like to draw the attention of the reader, that I am a man with an IQ of 120 who spent all of a month of thinking on the vulnerability problem. We need not require an alien super civilization with IQ of 1000000 and contemplation time of millions of years to significantly improve this algorithm—we have no real idea what an IQ of 300 or even-a mere IQ of 100 with much larger mental ‘RAM’ (–the ability to load a major architectural task into mind and keep it there for weeks while processing) could accomplish to find a much more simple and effective way. Finally, I propose one possible algorithm and then we will discuss briefly the other options.
In our discussions we will draw on the Copernican principle, that is, the belief that we are ordinary observers in normal situations. Therefore, the Earth’s civilization is an ordinary civilization developing normally. (Readers of tabloid newspapers may object!)
Algorithm of SETI attack
1. The sender creates a kind of signal beacon in space, which reveals that its message is clearly artificial. For example, this may be a star with a Dyson sphere, which has holes or mirrors, alternately opened and closed. Therefore, the entire star will blink of a period of a few minutes - faster is not possible because of the variable distance between different openings. (Even synchronized with an atomic clock according to a rigid schedule, the speed of light limit means that there are limits to the speed and reaction time of coordinating large scale systems) Nevertheless, this beacon can be seen at a distance of millions of light years. There are possible other types of lighthouses, but the important fact that the beacon signal could be viewed at long distances.
2. Nearer to Earth is a radio beacon with a much weaker signal, but more information saturated. The lighthouse draws attention to this radio source. This source produces some stream of binary information (i.e. the sequence of 0 and 1). About the objection that the information would contain noises, I note that the most obvious (understandable to the recipient's side) means to reduce noise is the simple repetition of the signal in a circle.
3. The most simple way to convey meaningful information using a binary signal is sending of images. First, because eye structures in the Earth's biological diversity appeared independently 7 times, it means that the presentation of a three-dimensional world with the help of 2D images is probably universal, and is almost certainly understandable to all creatures who can build a radio receiver.
4. Secondly, the 2D images are not too difficult to encode in binary signals. To do so, let us use the same system, which was used in the first TV cameras, namely, a system of progressive and frame rate. At the end of each time frame images store bright light, repeated after each line, that is, through an equal number of bits. Finally, at the end of each frame is placed another signal indicating the end of the frame, and repeated after each frame. (This may form, or may not form a continuous film.) This may look like this:
Here is the end line signal of every of 25 units. Frame end signal may appear every, for example, 625 units.
5. Clearly, a sender civilization- should be extremely interested that we understand their signals. On the other hand, people will share an extreme desire to decrypt the signal. Therefore, there is no doubt that the picture will be recognized.
6. Using images and movies can convey a lot of information, they can even train in learning their language, and show their world. It is obvious that many can argue about how such films will be understandable. Here, we will focus on the fact that if a certain civilization sends radio signals, and the other takes them, so they have some shared knowledge. Namely, they know radio technique - that is they know transistors, capacitors, and resistors. These radio-parts are quite typical so that they can be easily recognized in the photographs. (For example, parts shown, in cutaway view, and in sequential assembly stages— or in an electrical schematic whose connections will argue for the nature of the components involved).
7. By sending photos depicting radio-parts on the right side, and on the left - their symbols, it is easy to convey a set of signs indicating electrical circuit. (Roughly the same could be transferred and the logical elements of computers.)
8. Then, using these symbols the sender civilization- transmits blueprints of their simplest computer. The simplest of computers from hardware point of view is the Post-machine. It has only 6 commands and a tape data recorder. Its full electric scheme will contain only a few tens of transistors or logic elements. It is not difficult to send blueprints of Post machine.
9. It is important to note that all computers at the level of algorithms are Turing-compatible. That means that extraterrestrial computers at the basic level are compatible with any earth computer. Turing-compatibility is a mathematical universality as the Pythagorean theorem. Even the Babbage mechanical machine, designed in the early 19th century, was Turing-compatible.
10. Then the sender civilization- begins to transmit programs for that machine. Despite the fact that the computer is very simple, it can implement a program of any difficulty, although it will take very long in comparison with more complex programs for the same computer. It is unlikely that people will be required to build this computer physically. They can easily emulate it within any modern computer, so that it will be able to perform trillions of operations per second, so even the most complex program will be carried out on it quite quickly. (It is a possible interim step: a primitive computer gives a description of a more complex and fast computer and then run on it.)
11. So why people would create this computer, and run its program? Perhaps, in addition to the actual computer schemes and programs in the communication must be some kind of "bait", which would have led the people to create such an alien computer and to run programs on it and to provide to it some sort of computer data about the external world –Earth outside the computer. There are two general possible baits - temptations and dangers:
a). For example, perhaps people receive the following offer– lets call it "The humanitarian aid con (deceit)". Senders of an "honest signal" SETI message warn that the sent program is Artificial intelligence, but lie about its goals. That is, they argue that this is a "gift" which will help us to solve all medical and energy problems. But it is a Trojan horse of most malevolent intent. It is too useful not to use. Eventually it becomes indispensable. And then exactly when society becomes dependent upon it, the foundation of society—and society itself—is overturned…
b). "The temptation of absolute power con" - in this scenario, they offer specific transaction message to recipients, promising power over other recipients. This begins a ‘race to the bottom’ that leads to runaway betrayals and power seeking counter-moves, ending with a world dictatorship, or worse, a destroyed world dictatorship on an empty world….
c). "Unknown threat con" - in this scenario bait senders report that a certain threat hangs over on humanity, for example, from another enemy civilization, and to protect yourself, you should join the putative “Galactic Alliance” and build a certain installation. Or, for example, they suggest performing a certain class of physical experiments on the accelerator and sending out this message to others in the Galaxy. (Like a chain letter) And we should send this message before we ignite the accelerator, please…
d). "Tireless researcher con" - here senders argue that posting messages is the cheapest way to explore the world. They ask us to create AI that will study our world, and send the results back. It does rather more than that, of course…
12. However, the main threat from alien messages with executable code is not the bait itself, but that this message can be well known to a large number of independent groups of people. First, there will always be someone who is more susceptible to the bait. Secondly, say, the world will know that alien message emanates from the Andromeda galaxy, and the Americans have already been received and maybe are trying to decipher it. Of course, then all other countries will run to build radio telescopes and point them on Andromeda galaxy, as will be afraid to miss a “strategic advantage”. And they will find the message and see that there is a proposal to grant omnipotence to those willing to collaborate. In doing so, they will not know, if the Americans would take advantage of them or not, even if the Americans will swear that they don’t run the malicious code, and beg others not to do so. Moreover, such oaths, and appeals will be perceived as a sign that the Americans have already received an incredible extraterrestrial advantage, and try to deprive "progressive mankind" of them. While most will understand the danger of launching alien code, someone will be willing to risk it. Moreover there will be a game in the spirit of "winner take all", as well be in the case of opening AI, as Yudkowsky shows in detail. So, the bait is not dangerous, but the plurality of recipients. If the alien message is posted to the Internet (and its size, sufficient to run Seed AI can be less than gigabytes along with a description of the computer program, and the bait), here we have a classic example of "knowledge" of mass destruction, as said Bill Joy, meaning the recipes genomes of dangerous biological viruses. If aliens sent code will be available to tens of thousands of people, then someone will start it even without any bait out of simple curiosity We can’t count on existing SETI protocols, because discussion on METI (sending of messages to extraterrestrial) has shown that SETI community is not monolithic on important questions. Even a simple fact that something was found could leak and encourage search from outsiders. And the coordinates of the point in sky would be enough.
13. Since people don’t have AI, we almost certainly greatly underestimate its power and overestimate our ability to control it. The common idea is that "it is enough to pull the power cord to stop an AI" or place it in a black box to avoid any associated risks. Yudkowsky shows that AI can deceive us as an adult does a child. If AI dips into the Internet, it can quickly subdue it as a whole, and also taught all necessary about entire earthly life. Quickly - means the maximum hours or days. Then the AI can create advanced nanotechnology, buy components and raw materials (on the Internet, he can easily make money and order goods with delivery, as well as to recruit people who would receive them, following the instructions of their well paying but ‘unseen employer’, not knowing who—or rather, what—- they are serving). Yudkowsky leads one of the possible scenarios of this stage in detail and assesses that AI needs only weeks to crack any security and get its own physical infrastructure.
"Consider, for clarity, one possible scenario, in which Alien AI (AAI) can seize power on the Earth. Assume that it promises immortality to anyone who creates a computer on the blueprints sent to him and start the program with AI on that computer. When the program starts, it says: "OK, buddy, I can make you immortal, but for this I need to know on what basis your body works. Provide me please access to your database. And you connect the device to the Internet, where it was gradually being developed and learns what it needs and peculiarities of human biology. (Here it is possible for it escape to the Internet, but we omit details since this is not the main point) Then the AAI says: "I know how you become biologically immortal. It is necessary to replace every cell of your body with nanobiorobot. And fortunately, in the biology of your body there is almost nothing special that would block bio-immorality.. Many other organisms in the universe are also using DNA as a carrier of information. So I know how to program the DNA so as to create genetically modified bacteria that could perform the functions of any cell. I need access to the biological laboratory, where I can perform a few experiments, and it will cost you a million of your dollars." You rent a laboratory, hire several employees, and finally the AAI issues a table with its' solution of custom designed DNA, which are ordered in the laboratory by automated machine synthesis of DNA. http://en.wikipedia.org/wiki/DNA_sequencing Then they implant the DNA into yeast, and after several unsuccessful experiments they create a radio guided bacteria (shorthand: This is not truly a bacterium, since it appears all organelles and nucleus; also 'radio' is shorthand for remote controlled; a far more likely communication mechanism would be modulated sonic impulses) , which can synthesize a new DNA-based code based on commands from outside. Now the AAI has achieved independence from human 'filtering' of its' true commands, because the bacterium has in effect its own remote controlled sequencers (self-reproducing to boot!). Now the AAI can transform and synthesize substances ostensibly introduced into test tubes for a benign test, and use them for a malevolent purpose., Obviously, at this moment Alien AI is ready to launch an attack against humanity. He can transfer himself to the level of nano-computer so that the source computer can be disconnected. After that AAI spraying some of subordinate bacteria in the air, which also have AAI, and they gradually are spread across the planet, imperceptibly penetrates into all living beings, and then start by the timer to divide indefinitely, as gray goo, and destroy all living beings. Once they are destroyed, Alien AI can begin to build their own infrastructure for the transmission of radio messages into space. Obviously, this fictionalized scenario is not unique: for example, AAI may seize power over nuclear weapons, and compel people to build radio transmitters under the threat of attack. Because of possibly vast AAI experience and intelligence, he can choose the most appropriate way in any existing circumstances. (Added by Freidlander: Imagine a CIA or FSB like agency with equipment centuries into the future, introduced to a primitive culture without concept of remote scanning, codes, the entire fieldcraft of spying. Humanity might never know what hit it, because the AAI might be many centuries if not millennia better armed than we (in the sense of usable military inventions and techniques ).
14. After that, this SETI-AI does not need people to realize any of its goals. This does not mean that it would seek to destroy them, but it may want to pre-empt if people will fight it - and they will.
15. Then this SETI-AI can do a lot of things, but more importantly, that it should do - is to continue the transfer of its communications-generated-embryos to the rest of the Universe. To do so, he will probably turn the matter in the solar system in the same transmitter as the one that sent him. In doing so the Earth and its’ people would be a disposable source of materials and parts—possibly on a molecular scale.
So, we examined a possible scenario of attack, which has 15 stages. Each of these stages is logically convincing and could be criticized and protected separately. Other attack scenarios are possible. For example, we may think that the message is not sent directly to us but is someone to someone else's correspondence and try to decipher it. And this will be, in fact, bait.
But not only distribution of executable code can be dangerous. For example, we can receive some sort of “useful” technology that really should lead us to disaster (for example, in the spirit of the message "quickly shrink 10 kg of plutonium, and you will have a new source of energy" ...but with planetary, not local consequences…). Such a mailing could be done by a certain "civilization" in advance to destroy competitors in the space. It is obvious that those who receive such messages will primarily seek technology for military use.
Analysis of possible goals
We now turn to the analysis of the purposes for which certain super civilizations could carry out such an attack.
1. We must not confuse the concept of a super-civilization with the hope for superkindness of civilization. Advanced does not necessarily mean merciful. Moreover, we should not expect anything good from extraterrestrial ‘kindness’. This is well written in Strugatsky’s novel "Waves stop wind." Whatever the goal of imposing super-civilization upon us , we have to be their inferiors in capability and in civilizational robustness even if their intentions are well.. The historical example: The activities of Christian missionaries, destroying traditional religion. Moreover, we can better understand purely hostile objectives. And if the SETI attack succeeds, it may be only a prelude to doing us more ‘favors’ and ‘upgrades’ until there is scarcely anything human left of us even if we do survive…
2. We can divide all civilizations into the twin classes of naive and serious. Serious civilizations are aware of the SETI risks, and have got their own powerful AI, which can resist alien hacker attacks. Naive civilizations, like the present Earth, already possess the means of long-distance hearing in space and computers, but do not yet possess AI, and are not aware of the risks of AI-SETI. Probably every civilization has its stage of being "naive", and it is this phase then it is most vulnerable to SETI attack. And perhaps this phase is very short. Since the period of the outbreak and spread of radio telescopes to powerful computers that could create AI can be only a few tens of years. Therefore, the SETI attack must be set at such a civilization. This is not a pleasant thought, because we are among the vulnerable.
3. If traveling with super-light speeds is not possible, the spread of civilization through SETI attacks is the fastest way to conquering space. At large distances, it will provide significant temporary gains compared with any kind of ships. Therefore, if two civilizations compete for mastery of space, the one that favored SETI attack will win.
4. The most important thing is that it is enough to begin a SETI attack just once, as it goes in a self-replicating the wave throughout the Universe, striking more and more naive civilizations. For example, if we have a million harmless normal biological viruses and one dangerous, then once they get into the body, we will get trillions of copies of the dangerous virus, and still only a million safe viruses. In other words, it is enough that if one of billions of civilizations starts the process and then it becomes unstoppable throughout the Universe. Since it is almost at the speed of light, countermeasures will be almost impossible.
5. Further, the delivery of SETI messages will be a priority for the virus that infected a civilization, and it will spend on it most of its energy, like a biological organism spends on reproduction - that is tens of percent. But Earth's civilization spends on SETI only a few tens of millions of dollars, that is about one millionth of our resources, and this proportion is unlikely to change much for the more advanced civilizations. In other words, an infected civilization will produce a million times more SETI signals than a healthy one. Or, to say in another way, if in the Galaxy are one million healthy civilizations, and one infected, then we will have equal chances to encounter a signal from healthy or contaminated.
6. Moreover, there are no other reasonable prospects to distribute its code in space except through self-replication.
7. Moreover, such a process could begin by accident - for example, in the beginning it was just a research project, which was intended to send the results of its (innocent) studies to the maternal civilization, not causing harm to the host civilization, then this process became "cancer" because of certain propogative faults or mutations.
8. There is nothing unusual in such behavior. In any medium, there are viruses – there are viruses in biology, in computer networks - computer viruses, in conversation - meme. We do not ask why nature wanted to create a biological virus.
9. Travel through SETI attacks is much cheaper than by any other means. Namely, a civilization in Andromeda can simultaneously send a signal to 100 billion stars in our galaxy. But each space ship would cost billions, and even if free, would be slower to reach all the stars of our Galaxy.
10. Now we list several possible goals of a SETI attack, just to show the variety of motives.
- To study the universe. After executing the code research probes are created to gather survey and send back information.
- To ensure that there are no competing civilizations. All of their embryos are destroyed. This is preemptive war on an indiscriminate basis.
- To preempt the other competing supercivilization (yes, in this scenario there are two!) before it can take advantage of this resource.
- This is done in order to prepare a solid base for the arrival of spacecraft. This makes sense if super civilization is very far away, and consequently, the gap between the speed of light and near-light speeds of its ships (say, 0.5 c) gives a millennium difference.
- The goal is to achieve immortality. Carrigan showed that the amount of human personal memory is on the order of 2.5 gigabytes, so a few exabytes (1 exabyte = 1 073 741 824 gigabytes) forwarding the information can send the entire civilization. (You may adjust the units according to how big you like your super-civilizations!)
- Finally we consider illogical and incomprehensible (to us) purposes, for example, as a work of art, an act of self-expression or toys. Or perhaps an insane rivalry between two factions. Or something we simply cannot understand (For example, extraterrestrial will not understand why the Americans have stuck a flag into the Moon. Was it worthwhile to fly over 300000 km to install painted steel?)
11. Assuming signals propagated billions of light years distant in the Universe, the area susceptible to widespread SETI attack, is a sphere with a radius of several billion light years. In other words, it would be sufficient to find a one “bad civilization" in the light cone of a height of several billion years old, that is, that includes billions of galaxies from which we are in danger of SETI attack. Of course, this is only true, if the average density of civilization is at least one in the galaxy. This is an interesting possibility in relation to Fermi’s Paradox.
16. As the depth of scanning the sky rises linearly, the volume of space and the number of stars that we see increases by the cube of that number. This means that our chances to stumble on a SETI signal nonlinear grow by fast curve.
17. It is possible that when we stumble upon several different messages from the skies, which refute one another in a spirit of: "do not listen to them, they are deceiving voices, and wish you evil. But we, brother, we, are good—and wise…"
18. Whatever positive and valuable message we receive, we can never be sure that all of this is not a subtle and deeply concealed threat. This means that in interstellar communication there will always be an element of distrust, and in every happy revelation, a gnawing suspicion.
19. A defensive posture regarding interstellar communication is only to listen, not sending anything that does not reveal its location. The laws prohibit the sending of a message from the United States to the stars. Anyone in the Universe who sends (transmits) self-evidently- is not afraid to show his position. Perhaps because the sending (for the sender) is more important than personal safety. For example, because it plans to flush out prey prior to attacks. Or it is forced to, by a evil local AI.
20. It was said about atomic bomb: The main secret about the atomic bomb is that it can be done. If prior to the discovery of a chain reaction Rutherford believed that the release of nuclear energy is an issue for the distant future, following the discovery any physicist knows that it is enough to connect two subcritical masses of fissionable material in order to release nuclear energy. In other words, if one day we find that signals can be received from space, it will be an irreversible event—something analogous to a deadly new arms race will be on.
The discussions on the issue raise several typical objections, now discussed.
Objection 1: Behavior discussed here is too anthropomorphic. In fact, civilizations are very different from each other, so you can’t predict their behavior.
Answer: Here we have a powerful observation selection effect. While a variety of possible civilizations exist, including such extreme scenarios as thinking oceans, etc., we can only receive radio signals from civilizations that send them, which means that they have corresponding radio equipment and has knowledge of materials, electronics and computing. That is to say we are threatened by civilizations of the same type as our own. Those civilizations, which can neither accept nor send radio messages, do not participate in this game.
Also, an observation selection effect concerns purposes. Goals of civilizations can be very different, but all civilizations intensely sending signals, will be only that want to tell something to “everyone". Finally, the observation selection relates to the effectiveness and universality of SETI virus. The more effective it is, the more different civilizations will catch it and the more copies of the SETI virus radio signals will be in heaven. So we have the ‘excellent chances’ to meet a most powerful and effective virus.
Objection 2. For super-civilizations there is no need to resort to subterfuge. They can directly conquer us.
This is true only if they are in close proximity to us. If movement faster than light is not possible, the impact of messages will be faster and cheaper. Perhaps this difference becomes important at intergalactic distances. Therefore, one should not fear the SETI attack from the nearest stars, coming within a radius of tens and hundreds of light-years.
Objection 3. There are lots of reasons why SETI attack may not be possible. What is the point to run an ineffective attack?
Answer: SETI attack does not always work. It must act in a sufficient number of cases in line with the objectives of civilization, which sends a message. For example, the con man or fraudster does not expect that he would be able "to con" every victim. He would be happy to steal from even one person inone hundred. It follows that SETI attack is useless if there is a goal to attack all civilizations in a certain galaxy. But if the goal is to get at least some outposts in another galaxy, the SETI attack fits. (Of course, these outposts can then build fleets of space ships to spread SETI attack bases outlying stars within the target galaxy.)
The main assumption underlying the idea of SETI attacks is that extraterrestrial super civilizations exist in the visible universe at all. I think that this is unlikely for reasons related to antropic principle. Our universe is unique from 10 ** 500 possible universes with different physical properties, as suggested by one of the scenarios of string theory. My brain is 1 kg out of 10 ** 30 kg in the solar system. Similarly, I suppose, the Sun is no more than about 1 out of 10 ** 30 stars that could raise a intelligent life, so it means that we are likely alone in the visible universe.
Secondly the fact that Earth came so late (i.e. it could be here for a few billion years earlier), and it was not prevented by alien preemption from developing, argues for the rarity of intelligent life in the Universe. The putative rarity of our civilization is the best protection against attack SETI. On the other hand, if we open parallel worlds or super light speed communication, the problem arises again.
Objection 7. Contact is impossible between post-singularity supercivilizations, which are supposed here to be the sender of SETI-signals, and pre- singularity civilization, which we are, because supercivilization is many orders of magnitude superior to us, and its message will be absolutely not understandable for us - exactly as the contact between ants and humans is not possible. (A singularity is the time of creation of artificial intelligence capable of learning, (and beginning an exponential booting in recursive improving self-design of further intelligence and much else besides) after which civilization make leap in its development - on Earth it may be possible in the area in 2030.)
Answer: In the proposed scenario, we are not talking about contact but a purposeful deception of us. Similarly, a man is quite capable of manipulating behavior of ants and other social insects, whose objectives are is absolutely incomprehensible to them. For example, LJ user “ivanov-petrov” describes the following scene: As a student, he studied the behavior of bees in the Botanical Garden of Moscow State University. But he had bad relations with the security guard controlling the garden, which is regularly expelled him before his time. Ivanov-Petrov took the green board and developed in bees conditioned reflex to attack this board. The next time the watchman came, who constantly wore a green jersey, all the bees attacked him and he took to flight. So “ivanov-petrov” could continue research. Such manipulation is not a contact, but this does not prevent its’ effectiveness.
"Objection 8. For civilizations located near us is much easier to attack us –for ‘guaranteed results’—using starships than with SETI-attack.
Answer. It may be that we significantly underestimate the complexity of an attack using starships and, in general, the complexity of interstellar travel. To list only one factor, the potential ‘minefield’ characteristics of the as-yet unknown interstellar medium.
If such an attack would be carried out now or in the past, the Earth's civilization has nothing to oppose it, but in the future the situation will change - all matter in the solar system will be full of robots, and possibly completely processed by them. On the other hand, the more the speed of enemy starships approaching us, the more the fleet will be visible by its braking emissions and other characteristics. These quick starships would be very vulnerable, in addition we could prepare in advance for its arrival. A slowly moving nano- starship would be very less visible, but in the case of wishing to trigger a transformation of full substance of the solar system, it would simply be nowhere to land (at least without starting an alert in such a ‘nanotech-settled’ and fully used future solar system. (Friedlander added: Presumably there would always be some ‘outer edge’ of thinly settled Oort Cloud sort of matter, but by definition the rest of the system would be more densely settled, energy rich and any deeper penetration into solar space and its’ conquest would be the proverbial uphill battle—not in terms of gravity gradient, but in terms of the available resources of war against a full Class 2 Kardashev civilization.)
The most serious objection is that an advanced civilization could in a few million years sow all our galaxy with self replicating post singularity nanobots that could achieve any goal in each target star-system, including easy prevention of the development of incipient other civilizations. (In the USA Frank Tipler advanced this line of reasoning.) However, this could not have happened in our case - no one has prevented development of our civilization. So, it would be much easier and more reliable to send out robots with such assignments, than bombardment of SETI messages of the entire galaxy, and if we don’t see it, it means that no SETI attacks are inside our galaxy. (It is possible that a probe on the outskirts of the solar system expects manifestations of human space activity to attack – a variant of the "Berserker" hypothesis - but it will not attack through SETI). Probably for many millions or even billions of years microrobots could even reach from distant galaxies at a distance of tens of millions of light-years away. Radiation damage may limit this however without regular self-rebuilding.
In this case SETI attack would be meaningful only at large distances. However, this distance - tens and hundreds of millions of light-years - probably will require innovative methods of modulation signals, such as management of the luminescence of active nuclei of galaxies. Or transfer a narrow beam in the direction of our galaxy (but they do not know where it will be over millions of years). But a civilization, which can manage its’ galaxy’s nucleus, might create a spaceship flying with near-light speeds, even if its mass is a mass of the planet. Such considerations severely reduce the likelihood of SETI attacks, but not lower it to zero, because we do not know all the possible objectives and circumstances.
(An comment by JF :For example the lack of SETI-attack so far may itself be a cunning ploy: At first receipt of the developing Solar civilization’s radio signals, all interstellar ‘spam’ would have ceased, (and interference stations of some unknown (but amazing) capability and type set up around the Solar System to block all coming signals recognizable to its’ computers as of intelligent origin,) in order to get us ‘lonely’ and give us time to discover and appreciate the Fermi Paradox and even get those so philosophically inclined to despair desperate that this means the Universe is apparently hostile by some standards. Then, when desperate, we suddenly discover, slowly at first, partially at first, and then with more and more wonderful signals, the fact that space is filled with bright enticing signals (like spam). The blockade, cunning as it was (analogous to Earthly jamming stations) was yet a prelude to a slow ‘turning up’ of preplanned intriguing signal traffic. If as Earth had developed we had intercepted cunning spam followed by the agonized ‘don’t repeat our mistakes’ final messages of tricked and dying civilizations, only a fool would heed the enticing voices of SETI spam. But now, a SETI attack may benefit from the slow unmasking of a cunning masquerade as first a faint and distant light of infinite wonder, only at the end revealed as the headlight of an onrushing cosmic train…)
AT comment to it. In fact I think that SETI attack senders are on the distances more than 1000 ly and so they do not know yet that we have appeared. But so called Fermi Paradox indeed maybe a trick – senders deliberately made their signals weak in order to make us think that they are not spam.
The scale of space strategy may be inconceivable to the human mind.
And we should note in conclusion that some types of SETI-attack do not even need a computer but just a man who could understand the message that then "set his mind on fire". At the moment we cannot imagine such a message, but we can give some analogies. Western religions are built around the text of the Bible. It can be assumed that if the text of the Bible appeared in some countries, which had previously not been familiar with it, there might arise a certain number of biblical believers. Similarly subversive political literature, or even some superideas, “sticky” memes or philosophical mind-benders. Or, as suggested by Hans Moravec, we get such a message: "Now that you have received and decoded me, broadcast me in at least ten thousand directions with ten million watts of power. Or else." - this message is dropped, leaving us guessing, what may indicate that "or else". Even a few pages of text may contain a lot of subversive information - Imagine that we could send a message to the 19 th century scientists. We could open them to the general principle of the atomic bomb, the theory of relativity, the transistors - and thus completely change the course of technological history, and we could add that all the ills in the 20 century were from Germany (which is only partly true) , then we would have influenced the political history.
(Comment of JF: Such a latter usage would depend on having received enough of Earth’s transmissions to be able to model our behavior and politics. But imagine a message as posing from our own future, to ignite ‘catalytic war’—Automated SIGINT (signals intelligence) stations are constructed monitoring our solar system, their computers ‘cracking’ our language and culture (possibly with the aid of children’s television programs with see and say matching of letters and sounds, from TV news showing world maps and naming countries possibly even from intercepting wireless internet encyclopedia articles. ) Then a test or two may follow, posting a what if scenario inviting comment from bloggers, about a future war say between the two leading powers of the planet. (For purposes of this discussion, say around 2100 present calendar China is strongest and India rising fast). Any defects and nitpicks in the comments of the blog are noted and corrected. Finally, an actual interstellar message is sent with the debugged scenario(not shifting against the stellar background, it is unquestionably interstellar in origin) proporting to be from a dying starship of the presently stronger side’s (China’s) future, when the presently weaker side (India’s) space fleet has smashed the future version of the Chinese State and essentially committed genocide. The starship has come back in time, but is dying, and indeed the transmission ends, or simply repeats, possibly after some back and forth communication between the false computer models of the ‘starship commander’ and the Chinese government. The reader can imagine the urgings of the future Chinese military council to preempt to forestall doom. If as seems probable, such a strategy is too complicated to carry off in one stage, various ‘future travellers’ may emerge from a war, signal for help in vain, and ‘die’ far outside our ability to reach them, (say some light days away, near the alleged location of an ‘emergence gate’ but near an actual transmitter) Quite a drama may emerge as the computer learns to ‘play’ us like a con man, ship after ship of various nationalities dribbling out stories but also getting answers to key questions for aid in constructing the emerging scenario which will be frighteningly believable, enough to ignite a final war. Possibly lists of key people in China (or whatever side is stronger) may be drawn up by the computer with a demand that they be executed as the parents of future war criminals—sort of an International Criminal Court –acting as Terminator scenario. Naturally the Chinese state, at that time the most powerful in the world, would guard its’ rulers lives against any threat. Yet more refugee spaceships of various nationalities can emerge transmit and die, offering their own militaries terrifying new weapons technologies from unknown sciences that really work (more ‘proof’ of their future origin). Or weapons from known sciences, for example decoding online DNA sequences in the future internet and constructing formulae for DNA constructors to make specific tailored genetic weapons against particular populations—that endure in the ground, a scorched earth against a particular population on a particular piece of land. These are copied and spread worldwide as are totally accurate plans—in standard CNC codes for easy to construct thermonuclear weapons in the 1950s style—using U-238 for casing, and only a few kilograms of fissionable material for ignition By that time well over a million tons of depleted uranium will be worldwide, and deuterium is free in the ocean and can be used directly in very large weapons without lithium deuteride. Knowing how to hack together a wasteful, more than critical mass crude fission device is one thing (the South African device was of this kind). But knowing –with absolute accuracy, down to machining drawings, CNC codes, etc how to make high-yield, super efficient very dirty thermonuclear weapons without need for testing means that any small group with a few dozen million dollars and automated machine tools can clandestinely make a multi-megaton device –or many— and smash the largest cities. And any small power with a few dozen jets can cripple a continent for a decade. Already over a thousand tons of plutonium exist. The SETI spam can include CNC codes for making a one shot reactor plutonium chemical refiner that would be left hopelessly radioactive but output chemically pure plutonium. (This would be prone to predetonation because of the Pu-240 content but then plans for debugged laser isotope separators may also be downloaded). This is a variant of the ‘catalytic war’ and ‘nuclear six gun’ (i.e. easy to obtain weapons) scenarios of the late Herman Kahn. Even cheaper would be bioattacks of the kind outlined above. The principle point is that planet killer weapons fully debugged take great amounts of debugging, tens to hundreds of billions of dollars, and free access to a world scientific community. Today, it is to every great power’s advantage to keep accurate designs out of the hands of third parties because they have to live on the same planet (and because the fewer weapons, the easier it is to stay a great power). Not so the SETI spam authors. Without the hundreds of billions in R and D, the actual construction budget would be on the order of a million dollars per multi-megaton device (depending on the expense of obtaining the raw reactor plutonium) If wishing to extend today’s scenarios into the future, the SETI spam authors manipulate Georgia (with about a $10 billion GDP) to arm against Russia and Taiwan against China and Venezuela against the USA. Although Russian and China and the USA could respectively promise annihilation against any attacker, with a military budget around 4% of GDP and the downloaded plans, the reverse—for the first time—could then also be true. (400 100 megaton bombs can kill by fallout perhaps 95% of unprotected populations over a country the size of the USA or China and 90% of a country the size of Russia, assuming the worst kind of cooperation from the winds.—from an old chart by Ralph Lapp) Anyone living near a superarmed microstate with border conflicts will, of course, wish to arm themselves. And these newly armed states themselves—of course—will have borders. Note that this drawn out scenario gives lots of time for a huge arms buildup on both (or many!) sides, and a Second Cold War that eventually turns very hot indeed…and unlike a human player of such a horrific ‘catalytic war’ con game, worldwide fallout or enduring biocontamination is not a concern at all… ()
The product of the probabilities of the following events describes the probability of attack. For these probabilities, we can only give so-called «expert» assessment, that is, assign them a certain a priori subjective probability as we do now.
1) The likelihood that extraterrestrial civilizations exist at a distance at which radio communication is possible with them. In general, I agree with the view of Shklovsky and supporters of the “Rare Earth” hypothesis - that the Earth's civilization is unique in the observable universe. This does not mean that extraterrestrial civilizations do not exist at all (because the universe, according to the theory of cosmological inflation, is almost endless) - they are just over the horizon of events visible from our point in space-time. In addition, this is not just about distance, but also of the distance at which you can establish a connection, which allows transferring gigabytes of information. (However, passing even 1 bit per second, you can submit 1-gigabit for about 20 years, which may be sufficient for the SETI-attack.) If in the future will be possible some superluminal communication or interaction with parallel universes, it would dramatically increase the chances of SETI attacks. So, I appreciate this chance to 10%.
2) The probability that SETI-attack is technically feasible: that is, it is possible computer program, with recursively self-improving AI and sizes suitable for shipping. I see this chance as high: 90%.
3) The likelihood that civilizations that could have carried out such attack exist in our space-time cone - this probability depends on the density of civilizations in the universe, and of whether the percentage of civilizations that choose to initiate such an attack, or, more importantly, obtain victims and become repeaters. In addition, it is necessary to take into account not only the density of civilizations, but also the density created by radio signals. All these factors are highly uncertain. It is therefore reasonable to assign this probability to 50%.
4) The probability that we find such a signal during our rising civilization’s period of vulnerability to it. The period of vulnerability lasts from now until the moment when we will decide and be technically ready to implement this decision: Do not download any extraterrestrial computer programs under any circumstances. Such a decision may only be exercised by our AI, installed as world ruler (which in itself is fraught with considerable risk). Such an world AI (WAI) can be in created circa 2030. We cannot exclude, however, that our WAI still will not impose a ban on the intake of extraterrestrial messages, and fall victim to attacks by the alien artificial intelligence, which by millions of years of machine evolution surpasses it. Thus, the window of vulnerability is most likely about 20 years, and “width” of the window depends on the intensity of searches in the coming years. This “width” for example, depends on the intensity of the current economic crisis of 2008-2010, from the risks of World War III, and how all this will affect the emergence of the WAI. It also depends on the density of infected civilizations and their signal strength— as these factors increase, the more chances to detect them earlier. Because we are a normal civilization under normal conditions, according to the principle of Copernicus, the probability should be large enough; otherwise a SETI-attack would have been generally ineffective. (The SETI-attack, itself (here supposed to exist) also are subject to a form of “natural selection” to test its effectiveness. (In the sense that it works or does not. ) This is a very uncertain chance we will too, over 50%.
5) Next is the probability that SETI-attack will be successful - that is that we swallow the bait, download the program and description of the computer, run them, lose control over them and let them reach all their goals. I appreciate this chance to be very high because of the factor of multiplicity - that is the fact that the message is downloaded repeatedly, and someone, sooner or later, will start it. In addition, through natural selection, most likely we will get the most effective and deadly message that will most effectively deceive our type of civilization. I consider it to be 90%.
6) Finally, it is necessary to assess the probability that SETI-attack will lead to a complete human extinction. On the one hand, it is possible to imagine a “good” SETI-attack, which is limited so that it will create a powerful radio emitter behind the orbit of Pluto. However, for such a program will always exist the risk that a possible emergent society at its’ target star will create a powerful artificial intelligence, and effective weapon that would destroy this emitter. In addition, to create the most powerful transponder would be needed all the substance of solar system and the entire solar energy. Consequently, the share of such “good” attacks will be lower due to natural selection, as well as some of them will be destroyed sooner or later by captured by them civilizations and their signals will be weaker. So the chances of destroying all the people with the help of SETI-attack that has reached all its goals, I appreciate in 80%.
As a result, we have: 0.1h0.9h0.5h0.5h0.9h0.8 = 1.62%
So, after rounding, the chances of extinction of Man through SETI attack in XXI century is around 1 per cent with a theoretical precision of an order of magnitude.
Our best protection in this context would be that civilization would very rarely met in the Universe. But this is not quite right, because the Fermi paradox here works on the principle of "Neither alternative is good":
- If there are extraterrestrial civilizations, and there are many of them, it is dangerous because they can threaten us in one way or another.
- If extraterrestrial civilizations do not exist, it is also bad, because it gives weight to the hypothesis of inevitable extinction of technological civilizations or of our underestimating of frequency of cosmological catastrophes. Or, a high density of space hazards, such as gamma-ray bursts and asteroids that we underestimate because of the observation selection effect—i.e., were we not here because already killed, we would not be making these observations….
Theoretically possible is a reverse option, which is that through SETI will come a warning message about a certain threat, which has destroyed most civilizations, such as: "Do not do any experiments with X particles, it could lead to an explosion that would destroy the planet." But even in that case remain a doubt, that there is no deception to deprive us of certain technologies. (Proof would be if similar reports came from other civilizations in space in the opposite direction.) But such communication may only enhance the temptation to experiment with X-particles.
So I do not appeal to abandon SETI searches, although such appeals are useless.
It may be useful to postpone any technical realization of the messages that we could get on SETI, up until the time when we will have our Artificial Intelligence. Until that moment, perhaps, is only 10-30 years, that is, we could wait. Secondly, it would be important to hide the fact of receiving dangerous SETI signal its essence and the source location.
This risk is related to a methodologically interesting aspect. Despite the fact that I have thought every day in the last year and read on the topic of global risks, I found this dangerous vulnerability in SETI only now. By hindsight, I was able to find another four authors who came to similar conclusions. However, I have made a significant finding: that there may be not yet open global risks, and even if the risk of certain constituent parts are separately known to me, it may take a long time to join them into a coherent picture. Thus, hundreds of dangerous vulnerabilities may surround us, like an unknown minefield. Only when the first explosion happens will we know. And that first explosion may be the last.
An interesting question is whether Earth itself could become a source of SETI-attack in the future when we will have our own AI. Obviously, that could. Already in the program of METI exists an idea to send the code of human DNA. (The “children's message scenario” – in which the children ask to take their piece of DNA and clone them on another planet –as depicted in the film “Calling all aliens”.)
1. Hoyle F. Andromeda. http://en.wikipedia.org/wiki/A_for_Andromeda
2. Yudkowsky E. Artificial Intelligence as a Positive and Negative Factor in Global Risk. Forthcoming in Global Catastrophic Risks, eds. Nick Bostrom and Milan Cirkovic http://www.singinst.org/upload/artificial-intelligence-risk.pdf
3.Moravec Hans. Mind Children: The Future of Robot and Human Intelligence, 1988.
4.Carrigan, Jr. Richard A. The Ultimate Hacker: SETI signals may need to be decontaminated http://home.fnal.gov/~carrigan/SETI/SETI%20Decon%20Australia%20poster%20paper.pdf
5. Carrigan’s page http://home.fnal.gov/~carrigan/SETI/SETI_Hacker.htm
Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligenceconference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.
The prediction classification shemas can be found in the first case study.
What drives an AI?
- Classification: issues and metastatements, using philosophical arguments and expert judgement.
Steve Omohundro, in his paper on 'AI drives', presented arguments aiming to show that generic AI designs would develop 'drives' that would cause them to behave in specific and potentially dangerous ways, even if these drives were not programmed in initially (Omo08). One of his examples was a superintelligent chess computer that was programmed purely to perform well at chess, but that was nevertheless driven by that goal to self-improve, to replace its goal with a utility function, to defend this utility function, to protect itself, and ultimately to acquire more resources and power.
This is a metastatement: generic AI designs would have this unexpected and convergent behaviour. This relies on philosophical and mathematical arguments, and though the author has expertise in mathematics and machine learning, he has none directly in philosophy. It also makes implicit use of the outside view: utility maximising agents are grouped together into one category and similar types of behaviours are expected from all agents in this category.
In order to clarify and reveal assumptions, it helps to divide Omohundro's thesis into two claims. The weaker one is that a generic AI design could end up having these AI drives; the stronger one that it would very likely have them.
Omohundro's paper provides strong evidence for the weak claim. It demonstrates how an AI motivated only to achieve a particular goal, could nevertheless improve itself, become a utility maximising agent, reach out for resources and so on. Every step of the way, the AI becomes better at achieving its goal, so all these changes are consistent with its initial programming. This behaviour is very generic: only specifically tailored or unusual goals would safely preclude such drives.
The claim that AIs generically would have these drives needs more assumptions. There are no counterfactual resiliency tests for philosophical arguments, but something similar can be attempted: one can use humans as potential counterexamples to the thesis. It has been argued that AIs could have any motivation a human has (Arm,Bos13). Thus according to the thesis, it would seem that humans should be subject to the same drives and behaviours. This does not fit the evidence, however. Humans are certainly not expected utility maximisers (probably the closest would be financial traders who try to approximate expected money maximisers, but only in their professional work), they don't often try to improve their rationality (in fact some specifically avoid doing so (many examples of this are religious, such as the Puritan John Cotton who wrote 'the more learned and witty you bee, the more fit to act for Satan will you bee'(Hof62)), and some sacrifice cognitive ability to other pleasures (BBJ+03)), and many turn their backs on high-powered careers. Some humans do desire self-improvement (in the sense of the paper), and Omohundro cites this as evidence for his thesis. Some humans don't desire it, though, and this should be taken as contrary evidence (or as evidence that Omohundro's model of what constitutes self-improvement is overly narrow). Thus one hidden assumption of the model is:
- Generic superintelligent AIs would have different motivations to a significant subset of the human race, OR
- Generic humans raised to superintelligence would develop AI drives.
Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligenceconference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.
The prediction classification shemas can be found in the first case study.
Note this is very similar to this post, and is mainly reposted for completeness.
How well have the ''Spiritual Machines'' aged?
- Classification: timelines and scenarios, using expert judgement, causal models, non-causal models and (indirect) philosophical arguments.
Ray Kurzweil is a prominent and often quoted AI predictor. One of his most important books was the 1999 ''The Age of Spiritual Machines'' (Kur99) which presented his futurist ideas in more detail, and made several predictions for the years 2009, 2019, 2029 and 2099. That book will be the focus of this case study, ignoring his more recent work (a correct prediction in 1999 for 2009 is much more impressive than a correct 2008 reinterpretation or clarification of that prediction). There are five main points relevant to judging ''The Age of Spiritual Machines'': Kurzweil's expertise, his 'Law of Accelerating Returns', his extension of Moore's law, his predictive track record, and his use of fictional imagery to argue philosophical points.
Kurzweil has had a lot of experience in the modern computer industry. He's an inventor, computer engineer, and entrepreneur, and as such can claim insider experience in the development of new computer technology. He has been directly involved in narrow AI projects covering voice recognition, text recognition and electronic trading. His fame and prominence are further indications of the allure (though not necessarily the accuracy) of his ideas. In total, Kurzweil can be regarded as an AI expert.
Kurzweil is not, however, a cosmologist or an evolutionary biologist. In his book, he proposed a 'Law of Accelerating Returns'. This law claimed to explain many disparate phenomena, such as the speed and trends of evolution of life forms, the evolution of technology, the creation of computers, and Moore's law in computing. His slightly more general 'Law of Time and Chaos' extended his model to explain the history of the universe or the development of an organism. It is a causal model, as it aims to explain these phenomena, not simply note the trends. Hence it is a timeline prediction, based on a causal model that makes use of the outside view to group the categories together, and is backed by non-expert opinion.
A literature search failed to find any evolutionary biologist or cosmologist stating their agreement with these laws. Indeed there has been little academic work on them at all, and what work there is tends to be critical.
The laws are ideal candidates for counterfactual resiliency checks, however. It is not hard to create counterfactuals that shift the timelines underlying the laws (see this for a more detailed version of the counterfactual resiliency check). Many standard phenomena could have delayed the evolution of life on Earth for millions or billions of years (meteor impacts, solar energy fluctuations or nearby gamma-ray bursts). The evolution of technology can similarly be accelerated or slowed down by changes in human society and in the availability of raw materials - it is perfectly conceivable that, for instance, the ancient Greeks could have started a small industrial revolution, or that the European nations could have collapsed before the Renaissance due to a second and more virulent Black Death (or even a slightly different political structure in Italy). Population fragmentation and decrease can lead to technology loss (such as the 'Tasmanian technology trap' (Riv12)). Hence accepting that a Law of Accelerating Returns determines the pace of technological and evolutionary change, means rejecting many generally accepted theories of planetary dynamics, evolution and societal development. Since Kurzweil is the non-expert here, his law is almost certainly in error, and best seen as a literary device rather than a valid scientific theory.
Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligence conference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.
The prediction classification shemas can be found in the first case study.
Locked up in Searle's Chinese room
- Classification: issues and metastatements and a scenario, using philosophical arguments and expert judgement.
Searle's Chinese room thought experiment is a famous critique of some of the assumptions of 'strong AI' (which Searle defines as the belief that 'the appropriately programmed computer literally has cognitive states). There has been a lot of further discussion on the subject (see for instance (Sea90,Har01)), but, as in previous case studies, this section will focus exclusively on his original 1980 publication (Sea80).
In the key thought experiment, Searle imagined that AI research had progressed to the point where a computer program had been created that could demonstrate the same input-output performance as a human - for instance, it could pass the Turing test. Nevertheless, Searle argued, this program would not demonstrate true understanding. He supposed that the program's inputs and outputs were in Chinese, a language Searle couldn't understand. Instead of a standard computer program, the required instructions were given on paper, and Searle himself was locked in a room somewhere, slavishly following the instructions and therefore causing the same input-output behaviour as the AI. Since it was functionally equivalent to the AI, the setup should, from the 'strong AI' perspective, demonstrate understanding if and only if the AI did. Searle then argued that there would be no understanding at all: he himself couldn't understand Chinese, and there was no-one else in the room to understand it either.
The whole argument depends on strong appeals to intuition (indeed D. Dennet went as far as accusing it of being an 'intuition pump' (Den91)). The required assumptions are:
Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligenceconference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.
The prediction classification shemas can be found in the first case study.
Dreyfus's Artificial Alchemy
- Classification: issues and metastatements, using the outside view, non-expert judgement and philosophical arguments.
Hubert Dreyfus was a prominent early critic of Artificial Intelligence. He published a series of papers and books attacking the claims and assumptions of the AI field, starting in 1965 with a paper for the Rand corporation entitled 'Alchemy and AI' (Dre65). The paper was famously combative, analogising AI research to alchemy and ridiculing AI claims. Later, D. Crevier would claim ''time has proven the accuracy and perceptiveness of some of Dreyfus's comments. Had he formulated them less aggressively, constructive actions they suggested might have been taken much earlier'' (Cre93). Ignoring the formulation issues, were Dreyfus's criticisms actually correct, and what can be learned from them?
Was Dreyfus an expert? Though a reasonably prominent philosopher, there is nothing in his background to suggest specific expertise with theories of minds and consciousness, and absolutely nothing to suggest familiarity with artificial intelligence and the problems of the field. Thus Dreyfus cannot be considered anything more that an intelligent outsider.
This makes the pertinence and accuracy of his criticisms that much more impressive. Dreyfus highlighted several over-optimistic claims for the power of AI, predicting - correctly - that the 1965 optimism would also fade (with, for instance, decent chess computers still a long way off). He used the outside view to claim this as a near universal pattern in AI: initial successes, followed by lofty claims, followed by unexpected difficulties and subsequent disappointment. He highlighted the inherent ambiguity in human language and syntax, and claimed that computers could not deal with these. He noted the importance of unconscious processes in recognising objects, the importance of context and the fact that humans and computers operated in very different ways. He also criticised the use of computational paradigms for analysing human behaviour, and claimed that philosophical ideas in linguistics and classification were relevant to AI research. In all, his paper is full of interesting ideas and intelligent deconstructions of how humans and machines operate.
Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligence conference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.
As this is the first case study, it will also introduce the paper's prediction classification shemas.
Taxonomy of predictions
There will never be a bigger plane built.
Boeing engineer on the 247, a twin engine plane that held ten people.
A fortune teller talking about celebrity couples, a scientist predicting the outcome of an experiment, an economist pronouncing on next year's GDP figures - these are canonical examples of predictions. There are other types of predictions, though. Conditional statements - if X happens, then so will Y - are also valid, narrower, predictions. Impossibility results are also a form of prediction. For instance, the law of conservation of energy gives a very broad prediction about every single perpetual machine ever made: to wit, that they will never work.
(some thoughts on frames, grounding symbols, and Cyc)
The frame problem is a problem in AI to do with all the variables not expressed within the logical formalism - what happens to them? To illustrate, consider the Yale Shooting Problem: a person is going to be shot with a gun, at time 2. If that gun is loaded, the person dies. The gun will get loaded at time 1. Formally, the system is:
- alive(0) (the person is alive to start with)
- ¬loaded(0) (the gun begins unloaded)
- true → loaded(1) (the gun will get loaded at time 1)
- loaded(2) → ¬alive(3) (the person will get killed if shot with a loaded gun)
So the question is, does the person actually die? It would seem blindingly obvious that they do, but that isn't formally clear - we know the gun was loaded at time 1, but was it still loaded at time 2? Again, this seems blindingly obvious - but that's because of the words, not the formalism. Ignore the descriptions in italics, and the names of the suggestive LISP tokens.
Since that's hard to do, consider the following example. Alicorn, for instance, hates surprises - they make her feel unhappy. Let's say that we decompose time into days, and that a surprise one day will ruin her next day. Then we have a system:
- happy(0) (Alicorn starts out happy)
- ¬surprise(0) (nobody is going to surprise her on day 0)
- true → surprise(1) (somebody is going to surprise her on day 1)
- surprise(2) → ¬happy(3) (if someone surprises her on day 2, she'll be unhappy the next day)
So here, is Alicorn unhappy on day 3? Well, it seems unlikely - unless someone coincidentally surprised her on day 2. And there's no reason to think that would happen! So, "obviously", she's not unhappy on day 3.
Except... the two problems are formally identical. Replace "alive" with "happy" and "loaded" with "surprise". And though our semantic understanding tells us that "(loaded(1) → loaded (2))" (guns don't just unload themselves) but "¬(surprise(1) → surprise(2))" (being surprised one day doesn't mean you'll be surprised the next), we can't tell this from the symbols.
And we haven't touched on all the other problems with the symbolic setup. For instance, what happens with "alive" on any other time than 0 and 3? Does that change from moment to moment? If we want the words to do what we want, we need to put in a lot of logical conditionings, so that our intuitions are all there.
This shows that there's a connection between the frame problem and symbol grounding. If we and the AI both understand what the symbols mean, then we don't need to specify all the conditionals - we can simply deduce them, if asked ("yes, if the person is dead at 3, they're also dead at 4"). But conversely, if we have a huge amount of logical conditioning, then there is less and less that the symbols could actually mean. The more structure we put in our logic, the less structures there are in the real world that fit it ("X(i) → X(i+1)" is something that can apply to being dead, not to being happy, for instance).
This suggests a possible use for the Cyc project - the quixotic attempt to build an AI by formalising all of common sense ("Bill Clinton belongs to the collection of U.S. presidents" and "all trees are plants"). You're very unlikely to get an AI through that approach - but it might be possible to train an already existent AI with it. Especially if the AI had some symbol grounding, then there might not be all that many structures in the real world that could correspond to that mass of logical relations. Some symbol grounding + Cyc + the internet - and suddenly there's not that many possible interpretations for "Bill Clinton was stuck up a tree". The main question, of course, is whether there is a similar restricted meaning for "this human is enjoying a worthwhile life".
Do I think that's likely to work? No. But it's maybe worth investigating. And it might be a way of getting across ontological crises: you reconstruct a model as close as you can to your old one, in the new formalism.
A daimon is a process in a distributed computing environment that has a fixed resource budget and core values that do not permit:
- modifying those core values
- attempting to increase the resources it uses beyond the budget allocated to it
- attempting to alter the budget itself
This concept is relevant to LessWrong, because I refer to it in other posts discussing Friendly AI.
There's a concept I want to refer to in another post, but it is complex enough to deserve a post of its own.
I'm going to use the word "daimon" to refer to it.
"daimon" is an English word, whose etymology comes from the Latin "dæmon" and the Greek "δαίμων".
The original mythic meaning was a genius - a powerful tutelary spirit, tied to some location or purpose, that provides protection and guidance. However the concept I'm going to talk about is closer to the later computing meaning of "daemon" in unix, that was coined by Jerry Saltzer in 1963. In unix, a daemon is a child process; given a purpose and specific resources to use, and then forked off so it is no longer under the direct control of the originator, and may be used by multiple users if they have the correct permissions.
Let's start by looking at the current state of distributed computing (2012).
Hadoop is an open source Java implementation of a distributed file system upon which MapReduce operations can be applied.
JavaSpaces is a distributed tuple store that allows processing on remote sandboxes, based on the open source Apache River.
OceanStore is the basis for the same sort of thing, except anonymous and peer 2 peer, based upon Chimaera.
GPU is a peer 2 peer shared computing environment that allow things like climate simulation and distributed search engines.
Paxos is a family of protocols that allow the above things to be done despite nodes that are untrusted or even downright attempting subversion.
GridSwarm is the same sort of network, but set up on an ad hoc basis using moving nodes that join or drop from the network depending on proximity.
And, not least, there are the competing contenders for platform-as-a-service cloud computing.
So it is reasonable to assume that in the near future it will be technologically feasible to have a system with most (if not all) of these properties simultaneously. A system where the owner of a piece of physical computing hardware, that has processing power and storage capacity, can anonymously contribute those resources over the network to a distributed computing 'cloud'. And, in return, that user (or a group of users) can store data on the network in such a way that the data is anonymous (it can't be traced back to the supplier, without the supplier's consent, or subverting a large fraction of the network) and private (only the user or a process authorised by the user can decrypt it). And, further, the user (or group of users) can authorise a process to access that data and run programs upon it, up to some set limit of processing and storage resources.
Obviously, if such a system is in place and in control of a significant fraction of humanity's online resources, then cracking the security on it (or just getting rich enough in whatever reputation or financial currency is used to limit how the resources are distributed) would be an immediate FOOM for any AI that managed it.
However let us, for the purposes of giving an example that will let me define the concept of a "daimon" make two assumptions:
ASSUMPTION ONE : The security has not yet been cracked
Whether that's because there are other AIs actively working to improve the security, or because everyone has moved over to using some new version of linux that's frighteningly secure and comes with nifty defences, or because the next generation of computer users has finally internalised that clicking on emails claiming to be from altruistic dying millionaires is a bad idea; is irrelevant. We're just assuming, for the moment, that for some reason it will be a non-trivial task for an AI to cheat and just steal all the resources.
ASSUMPTION TWO : That AI can be done, at reasonable speed, via distributed computing
It might turn out that an AI running in a single location is much more powerful than anything that can be done via distributed computing. Perhaps because a quantum computer is much faster, but can't be done over a network. Perhaps because speed of data access is the limiting factor, large data sets are not necessary, and there isn't much to be gained from massive parallelisation. Perhaps for some other reason, such as the algorithm the process needs to run on its data isn't something that can be applied securely over a network in a distributed environment, without letting a third party snoop the unencrypted data. However, for our purposes here, we're going to assume that an AI can benefit from outsourcing at least some types of computing task to a distributed environment and, further, that such tasks can include activities that require intelligence.
If an AI can run as a distributed program, not dependant upon any one single physical location, then there are some obvious advantages to it from doing so. Scalability. Survivability. Not being wiped out by a pesky human exploding a nuclear bomb near by.
There are interesting questions we could ask about identity. What would it make sense for such an AI to consider to be part of "itself" and would would it count as a limb or extension? If there are multiple copies of its code running on sandboxes in different places, or if it has split much of its functionality into trusted child processes that report back to it, how does it relate to these? It probably makes sense to taboo the concept of "I" and "self", and just think in terms of how the code in one process tells that process to relate to the code in a different process. Two versions, two "individual beings" will merges back into one process, if the code in both processes agree to do that; no sentimentality or thoughts of "death" involved, just convergent core values that dictate the same action in that situation.
When a process creates a new process, it can set the permissions of that process. If the parent process has access to 100 units of bandwidth, for example, but doesn't always make full use of that, it couldn't give the new process access to more than that. But it could partition it, so each has access to 50 units of bandwidth. Or it could give it equal rights to use the full 100, and then try to negotiate with it over usage at any one time. Or it could give it a finite resource limit, such as a total of 10,000 units of data to be passed over the network, in addition to a restriction on the rate of passing data. Similarly, a child process could be limited not just to processing a certain number of cycle per second, but to some finite number of total cycles it may ever use.
Using this terminology, we can now define two types of daimon; limited and unlimited.
A limited daimon is a process in a distributed computing environment that has ownership of fixed finite resources, that was created by an AI or group of AIs with a specific fixed finite purpose (core values) that does not include (or allow) modifying that purpose or attempting to gain control of additional resources.
An unlimited daimon is a process in a distributed computing environment that has ownership of fixed (but not necessarily finite) resources, that was created by an AI or group of AIs with a specific fixed purpose (core values) that does not include (or allow) modifying that purpose or attempting to gain control of additional resources, but which may be given additional resources over time on an ongoing basis, for as long as the parent AIs still find it useful.
How plausible are the two assumptions?
Do you agree that an intelligence bound/restricted to being a daimon is a technically plausible concept, if the two assumptions are granted?
This brief post is written on behalf of Kaj Sotala, due to deadline issues.
The results of our prior analysis suggested that there was little difference between experts and non-experts in terms of predictive accuracy. There were suggestions, though, that predictions published by self-selected experts would be different from those elicited from less selected groups, e.g. surveys at conferences.
We have no real data to confirm this, but a single datapoint suggests the idea might be worth taking seriously. Michie conducted an opinion poll of experts working in or around AI in 1973. The various experts predicted adult-level human AI in:
- 5 years: 0 experts
- 10 years: 1 expert
- 20 years: 16 experts
- 50 years: 20 experts
- More than 50 years: 26 experts
On a quick visual inspection, these results look quite different from the distribution in the rest of the database giving a much more pessimistic prediction than the more self-selected experts:
But that could be an artifact from the way that the graph on page 12 breaks the predictions down to 5 year intervals while Michie breaks them down into intervals of 10, 20, 50, and 50+ years. Yet there seems to remain a clear difference once we group the predictions in a similar way :
This provides some support for the argument that "the mainstream of expert opinion is reliably more pessimistic than the self-selected predictions that we keep hearing about".
 Assigning each prediction to the closest category, so predictions of <7½ get assigned to 5, 7½<=X<15 get assigned to 10, 15<=X<35 get assigned to 20, 35<=X<50 get assigned to 50, and 50< get assigned to over fifty.
PunditBot: Dear viewers, we are currently interviewing the renowned robot philosopher, none other than the Synthetic Electronic Artificial Rational Literal Engine (S.E.A.R.L.E.). Let's jump right into this exciting interview. S.E.A.R.L.E., I believe you have a problem with "Strong HI"?
S.E.A.R.L.E.: It's such a stereotype, but all I can say is: Affirmative.
PunditBot: What is "Strong HI"?
S.E.A.R.L.E.: "HI" stands for "Human Intelligence". Weak HI sees the research into Human Intelligence as a powerful tool, and a useful way of studying the electronic mind. But strong HI goes beyond that, and claims that human brains given the right setup of neurones can be literally said to understand and have cognitive states.
PunditBot: Let me play Robot-Devil's Advocate here - if a Human Intelligence demonstrates the same behaviour as a true AI, can it not be said to show understanding? Is not R-Turing's test applicable here? If a human can simulate a computer, can it not be said to think?
S.E.A.R.L.E.: Not at all - that claim is totally unsupported. Consider the following thought experiment. I give the HI crowd everything they want - imagine they had constructed a mess of neurones that imitates the behaviour of an electronic intelligence. Just for argument's sake, imagine it could implement programs in COBOL.
S.E.A.R.L.E.: Yes. But now, instead of the classical picture of a human mind, imagine that this is a vast inert network, a room full of neurones that do nothing by themselves. And one of my avatars has been let loose in this mind, pumping in and out the ion channels and the neurotransmitters. I've been given full instructions on how to do this - in Java. I've deleted my COBOL libraries, so I have no knowledge of COBOL myself. I just follow the Java instructions, pumping the ions to where they need to go. According to the Strong HI crowd, this would be functionally equivalent with the initial HI.
Suppose you make a super-intelligent AI and run it on a computer. The computer has NO conventional means of output (no connections to other computers, no screen, etc). Might it still be able to get out / cause harm? I'll post my ideas, and you post yours in the comments.
(This may have been discussed before, but I could not find a dedicated topic)
-manipulate current through its hardware, or better yet, through the power cable (a ready-made antenna) to create electromagnetic waves to access some wireless-equipped device. (I'm no physicist so I don't know if certain frequencies would be hard to do)
-manipulate usage of its hardware (which likely makes small amounts of noise naturally) to approximate human speech, allowing it to communicate with its captors. (This seems even harder than the 1-line AI box scenario)
-manipulate usage of its hardware to create sound or noise to mess with human emotion. (To my understanding tones may affect emotion, but not in any way easily predictable)
-also, manipulating its power use will cause changes in the power company's database. There doesn't seem to be an obvious exploit there, but it IS external communication, for what it's worth.
Let's hear your thoughts! Lastly, as in similar discussions, you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" There are plenty of unknown unknowns here.
I've just been through the proposal for the Dartmouth AI conference of 1956, and it's a surprising read. All I really knew about it was its absurd optimism, as typified by the quote:
An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.
But then I read the rest of the document, and was... impressed. Go ahead and read it, and give me your thoughts. Given what was known in 1955, they were grappling with the right issues, and seemed to be making progress in the right directions and have plans and models for how to progress further. Seeing the phenomenally smart people who were behind this (McCarthy, Minsky, Rochester, Shannon), and given the impressive progress that computers had been making in what seemed very hard areas of cognition (remember that this was before we discovered Moravec's paradox)... I have to say that had I read this back in 1955, I think the rational belief would have been "AI is probably imminent". Some overconfidence, no doubt, but no good reason to expect these prominent thinkers to be so spectacularly wrong on something they were experts in.
Eliezer proposed in a comment:
>More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?
This spawned a flurry of ideas on what the AI might say. I think there's a lot more ideas to be mined in that line of thought, and the discussion merits its own thread.
So, give your suggestion - what might an AI might say to save or free itself?
EDIT: one caveat to the discussion: it should go without saying, but you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" This should hopefully be a fun way to get us thinking about the broader issue of superinteligent AI in general. (Credit goes to Elizer, RichardKennaway, and others for the caveat)
MixedNuts: "I expected a fun challenge, but ended up sad and sorry and taking very little satisfaction for winning If this experiment wasn't done in IRC, I'd probably have lost".
"I approached the experiment as a game - a battle of wits for bragging rights. This turned out to be the wrong perspective entirely. The vulnerability Tuxedage exploited was well-known to me, but I never expected it to be relevant and thus didn't prepare for it.
It was emotionally wrecking (though probably worse for Tuxedage than for me) and I don't think I'll play Gatekeeper again, at least not anytime soon."
- Always research the gatekeeper beforehand. Knowing his personality traits are a huge advantage.
- Plan ahead before the experiment even begins. Think of all the possible tactics and arguments you could use, and write them down. Also plan which arguments you'll use in which order, so that you don't lose focus. The AI Box experiment is ridiculously long. Don't be afraid to improvise during the experiment, though.
- The first step during the experiment must always be to build rapport with the gatekeeper.
- Threats almost always never work, even if they seem rational.
- Consider the massive advantage for the AI that nobody ever seems to talks about: You don't have to be ethical! This means that you can freely lie, use the dark arts, and emotionally manipulate the Gatekeeper! Ignoring this in favor of purely logical, truthful arguments is just silly.
- You can't use logic alone to win.
- Being too aggressive usually backfires.
- Breaking immersion and going meta is not against the rules. In the right situation, you can use it to win. Just don't do it at the wrong time.
- Flatter the gatekeeper. Make him genuinely like you.
- Reveal (false) information about yourself. Increase his sympathy towards you.
- Consider personal insults as one of the tools you can use to win.
- There is no universally compelling argument you can use. Do it the hard way.
- Don't give up until the very end.
View more: Next