[Link] If we knew about all the ways an Intelligence Explosion could go wrong, would we be able to avoid them?
https://www.reddit.com/r/LessWrong/comments/2icm8m/if_we_knew_about_all_the_ways_an_intelligence/
I submitted this a while back to the lesswrong subreddit, but it occurs to me now that most LWers probably don't actually check the sub. So here it is again in case anyone that's interested didn't see it.
The Truth and Instrumental Rationality
One of the central focuses of LW is instrumental rationality. It's been suggested, rather famously, that this isn't about having true beliefs, but rather its about "winning". Systematized winning. True beliefs are often useful to this goal, but an obsession with "truthiness" is seen as counter-productive. The brilliant scientist or philosopher may know the truth, yet be ineffective. This is seen as unacceptable to many who see instrumental rationality as the critical path to achieving one's goals. Should we all discard our philosophical obsession with the truth and become "winners"?
The River Instrumentus
You are leading a group of five people away from deadly threat which is slowly advancing behind you. You come to a river. It looks too dangerous to wade through, but through the spray of the water you see a number of stones. They are dotted across the river in a way that might allow you to cross. However, the five people you are helping are extremely nervous and in order to convince them to cross, you will not only have to show them its possible to cross, you will also need to look calm enough after doing it to convince them that it's safe. All five of them must cross, as they insist on living or dying together.
Just as you are about to step out onto the first stone it splutters and moves in the mist of the spraying water. It looks a little different from the others, now you think about it. After a moment you realise its actually a person, struggling to keep their head above water. Your best guess is that this person would probably drown if they got stepped on by five more people. You think for a moment, and decide that, being a consequentialist concerned primarily with the preservation of life, it is ultimately better that this person dies so the others waiting to cross might live. After all, what is one life compared with five?
However, given your need for calm and the horror of their imminent death at your hands (or feet), you decide it is better not to think of them as a person, and so you instead imagine them being simply a stone. You know you'll have to be really convincingly calm about this, so you look at the top of the head for a full hour until you utterly convince yourself that the shape you see before you is factually indicitative not of a person, but of a stone. In your mind, tops of heads aren't people - now they're stones. This is instrumentally rational - when you weigh things up the self-deception ultimately increases the number of people who will likely live, and there is no specific harm you can identify as a result.
After you have finished convincing yourself you step out onto the per... stone... and start crossing. However, as you step out onto the subsequent stones, you notice they all shift a little under your feet. You look down and see the stones spluttering and struggling. You think to yourself "lucky those stones are stones and not people, otherwise I'd be really upset". You lead the five very greatful people over the stones and across the river. Twenty dead stones drift silently downstream.
When we weigh situations on pure instrumentality, small self deception makes sense. The only problem is, in an ambiguous and complex world, self-deceptions have a notorious way of compounding eachother, and leave a gaping hole for cognitive bias to work its magic. Many false but deeply-held beliefs throughout human history have been quite justifiable on these grounds. Yet when we forget the value of truth, we can be instrumental, but we are not instrumentally rational. Rationality implies, or ought to imply, a value of the truth.
Winning and survival
In the jungle of our evolutionary childhood, humanity formed groups to survive. In these groups there was a hierachy of importance, status and power. Predators, starvation, rival groups and disease all took the weak on a regular basis, but the groups afforded a partial protection. However, a violent or unpleasant death still remained a constant threat. It was of particular threat to the lowest and weakest members of the group. Sometimes these individuals were weak because they were physically weak. However, over time groups that allowed and rewarded things other than physical strength became more successful. In these groups, discussion played a much greater role in power and status. The truely strong individuals, the winners in this new arena were one's that could direct converstation in their favour - conversations about who will do what, about who got what, and about who would be punished for what. Debates were fought with words, but they could end in death all the same.
In this environment, one's social status is intertwined with one's ability to win. In a debate, it was not so much a matter of what was true, but of what facts and beliefs achieved one's goals. Supporting the factual position that suited one's own goals was most important. Even where the stakes where low or irrelevant, it payed to prevail socially, because one's reputation guided others limited cognition about who was best to listen to. Winning didn't mean knowing the most, it meant social victory. So when competition bubbled to the surface, it payed to ignore what one's opponent said and instead focus on appearing superior in any way possible. Sure, truth sometimes helped, but for the charismatic it was strictly optional. Politics was born.
Yet as groups got larger, and as technology began to advance for the first time, there appeared a new phenomenon. Where a group's power dynamics meant that it systematically had false beliefs, it became more likely to fail. The group that believing that fire spirits guided a fire's advancement fared poorly compared with those who checked the wind and planned their means of escape accordingly. The truth finally came into its own. Yet truth, as opposed to simple belief by politics, could not be so easily manipulated for personal gain. The truth had no master. In this way it was both dangerous and liberating. And so slowly but surely the capacity for complex truth-pursuit became evolutionarily impressed upon the human blueprint.
However, in evolutionary terms there was little time for the completion of this new mental state. Some people had it more than others. It also required the right circumstances for it to rise to the forefront of human thought. And other conditions could easily destroy it. For example, should a person's thoughts be primed with an environment of competition, the old ways came bubbling up to the surface. When a person's environment is highly competitive, it reverts to its primitive state. Learning and updating of views becomes increasingly difficult, because to the more primitive aspects of a person's social brain, updating one's views is a social defeat.
When we focus an organisation's culture on winning, there can be many benefits. It can create an air of achievement, to a degree. Hard work and the challenging of norms can be increased. However, we also prime the brain for social conflict. We create an environment where complexity and subtlety in conversation, and consequently in thought, is greatly reduced. In organisations where the goals and means are largely intellectual, a competitive environment creates useless conversations, meaningless debates, pointless tribalism, and little meaningful learning. There are many great examples, but I think you'd be best served watching our elected representatives at work to gain a real insight.
Rationality and truth
Rationality ought to contain an implication of truthfulness. Without it, our little self-deceptions start to gather and compond one another. Slowly but surely, they start to reinforce, join, and form an unbreakable, unchallengable yet utterly false belief system. I need not point out the more obvious examples, for in human society, there are many. To avoid this on LW and elsewhere, truthfulness of belief ought to inform all our rational decisions, methods and goals. Of course true beliefs do not guarantee influence or power or achievement, or anything really. In a world of half-evolved truth-seeking equipment, why would we expect that? What we can expect is that, if our goals are anything to do with the modern world in all its complexity, the truth isn't sufficient, but it is neccessary.
Instrumental rationality is about achieving one's goals, but in our complex world goals manifest in many ways - and we can never really predict how a false belief will distort our actions to utterly destroy our actual achievements. In the end, without truth, we never really see the stones floating down the river for what they are.
A few thoughts on a Friendly AGI (safe vs friendly, other minds problem, ETs and more)
Friendly AI is an idea that I find to be an admirable goal. While I'm not yet sure an intelligence explosion is likely, or whether FAI is possible, I've found myself often thinking about it, and I'd like for my first post to share a few those thoughts on FAI with you.
Safe AGI vs Friendly AGI
-Let's assume an Intelligence Explosion is possible for now, and that an AGI with the ability to improve itself somehow is enough to achieve it.
-Let's define a safe AGI as an above-human general AI that does not threaten humanity or terran life (eg. FAI, Tool AGI, possibly Oracle AGI)
-Let's define a Friendly AGI as one that *ensures* the continuation of humanity and terran life.
-Let's say an unsafe AGI is all other AGIs.
-Safe AGIs must supress unsafe AGIs in order to be considered Friendly. Here's why:
-An unsafe AGI is likely to be built at that point because:
-Some people will find the safe AGI's goals unnacceptable
-Some people will rationalise or simply mistake that their AGI design is safe when it is not
-Some people will not care if their AGI design is safe, because they do not care about other people, or because they hold some extreme beliefs
-Therefore, If a safe AGI does not prevent unsafe AGIs from coming into existence, humanity will very likely be destroyed.
-The AGI most likely to prevent unsafe AGIs from being created is one that actively predicted their development and terminates that development before or on completion.
-So to summarise
-Oracle and Tool AGIs are not Friendly AIs, they are just safe AIs, because they don't suppress anything.
-Oracle and Tool AGIs are a bad plan for AI if we want to prevent the destruction of humanity, because hostile AGIs will surely follow.
(**On reflection I cannot be certain of this specific point, but I assume it would take a fairly restrictive regime for this to be wrong. Further comments on this very welcome.)
Other minds problem - Why should be philosophically careful when attempting to theorise about FAI
I read quite a few comments in AI discussions that I'd probably characterise as "the best utility function for a FAI is one that values all consciousness". I'm quite concerned that this persists as a deeply held and largely unchallenged assumption amongst some FAI supporters. I think in general I find consciousness to be an extremely contentious, vague and inconsistently defined concept, but here I want to talk about some specific philosophical failures.
My first concern is that while many AI theorists like to say that consciousness is a physical phenomenon, which seems to imply Monist/Physicalist views, they at the same time don't seem to understand that consciousness is a Dualist concept that is coherent only in a Dualist framework. A Dualist believes there is a thing called a "subject" (very crudely this equates with the mind) and then things called objects (the outside "empirical" world interpreted by that mind). Most of this reasoning begins with Descartes' cogito ergo sum or similar starting points ( https://en.wikipedia.org/wiki/Cartesian_dualism ). Subjective experience, qualia and consciousness make sense if you accept that framework. But if you're a Monist, this arbitrary distinction between a subject and object is generally something you don't accept. In the case of a Physicalist, there's just matter doing stuff. A proper Physicalist doesn't believe in "consciousness" or "subjective experience", there's just brains and the physical human behaviours that occur as a result. Your life exists from a certain point of view, I hear you say? The Physicalist replies, "well a bunch of matter arranged to process information would say and think that, wouldn't it?".
I don't really want to get into whether Dualism or Monism is correct/true, but I want to point out even if you try to avoid this by deciding Dualism is right and consciousness is a thing, there's yet another more dangerous problem. The core of the problem is that logically or empirically establishing the existence of minds, other than your own is extremely difficult (impossible according to many). They could just be physical things walking around acting similar to you, but by virtue of something purely mechanical - without actual minds. In philosophy this is called the "other minds problem" ( https://en.wikipedia.org/wiki/Problem_of_other_minds or http://plato.stanford.edu/entries/other-minds/). I recommend a proper read of it if the idea seems crazy to you. It's a problem that's been around for centuries, and yet to-date we don't really have any convincing solution (there are some attempts but they are highly contentious and IMHO also highly problematic). I won't get into it more than that for now, suffice to say that not many people accept that there is a logical/empirical solution to this problem.
Now extrapolate that to an AGI, and the design of its "safe" utility functions. If your AGI is designed as a Dualist (which is neccessary if you wish to encorporate "consciousness", "experience" or the like into your design), then you build-in a huge risk that the AGI will decide that other minds are unprovable or do not exist. In this case your friendly utility function designed to protect "conscious beings" fails and the AGI wipes out humanity because it poses a non-zero threat to the only consciousness it can confirm - its own. For this reason I feel "consciousness", "awareness", "experience" should be left out of FAI utility functions and designs, regardless of the truth of Monism/Dualism, in favour of more straight-forward definitions of organisms, intelligence, observable emotions and intentions. (I personally favour conceptualising any AGI as a sort of extension of biological humanity, but that's a discussion for another day) My greatest concern is there is such strong cultural attachment to the concept of consciousness that researchers will be unwilling to properly question the concept at all.
What if we're not alone?
It seems a little unusual to throw alien life into the mix at this point, but I think its justified because an intelligence explosion really puts an interstellar existence well within our civilisation's grasp. Because it seems that an intelligence explosion implies a very high rate of change, it makes sense to start considering even the long term implication early, particularly if the consequences are very serious, as I believe they may be in this realm of things.
Let's say we successfully achieved a FAI. In order to fufill its mission of protecting humanity and the biosphere, it begins expanding, colonising and terraforming other planets for potential habitation by Earth originating life. I would expect this expansion wouldn't really have a limit, because the more numourous the colonies, the less likely it is we could be wiped out by some interstellar disaster.
Of course, we can't really rule out the possibility that we're not alone in the universe, or even the galaxy. If we make it as far as AGI, then its possible another alien civilisation might reach a very high level of technological advancement too. Or there might be many. If our FAI is friendly to us but basically treats them as paperclip fodder, then potentially that's a big problem. Why? Well:
-Firstly, while a species' first loyalty is to itself, we should consider that it might be morally unsdesirable to wipe out alien civilisations, particularly as they might be in some distant way "related" (see panspermia) to own biosphere.
-Secondly, there is conceivable scenarios where alien civilisations might respond to this by destroying our FAI/Earth/the biosphere/humanity. The reason is fairly obvious when you think about it. An expansionist AGI could be reasonably viewed as an attack or possibly an act of war.
Let's go into a tiny bit more detai. Given that we've not been destroyed by any alien AGI just yet, I can think of a number of possible interstellar scenarios:
(1) There is no other advanced life
(2) There is advanced life, but it is inherently non-expansive (expand inwards, or refuse to develop dangerous AGI)
(3) There is advanced life, but they have not discovered AGI yet. There could potentially be a race-to-the-finish (FAI) scenario on.
(4) There is already expanding AGIs, but due to physical limits on the expansion rate, we are not aware of them yet. (this could use further analysis)
One civilisation, or an allied group of civilisations have develop FAIs and are dominant in the galaxy. They could be either:
(6) Dominators that tolerate civilisations so long as they remain primitive and non-threatening by comparison.
(7) Some sort of interstellar community that allows safe civilisations to join (this community still needs to stomp on dangerous potential rival AGIs)
In the case of (6) or (7), developing a FAI that isn't equipped to deal with alien life will probably result in us being liquidated, or at least partially sanitised in some way. In (1) (2) or (5), it probably doesn't matter what we do in this regard, though in (2) we should consider being nice. In (3) and probably (4) we're going to need a FAI capable of expanding very quickly and disarming potential AGIs (or at least ensuring they are FAIs from our perspective).
The upshot of all this is that we probably want to design safety features into our FAI so that it doesn't destroy alien civilisations/life unless its a significant threat to us. I think the understandable reaction to this is something along the lines of "create an FAI that values all types of life" or "intelligent life" or something along these lines. I don't exactly disagree, but I think we must be cautious in how we formulate this too.
Say there are many different civilisations in the galaxy. What sort of criteria would ensure that, given some sort of zero-sum scenario, Earth life wouldn't be destroyed. Let's say there was some sort of tiny but non-zero probability that humanity could evade the FAI's efforts to prevent further AGI development. Or perhaps there was some loophole in the types of AGI's that humans were allowed to develop. Wouldn't it be sensible, in this scenario, for a universalist FAI to wipe out humanity to protect the countless other civilisations? Perhaps that is acceptable? Or perhaps not? Or less drastically, how does the FAI police warfare or other competition between civilisations? A slight change in the way life is quantified and valued could change drastically the outcome for humanity. I'd probably suggest we want to weight the FAI's values to start with human and Earth biosphere primacy, but then still give some non-zero weighting to other civilisations. There is probably more thought to be done in this area too.
Simulation
I want to also briefly note that one conceivable way we might postulate as a safe way to test Friendly AI designs is to simulate a worlds/universes of less complexity than our own, make it likely that it's inhabitants invent a AGI or FAI, and then closely study the results of these simluations. Then we could study failed FAI attempt with much greater safety. It also occured to me that if we consider the possibilty of our universe being a simulated one, then this is a conceivable scenario under which our simulation might be created. After all, if you're going to simulate something, why not something vital like modelling existential risks? I'm not sure yet sure of the implications exactly. Maybe we need to consider how it relates to our universe's continued existence, or perhaps it's just another case of Pascal's Mugging. Anyway I thought I'd mention it and see what people say.
A playground for FAI theories
I want to lastly mention this link (https://www.reddit.com/r/LessWrongLounge/comments/2f3y53/the_ai_game/). Basically its a challenge for people to briefly describe an FAI goal-set, and for others to respond by telling them how that will all go horribly wrong. I want to suggest this is a very worthwhile discussion, not because its content will include rigourous theories that are directly translatable into utility functions, because very clearly it won't, but because a well developed thread of this kind would be mixing pot of ideas and good introduction to common known mistakes in thinking about FAI. We should encourage a slightly more serious verison of this.
Thanks
FAI and AGI are very interesting topics. I don't consider myself able to really discern whether such things will occur, but its an interesting and potentially vital topic. I'm looking forward to a bit of feedback on my first LW post. Thanks for reading!
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)