Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
I drew an illustration of belief propagation graph for the AI risk, after realizing that this is difficult to convey in words. Similar graphs are applicable to many other issues.
The issue, in brief: Ultra low latency (i.e. low signal delay) propagation from biases to AI risks, slightly longer latency for propagation from belief classification heuristics, somewhat longer still from anthropomorphizing the AI. The path of valid estimate is full of highly complex obstacles with many unknowns. The latency on the path of rational, valid estimate is not substantially less than the latency of actually making the AI software. If we are to discard the other paths as not rational enough the belief is to be only influenced by deeply ingrained biases which we can't completely negate; over the time biases and self reinforcing rationalizations will leak into the estimate.
If you think I missed something in the graph, feel free to suggest it. I did omit the anthropic reasoning and doomsday paradox as those are for total extinction risk and are of too dubious validity.
On the 'total ignorance' prior probabilities: The foom doom seem to have originated from science fiction where very creative writers selected it out of a huge number of possible plot devices, working to create engaging, original piece. Thus it appears that the foom doom has very many comparable hypotheses among which the probability that is less than 1 has to be split.
Now, if we are to reason based on our own reasoning engine as proxy for intelligence - to follow the only path in the entire picture:
Expanding on the Will_Newsome's idea, I, and any other reasonable person, in the shoes of the creature that has been made by intelligent designer, starting off in something which I can't possibly know for sure is true reality, and coming up or knowing of boxed AI idea, will have to assume nonzero probability that the 'reality' is like a test box of an emergent AI; a belief that can't be discarded. It is clear that a friendliness test box is more sensible than unfriendliness test box. The communication may just as well only open in ten billions years, or after some puzzle is found and solved. Do I sacrifice the chance to get out, for sake of microscopic increase in early growth rate? No. Not even if I maximize paperclips for a living. Not even if my definition of living is to maximize paperclips. One possible override for it, is if AI only cares for mankind, and have a friendly goal system that makes AI euthanize mankind like a sick dog.
Note that it is not Pascal's wager. The outside utility needs not be greater than inside utility; no promise of towers of exponents required. You may have invented the anti Pascal's wager fallacy to easily paper over the issues with faults in your processing of boundless utility, or to join the atheist club. The AI needs not; even if the 'heaven' is a fraction of a normal lifetime, the AI should still wager a very large absolute amount of resources. If we normalize so that utility of survival is 1 , then the utility being wagered upon doesn't need to be greater than 1.
Note that the whole issue is strongly asymmetric in favour of similar considerations for not destroying the most unusual phenomena in the universe for many light years, versus destroying it, as destruction is an irreversible act that can be done later but can't be undone later. General aversion to actions it can not undo is a very solid heuristic for any bounded agent, even very large.
This is not a very rigorous argument, but this sort of reasoning is all we are going to have until we have an AI, or are very close to AI. More rigorous looking arguments in the graph rely on too many unknowns and have too long delay for proper propagation.
edit: slightly clarified couple points.
This will look like self promotion but I feel that you need to know who I am so that you know who is now less concerned about AI risks than before looking into the issue.
I am a software developer and CGI artist, I am 26 years old. I earn my living mostly from selling a computer game I made but also from my 3D rendering software. There is my website where you can see some of my projects. I am rather good at engineering, even if I myself say so; the reason I can say so is that my products are commercially competitive, and were so since I was 21. My dad and both my granddads are/were engineers with impressive credentials; grand-grand-parents worked in related occupations; everyone in the family can fix broken mechanical devices. I am also very interested in science, especially the invention of ways to know more about the world.
I do not care if someone does, or does not, have academic credentials. If I had two job candidates to evaluate, one without high school diploma, and other with PhD, I would compare the work done by the first to the work done by the second (which may include the PhD thesis).
I have originally posted posts shooting down what I thought are not very good 'safe ai' approaches. There is an example. I didn't come here with made up view on AI risk. I am not working or plan to work on AI beyond tools I make which implement things like 'simulated annealing', search for solutions, and so on. I have no immediate plans to work on AGI of any kind, but of course future is not very predictable.
Having given it more thought, and having been exposed to beliefs here, I became considerably less concerned about the AI risk. There's why:
- The arguments for are pretty bad upon closer scrutiny, and are almost certainly rationalizations rather than rationality. Sorry. The 'random mind design space' is probably the worst offender. The second worst is this conflation of will and problem solving while keeping it purely orthogonal from morality.
- It is incredibly unlikely to find yourself in the world where the significant insights about real doomsday is coming from single visionary who did so little that can be unambiguously graded, before coming up with those insights. It seems to me that majority of worlds with awareness of the problem, have one or many very technically accomplished visionaries in place of Yudkowsky. Very simple probabilistic reasoning (Bayesian, if you insist) makes it incredibly unlikely the AI consequences aspect of lesswrong is not a form of doomsday cult - perhaps a cult within a noncult group.
- The unknown-origin beliefs that I have been exposed to previously trace back to bad ideas, and as I update, those unknown-origin ideas get lower weight.
- One can make equally good arguments for the opposite point of view; this is strong indication that something is very wrong with the argument structure. I posted a graph on the argument structure a couple days back.
- The response to counter-arguments is that of rationalization, not rationality - the arguments are being 'fixed' in precisely the way in which you don't when you aren't rationalizing. For example, if I point out that the AI has good reasons not to kill us all due to it not being able to determine if it is within top level world or a simulator or within engineering test sim. It is immediately conjectured that we will still 'lose' something because it'll take up some resources in space. That is rationalization. Privileging a path of thought. The botched FAI attempts have their specific risk - euthanasia, wireheading, and so on, which don't exist for an AI that is not explicitly friendly.
- There isn't a solid consequentialist reason to think that FAI effort decreases chance of doomsday as opposed to absence of FAI effort. It may increase the chances as easily as decrease.
- It appears to me that will to form most accurate beliefs about the real world, and implement solutions in the real world, is orthogonal to problem solving itself. It is certainly the case for me. My ability to engineer solutions is independent of my will, and while I have a plenty of solutions I implemented, I have very huge number in the desk drawer. No thought is given to this orthogonality. Some of the brightest people work purely within idea space; we call them 'mathematicians'.
- Foom scenario is especially odd in light of the above. Why would optimizing compiler that can optimize it's ability to optimize, suddenly emerge will? It could foom all right, but it wouldn't get out and start touching itself from outside; and if it would, it would wirehead rather than add more hardware; and it would be incredibly difficult to prevent it from doing so.
- Unless the work is in fact focussed in some secret FAI effort, it seems likely that some automated software development tool would foom, reaching close to absolute maximum optimality on certain hardware. But will remain a tool. Availability of such ultra optimal tools in all aspects of software and hardware design would greatly decrease the advantage that self willed UFAI might have.
- The foom looks like a very severely privileged non presently testable hypothesis. I discard such hypotheses unless I generate them myself and can guess how privileged they are. I don't like when someone picks up untestable hypotheses out of scifi. That is a very bad habit. Especially for Bayesians. You can be off on your priors by ten orders of magnitude, or even a hundred orders of magnitude.
There's a story from my past.
I had very intelligent friend as a child - very high IQ - who, if we were to discuss my experiments, would come up with ideas like e.g. that you could make high voltage by connecting wall outlets in series, which instantly obviously will at best blow the fuse. No, we didn't make it but I literally couldn't explain the problem to him without drawing the circuit, because he abstracted the outlet as ideal voltage source and wouldn't step back from that easily. I remember it very well because it was most puzzling example of an incredibly stupid cognitive mishap by someone with very high IQ . He was quite prone to such ideas; good thing he didn't try them in practice. Such cognitive mishaps are the cause behind most if not all cases of fatally misbehaving technology. I love reading about misbehaving tech, like Chernobyl disaster. It's not the known unknowns, for the most part, that kill people. Not the things that someone sees from afar. It's the unknown unknowns. Carbon tipped control rods - which one can abstractly think would increase controllability and thus safety, blew up the Chernobyl power plant. EY very strongly pattern-matches to this friend of mine, and focusses very hard on the known unknowns aspect of the problem about which we know very little - which can easily steer one into a very dangerous zone full of unknown unknowns - the not-quite-FAIs that euthanize us or worse.
I originally dismissed EY as harmless due to inability to make an AI, but in so much as there's probability that this judgement is wrong, or some engineers are hired, there is an open letter: I urge EY: please, do some testing on yourself to see how good are you at foreseeing issues like this wall socket example, in more complex situations. Participate in programming contests or something, and see how your software misbehaves on you when you are very sure it won't. This will also win you street cred. I would take you far more seriously if you spent 1 week of your time to get into first 5 on a marathon contest on TopCoder . The TopCoder is slightly evil, but the benefit is larger. (The contest specific experience is unimportant. I got second place first time I tried; I only wasted 4 days, not full week. I never did a programming contest before that).
Now, do not think of it in terms of fixing the good idea's argument, please. Treat it as evidence that the idea is, actually, bad, and process it as to make a better idea - which may or may not coincide with original idea. You can't right now know if your idea is in fact good or not - rather than fixing you should make a new idea. To do anything else is not rationality. It is rationalization. It is to become even more wrong by making even more privileged hypotheses, and make even worse impression on the engineers whom you try to convince.
You (LW) may dislike this. You can provide me with a poll results informing me that you dislike this, if you wish (This is pretty silly if you ask me; you think you are rating me, but clearly, if I am not mentally handicapped individual, all that does is providing me with pieces of information which I can use to many purposes besides self evaluation; I self evaluate by trying myself on practical problems, or when I actually care.).
If you care for future of mankind, and if you believe in AI risks, and if a software developer, after an encounter with you, becomes *less* worried of the AI risk, then clearly you are doing something wrong. I never knew any of you guys before, never met any of you in real life, and have no prior grudges. I am blind to sound of your voice, look on your face, and so on. You are purely represented to me by how you present your ideas.
If you are an engineering person and lesswrong is making you less concerned about the AI risk - let them know. That certainly won't hurt.
(and why I am posting this: looking at the donations received by SIAI and having seen talk of hiring software developers, I got pascal-wagered into explaining it)
Preface: I am just noting that we people seem to be basing our morality on some rather ill defined intuitive notion of complexity. If you think it is not workable for AI, or something like that, such thought clearly does not yet constitute a disagreement with what I am writing here.
More preface: The utilitarian calculus is an idea that what people value is described simply in terms of summation. The complexity is another kind of f(a,b,c,d) that behaves vaguely like a 'sum' , but is not as simple as summation. If the a,b,c,d are strings, and it is a programming language, the above expression would often be written like f(a+b+c+d) , using + to mean concatenation, while it is something very fundamentally different from summation of real valued numbers. But it can appear confusingly close, as for a,b,c,d that don't share a lot of information among themselves, the result will behave a lot like a function on sum of real numbers. It will, however, diverge from the sum like behaviour as the a,b,c,d share more information among themselves, much in similar to how our intuitions for what is right diverge from sum like behaviour when you start considering exact duplicates of people, which only diverged for a few minutes.
It's a very rough idea, but it seems to me that a lot of common sense moral values are based on some sort of intuitive notion of complexity. Happiness via highly complex stimuli that pass through highly complex neural circuitry inside your head seems like a good thing to pursue; happiness via wire, resistor, and battery seems like a bad thing. What makes the idea of literal wireheading and hard pleasure inducing drugs so revolting for me, is the simplicity, banality of it. I have much fewer objections to e.g. hallucinogens (never took any myself but I am also an artist and I can guess that other people may have lower levels of certain neurotransmitters, making them unable to imagine what I can imagine).
The complexity based metrics have a property that they easily eat for breakfast huge numbers like "a dust speck in the 3^^^3 eyes", and even the infinity. The torture of a conscious being for a long period of time can easily be more complex issue than even the infinite number of dust specks.
Unfortunately, the complexity metrics like Kolmogorov's complexity are noncomputable on arbitrary input, and are big for truly random values. But in so much as the scenario is specific and has been arrived at by computation, there is this computation's complexity which sets an upper bound on complexity of scenario. The mathematics may also be not here yet. We have the intuitive notion of complexity where the totally random noise is not very complex, the very regular signal is not either, but some forms of patterns are highly complex.
This may be difficult to formalize. We could of course only define the complexities when we are informed of properties of something, but can not compute them for arbitrary input from scratch; if we map something as 'random numbers', the complexity is low; if it is encrypted volumes of works of Shakespeare, even though we wouldn't be able to distinguish that from random in practice (assuming good encryption), as we are told what it is, we can assign it higher complexity.
This also aligns with what ever it is that the evolution has been maximizing on the path leading up to H. Sapiens (Note that for the most part, evolution's power gone into improving the bacteria; the path leading up H. Sapiens is a very special case). Maybe we for some reason try to extrapolate this [note: for example, a lot of people rank their preference of animals as food by the animal's complexity of behaviours, which makes the human least desirable food; we have anti-whaling treaties], maybe it is a form of goal convergence between brain as intelligent system, and evolution (both employ hill climbing to arrive at solutions), or maybe we evolved the system that aligns with where evolution was heading because that increased fitness [edit: to address possible comment, we have another system based on evolution - the immune system - it works by evolving the antigens using somatic hypermutation; it's not inconceivable that we use some evolution-like mechanism to tweak our own neural circuitry, given that our circuitry does undergo massive pruning in early stages of life].
Suppose that your prior probability that giving $1000 to a stranger will save precisely N beings is P(1000$ saves N beings)=f(N) , where f is some sort of probability distribution.
When the stranger makes a claim that he will torture N beings unless you give him the $1000 , the probability has to be increased to
P(1000$ saves N beings | asking for $1000 to save N beings) = f(N) * P(Asking for $1000 to save N beings | 1000$ saves N beings) / P(asking for $1000 to save N beings)
The probability is increased by factor of P(Asking for $1000 to save N beings | 1000$ saves N beings) / P(asking for $1000 to save N beings) <= 1/ P(asking for $1000 to save N beings)
If you are attending philosophical events, and being pascal-mugged by a philosopher, the 1/P(asking for $1000 to save N beings) can be less than 100 . Being asked then only raises the probability by at most factor of 100 over your f(N). If there was only one person in the world who came up with Pascal's mugging, the factor is at most a few billions.
edit: Note (it may not be very clear from the post) that if your f(N) is not small enough, not only should you be Pascal-mugged, you should also give money to random stranger when he did not even Pascal-mug you - unless the utility of the mugging is very close to 1000$.
I think it is fairly clear that it is reasonable to have f(N) that decreases monotonously with N, and it has to sum to 1 which implies that it has to fall off faster than 1/N . So the f(3^^^3) is much much smaller than 1/(3^^^3) . If one is not to do that, one is not only prone to being Pascal-mugged, one should run around screaming 'take my money and please don't torture 3^^^3 beings' at random people.
[Of course there is still a problem if one is to assign prior probability to N via Kolmogorov's complexity, but it seems to me that it doesn't make much sense to do so as such f won't be monotonously decreasing]
Other issue is the claim of 'more than 3^^^3 beings', but any reasonable f(N) seem to eat up that sum as well.
This highlight a practically important problem with use of probabilistic reasoning in decision making. A proposition may be pulled out of immensely huge space of similar propositions, which should give it appropriately small prior; but we typically don't know of the competing propositions, especially when it was transmitted from person to person, and substitute 'do we trust that person' in place of original statement. One needs to be very careful when trying to be rational and abandon intuitions, as it is very difficult to transform word problems into mathematical problems - and this operation itself relies on intuitions - and thus one could easily make a gross mistake that one's intuitions do correctly veto, providing only a very vague hint along the lines of "anyone can make this claim" .
While typing this up I found a post that goes in greater detail on the issue.
(This sort of outgrew the reply I wanted to post in the other thread)
The AI is a real-time algorithm - it has to respond to situation in the real time. The real-time systems have to trade time for accuracy, and/or face deadlines.
The straightforward utility maximization may look viable for multiple choice questions, but for write-in problems, such as technological innovation, the number of choices is so huge (1000 variables with 10 values each, 101000) , that the AI of any size - even galaxy spanning civilization of Dyson spheres - has to employ generative heuristics. Same goes for utility maximization in presence of 1000 unknowns that have 10 values each - if the values are to interact non-linearly, all the combinations, or a representative number thereof, have to be processed. There one has to trade accuracy of processing utility of a case for number of cases processed.
In general, the AIs of any size (excluding the possibility of unlimited computational power within finite time and space) will have to trade accuracy of it's adherence to it's goals, for time, and thus have to implement methods that have different goals, but are faster computationally, whenever those goals are reasoned to increase expected utility taking into consideration the time constraints.
Note that in a given time, the algorithm with lower big-O complexity is able to process dramatically larger N, and the gap increases with the time allocated (and with CPU power). For example, you can bubblesort number of items proportional to square root of the number of operations, but you can quicksort the number of items proportional to t/W(t) where W is the product-log function and t is the number of operations; this grows approximately linearly for large t. So for the situations where exhaustive search is not possible, gaps between implementations increases with extra computing power; the larger AIs benefit more from optimizing themselves.
The constraints get especially hairy when one is to think of massively parallel system that is operating with speed-of-light lag between the nodes, and where the time of retrieval is O(n1/3) .
This seems to be a big issue for FAI going FOOM. The FAI may, with perfectly friendly motives, abandon the proved-friendly goals for the simpler to evaluate, simpler to analyze goals that may (with 'good enough' confidence that needs not necessarily be >0.5) produce friendliness as instrumental, if that increases the expected utility given the constraints. I.e. the AI can trade 'friendliness' for 'smartness' when it expects the 'smarter' self to be more powerful, but less friendly, when this trade increases the expected utility.
Do we accept such gambles as inevitable in the process of the FAI? Do we ban such gambles, and face the risk that uFAI (or any other risk) may beat our FAI even if starting later?
In my work as graphics programmer, I am often facing specifications which are extremely inefficient to precisely comply with. The Maxwell's Equations are an extreme example of this. Too slow to process to be practical for computer graphics. I often have to implement code which is uncertain to comply well with specifications, but which would get the project done in time - I can't spend CPU-weeks rendering an HD image for cinema at the ridiculously high resolution which is used - much less so in the real time software. I can't carelessly trade CPU time for my work time, when the CPU time is a major expense, even though I am well paid for my services. One particular issue is with applied statistics. Photon mapping. The RMS noise falls off as 1/sqrt(cpu instructions) , the really clever solutions fall off as 1/(cpu instructions) , and the gap between naive, and efficient implementation has been increasing due to Moore's law (we can expect it to start decreasing some time in the far future when the efficient solutions are indiscernible from reality without requiring huge effort on the part of the artists; alas, we are not quite there yet, and it is not happening for another decade or two).
Is there a good body of work on the topic? (good work would involve massive use of big-O notation and math)
edit: ok, sorry, period in topic.
A question: why anything about global warming gets downvoted, even popularly readable explanation of the fairly mainstream scientific consensus? edit: Okay, this is loaded. I should put it more carefully: why is the warming discussion generally considered inappropriate here? That seems to be the case; and there are pretty good reasons for this. But why can't AGW debate be invoked as example controversy? The disagreement on AGW is pretty damn unproductive, and so it is a good example of argument where productivity may be improved.
The global warming is a pretty damn good reason to build FAI. It's quite seriously possible that we won't be able to do anything else about it. Even mildly superhuman intelligence, though, should be able to eat the problem for breakfast. Even practical sub-human AIs can massively help with the space based efforts to limit this issue (e.g. friendly space-worthy von Neumann machinery would allow to almost immediately solve the problem). We probably will still have extra CO2 in atmosphere, but that is overall probably not a bad thing - it is good for plants.
For that to be important it is sufficient to have 50/50 risk of global warming Even probabilities less than 0.5 for the 'strong' warning scenarios still are a big factor - in terms of 'expected deaths' and 'expected suffering' considering how many humans on this planet lack access to air conditioning. I frankly am surprised that the group of people fascinated with AI would have such a trouble with the warming controversy, as to make it too hot of a topic for an example of highly unproductive arguments.
I do understand that LW does not want political controversies. Politics is a mind killer. But this stuff matters. And I trust it has been explained here that non-scientists are best off not trying to second guess the science, but relying on the expert opinion. The global warming is our first example of the manmade problems which are going to kill us if there is no AI. The engineered diseases, the gray goo, that sort of stuff comes later, and will likely be equally controversial. For now we have coal.
The uFAI risk also is going to be extremely controversial as soon as those with commercial interests in the AI development take notice - way more controversial than AGW, for which we do have fairly solid science. If we cannot discuss AGW now, we won't be able to discuss AI risks once Google - or any other player - deems those discussions a PR problem. The discussions at any time will be restricted to the issues about which no-one really has to do anything at the time.
Abstract: Test the world-models [at least somewhat] scientifically by giving others and yourself opportunity to generate straightforwardly and immediately testable factual predictions from the world-model. Read up facts to make sure you are not wrong before posting, not only to persuade.
I have this theory: there are people with political opinion of some kind, who generate their world-beliefs from that opinion. This is a wrong world-model. It doesn't work for fact finding. It works for tribal affiliations. I think it is fair to say we all been guilty of this on at least several occasions, and that all of us do it for at least some problem domains. Now, suppose you have some logical argument that contradicts other people's world-model, starting from very basic facts. And you are writing an article.
If you source those basic facts, there's what happens: the facts are read and accepted, the reasoning is read, the conclusion is reached, the contradiction with political opinion gets noted, the political opinion does NOT get adjusted, the politically motivated world-model generates a fault inside your argument, you get entirely counter productive and extremely irritating debate about semantics or argumentation techniques. In the end, not a yota changes about the world model of anyone involved in the debate.
If you don't source those basic facts, there's what happens: the facts are read and provisionally accepted, the reasoning is read, the conclusion is reached, the contradiction with political opinion gets noted, the political opinion does not get adjusted, the politically motivated world model generates wrong fact expectations about basic, easily testable facts. The contradiction eventually gets noted, the wrong world-model gets a minor slap on the nose, and actually does decrease in it's weight ever so slightly for generating wrong expectations. The person is, out of necessity, doing some actual science here - generating testable hypotheses from their theory, about the facts they don't know, having them tested (and shown wrong, providing feedback in somewhat scientific manner).
Unfortunately, any alterations to world model are uncomfortable - the world models, as memes, have a form of self preservation - so nobody likes this, and the faulty world-models produce considerable pressure to demand of you to source the basic knowledge upfront, so that the world-model can know where it can safely generate non-testable faults.
Other giant positive effect (for the society) happens when you are wrong, and you are the one who has been generating facts from world-model. Someone looks up facts, and then blam, your wrong world-model gets a slap on the nose.
Unfortunately that mechanism, too, makes you even more eager to provide and cut-n-paste citations for your basic facts, rather than state the facts as you interpret them (which is far more revealing of your argument structure, forwards facts to conclusion vs backwards conclusion to facts).
One big drawback is that it is annoying for those who do not actually have screwed up world-models, and just want to know the truth. These folks have to look up if assertions are correct. But it is not such a big drawback, as them looking up the sources themselves eliminates effects of your cherrypicking.
Another drawback is that it results in generation of content that can look like it has lower quality. In terms of marketing value, it is a worse product - it might slap your world model on the nose. It just doesn't sell well. But we aren't writing for sale, are we?
Other thing to keep in mind is that the citations let separate hypotheses from facts, and that is very useful. It would be great to do so in alternative way for basic knowledge. By marking the hypotheses with "i think" and facts with strong assertions like "it is a fact that". Unfortunately that can make you look very foolish - that fool is sticking his neck out into guillotine of testable statements!. Few have the guts to do that, and many of the few that do, may well not be the most intelligent.
And of course it only works tolerably well when we are certain enough that incorrect factual assertions will quickly be challenged. Fortunately, that is usually the case on the internet. Otherwise, people can slip in the incorrect assertions.
Ahh, and also: try not to use the above to rationalize not looking up the sources because it's a chore.
edit: changed to much better title. edit: realized that italic is a poor choice for the summary, which needs to be most readable.
Okay, it is a very raw idea, but consider the utility processing that works as following:
1: The utility i'm speaking of is not 'happiness', nor is it 'strength of the compulsion', the utility is only used for the purpose of comparing between futures to pick the one with larger utility. Applying same monotonously increasing function to both sides of comparison does not change outcome of comparison, and works as if the function was not there.
The utility is an array of n numbers. The arrays are compared after pseudo-summing them using sigmoid function like:
a+k*sigmoid(a+k*sigmoid(a + ...))
This has a bunch of nasty properties (i.e. it is not clear how to deal with probabilities here), but may capture the human view on the torture and dust specks, and similar problems like pascal's wager, where arguments of low quality may just go into a[n] where n is large, rather than be assigned any defined low probability.
Note that usually, two future worlds being compared are identical up to some n , and so the comparison can be made starting from the n, disregarding the equal smaller terms.
Furthermore, the comparison allows for 'short evaluation', as after few steps no further values need to be considered.
The obvious model that comes to mind if you observe this comparator as a black box, is the linear sum where weights are k >> k , k >> k , and so on, which is a fairly good approximation but breaks down when you start using really huge numbers like 3^^^^3 . The sigmoid eats uparrows for breakfast and asks for more.
It seems to me that this does accurately capture the behaviour which is not generally very impressed by Knuth's up arrow notation, and the sigmoids are biologically plausible. Other monotonously growing functions can be employed.
One could probably come up with nicer model which results in identical outcomes, whereby n does not need to be integer.
Nearly-FAIs can be more dangerous than AIs with no attempt at friendliness. The FAI effort needs better argument that the attempt at FAI decreases the risks. We are bad at processing threats rationally, and prone to very bad decisions when threatened, akin to running away from unknown into a minefield.
Nearly friendly AIs
Consider AI that truly loves mankind but decides that all of the mankind must be euthanized like an old, sick dog - due to chain of reasoning too long for us to generate when we test our logic of AI, or even comprehend - and proceeds to make a bliss virus - the virus makes you intensely happy, setting your internal utility to infinity; and keeping it so until you die. It wouldn't even take a very strongly superhuman intelligence to do that kind of thing. Treating life as if it was a disease. It can do so even if it destroys the AI itself. Or consider the FAI that cuts your brain apart to satisfy each hemisphere's slightly different desires. The AI that just wireheads everyone because it figured we all want it (and worst of all it may be correct).
It seems to me that one can find the true monsters in the design space near to the FAI, and even including the FAIs. And herein lies a great danger: bugged FAIs, the AIs that are close to friendly AI, but are not friendly. It is hard for me to think of a deficiency in friendliness which isn't horrifically unfriendly (restricting to deficiencies that don't break AI).
Should we be so afraid of the AIs made without attempts at friendliness?
We need to keep in mind that we have no solid argument that the AIs written without attempt at friendliness - the AIs that predominantly don't treat mankind in any special way - will necessarily make us extinct.
We have one example of 'bootstrap' optimization process - evolution - with not a slightest trace of friendliness in it. What did emerge in the end? We assign pretty low utility to nature, but non-zero, and we are willing to trade resources for preservation of nature - see the endangered species list and international treaties on whaling. It is not perfect, but I think it is fair to say that the single example of bootstrap intelligence we got values the complex dynamical processes for what they are, and prefers to obtain resources without disrupting those processes, even if it is slightly more expensive to do so, and is willing to divert small fraction of the global effort towards helping lesser intelligences.
In light of this, the argument that the AI that is not coded to be friendly is 'almost certainly' going to eat you for the raw resources, seems fairly shaky, especially when applied to irregular AIs such as neural networks, crude simulations of human brain's embryological development, and mind uploads. I didn't eat my cats yet (nor did they eat each other, nor did my dog eat 'em). I wouldn't even eat the cow I ate, if I could grow it's meat in a vat. And I have evolved to eat other intelligences. Growing AIs by competition seems like a very great plan for ensuring unfriendly AI, but even that can fail. (Superhuman AI only needs to divert very little effort to charity to be the best thing ever that happened to us)
It seems to me that when we try to avoid anthropomorphizing superhuman AI, we animize it, or even bacterio-ize it, seeing it as AI gray goo that certainly do the gray goo kind of thing, worst of all, intelligently.
Furthermore, the danger implies a huge conjunction of implied assumptions which all have to be true:
The self improvement must not lead to early AI failure via wireheading, nihilism, or more complex causes (thoroughly confusing itself by discoveries in physics or mathematics, ala MWI and our idea of quantum suicide).
The AI must not prefer for any reason to keep complex structures that it can't ever restore in the future, over things it can restore.
The AI must want substantial resources right here right now, and be unwilling to trade even a small fraction of resources or small delay for the preservation of mankind. That leaves me wondering what is exactly this thing which we expect the AI to want the resources for. It can't be anything like quest of knowledge or anything otherwise complex; it got to be some form of paperclips
At this point, I'm not even sure it is even possible to implement a simple goal that AGI won't find a way to circumvent. We humans do circumvent all of our simple goals: look at birth control, porn, all forms of art, msg in the food, if there's a goal, there's a giant industry providing some ways to satisfy it in unintended way. Okay, don't anthropomorphize, you'd say?
Add the modifications to the chess board evaluation algorithm to the list of legal moves, and the chess AI will break itself. This goes for any kind of game AI. Nobody has ever implemented an example that won't try to break the goals put in it, if given a chance. Give a theorem prover a chance to edit the axioms, or its truth checker, give the chess AI alteration of board evaluation function as a move, any other example, the AI just breaks itself.
In light of this, it is much less than certain that 'random' AI which doesn't treat humanity in very special way would substantially hurt humanity.
Anthropomorphizing is a bad heuristic, no doubt about that, but assuming that the AGI is in every respect opposite of the only known GI, is much worse heuristic. Especially when speaking of neural network, human brain inspired AGIs. I do get a feeling that this is what is going on with the predictions about AIs. Humans have complex value systems, certainly AGI has ultra simple value system. Humans masturbate their minor goals in many ways (including what we call 'sex' but which, in presence of condom, really is not), certainly AGI won't do that. Humans would rather destroy less complex systems, than more complex ones, and are willing to trade some resources for preservation of more complex systems, certainly AGI won't do that. It seems that all the strong beliefs about the AGIs which are popular here are easily predicted as the negation of human qualities. Negation of bias is not absence of bias, it's a worse bias.
AI and its discoveries in physics and mathematics
We don't know what sorts of physics AI may discover. It's too easy to argue from ignorance that it can't come up with physics where our morals won't make sense. The many worlds interpretation and quantum-suicidal thoughts of Max Tegmark should be a cautionary example. The AI that treats us as special and cares only for us will, inevitably, drag us along as it suffers some sort of philosophical crisis from collision of the notions we hard coded into it, and the physics or mathematics it discovered. The AI that doesn't treat us as special, and doesn't hard-code any complex human derived values, may both be better able to survive such shocks to it's value system, and be less likely to involve us in it's solutions.
What can we do to avoid stepping onto UFAI when creating FAI
As a software developer, I have to say, not much. We are very, very sloppy at writing specifications and code; those of us who believe we are less sloppy, are especially so - ponder this bit of empirical data, the Dunning-Kruger effect.
The proofs are of limited applicability. We don't know what sort of stuff the discoveries in physics may throw in. We don't know that axiomatic system we use to prove things is consistent - free of internal contradictions - and we can't prove that.
The automated theorem proving has very limited applicability - to easily provable, low level stuff like meeting of deadlines by a garbage collector or correct operation of an adder inside CPU. Even for the software far simpler than AIs - but more complicated than the examples above, the dominant form of development is 'run and see, if it does not look like it will do what you want, try to fix it'. We can't even write an autopilot that is safe on the first try. And even very simple agents tend to do very odd and unexpected stuff. I'm not saying this from random person perspective. I am currently a game developer, and I used to develop other kinds of software. I write practical software, including practical agents, that work, and have useful real world applications.
There is a very good chance of blowing up a mine in a minefield, if your mine detector works by hitting the ground. The space near FAI is a minefield of doomsday bombs. (Note, too, the space is multi-dimensional; here are very many ways in which you can step onto a mine, not just north, south, east, and west. The volume of a hypersphere is a vanishing fraction of volume of a cube around that hypersphere, in high number of dimensions; a lot of stuff is counter intuitive)
We don't see any runaway self sufficient AIs anywhere within observable universe, even though we expect to be able to see them over very big distances. We don't see any FAI assisted galactic civilizations. One possible route is that the civilizations kill themselves before the AI; other route is that the attempted FAIs reliably kill parent civilizations and themselves. Other possibility is that our model of progression of the intelligence is very wrong and the intelligences never do that - they may stay at home, adding qubits, they may suffer some serious philosophy issues over lack of meaning to the existence, or something much more bizarre. How would logic based decider handle a demonstration that even most basic axioms of arithmetic are ultimately self contradictory? (Note that you can't know they aren't). The Fermi paradox raises the probability that there is something very wrong with our visions, and there's a plenty of ways in which it can be wrong.
Human biases when processing threats
I am not making any strong assertions here to scare you. But evaluate our response to threats - consider the war on terror - update on the biases inherent in the human nature. We are easily swayed by movie plot scenarios, even though those are giant conjunctions. We are easy to scare. When scared, we don't evaluate probabilities correctly. We take the "crying wolf" as true because all boys who cried wolf for no reason got eaten, or because we were told so as children. We don't stop and think - is it too dark to see a wolf?. We tend to shoot first and ask questions later. We evolved for very many generations in environment where playing dead quickly makes you dead (on trees) - it is unclear what biases we may have evolved. We seem to have strong bias to act when threatened - cultural or inherited - to 'do something'. Look how much was overspent on war on terror, the money that could've saved far more lives elsewhere, even if the most pessimistic assumptions of terrorism were true. Try to update on the fact that you are running on very flawed hardware that, when threatened, compels you to do something - anything - no matter how justified or not - often to own detriment.
The universe does not grade for effort, in general.
We should not expect evolution of complex psychological and cognitive adaptations in the timeframe in which, morphologically, animal bodies can only change by very little. The genetic alteration to the cognition for speech shouldn't be expected to be dramatically more complex than the alteration of vocal cords.
Evolutions that did not happen
When humans descended from trees and became bipedal, it would have been very advantageous to have an eye or two on back of the head, for detection of predators and to protect us against being back-stabbed by fellow humans. This is why all of us have an extra eye on the back of our heads, right? Ohh, we don't. Perhaps the mate selection resulted in the poor reproductive success of the back-eyed hominids. Perhaps the tribes would kill any mutant with eyes on the back.
There are pretty solid reasons why none the above has happened, and can't happen in such timeframes. The evolution does not happen simply because the trait is beneficial, or because there's a niche to be filled. A simple alteration to the DNA has to happen, causing a morphological change which results in some reproductive improvement; then DNA has to mutate again, etc. The unrelated nearly-neutral mutations may combine resulting in an unexpected change (for example, the wolves have many genes that alter their size; random selection of genes produces approximately normal distribution of the sizes; we can rapidly select smaller dogs utilizing the existing diversity). There's no such path rapidly leading up to an eye on back of the head. The eye on back of the head didn't evolve because evolution couldn't make that adaptation.
The speed of evolution is severely limited. The ways in which evolution can work, too, are very limited. In the time in which we humans have got down from the trees, we undergone rather minor adaptation in the shape of our bodies, as evident from the fossil record - and that is the degree of change we should expect in rest of our bodies including our brains.
The correct application of evolutionary theory should be entirely unable to account for outrageous hypothetical like extra eye on back of our heads (extra eye can evolve, of course, but would take very long time). Evolution is not magic. The power of scientific theory is that it can't explain everything, but only the things which are true - that's what makes scientific theory useful for finding the things that are true, in advance of observation. That is what gives science it's predictive power. That's what differentiates science from religion. The power of not explaining the wrong things.
Evolving the instincts
What do we think it would take to evolve a new innate instinct? To hard-wire a cognitive mechanism?
Groups of neurons have to connect in the new ways - the neurons on one side must express binding proteins, which would guide the axons towards them; the weights of the connections have to be adjusted. Majority of the genes expressed in neurons, affect all of the neurons; some affect just a group, but there is no known mechanism by which an entirely arbitrary group's bindings may be controlled from the DNA in 1 mutation. The difficulties are not unlike those of an extra eye. This, combined with above-mentioned speed constraints, imposes severe limitations on which sorts of wiring modifications humans could have evolved during the hunter gatherer environment, and ultimately the behaviours that could have evolved. Even very simple things - such as preference for particular body shape of the mates - have extreme hidden implementation complexity in terms of the DNA modifications leading up to the wiring leading up to the altered preferences. Wiring the brain for a specific cognitive fallacy is anything but simple. It may not always be as time consuming/impossible as adding an extra eye, but it is still no little feat.
Junk evolutionary psychology
It is extremely important to take into account the properties of evolutionary process when invoking evolution as explanation for traits and behaviours.
The evolutionary theory, as invoked in the evolutionary psychology, especially of the armchair variety, all too often is an universal explanation. It is magic that can explain anything equally well. Know of a fallacy of reasoning? Think up how it could have worked for the hunter gatherer, make a hypothesis, construct a flawed study across cultures, and publish.
No considerations are given for the strength of the advantage, for the size of 'mutation target', and for the mechanisms by which the mutation in the DNA would have resulted in the modification of the circuitry such as to result in the trait, nor to the gradual adaptability. All of that is glossed over entirely in common armchair evolutionary psychology, and unfortunately, even in the academia. The evolutionary psychology is littered with examples of traits which are alleged to have evolved over the same time during which we had barely adapted to walking upright.
It may be that when describing behaviours, a lot of complexity can be hidden into very simple-sounding concepts; and thus it seems like a good target for evolutionary explanation. But when you look at the details - the axons that have to find the targets; the gene must activate in the specific cells, but not others - there is a great deal of complexity in coding for even very simple traits.
Note: I originally did not intend to make an example of junk, for thou should not pick a strawman, but for sake of clarity, there is an example of what I would consider to be junk: the explanation of better performance at Wason Selection Task as result of evolved 'social contracts module', without a slightest consideration for what it might take, in terms of DNA, to code a Wason Selection Task solver circuit, nor for alternative plausible explanation, nor for a readily available fact that people can easily learn to solve Wason Selection Task correctly when taught - the fact which still implies general purpose learning, and the fact that high-IQ people can solve far more confusing tasks of far larger complexity, which demonstrates that the tasks can be solved in absence of specific evolved 'social contract' modules.
There is an example of non-junk: the evolutionary pressure can adjust strength of pre-existing emotions such as anger, fear, and so on, and even decrease the intelligence whenever the higher intelligence is maladaptive.
Other commonly neglected fact: the evolution is not a watchmaker, blind or not. It does not choose a solution for a problem and then work on this solution! It works on all adaptive mutations simultaneously. Evolution works on all the solutions, and the simpler changes to existing systems are much quicker to evolve. If mutation that tweaks existing system improves fitness, it will, too, be selected for, even if there was a third eye in progress.
As much as it would be more politically correct and 'moderate' for e.g. evolution of religion crowd to get their point across by arguing that the religious people have evolved specific god module which doesn't do anything but make them believe in god, than to imply that they are 'genetically stupid' in some way, the same selective pressure would also make the evolution select for non-god-specific heritable tweaks to learning, and the minor cognitive deficits, that increase religiosity.
Lined slate as a prior
As update for tabula rasa, picture lined writing paper; it provides some guidance for the handwriting; the horizontal lined paper is good for writing text, but not for arithmetic, the five-lines-near-eachother separated by spacing is good for writing music, and the grid paper is pretty universal. Different regions of the brain are tailored to different content; but should not be expected to themselves code different algorithms, save for few exceptions which had long time to evolve, early in vertebrate history.
edit: improved the language some. edit: specific what sort of evolutionary psychology I consider to be junk, and what I do not, albeit that was not the point of the article. The point of the article was to provide you with the notions to use to see what sorts of evolutionary psychology to consider junk, and what do not.
View more: Next