Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Yudkowsky's brain is the pinnacle of evolution

-27 Yudkowsky_is_awesome 24 August 2015 08:56PM

Here's a simple problem: there is a runaway trolley barreling down the railway tracks. Ahead, on the tracks, there are 3^^^3 people tied up and unable to move. The trolley is headed straight for them. You are standing some distance off in the train yard, next to a lever. If you pull this lever, the trolley will switch to a different set of tracks. However, you notice that there is one person, Eliezer Yudkowsky, on the side track. You have two options: (1) Do nothing, and the trolley kills the 3^^^3 people on the main track. (2) Pull the lever, diverting the trolley onto the side track where it will kill Yudkowsky. Which is the correct choice?

The answer:

Imagine two ant philosophers talking to each other. “Imagine," they said, “some being with such intense consciousness, intellect, and emotion that it would be morally better to destroy an entire ant colony than to let that being suffer so much as a sprained ankle."

Humans are such a being. I would rather see an entire ant colony destroyed than have a human suffer so much as a sprained ankle. And this isn't just human chauvinism either - I can support my feelings on this issue by pointing out how much stronger feelings, preferences, and experiences humans have than ants do.

How this relates to the trolley problem? There exists a creature as far beyond us ordinary humans as we are beyond ants, and I think we all would agree that its preferences are vastly more important than those of humans.

Yudkowsky will save the world, not just because he's the one who happens to be making the effort, but because he's the only one who can make the effort.

The world was on its way to doom until the day of September 11, 1979, which will later be changed to national holiday and which will replace Christmas as the biggest holiday. This was of course the day when the most important being that has ever existed or will exist, was born.

Yudkowsky did the same to the field of AI risk as Newton did to the field of physics. There was literally no research done on AI risk in the same scale that has been done in the 2000's by Yudkowsky. The same can be said about the field of ethics: ethics was an open problem in philosophy for thousands of years. However, Plato, Aristotle, and Kant don't really compare to the wisest person who has ever existed. Yudkowsky has come closest to solving ethics than anyone ever before. Yudkowsky is what turned our world away from certain extinction and towards utopia.

We all know that Yudkowsky has an IQ so high that it's unmeasurable, so basically something higher than 200. After Yudkowsky gets the Nobel prize in literature due to getting recognition from Hugo Award, a special council will be organized to study the intellect of Yudkowsky and we will finally know how many orders of magnitude higher Yudkowsky's IQ is to that of the most intelligent people of history.

Unless Yudkowsky's brain FOOMs before it, MIRI will eventually build a FAI with the help of Yudkowsky's extraordinary intelligence. When that FAI uses the coherent extrapolated volition of humanity to decide what to do, it will eventually reach the conclusion that the best thing to do is to tile the whole universe with copies of Eliezer Yudkowsky's brain. Actually, in the process of making this CEV, even Yudkowsky's harshest critics will reach such understanding of Yudkowsky's extraordinary nature that they will beg and cry to start doing the tiling as soon as possible and there will be mass suicides because people will want to give away the resources and atoms of their bodies for Yudkowsky's brains. As we all know, Yudkowsky is an incredibly humble man, so he will be the last person to protest this course of events, but even he will understand with his vast intellect and accept that it's truly the best thing to do.

AI is Software is AI

-42 AndyWood 05 June 2014 06:15PM

Turing's Test is from 1950. We don't judge dogs only by how human they are. Judging software by a human ideal is like a species bias.

Software is the new System. It errs. Some errors are jokes (witness funny auto-correct). Driver-less cars don't crash like we do. Maybe a few will.

These processes are our partners now (Siri). Whether a singleton evolves rapidly, software evolves continuously, now.

 

Crocker's Rules

Jews and Nazis: a version of dust specks vs torture

16 shminux 07 September 2012 08:15PM

This is based on a discussion in #lesswrong a few months back, and I am not sure how to resolve it.

Setup: suppose the world is populated by two groups of people, one just wants to be left alone (labeled Jews), the other group hates the first one with passion and want them dead (labeled Nazis). The second group is otherwise just as "good" as the first one (loves their relatives, their country and is known to be in general quite rational). They just can't help but hate the other guys (this condition is to forestall the objections like "Nazis ought to change their terminal values"). Maybe the shape of Jewish noses just creeps the hell out of them, or something. Let's just assume, for the sake of argument, that there is no changing that hatred.

Is it rational to exterminate the Jews to improve the Nazi's quality of life? Well, this seems like a silly question. Of course not! Now, what if there are many more Nazis than Jews? Is there a number large enough where exterminating Jews would be a net positive utility for the world? Umm... Not sure... I'd like to think that probably not, human life is sacred! What if some day their society invents immortality, then every death is like an extremely large (infinite?) negative utility!

Fine then, not exterminating. Just send them all to concentration camps, where they will suffer in misery and probably have a shorter lifespan than they would otherwise. This is not an ideal solutions from the Nazi point of view, but it makes them feel a little bit better. And now the utilities are unquestionably comparable, so if there are billions of Nazis and only a handful of Jews, the overall suffering decreases when the Jews are sent to the camps.

This logic is completely analogous to that in the dust specks vs torture discussions, only my "little XML labels", to quote Eliezer, make it more emotionally charged. Thus, if you are a utilitarian anti-specker, you ought to decide that, barring changing Nazi's terminal value of hating Jews, the rational behavior is to herd the Jews into concentration camps, or possibly even exterminate them, provided there are enough Nazi's in the world who benefit from it.

This is quite a repugnant conclusion, and I don't see a way of fixing it the way the original one is fixed (to paraphrase Eliezer, "only lives worth celebrating are worth creating").

EDIT: Thanks to CronoDAS for pointing out that this is known as the 1000 Sadists problem. Once I had this term, I found that lukeprog has mentioned it on his old blog. 

 

Dealing with meta-discussion and the signal to noise ratio

-13 metatroll 01 September 2012 12:50AM

Meta-discussion is nasty. Allegedly, troll-feeding was flooding the comments. Verifiably, meta-discussion is flooding the comments. Keep it simple stupid!

Correcting errors and karma

-5 rebellionkid 29 April 2012 05:03PM

An easy way to win cheep karma on LW:

  1. Publicly make a mistake.
  2. Wait for people to call you on it.
  3. Publicly retract your errors and promise to improve.
Post 1) gets you negative karma, post 3) gets you positive karma. Anecdotally the net result is generally very positive.
This doesn't seem quite sane. Yes, it is good for us to reward people for changing their minds based on evidence. But it's still better not to have made the error the first time round. At the very least you should get less net karma for changing your mind towards the correct answer than you would for stating the correct thing the first time.
Questions:
Is there an advantage to this signalling-approval-for-updates that outweighs the value of karma as indicator-of-general-correctness-of-posts?
If so then can some other signal of general correctness be devised?
If not then what karma etiquette should we impose to ensure this effect doesn't happen?

Global warming is a better test of irrationality that theism

-2 Stuart_Armstrong 16 March 2012 05:10PM

Theism is often a default test of irrationality on Less Wrong, but I propose that global warming denial would make a much better candidate.

Theism is a symptom of excess compartmentalisation, of not realising that absence of evidence is evidence of absence, of belief in belief, of privileging the hypothesis, and similar failings. But these are not intrinsically huge problems. Indeed, someone with a mild case of theism can have the same anticipations as someone without, and update their evidence in the same way. If they have moved their belief beyond refutation, in theory it thus fails to constrain their anticipations at all; and often this is the case in practice.

Contrast that with someone who denies the existence of anthropogenic global warming (AGW). This has all the signs of hypothesis privileging, but also reeks of fake justification, motivated skepticism, massive overconfidence (if they are truly ignorant of the facts of the debate), and simply the raising of politics above rationality. If I knew someone was a global warming skeptic, then I would expect them to be wrong in their beliefs and their anticipations, and to refuse to update when evidence worked against them. I would expect their judgement to be much more impaired than a theist's.

Of course, reverse stupidity isn't intelligence: simply because one accepts AGW, doesn't make one more rational. I work in England, in a university environment, so my acceptance of AGW is the default position and not a sign of rationality. But if someone is in a milieu that discouraged belief in AGW (one stereotype being heavily Republican areas of the US) and has risen above this, then kudos to them: their acceptance of AGW is indeed a sign of rationality.

Risks from AI and Charitable Giving

2 XiXiDu 13 March 2012 01:54PM

If you’re interested in being on the right side of disputes, you will refute your opponents' arguments. But if you're interested in producing truth, you will fix your opponents' arguments for them. To win, you must fight not only the creature you encounter; you [also] must fight the most horrible thing that can be constructed from its corpse.

-- Black Belt Bayesian

This is an informal post meant as a reply to a post by user:utilitymonster, 'What is the best compact formalization of the argument for AI risk from fast takeoff?'

I hope to find the mental strength to put more effort into it in future to improve it. But since nobody else seems to be willing to take a critical look at the overall topic I feel that doing what I can is better than doing nothing.

Please review the categories 'Further Reading' and 'Notes and References'.

Contents

 

Abstract

In this post I just want to take a look at a few premises (P#) that need to be true simultaneously to make the SIAI a wortwhile charity from the point of view of someone trying to do as much good as possible by contributing money. I am going to show that the case of risks from AI is strongly conjunctive, that without a concrete and grounded understanding of AGI an abstract analysis of the issues is going to be very shaky, and that therefore SIAI is likely to be a bad choice as a charity. In other words, that which speaks in favor of SIAI does mainly consist of highly specific, conjunctive, non-evidence-backed speculations on possible bad outcomes.

Requirements for an Intelligence Explosion

P1 Fast, and therefore dangerous, recursive self-improvement is logically possible.

It took almost four hundred years to prove Fermat’s Last Theorem. The final proof is over a hundred pages long. Over a hundred pages! And we are not talking about something like an artificial general intelligence that can magically make itself smart enough to prove such theorems and many more that no human being would be capable of proving. Fermat’s Last Theorem simply states “no three positive integers a, b, and c can satisfy the equation a^n + b^n = c^n for any integer value of n greater than two.”

Even artificial intelligence researchers admit that "there could be non-linear complexity constrains meaning that even theoretically optimal algorithms experience strongly diminishing intelligence returns for additional compute power." [1] We just don't know.

Other possible problems include the impossibility of a stable utility function and a reflective decision theory, the intractability of real world expected utility maximization or that expected utility maximizers stumble over Pascal's mugging, among other things [2].

For an AI to be capable of recursive self-improvement it also has to guarantee that its goals will be preserved when it improves itself. It is still questionable if it is possible to conclusively prove that improvements to an agent's intelligence or decision procedures maximize expected utility. If this isn't possible it won't be rational or possible to undergo explosive self-improvement.

P1.b The fast computation of a simple algorithm is sufficient to outsmart and overpower humanity.

Imagine a group of 100 world-renowned scientists and military strategists.

  • The group is analogous to the initial resources of an AI.
  • The knowledge that the group has is analogous to what an AI could come up with by simply "thinking" about it given its current resources.

Could such a group easily wipe away the Roman empire when beamed back in time?

  • The Roman empire is analogous to our society today.

Even if you gave all of them a machine gun, the Romans would quickly adapt and the people from the future would run out of ammunition.

  • Machine guns are analogous to the supercomputer it runs on.

Consider that it takes a whole technological civilization to produce a modern smartphone.

You can't just say "with more processing power you can do more different things", that would be analogous to saying that "100 people" from today could just build more "machine guns". But they can't! They can't use all their knowledge and magic from the future to defeat the Roman empire.

A lot of assumptions have to turn out to be correct to make humans discover simple algorithms over night that can then be improved to self-improve explosively.

You can also compare this to the idea of a Babylonian mathematician discovering modern science and physics given that he would be uploaded into a supercomputer (a possibility that is in and of itself already highly speculative). It assumes that he could brute-force conceptual revolutions.

Even if he was given a detailed explanation of how his mind works and the resources to understand it, self-improving to achieve superhuman intelligence assumes that throwing resources at the problem of intelligence will magically allow him to pull improved algorithms from solution space as if they were signposted.

But unknown unknowns are not signposted. It's rather like finding a needle in a haystack. Evolution is great at doing that and assuming that one could speed up evolution considerably is another assumption about technological feasibility and real-world resources.

That conceptual revolutions are just a matter of computational resources is pure speculation.

If one were to speed up the whole Babylonian world and accelerate cultural evolution, obviously one would arrive quicker at some insights. But how much quicker? How much are many insights dependent on experiments, to yield empirical evidence, that can't be speed-up considerably? And what is the return? Is the payoff proportionally to the resources that are necessary?

If you were going to speed up a chimp brain a million times, would it quickly reach human-level intelligence? If not, why then would it be different for a human-level intelligence trying to reach transhuman intelligence? It seems like a nice idea when formulated in English, but would it work?

Being able to state that an AI could use some magic to take over the earth does not make it a serious possibility.

Magic has to be discovered, adapted and manufactured first. It doesn't just emerge out of nowhere from the computation of certain algorithms. It emerges from a society of agents with various different goals and heuristics like "Treating Rare Diseases in Cute Kittens". It is an evolutionary process that relies on massive amounts of real-world feedback and empirical experimentation. Assuming that all that can happen because some simple algorithm is being computed is like believing it will emerge 'out of nowhere', it is magical thinking.

Unknown unknowns are not sign-posted. [3]

If people like Benoît B. Mandelbrot would have never decided to research Fractals then many modern movies wouldn't be possible, as they rely on fractal landscape algorithms. Yet, at the time Benoît B. Mandelbrot conducted his research it was not foreseeable that his work would have any real-world applications.

Important discoveries are made because many routes with low or no expected utility are explored at the same time [4]. And to do so efficiently it takes random mutation, a whole society of minds, a lot of feedback and empirical experimentation.

"Treating rare diseases in cute kittens" might or might not provide genuine insights and open up new avenues for further research. As long as you don't try it you won't know.

The idea that a rigid consequentialist with simple values can think up insights and conceptual revolutions simply because it is instrumentally useful to do so is implausible.

Complex values are the cornerstone of diversity, which in turn enables creativity and drives the exploration of various conflicting routes. A singleton with a stable utility-function lacks the feedback provided by a society of minds and its cultural evolution.

You need to have various different agents with different utility-functions around to get the necessary diversity that can give rise to enough selection pressure. A "singleton" won't be able to predict the actions of new and improved versions of itself by just running sandboxed simulations. Not just because of logical uncertainty but also because it is computationally intractable to predict the real-world payoff of changes to its decision procedures.

You need complex values to give rise to the necessary drives to function in a complex world. You can't just tell an AI to protect itself. What would that even mean? What changes are illegitimate? What constitutes "self"? That are all unsolved problems that are just assumed to be solvable when talking about risks from AI.

An AI with simple values will simply lack the creativity, due to a lack of drives, to pursue the huge spectrum of research that a society of humans does pursue. Which will allow an AI to solve some well-defined narrow problems, but it will be unable to make use of the broad range of synergetic effects of cultural evolution. Cultural evolution is a result of the interaction of a wide range of utility-functions.

Yet even if we assume that there is one complete theory of general intelligence, once discovered, one just has to throw more resources at it. It might be able to incorporate all human knowledge, adapt it and find new patterns. But would it really be vastly superior to human society and their expert systems?

Can intelligence itself be improved apart from solving well-defined problems and making more accurate predictions on well-defined classes of problems? The discovery of unknown unknowns does not seem to be subject to other heuristics than natural selection. Without goals, well-defined goals, terms like "optimization" have no meaning.

P2 Fast, and therefore dangerous, recursive self-improvement is physically possible.

Even if it could be proven that explosive recursive self-improvement is logically possible, e.g. that there are no complexity constraints, the question remains if it is physically possible.

Our best theories about intelligence are highly abstract and their relation to real world human-level general intelligence is often wildly speculative [5][6].

P3 Fast, and therefore dangerous, recursive self-improvement is economically feasible.

To exemplify the problem take the science fictional idea of using antimatter as explosive for weapons. It is physically possible to produce antimatter and use it for large scale destruction. An equivalent of the Hiroshima atomic bomb will only take half a gram of antimatter. But it will take 2 billion years to produce that amount of antimatter [7].

We simply don’t know if intelligence is instrumental or quickly hits diminishing returns [8].

P3.b AGI is able to create (or acquire) resources, empowering technologies or civilisatory support [9].

We are already at a point where we have to build billion dollar chip manufacturing facilities to run our mobile phones. We need to build huge particle accelerators to obtain new insights into the nature of reality.

An AI would either have to rely on the help of a whole technological civilization or be in control of advanced nanotech assemblers.

And if an AI was to acquire the necessary resources on its own, its plan for world-domination would have to go unnoticed. This would require the workings of the AI to be opaque to its creators yet comprehensible to itself.

But an AI capable of efficient recursive self improvement must be able to

  1. comprehend its own workings
  2. predict how improvements, respectively improved versions of itself, are going to act to ensure that its values are preserved

So if the AI can do that, why wouldn't humans be able to use the same algorithms to predict what the initial AI is going to do? And if the AI can't do that, how is it going to maximize expected utility if it is unable to predict what it is going to do?

Any AI capable of efficient self-modification must be able to grasp its own workings and make predictions about improvements to various algorithms and its overall decision procedure. If an AI can do that, why would the humans who build it be unable to notice any malicious intentions? Why wouldn't the humans who created it not be able to use the same algorithms that the AI uses to predict what it will do? If humans are unable to predict what the AI will do, how is the AI able to predict what improved versions of itself will do?

And even if an AI was able to somehow acquire large amounts of money. It is not easy to use the money. You can't "just" build huge companies with fake identities, or a straw man, to create revolutionary technologies easily. Running companies with real people takes a lot of real-world knowledge, interactions and feedback. But most importantly, it takes a lot of time. An AI could not simply create a new Intel or Apple over a few years without its creators noticing anything.

The goals of an AI will be under scrutiny at any time. It seems very implausible that scientists, a company or the military are going to create an AI and then just let it run without bothering about its plans. An artificial agent is not a black box, like humans are, where one is only able to guess its real intentions.

A plan for world domination seems like something that can't be concealed from its creators. Lying is no option if your algorithms are open to inspection.

P4 Dangerous recursive self-improvement is the default outcome of the creation of artificial general intelligence.

Complex goals need complex optimization parameters (the design specifications of the subject of the optimization process against which it will measure its success of self-improvement).

Even the creation of paperclips is a much more complex goal than telling an AI to compute as many decimal digits of Pi as possible.

For an AGI, that was designed to design paperclips, to pose an existential risk, its creators would have to be capable enough to enable it to take over the universe on its own, yet forget, or fail to, define time, space and energy bounds as part of its optimization parameters. Therefore, given the large amount of restrictions that are inevitably part of any advanced general intelligence (AGI), the nonhazardous subset of all possible outcomes might be much larger than that where the AGI works perfectly yet fails to hold before it could wreak havoc.

And even given a rational utility maximizer. It is possible to maximize paperclips in a lot of different ways. How it does it is fundamentally dependent on its utility-function and how precisely it was defined.

If there are no constraints in the form of design and goal parameters then it can maximize paperclips in all sorts of ways that don't demand recursive self-improvement.

"Utility" does only become well-defined if we precisely define what it means to maximize it. Just maximizing paperclips doesn't define how quickly and how economically it is supposed to happen.

The problem is that "utility" has to be defined. To maximize expected utility does not imply certain actions, efficiency and economic behavior, or the drive to protect yourself. You can also rationally maximize paperclips without protecting yourself if it is not part of your goal parameters.

You can also assign utility to maximize paperclips as long as nothing turns you off but don't care about being turned off. If an AI is not explicitly programmed to care about it, then it won't.

Without well-defined goals in form of a precise utility-function, it might be impossible to maximize expected "utility". Concepts like "efficient", "economic" or "self-protection" all have a meaning that is inseparable with an agent's terminal goals. If you just tell it to maximize paperclips then this can be realized in an infinite number of ways that would all be rational given imprecise design and goal parameters. Undergoing to explosive recursive self-improvement, taking over the universe and filling it with paperclips, is just one outcome. Why would an arbitrary mind pulled from mind-design space care to do that? Why not just wait for paperclips to arise due to random fluctuations out of a state of chaos? That wouldn't be irrational. To have an AI take over the universe as fast as possible you would have to explicitly design it to do so.

But for the sake of a thought experiment assume that the default case was recursive self-improvement. Now imagine that a company like Apple wanted to build an AI that could answer every question (an Oracle).

If Apple was going to build an Oracle it would anticipate that other people would also want to ask it questions. Therefore it can't just waste all resources on looking for an inconsistency arising from the Peano axioms when asked to solve 1+1. It would not devote additional resources on answering those questions that are already known to be correct with a high probability. It wouldn't be economically useful to take over the universe to answer simple questions.

It would neither be rational to look for an inconsistency arising from the Peano axioms while solving 1+1. To answer questions an Oracle needs a good amount of general intelligence. And concluding that asking it to solve 1+1 implies to look for an inconsistency arising from the Peano axioms does not seem reasonable. It also does not seem reasonable to suspect that humans desire an answer to their questions to approach infinite certainty. Why would someone build such an Oracle in the first place?

A reasonable Oracle would quickly yield good solutions by trying to find answers within a reasonable time which are with a high probability just 2–3% away from the optimal solution. I don't think anyone would build an answering machine that throws the whole universe at the first sub-problem it encounters.

P5 The human development of artificial general intelligence will take place quickly.

What evidence do we have that there is some principle that, once discovered, allows us to grow superhuman intelligence overnight?

If the development of AGI takes place slowly, a gradual and controllable development, we might be able to learn from small-scale mistakes, or have enough time to develop friendly AI, while having to face other existential risks.

This might for example be the case if intelligence can not be captured by a discrete algorithm, or is modular, and therefore never allow us to reach a point where we can suddenly build the smartest thing ever that does just extend itself indefinitely.

Therefore the probability of an AI to undergo explosive recursive self-improvement (P(FOOM)) is the probability of the conjunction (P#P#) of its premises:

P(FOOM) = P(P1∧P2∧P3∧P4∧P5)

Of course, there are many more premises that need to be true in order to enable an AI to go FOOM, e.g. that each level of intelligence can effectively handle its own complexity, or that most AGI designs can somehow self-modify their way up to massive superhuman intelligence. But I believe that the above points are enough to show that the case for a hard takeoff is not disjunctive, but rather strongly conjunctive.

Requirements for SIAI to constitute an optimal charity

In this section I will assume the truth of all premises in the previous section.

P6 SIAI can solve friendly AI.

Say you believe that unfriendly AI will wipe us out with a probability of 60% and that there is another existential risk that will wipe us out with a probability of 10% even if unfriendly AI turns out to be no risk or in all possible worlds where it comes later. Both risks have the same utility x (if we don't assume that an unfriendly AI could also wipe out aliens etc.). Thus .6x > .1x. But if the probability of solving friendly AI = A to the probability of solving the second risk = B is A ≤ (1/6)B then the expected utility of mitigating friendly AI is at best equal to the other existential risk because .6Ax ≤ .1Bx.

Consider that one order of magnitude more utility could easily be outweighed or trumped by an underestimation of the complexity of friendly AI.

So how hard is it to solve friendly AI?

Take for example Pascal's mugging, if you can't solve it then you need to implement a hack that is largely based on human intuition. Therefore, in order to estimate the possibility of solving friendly AI one needs to account for the difficulty in solving all sub-problems.

Consider that we don't even know "how one would start to research the problem of getting a hypothetical AGI to recognize humans as distinguished beings." [10]

P7 SIAI does not increase risks from AI.

By trying to solve friendly AI, SIAI has to think about a lot of issues related to AI in general and might have to solve problems that will make it easier to create artificial general intelligence.

It is far from being clear that SIAI is able to protect its findings against intrusion, betrayal, industrial or espionage.

P8 SIAI does not increase negative utility.

There are several possibilities by which SIAI could actually cause a direct increase in negative utility.

1) Friendly AI is incredible hard and complex. Complex systems can fail in complex ways. Agents that are an effect of evolution have complex values. To satisfy complex values you need to meet complex circumstances. Therefore any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways. A half-baked, not quite friendly, AI might create a living hell for the rest of time, increasing negative utility dramatically [11].

2) Humans are not provably friendly. Given the power to shape the universe the SIAI might fail to act altruistic and deliberately implement an AI with selfish motives or horrible strategies [12].

P9 It makes sense to support SIAI at this time [13].

Therefore the probability of SIAI to be a worthwhile charity (P(CHARITY)) is the probability of the conjunction (P#P#) of its premises:

P(CHARITY) = P(P6∧P7∧P8∧P9)

As before, there are many more premises that need to be true in order for SIAI to be the best choice for someone who wants to maximize doing good by contributing money to a charity.

Further Reading

The following posts and resources elaborate on many of the above points and hint at a lot of additional problems.

Notes and References

[1] Q&A with Shane Legg on risks from AI

[2] http://lukeprog.com/SaveTheWorld.html

[3] "In many ways, this is a book about hindsight. Pythagoras could not have imagined the uses to which his equation would be put (if, indeed, he ever came up with the equation himself in the first place). The same applies to almost all of the equations in this book. They were studied/discovered/developed by mathematicians and mathematical physicists who were investigating subjects that fascinated them deeply, not because they imagined that two hundred years later the work would lead to electric light bulbs or GPS or the internet, but rather because they were genuinely curious."

17 Equations that changed the world

[4] Here is my list of "really stupid, frivolous academic pursuits" that have lead to major scientific breakthroughs.

  • Studying monkey social behaviors and eating habits lead to insights into HIV (Radiolab: Patient Zero)
  • Research into how algae move toward light paved the way for optogenetics: using light to control brain cells (Nature 2010 Method of the Year).
  • Black hole research gave us WiFi (ICRAR award)
  • Optometry informs architecture and saved lives on 9/11 (APA Monitor)
  • Certain groups HATE SETI, but SETI's development of cloud-computing service SETI@HOME paved the way for citizen science and recent breakthroughs in protein folding (Popular Science)
  • Astronomers provide insights into medical imaging (TEDxBoston: Michell Borkin)
  • Basic physics experiments and the Fibonacci sequence help us understand plant growth and neuron development

http://blog.ketyov.com/2012/02/basic-science-is-about-creating.html

[5] "AIXI is often quoted as a proof of concept that it is possible for a simple algorithm to improve itself to such an extent that it could in principle reach superhuman intelligence. AIXI proves that there is a general theory of intelligence. But there is a minor problem, AIXI is as far from real world human-level general intelligence as an abstract notion of a Turing machine with an infinite tape is from a supercomputer with the computational capacity of the human brain. An abstract notion of intelligence doesn’t get you anywhere in terms of real-world general intelligence. Just as you won’t be able to upload yourself to a non-biological substrate because you showed that in some abstract sense you can simulate every physical process."

Alexander Kruel, Why an Intelligence Explosion might be a Low-Priority Global Risk

[6] "…please bear in mind that the relation of Solomonoff induction and “Universal AI” to real-world general intelligence of any kind is also rather wildly speculative… This stuff is beautiful math, but does it really have anything to do with real-world intelligence? These theories have little to say about human intelligence, and they’re not directly useful as foundations for building AGI systems (though, admittedly, a handful of scientists are working on “scaling them down” to make them realistic; so far this only works for very simple toy problems, and it’s hard to see how to extend the approach broadly to yield anything near human-level AGI). And it’s not clear they will be applicable to future superintelligent minds either, as these minds may be best conceived using radically different concepts."

Ben Goertzel, 'Are Prediction and Reward Relevant to Superintelligences?'

[7] http://public.web.cern.ch/public/en/spotlight/SpotlightAandD-en.html

[8] "If any increase in intelligence is vastly outweighed by its computational cost and the expenditure of time needed to discover it then it might not be instrumental for a perfectly rational agent (such as an artificial general intelligence), as imagined by game theorists, to increase its intelligence as opposed to using its existing intelligence to pursue its terminal goals directly or to invest its given resources to acquire other means of self-improvement, e.g. more efficient sensors."

Alexander Kruel, Why an Intelligence Explosion might be a Low-Priority Global Risk

[9] Section 'Necessary resources for an intelligence explosion', Why an Intelligence Explosion might be a Low-Priority Global Risk, Alexander Kruel

[10] http://lesswrong.com/lw/3aa/friendly_ai_research_and_taskification/

[11] http://lesswrong.com/r/discussion/lw/ajm/ai_risk_and_opportunity_a_strategic_analysis/5ylx

[12] http://lesswrong.com/lw/8c3/qa_with_new_executive_director_of_singularity/5y77

[13] "I think that if you're aiming to develop knowledge that won't be useful until very very far in the future, you're probably wasting your time, if for no other reason than this: by the time your knowledge is relevant, someone will probably have developed a tool (such as a narrow AI) so much more efficient in generating this knowledge that it renders your work moot."

Holden Karnofsky in a conversation with Jaan Tallinn

A case study in fooling oneself

-2 Mitchell_Porter 15 December 2011 05:25AM

Note: This post assumes that the Oxford version of Many Worlds is wrong, and speculates as to why this isn't obvious. For a discussion of the hypothesis itself, see Problems of the Deutsch-Wallace version of Many Worlds.

smk asks how many worlds are produced in a quantum process where the outcomes have unequal probabilities; Emile says there's no exact answer, just like there's no exact answer for how many ink blots are in the messy picture; Tetronian says this analogy is a great way to demonstrate what a "wrong question" is; Emile has (at this writing) 9 upvotes, and Tetronian has 7.

My thesis is that Emile has instead provided an example of how to dismiss a question and thereby fool oneself; Tetronian provides an example of treating an epistemically destructive technique of dismissal as epistemically virtuous and fruitful; and the upvotes show that this isn't just their problem. [edit: Emile and Tetronian respond.]

I am as tired as anyone of the debate over Many Worlds. I don't expect the general climate of opinion on this site to change except as a result of new intellectual developments in the larger world of physics and philosophy of physics, which is where the question will be decided anyway. But the mission of Less Wrong is supposed to be the refinement of rationality, and so perhaps this "case study" is of interest, not just as another opportunity to argue over the interpretation of quantum mechanics, but as an opportunity to dissect a little bit of irrationality that is not only playing out here and now, but which evidently has a base of support.

The question is not just, what's wrong with the argument, but also, how did it get that base of support? How was a situation created where one person says something irrational (or foolish, or however the problem is best understood), and a lot of other people nod in agreement and say, that's an excellent example of how to think?

On this occasion, my quarrel is not with the Many Worlds interpretation as such; it is with the version of Many Worlds which says there's no actual number of worlds. Elsewhere in the thread, someone says there are uncountably many worlds, and someone else says there are two worlds. At least those are meaningful answers (although the advocate of "two worlds" as the answer, then goes on to say that one world is "stronger" than the other, which is meaningless).

But the proposition that there is no definite number of worlds, is as foolish and self-contradictory as any of those other contortions from the history of thought that rationalists and advocates of common sense like to mock or boggle at. At times I have wondered how to place Less Wrong in the history of thought; well, this is one way to do it - it can have its own chapter in the history of intellectual folly; it can be known by its mistakes.

Then again, this "mistake" is not original to Less Wrong. It appears to be one of the defining ideas of the Oxford-based approach to Many Worlds associated with David Deutsch and David Wallace; the other defining idea being the proposal to derive probabilities from rationality, rather than vice versa. (I refer to the attempt to derive the Born rule from arguments about how to behave rationally in the multiverse.) The Oxford version of MWI seems to be very popular among thoughtful non-physicist advocates of MWI - even though I would regard both its defining ideas as nonsense - and it may be that its ideas get a pass here, partly because of their social status. That is, an important faction of LW opinion believes that Many Worlds is the explanation of quantum mechanics, and the Oxford school of MWI has high status and high visibility within the world of MWI advocacy, and so its ideas will receive approbation without much examination or even much understanding, because of the social and psychological mechanisms which incline people to agree with, defend, and laud their favorite authorities, even if they don't really understand what these authorities are saying or why they are saying it.

However, it is undoubtedly the case that many of the LW readers who believe there's no definite number of worlds, believe this because the idea genuinely makes sense to them. They aren't just stringing together words whose meaning isn't known, like a Taliban who recites the Quran without knowing a word of Arabic; they've actually thought about this themselves; they have gone through some subjective process as a result of which they have consciously adopted this opinion. So from the perspective of analyzing how it is that people come to hold absurd-sounding views, this should be good news. It means that we're dealing with a genuine failure to reason properly, as opposed to a simple matter of reciting slogans or affirming allegiance to a view on the basis of something other than thought.

At a guess, the thought process involved is very simple. These people have thought about the wavefunctions that appear in quantum mechanics, at whatever level of technical detail they can muster; they have decided that the components or substructures of these wavefunctions which might be identified as "worlds" or "branches" are clearly approximate entities whose definition is somewhat arbitrary or subject to convention; and so they have concluded that there's no definite number of worlds in the wavefunction. And the failure in their thinking occurs when they don't take the next step and say, is this at all consistent with reality? That is, if a quantum world is something whose existence is fuzzy and which doesn't even have a definite multiplicity - that is, we can't even say if there's one, two, or many of them - if those are the properties of a quantum world, then is it possible for the real world to be one of those? It's the failure to ask that last question, and really think about it, which must be the oversight allowing the nonsense-doctrine of "no definite number of worlds" to gain a foothold in the minds of otherwise rational people.

If this diagnosis is correct, then at some level it's a case of "treating the map as the territory" syndrome. A particular conception of the quantum-mechanical wavefunction is providing the "map" of reality, and the individual thinker is perhaps making correct statements about what's on their map, but they are failing to check the properties of the map against the properties of the territory. In this case, the property of reality that falsifies the map is, the fact that it definitely exists, or perhaps the corollary of that fact, that something which definitely exists definitely exists at least once, and therefore exists with a definite, objective multiplicity.

Trying to go further in the diagnosis, I can identify a few cognitive tendencies which may be contributing. First is the phenomenon of bundled assumptions which have never been made distinct and questioned separately. I suppose that in a few people's heads, there's a rapid movement from "science (or materialism) is correct" to "quantum mechanics is correct" to "Many Worlds is correct" to "the Oxford school of MWI is correct". If you are used to encountering all of those ideas together, it may take a while to realize that they are not linked out of logical necessity, but just contingently, by the narrowness of your own experience.

Second, it may seem that "no definite number of worlds" makes sense to an individual, because when they test their own worldview for semantic coherence, logical consistency, or empirical adequacy, it seems to pass. In the case of "no-collapse" or "no-splitting" versions of Many Worlds, it seems that it often passes the subjective making-sense test, because the individual is actually relying on ingredients borrowed from the Copenhagen interpretation. A semi-technical example would be the coefficients of a reduced density matrix. In the Copenhagen interpetation, they are probabilities. Because they have the mathematical attributes of probabilities (by this I just mean that they lie between 0 and 1), and because they can be obtained by strictly mathematical manipulations of the quantities composing the wavefunction, Many Worlds advocates tend to treat these quantities as inherently being probabilities, and use their "existence" as a way to obtain the Born probability rule from the ontology of "wavefunction yes, wavefunction collapse no". But just because something is a real number between 0 and 1, doesn't yet explain how it manages to be a probability. In particular, I would maintain that if you have a multiverse theory, in which all possibilities are actual, then a probability must refer to a frequency. The probability of an event in the multiverse is simply how often it occurs in the multiverse. And clearly, just having the number 0.5 associated with a particular multiverse branch is not yet the same thing as showing that the events in that branch occur half the time.

I don't have a good name for this phenomenon, but we could call it "borrowed support", in which a belief system receives support from considerations which aren't legitimately its own to claim. (Ayn Rand apparently talked about a similar notion of "borrowed concepts".)

Third, there is a possibility among people who have a capacity for highly abstract thought, to adopt an ideology, ontology, or "theory of everything" which is only expressed in those abstract terms, and to then treat that theory as the whole of reality, in a way that reifies the abstractions. This is a highly specific form of treating the map as the territory, peculiar to abstract thinkers. When someone says that reality is made of numbers, or made of computations, this is at work. In the case at hand, we're talking about a theory of physics, but the ontology of that theory is incompatible with the definiteness of one's own existence. My guess is that the main psychological factor at work here is intoxication with the feeling that one understands reality totally and in its essence. The universe has bowed to the imperial ego; one may not literally direct the stars in their courses, but one has known the essence of things. Combine that intoxication, with "borrowed support" and with the simple failure to think hard enough about where on the map the imperial ego itself might be located, and maybe you have a comprehensive explanation of how people manage to believe theories of reality which are flatly inconsistent with the most basic features of subjective experience.

I should also say something about Emile's example of the ink blots. I find it rather superficial to just say "there's no definite number of blots". To say that the number of blots depends on definition is a lot closer to being true, but that undermines the argument, because that opens the possibility that there is a right definition of "world", and many wrong definitions, and that the true number of worlds is just the number of worlds according to the right definition.

Emile's picture can be used for the opposite purpose. All we have to do is to scrutinize, more closely, what it actually is. It's a JPEG that is 314 pixels by 410 pixels in size. Each of those pixels will have an exact color coding. So clearly we can be entirely objective in the way we approach this question; all we have to do is be precise in our concepts, and engage with the genuine details of the object under discussion. Presumably the image is a scan of a physical object, but even in that case, we can be precise - it's made of atoms, they are particular atoms, we can make objective distinctions on the basis of contiguity and bonding between these atoms, and so the question will have an objective answer, if we bother to be sufficiently precise. The same goes for "worlds" or "branches" in a wavefunction. And the truly pernicious thing about this version of Many Worlds is that it prevents such inquiry. The ideology that tolerates vagueness about worlds serves to protect the proposed ontology from necessary scrutiny.

The same may be said, on a broader scale, of the practice of "dissolving a wrong question". That is a gambit which should be used sparingly and cautiously, because it easily serves to instead justify the dismissal of a legitimate question. A community trained to dismiss questions may never even notice the gaping holes in its belief system, because the lines of inquiry which lead towards those holes are already dismissed as invalid, undefined, unnecessary. smk came to this topic fresh, and without a head cluttered with ideas about what questions are legitimate and what questions are illegitimate, and as a result managed to ask something which more knowledgeable people had already prematurely dismissed from their own minds.

Pascal's wager re-examined

-8 PhilGoetz 05 October 2011 08:43AM

Let P(chr) = the probability that the statements attributed to Jesus of Nazareth and Paul of Tarsus regarding salvation and the afterlife are factually mostly correct; and let U(C) be the utility of action C, where C is in {Christianity, Islam, Judaism, atheism}.

Two of the key criticisms of Pascal's wager are that

  • limit U(Christianity)→∞, P(chr)→0 P(chr)U(Christianity) is undefined, and
  • invoking infinite utilities isn't fair.

If, however, P(chr) is not infinitessimal, and U(Christianity) is merely very large, these counter-arguments fail.

continue reading »

Only selfimmolate if you care about what foreigners think

-15 CharlieSheen 21 July 2011 10:25PM

Someone self immolates and explicitly states it is a form of political protest in Megdad. What a crazy regime! 
Someone self immolates and explicitly states it is a form of political protest in Hometown. What a crazy person!


Edit: What -5 already? What is giving an example of how people never take the outside view of their own society that bad a topic for the discussion section? Also disclaimer both Hometown State and Megdadistan Republic are fictional countries and no actual examples where given, to avoid mind killers.

2nd Edit: Wow I really need to spell this out? The media of Hometown are more likley to treat an immolation in Megdad as due to a legitimate grievance worthy of attention and down play any mental health problems or details that might paint the person in an unflattering light compared to someone who self-immolates in Hometown. And I think this effect is mostly not due to government enforced censorship or pressure.

 

Noble act of defiant self-sacrifice is far. Suicidal crazies are near.

 

The only way to get good coverage to acheive social change is to count on foreign media to paint a kind picture of you. And supposing your people care about what the media of Megdad say about your country.

3rd Edit: -15 Pretty clear that I'm wrong .

View more: Next