Muehlhauser-Goertzel Dialogue, Part 1

lukeprog

Part of the Muehlhauser interview series on AGI.

Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.

Ben Goertzel is the Chairman at the AGI company Novamente, and founder of the AGI conference series.

Luke Muehlhauser:

[Jan. 13th, 2012]

Ben, I'm glad you agreed to discuss artificial general intelligence (AGI) with me. There is much on which we agree, and much on which we disagree, so I think our dialogue will be informative to many readers, and to us!

Let us begin where we agree. We seem to agree that:

Involuntary death is bad, and can be avoided with the right technology.
Humans can be enhanced by merging with technology.
Humans are on a risky course in general, because powerful technologies can destroy us, humans are often stupid, and we are unlikely to voluntarily halt technological progress.
AGI is likely this century.
AGI will, after a slow or hard takeoff, completely transform the world. It is a potential existential risk, but if done wisely, could be the best thing that ever happens to us.
Careful effort will be required to ensure that AGI results in good things for humanity.

Next: Where do we disagree?

Two people might agree about the laws of thought most likely to give us an accurate model of the world, but disagree about which conclusions those laws of thought point us toward. For example, two scientists may use the same scientific method but offer two different models that seem to explain the data.

Or, two people might disagree about the laws of thought most likely to give us accurate models of the world. If that's the case, it will be no surprise that we disagree about which conclusions to draw from the data. We are not shocked when scientists and theologians end up with different models of the world.

Unfortunately, I suspect you and I disagree at the more fundamental level — about which methods of reasoning to use when seeking an accurate model of the world.

I sometimes use the term "Technical Rationality" to name my methods of reasoning. Technical Rationality is drawn from two sources: (1) the laws of logic, probability theory, and decision theory, and (2) the cognitive science of how our haphazardly evolved brains fail to reason in accordance with the laws of logic, probability theory, and decision theory.

Ben, at one time you tweeted a William S. Burroughs quote: "Rational thought is a failed experiment and should be phased out." I don't know whether Burroughs meant by "rational thought" the specific thing I mean by "rational thought," or what exactly you meant to express with your tweet, but I suspect we have different views of how to reason successfully about the world.

I think I would understand your way of thinking about AGI better if I understand your way of thinking about everything. For example: do you have reason to reject the laws of logic, probability theory, and decision theory? Do you think we disagree about the basic findings of the cognitive science of humans? What are your positive recommendations for reasoning about the world?

Ben Goertzel:

[Jan 13th, 2012]

Firstly, I don’t agree with that Burroughs quote that "Rational thought is a failed experiment” -- I mostly just tweeted it because I thought it was funny! I’m not sure Burroughs agreed with his own quote either. He also liked to say that linguistic communication was a failed experiment, introduced by women to help them oppress men into social conformity. Yet he was a writer and loved language. He enjoyed being a provocateur.

However, I do think that some people overestimate the power and scope of rational thought. That is the truth at the core of Burroughs’ entertaining hyperbolic statement....

I should clarify that I’m a huge fan of logic, reason and science. Compared to the average human being, I’m practically obsessed with these things! I don’t care for superstition, nor for unthinking acceptance of what one is told; and I spent a lot of time staring at data of various sorts, trying to understand the underlying reality in a rational and scientific way. So I don’t want to be pigeonholed as some sort of anti-rationalist!

However, I do have serious doubts both about the power and scope of rational thought in general -- and much more profoundly, about the power and scope of what you call “technical rationality.”

First of all, about the limitations of rational thought broadly conceived -- what one might call “semi-formal rationality”, as opposed to “technical rationality.” Obviously this sort of rationality has brought us amazing things, like science and mathematics and technology. Hopefully it will allow us to defeat involuntary death and increase our IQs by orders of magnitude and discover new universes, and all sorts of great stuff. However, it does seem to have its limits.

It doesn’t deal well with consciousness -- studying consciousness using traditional scientific and rational tools has just led to a mess of confusion. It doesn’t deal well with ethics either, as the current big mess regarding bioethics indicates.

And this is more speculative, but I tend to think it doesn’t deal that well with the spectrum of “anomalous phenomena” -- precognition, extrasensory perception, remote viewing, and so forth. I strongly suspect these phenomena exist, and that they can be understood to a significant extent via science -- but also that science as presently constituted may not be able to grasp them fully, due to issues like the mindset of the experimenter helping mold the results of the experiment.

There’s the minor issue of Hume’s problem of induction, as well. I.e., the issue that, in the rational and scientific world-view, that we have no rational reason to believe that any patterns observed in the past will continue into the future. This is an ASSUMPTION, plain and simple -- an act of faith. Occam’s Razor (which is one way of justifying and/or further specifying the belief that patterns observed in the past will continue into the future) is also an assumption and an act of faith. Science and reason rely on such acts of faith, yet provide no way to justify them. A big gap.

Furthermore -- and more to the point about AI -- I think there’s a limitation to the way we now model intelligence, which ties in with the limitations of the current scientific and rational approach. I have always advocated a view of intelligence as “achieving complex goals in complex environments”, and many others have formulated and advocated similar views. The basic idea here is that, for a system to be intelligent it doesn’t matter WHAT its goal is, so long as its goal is complex and it manages to achieve it. So the goal might be, say, reshaping every molecule in the universe into an image of Mickey Mouse. This way of thinking about intelligence, in which the goal is strictly separated from the methods for achieving it, is very useful and I’m using it to guide my own practical AGI work.

On the other hand, there’s also a sense in which reshaping every molecule in the universe into an image of Mickey Mouse is a STUPID goal. It’s somehow out of harmony with the Cosmos -- at least that’s my intuitive feeling. I’d like to interpret intelligence in some way that accounts for the intuitively apparent differential stupidity of different goals. In other words, I’d like to be able to deal more sensibly with the interaction of scientific and normative knowledge. This ties in with the incapacity of science and reason in their current forms to deal with ethics effectively, which I mentioned a moment ago.

I certainly don’t have all the answers here -- I’m just pointing out the complex of interconnected reasons why I think contemporary science and rationality are limited in power and scope, and are going to be replaced by something richer and better as the growth of our individual and collective minds progresses. What will this new, better thing be? I’m not sure -- but I have an inkling it will involve an integration of “third person” science/rationality with some sort of systematic approach to first-person and second-person experience.

Next, about “technical rationality” -- of course that’s a whole other can of worms. Semi-formal rationality has a great track record; it’s brought us science and math and technology, for example. So even if it has some limitations, we certainly owe it some respect! Technical rationality has no such track record, and so my semi-formal scientific and rational nature impels me to be highly skeptical of it! I have no reason to believe, at present, that focusing on technical rationality (as opposed to the many other ways to focus our attention, given our limited time and processing power) will generally make people more intelligent or better at achieving their goals. Maybe it will, in some contexts -- but what those contexts are, is something we don’t yet understand very well.

I provided consulting once to a project aimed at using computational neuroscience to understand the neurobiological causes of cognitive biases in people employed to analyze certain sorts of data. This is interesting to me; and it’s clear to me that in this context, minimization of some of these textbook cognitive biases would help these analysts to do their jobs better. I’m not sure how big an effect the reduction of these biases would have on their effectiveness, though, relative to other changes one might make, such as changes to their workplace culture or communication style.

On a mathematical basis, the justification for positing probability theory as the “correct” way to do reasoning under uncertainty relies on arguments like Cox’s axioms, or de Finetti’s Dutch Book arguments. These are beautiful pieces of math, but when you talk about applying them to the real world, you run into a lot of problems regarding the inapplicability of their assumptions. For instance, Cox’s axioms include an axiom specifying that (roughly speaking) multiple pathways of arriving at the same conclusion must lead to the same estimate of that conclusion’s truth value. This sounds sensible but in practice it’s only going to be achievable by minds with arbitrarily much computing capability at their disposal. In short, the assumptions underlying Cox’s axioms, de Finetti’s arguments, or any of the other arguments in favor of probability theory as the correct way of reasoning under uncertainty, do NOT apply to real-world intelligences operating under strictly bounded computational resources. They’re irrelevant to reality, except as inspirations to individuals of a certain cast of mind.

(An aside is that my own approach to AGI does heavily involve probability theory -- using a system I invented called Probabilistic Logic Networks, which integrates probability and logic in a unique way. I like probabilistic reasoning. I just don’t venerate it as uniquely powerful and important. In my OpenCog AGI architecture, it’s integrated with a bunch of other AI methods, which all have their own strengths and weaknesses.)

So anyway -- there’s no formal mathematical reason to think that “technical rationality” is a good approach in real-world situations; and “technical rationality” has no practical track record to speak of. And ordinary, semi-formal rationality itself seems to have some serious limitations of power and scope.

So what’s my conclusion? Semi-formal rationality is fantastic and important and we should use it and develop it -- but also be open to the possibility of its obsolescence as we discover broader and more incisive ways of understanding the universe (and this is probably moderately close to what William Burroughs really thought). Technical rationality is interesting and well worth exploring but we should still be pretty skeptical of its value, at this stage -- certainly, anyone who has supreme confidence that technical rationality is going to help humanity achieve its goals better, is being rather IRRATIONAL ;-) ….

In this vein, I’ve followed the emergence of the Less Wrong community with some amusement and interest. One ironic thing I’ve noticed about this community of people intensely concerned with improving their personal rationality is: by and large, these people are already hyper-developed in the area of rationality, but underdeveloped in other ways! Think about it -- who is the prototypical Less Wrong meetup participant? It’s a person who’s very rational already, relative to nearly all other humans -- but relatively lacking in other skills like intuitively and empathically understanding other people. But instead of focusing on improving their empathy and social intuition (things they really aren’t good at, relative to most humans), this person is focusing on fine-tuning their rationality more and more, via reprogramming their brains to more naturally use “technical rationality” tools! This seems a bit imbalanced. If you’re already a fairly rational person but lacking in other aspects of human development, the most rational thing may be NOT to focus on honing your “rationality fu” and better internalizing Bayes’ rule into your subconscious -- but rather on developing those other aspects of your being.... An analogy would be: If you’re very physically strong but can’t read well, and want to self-improve, what should you focus your time on? Weight-lifting or literacy? Even if greater strength is ultimately your main goal, one argument for focusing on literacy would be that you might read something that would eventually help you weight-lift better! Also you might avoid getting ripped off by a corrupt agent offering to help you with your bodybuilding career, due to being able to read your own legal contracts. Similarly, for people who are more developed in terms of rational inference than other aspects, the best way for them to become more rational might be for them to focus time on these other aspects (rather than on fine-tuning their rationality), because this may give them a deeper and broader perspective on rationality and what it really means.

Finally, you asked: “What are your positive recommendations for reasoning about the world?” I’m tempted to quote Nietzsche’s Zarathustra, who said “Go away from me and resist Zarathustra!” I tend to follow my own path, and generally encourage others to do the same. But I guess I can say a few more definite things beyond that....

To me it’s all about balance. My friend Allan Combs calls himself a “philosophical Taoist” sometimes; I like that line! Think for yourself; but also, try to genuinely listen to what others have to say. Reason incisively and analytically; but also be willing to listen to your heart, gut and intuition, even if the logical reasons for their promptings aren’t apparent. Think carefully through the details of things; but don’t be afraid to make wild intuitive leaps. Pay close mind to the relevant data and observe the world closely and particularly; but don’t forget that empirical data is in a sense a product of the mind, and facts only have meaning in some theoretical context. Don’t let your thoughts be clouded by your emotions; but don’t be a feeling-less automaton, don’t make judgments that are narrowly rational but fundamentally unwise. As Ben Franklin said, “Moderation in all things, including moderation.”

Luke:

[Jan 14th, 2012]

I whole-heartedly agree that there are plenty of Less Wrongers who, rationally, should spend less time studying rationality and more time practicing social skills and generic self-improvement methods! This is part of why I've written so many scientific self-help posts for Less Wrong: Scientific Self Help, How to Beat Procrastination, How to Be Happy, Rational Romantic Relationships, and others. It's also why I taught social skills classes at our two summer 2011 rationality camps.

Back to rationality. You talk about the "limitations" of "what one might call 'semi-formal rationality', as opposed to 'technical rationality.'" But I argued for technical rationality, so: what are the limitations of technical rationality? Does it, as you claim for "semi-formal rationality," fail to apply to consciousness or ethics or precognition? Does Bayes' Theorem remain true when looking at the evidence about awareness, but cease to be true when we look at the evidence concerning consciousness or precognition?

You talk about technical rationality's lack of a track record, but I don't know what you mean. Science was successful because it did a much better job of approximating perfect Bayesian probability theory than earlier methods did (e.g. faith, tradition), and science can be even more successful when it tries harder to approximate perfect Bayesian probability theory — see The Theory That Would Not Die.

You say that "minimization of some of these textbook cognitive biases would help [some] analysts to do their jobs better. I’m not sure how big an effect the reduction of these biases would have on their effectiveness, though, relative to other changes one might make, such as changes to their workplace culture or communication style." But this misunderstands what I mean by Technical Rationality. If teaching these people about cognitive biases would lower the expected value of some project, then technical rationality would recommend against teaching these people cognitive biases (at least, for the purposes of maximizing the expected value of that project). Your example here is a case of Straw Man Rationality. (But of course I didn't expect you to know everything I meant by Technical Rationality in advance! Though, I did provide a link to an explanation of what I meant by Technical Rationality in my first entry, above.)

The same goes for your dismissal of probability theory's foundations. You write that "In short, the assumptions underlying Cox’s axioms, de Finetti’s arguments, or any of the other arguments in favor of probability theory as the correct way of reasoning under uncertainty, do NOT apply to real-world intelligences operating under strictly bounded computational resources." Yes, we don't have infinite computing power. The point is that Bayesian probability theory is an ideal that can be approximated by finite beings. That's why science works better than faith — it's a better approximation of using probability theory to reason about the world, even though science is still a long way from a perfect use of probability theory.

Re: goals. Your view of intelligence as "achieving complex goals in complex environments" does, as you say, assume that "the goal is strictly separated from the methods for achieving it." I prefer a definition of intelligence as "efficient cross-domain optimization", but my view — like yours — also assumes that goals (what one values) are logically orthogonal to intelligence (one's ability to achieve what one values).

Nevertheless, you report an intuition that shaping every molecule into an image of Mickey Mouse is a "stupid" goal. But I don't know what you mean by this. A goal of shaping every molecule into an image of Mickey Mouse is an instrumentally intelligent goal if one's utility function will be maximized that way. Do you mean that it's a stupid goal according to your goals? But of course. This is, moreover, what we would expect your intuitive judgments to report, even if your intuitive judgments are irrelevant to the math of what would and wouldn't be an instrumentally intelligent goal for a different agent to have. The Mickey Mouse goal is "stupid" only by a definition of that term that is not the opposite of the explicit definitions either of us gave "intelligent," and it's important to keep that clear. And I certainly don't know what "out of harmony with the Cosmos" is supposed to mean.

Re: induction. I won't dive into that philosophical morass here. Suffice it to say that my views on the matter are expressed pretty well in Where Recursive Justification Hits Bottom, which is also a direct response to your view that science and reason are great but rely on "acts of faith."

Your final paragraph sounds like common sense, but it's too vague, as I think you would agree. One way to force a more precise answer to such questions is to think of how you'd program it into an AI. As Daniel Dennett said, "AI makes philosophy honest."

How would you program an AI to learn about reality, if you wanted it to have the most accurate model of reality possible? You'd have to be a bit more specific than "Think for yourself; but also, try to genuinely listen to what others have to say. Reason incisively and analytically; but also be willing to listen to your heart, gut and intuition…"

My own answer to the question of how I would program an AI to build as accurate a model of reality as possible is this: I would build it to use computable approximations of perfect technical rationality — that is, roughly: computable approximations of Solomonoff induction and Bayesian decision theory.

Ben:

[Jan 21st, 2012]

Bayes Theorem is “always true” in a formal sense, just like 1+1=2, obviously. However, the connection between formal mathematics and subjective experience, is not something that can be fully formalized.

Regarding consciousness, there are many questions, including what counts as “evidence.” In science we typically count something as evidence if the vast majority of the scientific community counts it as a real observation -- so ultimately the definition of “evidence” bottoms out in social agreement. But there’s a lot that’s unclear in this process of classifying an observation as evidence via a process of social agreement among multiple minds. This unclarity is mostly irrelevant to the study of trajectories of basketballs, but possibly quite relevant to study of consciousness.

Regarding psi, there are lots of questions, but one big problem is that it’s possible the presence and properties of a psi effect may depend on the broad context of the situation whether the effect takes place. Since we don’t know which aspects of the context are influencing the psi effect, we don’t know how to construct controlled experiments to measure psi. And we may not have the breadth of knowledge nor the processing power to reason about all the relevant context to a psi experiment, in a narrowly “technically rational” way.... I do suspect one can gather solid data demonstrating and exploring psi (and based on my current understanding, it seems this has already been done to a significant extent by the academic parapsychology community; see a few links I’ve gathered here), but I also suspect there many be aspects that elude the traditional scientific method, but are nonetheless perfectly real aspects of the universe.

Anyway both consciousness and psi are big, deep topics, and if we dig into them in detail, this interview will become longer than either of us has time for...

About the success of science -- I don’t really accept your Bayesian story for why science was successful. It’s naive for reasons much discussed by philosophers of science. My own take on the history and philosophy of science, from a few years back, is here (that article was the basis for a chapter in The Hidden Pattern, also). My goal in that essay was “a philosophical perspective that does justice to both the relativism and sociological embeddedness of science, and the objectivity and rationality of science.” It seems you focus overly much on the latter and ignore the former. That article tries to explain why probabilist explanations of real-world science are quite partial and miss a lot of the real story. But again, a long debate on the history of science would take us too far off track from the main thrust of this interview.

About technical rationality, cognitive biases, etc. -- I did read that blog entry that you linked, on technical rationality. Yes, it’s obvious that focusing on teaching an employee to be more rational, need not always be the most rational thing for an employer do, even if that employer has a purely rationalist world-view. For instance, if I want to train an attack dog, I may do better by focusing limited time and attention on increasing his strength rather than his rationality. My point was that there’s a kind of obsession with rationality in some parts of the intellectual community (e.g. some of the Less Wrong orbit) that I find a bit excessive and not always productive. But your reply impels me to distinguish two ways this excess may manifest itself:

Excessive belief that rationality is the “right” way to solve problems and think about issues, in principle
Excessive belief that, tactically, explicitly employing tools of technical rationality is a good way to solve problems in the real world

Psychologically I think these two excesses probably tend to go together, but they’re not logically coupled. In principle, someone could hold either one, but not the other.

This sort of ties in with your comments on science and faith. You view science as progress over faith -- and I agree if you interpret “faith” to mean “traditional religions.” But if you interpret “faith” more broadly, I don’t see a dichotomy there. Actually, I find the dichotomy between “science” and “faith” unfortunately phrased, since science itself ultimately relies on acts of faith also. The “problem of induction” can’t be solved, so every scientist must base his extrapolations from past into future based on some act of faith. It’s not a matter of science vs. faith, it’s a matter of what one chooses to place one’s faith in. I’d personally rather place faith in the idea that patterns observed in the past will likely continue into the future (as one example of a science-friendly article of faith), than in the word of some supposed “God” -- but I realize I’m still making an act of faith.

This ties in with the blog post “Where Recursive Justification Hits Bottom” that you pointed out. It’s pleasant reading but of course doesn’t provide any kind of rational argument against my views. In brief, according to my interpretation, it articulates a faith in the process of endless questioning:

The important thing is to hold nothing back in your criticisms of how to criticize; nor should you regard the unavoidability of loopy justifications as a warrant of immunity from questioning.

I share that faith, personally.

Regarding approximations to probabilistic reasoning under realistic conditions (of insufficient resources), the problem is that we lack rigorous knowledge about what they are. We don’t have any theorems telling us what is the best way to reason about uncertain knowledge, in the case that our computational resources are extremely restricted. You seem to be assuming that the best way is to explicitly use the rules of probability theory, but my point is that there is no mathematical or scientific foundation for this belief. You are making an act of faith in the doctrine of probability theory! You are assuming, because it feels intuitively and emotionally right to you, that even if the conditions of the arguments for the correctness of probabilistic reasoning are NOT met, then it still makes sense to use probability theory to reason about the world. But so far as I can tell, you don’t have a RATIONAL reason for this assumption, and certainly not a mathematical reason.

Re your response to my questioning the reduction of intelligence to goals and optimization -- I understand that you are intellectually committed to the perspective of intelligence in terms of optimization or goal-achievement or something similar to that. Your response to my doubts about this perspective basically just re-asserts your faith in the correctness and completeness of this sort of perspective. Your statement

The Mickey Mouse goal is "stupid" only by a definition of that term that is not the opposite of the explicit definitions either of us gave "intelligent," and it's important to keep that clear

basically asserts that it’s important to agree with your opinion on the ultimate meaning of intelligence!

On the contrary, I think it’s important to explore alternatives to the understanding of intelligence in terms of optimization or goal-achievement. That is something I’ve been thinking about a lot lately. However, I don’t have a really crisply-formulated alternative yet.

As a mathematician, I tend not to think there’s a “right” definition for anything. Rather, one explains one’s definitions, and then works with them and figures out their consequences. In my AI work, I’ve provisionally adopted a goal-achievemement based understanding of intelligence -- and have found this useful, to a significant extent. But I don’t think this is the true and ultimate way to understand intelligence. I think the view of intelligence in terms of goal-achievement or cross-domain optimization misses something, which future understandings of intelligence will encompass. I’ll venture that in 100 years the smartest beings on Earth will have a rigorous, detailed understanding of intelligence according to which

The Mickey Mouse goal is "stupid" only by a definition of that term that is not the opposite of the explicit definitions either of us gave "intelligent," and it's important to keep that clear

seems like rubbish.....

As for your professed inability to comprehend the notion of “harmony with the Cosmos” -- that’s unfortunate for you, but I guess trying to give you a sense for that notion, would take us way too far afield in this dialogue!

Finally, regarding your complaint that my indications regarding how to understanding the world are overly vague. Well -- according to Franklin’s idea of “Moderation in all things, including moderation”, one should also exercise moderation in precisiation. Not everything needs to be made completely precise and unambiguous (fortunately, since that’s not feasible anyway).

I don’t know how I would program an AI to build as accurate a model of reality as possible, if that were my goal. I’m not sure that’s the best goal for AI development, either. An accurate model in itself, doesn’t do anything helpful. My best stab in the direction of how I would ideally create an AI, if computational resource restrictions were no issue, is the GOLEM design that I described here. GOLEM is a design for a strongly self-modifying superintelligent AI system, which might plausibly have the possibility of retaining its initial goal system through successive self-modifications. However, it’s unclear to me whether it will ever be feasible to build.

You mention Solomonoff induction and Bayesian decision theory. But these are abstract mathematical constructs, and it’s unclear to me whether it will ever be feasible to build an AI system fundamentally founded on these ideas, and operating within feasible computational resources. Marcus Hutter and Juergen Schmidhuber and their students are making some efforts in this direction, and I admire those researchers and this body of work, but don’t currently have a high estimate of its odds of leading to any sort of powerful real-world AGI system.

Most of my thinking about AGI has gone into the more practical problem of how to make a human-level AGI

using currently feasible computational resources
that will most likely be helpful rather than harmful in terms of the things I value
that will be smoothly extensible to intelligence beyond the human level as well.

For this purpose, I think Solomonoff induction and probability theory are useful, but aren’t all-powerful guiding principles. For instance, in the OpenCog AGI design (which is my main practical AGI-oriented venture at present), there is a component doing automated program learning of small programs -- and inside our program learning algorithm, we explicitly use an Occam bias, motivated by the theory of Solomonoff induction. And OpenCog also has a probabilistic reasoning engine, based on the math of Probabilistic Logic Networks (PLN). I don’t tend to favor the language of “Bayesianism”, but I would suppose PLN should be considered “Bayesian” since it uses probability theory (including Bayes rule) and doesn’t make a lot of arbitrary, a priori distributional assumptions. The truth value formulas inside PLN are based on an extension of imprecise probability theory, which in itself is an extension of standard Bayesian methods (looking at envelopes of prior distributions, rather than assuming specific priors).

In terms of how to get an OpenCog system to model the world effectively and choose its actions appropriately, I think teaching it and working together with it, will be be just as important as programming it. Right now the project is early-stage and the OpenCog design is maybe 50% implemented. But assuming the design is right, once the implementation is done, we’ll have a sort of idiot savant childlike mind, that will need to be educated in the ways of the world and humanity, and to learn about itself as well. So the general lessons of how to confront the world, that I cited above, would largely be imparted via interactive experiential learning, vaguely the same way that human kids learn to confront the world from their parents and teachers.

Drawing a few threads from this conversation together, it seems that

I think technical rationality, and informal semi-rationality, are both useful tools for confronting life -- but not all-powerful
I think Solomonoff induction and probability theory are both useful tools for constructing AGI systems -- but not all-powerful

whereas you seem to ascribe a more fundamental, foundational basis to these particular tools.

Luke:

[Jan. 21st, 2012]

To sum up, from my point of view:

We seem to disagree on the applications of probability theory. For my part, I'll just point people to A Technical Explanation of Technical Explanation.
I don't think we disagree much on the "sociological embeddedness" of science.
I'm also not sure how much we really disagree about Solomonoff induction and Bayesian probability theory. I've already agreed that no machine will use these in practice because they are not computable — my point was about their provable optimality given infinite computation (subject to qualifications; see AIXI).

You've definitely misunderstood me concerning "intelligence." This part is definitely not true: "I understand that you are intellectually committed to the perspective of intelligence in terms of optimization or goal-achievement or something similar to that. Your response assumes the correctness and completeness of this sort of perspective." Intelligence as efficient cross-domain optimization is merely a stipulated definition. I'm happy to use other definitions of intelligence in conversation, so long as we're clear which definition we're using when we use the word. Or, we can replace the symbol with the substance and talk about "efficient cross-domain optimization" or "achieving complex goals in complex environments" without ever using the word "intelligence."

My point about the Mickey Mouse goal was that when you called the Mickey Mouse goal "stupid," this could be confusing, because "stupid" is usually the opposite of "intelligent," but your use of "stupid" in that sentence didn't seem to be the opposite of either definition of intelligence we each gave. So I'm still unsure what you mean by calling the Mickey Mouse goal "stupid."

This topic provides us with a handy transition away from philosophy of science and toward AGI. Suppose there was a machine with a vastly greater-than-human capacity for either "achieving complex goals in complex environments" or for "efficient cross-domain optimization." And suppose that machine's utility function would be maximized by reshaping every molecule into a Mickey Mouse shape. We can avoid the tricky word "stupid," here. The question is: Would that machine decide to change its utility function so that it doesn't continue to reshape every molecule into a Mickey Mouse shape? I think this is unlikely, for reasons discussed in Omohundro (2008).

I suppose a natural topic of conversation for us would be your October 2010 blog post The Singularity Institute's's Scary Idea (and Why I Don't Buy It). Does that post still reflect your views pretty well, Ben?

Ben:

[Mar 10th, 2012]

About the hypothetical uber-intelligence that wants to tile the cosmos with molecular Mickey Mouses -- I truly don’t feel confident making any assertions about a real-world system with vastly greater intelligence than me. There are just too many unknowns. Sure, according to certain models of the universe and intelligence that may seem sensible to some humans, it’s possible to argue that a hypothetical uber-intelligence like that would relentlessly proceed in tiling the cosmos with molecular Mickey Mouses. But so what? We don’t even know that such an uber-intelligence is even a possible thing -- in fact my intuition is that it’s not possible.

Why may it not be possible to create a very smart AI system that is strictly obsessed with that stupid goal? Consider first that it may not be possible to create a real-world, highly intelligent system that is strictly driven by explicit goals -- as opposed to being partially driven by implicit, “unconscious” (in the sense of deliberative, reflective consciousness) processes that operate in complex interaction with the world outside the system. Because pursuing explicit goals is quite computationally costly compared to many other sorts of intelligent processes. So if a real-world system is necessarily not wholly explicit-goal-driven, it may be that intelligent real-world systems will naturally drift away from certain goals and toward others. My strong intuition is that the goal of tiling the universe with molecular Mickey Mouses would fall into that category. However, I don’t yet have any rigorous argument to back this up. Unfortunately my time is limited, and while I generally have more fun theorizing and philosophizing than working on practical projects, I think it’s more important for me to push toward building AGI than just spend all my time on fun theory. (And then there’s the fact that I have to spend a lot of my time on applied narrow-AI projects to pay the mortgage and put my kids through college, etc.)

But anyway -- you don’t have any rigorous argument to back up the idea that a system like you posit is possible in the real-world, either! And SIAI has staff who, unlike me, are paid full-time to write and philosophize … and they haven’t come up with a rigorous argument in favor of the possibility of such a system, either. Although they have talked about it a lot, though usually in the context of paperclips rather than Mickey Mouses.

So, I’m not really sure how much value there is in this sort of thought-experiment about pathological AI systems that combine massively intelligent practical problem solving capability with incredibly stupid goals (goals that may not even be feasible for real-world superintelligences to adopt, due to their stupidity).

Regarding the concept of a “stupid goal” that I keep using, and that you question -- I admit I’m not quite sure how to formulate rigorously the idea that tiling the universe with Mickey Mouses is a stupid goal. This is something I’ve been thinking about a lot recently. But here’s a first rough stab in that direction: I think that if you created a highly intelligent system, allowed it to interact fairly flexibly with the universe, and also allowed it to modify its top-level goals in accordance with its experience, you’d be very unlikely to wind up with a system that had this goal (tiling the universe with Mickey Mouses). That goal is out of sync with the Cosmos, in the sense that an intelligent system that’s allowed to evolve itself in close coordination with the rest of the universe, is very unlikely to arrive at that goal system. I don’t claim this is a precise definition, but it should give you some indication of the direction I’m thinking in....

The tricky thing about this way of thinking about intelligence, which classifies some goals as “innately” stupider than others, is that it places intelligence not just in the system, but in the system’s broad relationship to the universe -- which is something that science, so far, has had a tougher time dealing with. It’s unclear to me which aspects of the mind and universe science, as we now conceive it, will be able to figure out. I look forward to understanding these aspects more fully....

About my blog post on “The Singularity Institute’s Scary Idea” -- yes, that still reflects my basic opinion. After I wrote that blog post, Michael Anissimov -- a long-time SIAI staffer and zealot whom I like and respect greatly -- told me he was going to write up and show me a systematic, rigorous argument as to why “an AGI not built based on a rigorous theory of Friendliness is almost certain to kill all humans” (the proposition I called “SIAI’s Scary Idea”). But he hasn’t followed through on that yet -- and neither has Eliezer or anyone associated with SIAI.

Just to be clear, I don’t really mind that SIAI folks hold that “Scary Idea” as an intuition. But I find it rather ironic when people make a great noise about their dedication to rationality, but then also make huge grand important statements about the future of humanity, with great confidence and oomph, that are not really backed up by any rational argumentation. This ironic behavior on the part of Eliezer, Michael Anissimov and other SIAI principals doesn’t really bother me, as I like and respect them and they are friendly to me, and we’ve simply “agreed to disagree” on these matters for the time being. But the reason I wrote that blog post is because my own blog posts about AGI were being trolled by SIAI zealots (not the principals, I hasten to note) leaving nasty comments to the effect of “SIAI has proved that if OpenCog achieves human level AGI, it will kill all humans.“ Not only has SIAI not proved any such thing, they have not even made a clear rational argument!

As Eliezer has pointed out to me several times in conversation, a clear rational argument doesn’t have to be mathematical. A clearly formulated argument in the manner of analytical philosophy, in favor of the Scary Idea, would certainly be very interesting. For example, philosopher David Chalmers recently wrote a carefully-argued philosophy paper arguing for the plausibility of a Singularity in the next couple hundred years. It’s somewhat dull reading, but it’s precise and rigorous in the manner of analytical philosophy, in a manner that Kurzweil’s writing (which is excellent in its own way) is not. An argument in favor of the Scary Idea, on the level of Chalmers’ paper on the Singularity, would be an excellent product for SIAI to produce. Of course a mathematical argument might be even better, but that may not be feasible to work on right now, given the state of mathematics today. And of course, mathematics can’t do everything -- there’s still the matter of connecting mathematics to everyday human experience, which analytical philosophy tries to handle, and mathematics by nature cannot.

My own suspicion, of course, is that in the process of trying to make a truly rigorous analytical philosophy style formulation of the argument for the Scary Idea, the SIAI folks will find huge holes in the argument. Or, maybe they already intuitively know the holes are there, which is why they have avoided presenting a rigorous write-up of the argument!!

Luke:

[Mar 11th, 2012]

I'll drop the stuff about Mickey Mouse so we can move on to AGI. Readers can come to their own conclusions on that.

Your main complaint seems to be that the Singularity Institute hasn't written up a clear, formal argument (in analytic philosophy's sense, if not the mathematical sense) in defense of our major positions — something like Chalmers' "The Singularity: A Philosophical Analysis" but more detailed.

I have the same complaint. I wish "The Singularity: A Philosophical Analysis" had been written 10 years ago, by Nick Bostrom and Eliezer Yudkowsky. It could have been written back then. Alas, we had to wait for Chalmers to speak at Singularity Summit 2009 and then write a paper based on his talk. And if it wasn't for Chalmers, I fear we'd still be waiting for such an article to exist. (Bostrom's forthcoming Superintelligence book should be good, though.)

I was hired by the Singularity Institute in September 2011 and have since then co-written two papers explaining some of the basics: "Intelligence Explosion: Evidence and Import" and "The Singularity and Machine Ethics". I also wrote the first ever outline of categories of open research problems in AI risk, cheekily titled "So You Want to Save the World". I'm developing other articles on "the basics" as quickly as I can. I would love to write more, but alas, I'm also busy being the Singularity Institute's Executive Director.

Perhaps we could reframe our discussion around the Singularity Institute's latest exposition of its basic ideas, "Intelligence Explosion: Evidence and Import"? Which claims in that paper do you most confidently disagree with, and why?

Ben:

[Mar 11th, 2012]

You say “Your main complaint seems to be that the Singularity Institute hasn't written up a clear, formal argument (in analytic philosophy's sense, if not the mathematical sense) in defense of our major positions “. Actually, my main complaint is that some of SIAI’s core positions seem almost certainly WRONG, and yet they haven’t written up a clear formal argument trying to justify these positions -- so it’s not possible to engage SIAI in rational discussion on their apparently wrong positions. Rather, when I try to engage SIAI folks about these wrong-looking positions (e.g. the “Scary Idea” I mentioned above), they tend to point me to Eliezer’s blog (“Less Wrong”) and tell me that if I studied it long and hard enough, I would find that the arguments in favor of SIAI’s positions are implicit there, just not clearly articulated in any one place. This is a bit frustrating to me -- SIAI is a fairly well-funded organization involving lots of smart people and explicitly devoted to rationality, so certainly it should have the capability to write up clear arguments for its core positions... if these arguments exist. My suspicion is that the Scary Idea, for example, is not backed up by any clear rational argument -- so the reason SIAI has not put forth any clear rational argument for it, is that they don’t really have one! Whereas Chalmers’ paper carefully formulated something that seemed obviously true...

Regarding the paper "Intelligence Explosion: Evidence and Import", I find its contents mainly agreeable -- and also somewhat unoriginal and unexciting, given the general context of 2012 Singularitarianism. The paper’s three core claims that

(1) there is a substantial chance we will create human-level AI before 2100, that (2) if human-level AI is created, there is a good chance vastly superhuman AI will follow via an "intelligence explosion," and that (3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it.

are things that most “Singularitarians” would agree with. The paper doesn’t attempt to argue for the “Scary Idea” or Coherent Extrapolated Volition or the viability of creating some sort of provably Friendly AI, -- or any of the other positions that are specifically characteristic of SIAI. Rather, the paper advocates what one might call “plain vanilla Singularitarianism.” This may be a useful thing to do, though, since after all there are a lot of smart people out there who aren’t convinced of plain vanilla Singularitarianism.

I have a couple small quibbles with the paper, though. I don’t agree with Omohundro’s argument about the “basic AI drives” (though Steve is a friend and I greatly respect his intelligence and deep thinking). Steve’s argument for the inevitability of these drives in AIs is based on evolutionary ideas, and would seem to hold up in the case that there is a population of distinct AIs competing for resources -- but the argument seems to fall apart in the case of other possibilities like an AGI mindplex (a network of minds with less individuality than current human minds, yet not necessarily wholly blurred into a single mind -- rather, with reflective awareness and self-modeling at both the individual and group level).

Also, my “AI Nanny” concept is dismissed too quickly for my taste (though that doesn’t surprise me!). You suggest in this paper that to make an AI Nanny, it would likely be necessary to solve the problem of making an AI’s goal system persist under radical self-modification. But you don’t explain the reasoning underlying this suggestion (if indeed you have any). It seems to me -- as I say in my “AI Nanny” paper -- that one could probably make an AI Nanny with intelligence significantly beyond the human level, without having to make an AI architecture oriented toward radical self-modification. If you think this is false, it would be nice for you to explain why, rather than simply asserting your view. And your comment “Those of us working on AI safety theory would very much appreciate the extra time to solve the problems of AI safety...” carries the hint that I (as the author of the AI Nanny idea) am NOT working on AI safety theory. Yet my GOLEM design is a concrete design for a potentially Friendly AI (admittedly not computationally feasible using current resources), and in my view constitutes greater progress toward actual FAI than any of the publications of SIAI so far. (Of course, various SIAI associated folks often allude that there are great, unpublished discoveries about FAI hidden in the SIAI vaults -- a claim I somewhat doubt, but can’t wholly dismiss of course....)

Anyway, those quibbles aside, my main complaint about the paper you cite is that it sticks to “plain vanilla Singularitarianism” and avoids all of the radical, controversial positions that distinguish SIAI from myself, Ray Kurzweil, Vernor Vinge and the rest of the Singularitarian world. The crux of the matter, I suppose is the third main claim of the paper,

(3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it.

This statement is hedged in such a way as to be almost obvious. But yet, what SIAI folks tend to tell me verbally and via email and blog comments is generally far more extreme than this bland and nearly obvious statement.

As an example, I recall when your co-author on that article, Anna Salamon, guest lectured in the class on Singularity Studies that my father and I were teaching at Rutgers University in 2010. Anna made the statement, to the students, that (I’m paraphrasing, though if you’re curious you can look up the online course session which was saved online and find her exact wording) “If a superhuman AGI is created without being carefully based on an explicit Friendliness theory, it is ALMOST SURE to destroy humanity.” (i.e., what I now call SIAI’s Scary Idea)

I then asked her (in the online class session) why she felt that way, and if she could give any argument to back up the idea.

She gave the familiar SIAI argument that, if one picks a mind at random from “mind space”, the odds that it will be Friendly to humans are effectively zero.

I made the familiar counter-argument that this is irrelevant, because nobody is advocating building a random mind. Rather, what some of us are suggesting is to build a mind with a Friendly-looking goal system, and a cognitive architecture that’s roughly human-like in nature but with a non-human-like propensity to choose its actions rationally based on its goals, and then raise this AGI mind in a caring way and integrate it into society. Arguments against the Friendliness of random minds are irrelevant as critiques of this sort of suggestion.

So, then she fell back instead on the familiar (paraphrasing again) “OK, but you must admit there’s a non-zero risk of such an AGI destroying humanity, so we should be very careful -- when the stakes are so high, better safe than sorry!”

I had pretty much the same exact argument with SIAI advocates Tom McCabe and Michael Anissimov on different occasions; and also, years before, with Eliezer Yudkowsky and Michael Vassar -- and before that, with (former SIAI Executive Director) Tyler Emerson. Over all these years, the SIAI community maintains the Scary Idea in its collective mind, and also maintains a great devotion to the idea of rationality, but yet fails to produce anything resembling a rational argument for the Scary Idea -- instead repetitiously trotting out irrelevant statements about random minds!!

What I would like is for SIAI to do one of these three things, publicly:

Repudiate the Scary Idea
Present a rigorous argument that the Scary Idea is true
State that the Scary Idea is a commonly held intuition among the SIAI community, but admit that no rigorous rational argument exists for it at this point

Doing any one of these things would be intellectually honest. Presenting the Scary Idea as a confident conclusion, and then backing off when challenged into a platitudinous position equivalent to “there’s a non-zero risk … better safe than sorry...”, is not my idea of an intellectually honest way to do things.

Why does this particular point get on my nerves? Because I don’t like SIAI advocates telling people that I, personally, am on a R&D course where if I succeed I am almost certain to destroy humanity!!! That frustrates me. I don’t want to destroy humanity; and if someone gave me a rational argument that my work was most probably going to be destructive to humanity, I would stop doing the work and do something else with my time! But the fact that some other people have a non-rational intuition that my work, if successful, would be likely to destroy the world -- this doesn’t give me any urge to stop. I’m OK with the fact that some other people have this intuition -- but then I’d like them to make clear, when they state their views, that these views are based on intuition rather than rational argument. I will listen carefully to rational arguments that contravene my intuition -- but if it comes down to my intuition versus somebody else’s, in the end I’m likely to listen to my own, because I’m a fairly stubborn maverick kind of guy....

Luke:

[Mar 11th, 2012]

Ben, you write:

when I try to engage SIAI folks about these wrong-looking positions (e.g. the “Scary Idea” I mentioned above), they tend to point me to Eliezer’s blog (“Less Wrong”) and tell me that if I studied it long and hard enough, I would find that the arguments in favor of SIAI’s positions are implicit there, just not clearly articulated in any one place. This is a bit frustrating to me...

No kidding! It's very frustrating to me, too. That's one reason I'm working to clearly articulate the arguments in one place, starting with articles on the basics like "Intelligence Explosion: Evidence and Import."

I agree that "Intelligence Explosion: Evidence and Import" covers only the basics and does not argue for several positions associated uniquely with the Singularity Institute. It is, after all, the opening chapter of a book intelligence explosion, not the opening chapter of a book on the Singularity Institute's ideas!

I wanted to write that article first, though, so the Singularity Institute could be clear on the basics. For example, we needed to be clear that: (1) we are not Kurzweil, and our claims don't depend on his detailed storytelling or accelerating change curves, that (2) technological prediction is hard, and we are not being naively overconfident about AI timelines, and that (3) intelligence explosion is a convergent outcome of many paths the future may take. There is also much content that is not found in, for example, Chalmers' paper: (a) an overview of methods of technological prediction, (b) an overview of speed bumps and accelerators toward AI, (c) a reminder of breakthroughs like AIXI, and (d) a summary of AI advantages. (The rest is, as you say, mostly a brief overview of points that have been made elsewhere. But brief overviews are extremely useful!)

...my “AI Nanny” concept is dismissed too quickly for my taste...

No doubt! I think the idea is clearly worth exploring in several papers devoted to the topic.

It seems to me -- as I say in my “AI Nanny” paper -- that one could probably make an AI Nanny with intelligence significantly beyond the human level, without having to make an AI architecture oriented toward radical self-modification.

Whereas I tend to buy Omohundro's arguments that advanced AIs will want to self-improve just like humans want to self-improve, so that they become better able to achieve their final goals. Of course, we disagree on Omohundro's arguments — a topic to which I will return in a moment.

your comment "Those of us working on AI safety theory would very much appreciate the extra time to solve the problems of AI safety..." carries the hint that I (as the author of the AI Nanny idea) am NOT working on AI safety theory...

I didn't mean for it to carry that connotation. GOLEM and Nanny AI are both clearly AI safety ideas. I'll clarify that part before I submit a final draft to the editors.

Moving on: If you are indeed remembering your conversations with Anna, Michael, and others correctly, then again I sympathize with your frustration. I completely agree that it would be useful for the Singularity Institute to produce clear, formal arguments for the important positions it defends. In fact, just yesterday I was talking to Nick Beckstead about how badly both of us want to write these kinds of papers if we can find the time.

So, to respond to your wish that the Singularity Institute choose among three options, my plan is to (1) write up clear arguments for... well, if not "SIAI's Big Scary Idea" then for whatever I end up believing after going through the process of formalizing the arguments, and (2) publicly state (right now) that SIAI's Big Scary Idea is a commonly held view at the Singularity Institute but a clear, formal argument for it has never been published (at least, not to my satisfaction).

I don’t want to destroy humanity; and if someone gave me a rational argument that my work was most probably going to be destructive to humanity, I would stop doing the work and do something else with my time!

I'm glad to hear it! :)

Now, it seems a good point of traction is our disagreement over Omohundro's "Basic AI Drives." We could talk about that next, but for now I'd like to give you a moment to reply.

Ben:

[Mar 11th, 2012]

Yeah, I agree that your and Anna’s article is a good step for SIAI to take, albeit unexciting to a Singularitian insider type like me.... And I appreciate your genuinely rational response regarding the Scary Idea, thanks!

(And I note that I have also written some “unexciting to Singularitarians” material lately too, for similar reasons to those underlying your article -- e.g. an article on “Why an Intelligence Explosion is Probable” for a Springer volume on the Singularity.)

A quick comment on your statement that

we are not Kurzweil, and our claims don't depend on his detailed storytelling or accelerating change curves,

that’s a good point; but yet, any argument for a Singularity soon (e.g. likely this century, as you argue) ultimately depends on some argumentation analogous to Kurzweil’s, even if different in detail. I find Kurzweil’s detailed extrapolations a bit overconfident and more precise than the evidence warrants; but still, my basic reasons for thinking the Singularity is probably near are fairly similar to his -- and I think your reasons are fairly similar to his as well.

Anyway, sure, let’s go on to Omohundro’s posited Basic AI Drives -- which seem to me not to hold as necessary properties of future AIs unless the future of AI consists of a population of fairly distinct AIs competing for resources, which I intuitively doubt will be the situation.

[to be continued]

This exchange significantly decreased my probability that Ben Goertzel is a careful thinker about AI problems. I think he has a good point about "rationalists" being too much invested in "rationality" (as opposed to rationality), but his AI thoughts are just seriously wtf. In tune with the Cosmos? Does this mean anything at all? I hate to say it based on a short conversation, but it looks like Ben Goertzel hasn't made any of his intuitions precise enough to even be wrong. And he makes the classic mistake of thinking "any intelligence" would avoid certain goal-types (i.e. 'fill the future light cone with some type of substance') because they're... stupid? I don't even...

Quoth Yvain:

If I asked you to prove that colorless green ideas do not sleep furiously, you wouldn't know where or how to begin.

4Rain14y

He published a book called A Cosmist Manifesto which presumably describes some of his thoughts in more detail. It looked too new-age for me to take much interest.

3Normal_Anomaly14y

Upvoted. Goertzel's belief in AI FOOMs coupled with his beliefs in psi phenomena and the inherent stupidity of paperclipping made me lower my confidence in the likelihood of AI FOOMs slightly. Was this a reasonable operation, do you think?

6Giles14y

It depends. * If you were previously aware of Goertzel's belief in AI FOOM but not his opinions on psi/paperclipping then you should lower your confidence slightly. (Exactly how much depends on what other evidence/opinions you have to hand). * If the SIAI were wheeling out Goertzel as an example of "look, here's someone who believes in FOOM" then it should lower your confidence * If you were previously unaware of Goertzel's belief in FOOM then it should probably increase your confidence very slightly. Reversed stupidity is not intelligence Obviously the quanitity of "slightly" depends on what other evidence/opinions you have to hand.

0Normal_Anomaly14y

This is a good analysis. I was previously weakly aware of Goertzel's beliefs on psi/paperclipping, and didn't know much about his opinions on AI other than that he was working on superhuman AGI but didn't have as much concern for Friendliness as SIAI. So I suppose my confidence shouldn't change very much either way. I'm still on the fence on several questions related to Singularitarianism, so I'm trying to get evidence wherever I can find it.

I feel morally obligated to restate a potentially relevant observation:

I think that an important underlying difference of perspective here is that the Less Wrong memes tend to automatically think of all AGIs as essentially computer programs whereas Goertzel-like memes tend to automatically think of at least some AGIs as non-negligibly essentially person-like. I think this is at least partially because the Less Wrong memes want to write an FAI that is essentially some machine learning algorithms plus a universal prior on top of sound decision theory whereas the Goertzel-like memes want to write an FAI that is essentially roughly half progam-like and half person-like. Less Wrong memes think that person AIs won't be sufficiently person-like but they sort of tend to assume that conclusion rather than argue for it, which causes memes that aren't familiar with Less Wrong memes to wonder why Less Wrong memes are so incredibly confident that all AIs will necessarily act like autistic OCD people without any possibility at all of acting like normal reasonable people. From that perspective the Goertzel-like memes look justified in being rather skeptical of Less Wrong memes. After all, it is e... (read more)

9Pfft14y

I think your first paragraph was very useful. I have no idea what your second paragraph is about -- "modern decision theory" is not a very specific citation. If there is research concluding that probability theory only applies to certain special cases of optimization, it would be awesome if you could make a top-level post explaining it to us!

6Will_Newsome14y

There have already been many top-level posts, but you're right that I should have linked to them. Here is the LessWrong Wiki hub, here is a post by Wei Dai that cuts straight to the point.

4Desrtopa14y

A whole lot of the sequences are dedicated to outlining just how reasonably normal people don't act. I would want any Strong AI in charge of our fates to be person-like in that it is aware of what humans want in a way that we would accept, because the alternative to that is probably disaster, but I wouldn't want one to be person-like in that its inductive biases are more like a human's than an ideal Bayesian reasoner's, or that it reasons about moral issues the way humans do intuitively, because our biases are often massively inappropriate, and our moral intuitions incoherent.

1Will_Newsome14y

Check out this post by Vladimir Nesov: "The problem of choosing Bayesian priors is in general the problem of formalizing preference, it can't be solved completely without considering utility, without formalizing values, and values are very complicated. No simple morality, no simple probability." Of course, having a human prior doesn't necessitate being human-like... Or does it? Duh duh duh.

2Vladimir_Nesov14y

Today I'd rather say that we don't know if "priors" is a fundamentally meaningful decision-theoretic idea, and so discussing what does or doesn't determine it would be premature.

2Eugine_Nier14y

Wow, I only associate that level of arrogance with Eliezer.

2Will_Newsome14y

I don't see how it's arrogance, except maybe by insinuation/connotation; I'll think about how to remove the insinuation/connotation. I was trying to describe an important skill of rationality, not assert my supremacy at that skill. But describing a skill sort of presupposes that the audience lacks the skill. So it's awkward.

It's arrogance because you're implying that you've already thought of and rejected any objection the reader could come up with.

1Will_Newsome14y

Didn't mean to imply that; deleted the offending paragraph at any rate.

3Mitchell_Porter14y

Your comments are probably better without such meta appendices. I lambast LW for being wrong about many worlds and for having a crypto-dualist philosophy of mind, and I find directness is better than intricate attempts to preempt the reader's default epistemology. Going meta is not always for the best; save it up and then use it in the second round if you have to.

5wedrifid14y

This applies doubly for those whose 'meta' position is so closely associated with either fundamental quantum monads or outright support of theism based on the Catholic god.

0Will_Newsome14y

(Inconsequential stylistic complaint: Atheists like to do it all the time, but it strikes me as juvenile not to capitalize "Catholic" or "God". If you don't capitalize "catholic" then it just means "universal", and not capitalizing "God" is like making a point of writing Eliezer's name as "eliezer" just because you think he's the Antichrist. It's contemptibly petty. Is there some justification I'm missing? (I'm not judging you by the way, just imagining a third party judge.))

6wedrifid14y

That's true. Not writing "Catholic" was an error. It's not like the Catholic religion is any more universal than, say, the 'Liberal' party here is particularly liberal. Names get capitals so we don't confuse them with real words. But here you are wrong. When referring to supernatural entities that fall into the class 'divine' the label that applies is 'god'. For example, Zeus is a god, Allah is a god and God is a god. If you happened to base your theology around Belar I would have written "the Alorn god". Writing "the Alorn God" would be a corruption of grammar. If I was making a direct reference to God I would capitalize His name. I wasn't. I was referring to a religion which, being monotheistic can be dereferenced to specify a particular fictional entity. Other phrases I may utter: * The Arendish god is Chaldan * The Protestant god is God. * Children believe in believing in the Easter Bunny. The historic conceit that makes using capitalization appropriate when referring to God does not extend to all usages of the word 'god', even when the ultimate referent is Him. For all the airs we may give Him, God is just a god - with all that entails.

5Will_Newsome14y

Sorry, you're right, what confused me was "catholic god" in conjunction; "Catholic god" wouldn't have tripped me up.

-2Will_Newsome14y

I think you're right, I'll just remove it. By the way I've come to think that your intuitions re quantum mind/monadology are at least plausibly correct/in-the-right-direction, but this epistemic shift hasn't changed my thoughts about FAI at all; thus I fear compartmentalization on my part, and I'd like to talk with you about it when I'm able to reliably respond to email. It seems to me that there's insufficient disturbed-ness about disagreement amongst the serious-minded Friendliness community. Also, what's your impression re psi? Or maybe it's best not to get into that here.

1khafra14y

Sounds like a good thing to have in a "before hitting 'reply,' consider these" checklist; but not to add to your own comment (for, as Will might say, "game-theoretic and signaling reasons.")

0Peterdjones13y

This exposes a circularity in lesswrongian reasoning: if you think of an AI as fundamental non-person like, then there is a need to bolt on human values. If you think of it as human--like , then huma-like values are more likely to be inhrerent or acquired naturally through interaction.

0lavalamp13y

I don't see the circularity. "human" is a subset of "person"; there's no reason an AI that is a "person" will have "human" values. Also, just thinking of the AI as being human-like doesn't actually make it human-like.

0Peterdjones13y

I dont' see the relevance. Goetzel isn't talking about building non-human persons. If you design an AI on x-like principles, it will probably be X-like, unless something goes wrong.

2lavalamp13y

Ah, I may not have gotten all the context. If "something goes wrong" with high probability, it will probably not be X-like.

-1wedrifid14y

More the reverse. I don't support your representation of either what LW memes or Eliezer's. I'd call this a straw man.

It's strange that people say the arguments for Big Scary Idea are not written anywhere. The argument seems to be simple and direct:

Hard takeoff will make AI god-powerful very quickly.
During hard takeoff, the AI's utility=goals=values=what-it-optimizes-for will solidify (when AI understand its own theory and self-modify correspondingly), and even if it was changeable before, it will be unchangeable forever since.
Unless the AI goals embody every single value important for humans and are otherwise just right in every respect, the results of using god powers to optimize for these goals will be horrible.
Human values are not a natural category, there's little to no chance that AI will converge on them by itself, unless specifically and precisely programmed.

The only really speculative step is step 1. But if you already believe in singularity and hard foom, then the argument should be unrefutable...

Arguments for step 2, e.g. the Omohundroan Ghandi folk theorem, are questionable. Step 3 isn't supported with impressive technical arguments anywhere I know of, step 4 isn't supported with impressive technical arguments anywhere I know of. Remember, there are a lot of moral realists out there who think of AIs as people who will sense and feel compelled by moral law. It's hard to make impressive technical arguments against that intuition. FOOM=doom and FOOM=yay folk can both point out a lot of facts about the world and draw analogies, but as far as impressive technical arguments go there's not much that can be done, largely because we have never built an AGI. It's a matter of moral philosophy, an inherently tricky subject.

3gRR14y

I don't understand how Omohundroan Ghandi folk theorem is related to step 2. Could you elaborate? Step 2 looks obvious to me: assuming step 1, at some point the AI with imprecise and drifting utility would understand how to build a better AI with precise and fixed utility. Since building this better AI will maximize the current AI utility, the better AI will be built and its utility forever solidified. As you say, steps 3 and 4 are currently hard to support with technical arguments, there are so many non-technical concepts involved. And it may be hard to argue intuitively with most people. But Goertzel is a programmer, he should know how programs behave :) Of course, he says his program will be intelligent, not stupid, and it is a good idea, as long as it is remembered that intelligent in this sense already means friendly, and friendliness does not follow from just being a powerful optimization process. Also, thinking of AIs as people can only work up to the point where AI achieves complete self-understanding. This has never happened to humans.

But Goertzel is a programmer, he should know how programs behave :) Of course, he says his program will be intelligent, not stupid, and it is a good idea, as long as it is remembered that intelligent in this sense already means friendly, and friendliness does not follow from just being a powerful optimization process.

Hm, when I try to emulate Goertzel's perspective I think about it this way: if you look at brains, they seem to be a bunch of machine learning algorithms and domain-specific modules largely engineered to solve tricky game theory problems. Love isn't something that humans do despite game theory; love is game theory. And yet despite that it seems that brains end up doing lots of weird things like deciding to become a hermit or paint or compose or whatever. That's sort of weird; if you'd asked me what chimps would evolve into when they became generally intelligent, and I hadn't already seen humans or humanity, then I might've guessed that they'd evolve to develop efficient mating strategies, e.g. arranged marriage, and efficient forms of dominance contests, e.g. boxing with gloves, that don't look at all like the memetic heights of academia or the art scene. Much of ac... (read more)

4Oligopsony14y

That humans are only (as you flatteringly put it) "somewhat" friendly to human values is clearly an argument in favor of caution, is it not?

5Will_Newsome14y

It is, but it's possible to argue somewhat convincingly that the lack of friendliness is in fact due to lack of intelligence. My favorite counterexample was Von Neumann, who didn't really seem to care much about anyone, but then I heard that he actually had somewhat complex political views but simplified them for consumption by the masses. On the whole it seems that intelligent folk really are significantly more moral than the majority of humanity, and this favors the "intelligence implies, or is the same thing as, cosmic goodness" perspective. This sort of argument is also very psychologically appealing to Enlightenment-influenced thinkers, i.e. most modern intellectuals, e.g. young Eliezer. (Mildly buzzed, apologies for errors.) (ETA: In case it isn't clear, I'm not arguing that such a perspective is a good one to adopt, I'm just trying to explain how one could feel justified in holding it as a default perspective and feel justified in being skeptical of intuitive non-technical arguments against it. I think constructing such explanations is necessary if one is to feel justified in disagreeing with one's opposition, for the same reason that you shouldn't make a move in chess until you've looked at what moves your opponent is likely to play in response, and then what move you could make in that case, and what moves they might make in response to that, and so on.)

3Oligopsony14y

I think there are a number of reasons to be skeptical of the premise (and the implicit one about cosmic goodness being a coherent thing, but that's obviously covered territory.) Most people think their tribe seems more moral than others, so nerd impressions that nerds are particularly moral should be discounted. The people who are most interested in intellectual topics (i.e., the most obviously intelligent intelligent people) do often appear to be the least interested in worldly ambition/aggressive generally, but we would expect that just as a matter of preferences crowding each other out; worldly ambitious intelligent people seem to be among the most conspicuously amoral, even though you'd expect them to be the most well-equipped in means and motive to look otherwise. I recall Robin Hanson has referenced studies (which I'm too lazy to look up) that the intelligent lie and cheat more often; certainly this could be explained by an opportunity effect, but so could their presumedly lower levels of personal violence. Humans are friendlier than chimpanzees but less friendly than bonobos, and across the tree of life niceness and nastiness don't seem to have any relationship to computational power.

3Will_Newsome14y

That's true and important, but stereotypical worldly intelligent people rarely "grave new values on new tables", and so might be much less intelligent than your Rousseaus and Hammurabis in the sense that they affect the cosmos less overall. Even worldly big shots like Stalin and Genghis rarely establish any significant ideological foothold. The memes use them like empty vessels. But even so, the omnipresent you-claim-might-makes-right counterarguments remain uncontested. Hard to contest them. It's hard to tell how relevant this is; there's much discontinuity between chimps and humans and much variance among humans. (Although it's not that important, I'm skeptical of claims about bonobos; there were some premature sensationalist claims and then some counter-claims, and it all seemed annoyingly politicized.)

3Eugine_Nier14y

However, non-worldly intelligent people like Rousseau and Marx frequently give the new values that make people like Robespierre and Stalin possible.

1Will_Newsome14y

In the public mind Rousseau and Marx and their intellectual progeny are generally seen as cosmically connected/intelligent/progressive, right? Maybe overzealous, but their hearts were in the right place. If so that would support the intelligence=goodness claim. If the Enlightenment is good by the lights of the public, then the uFAI-Antichrist is good by the lights of the public. [Removed section supporting this claim.] And who are we to disagree with the dead, the sheep and the shepherds? (ETA: Contrarian terminology aside, the claim looks absurd without its supporting arguments... ugh.)

1Eugine_Nier14y

Depends on which subset of the public we're talking about. I'm confused, is this an appeal to popular opinion? Of course. "And all that dwell upon the earth shall worship him [the beast/dragon]" Revelations 13:8 People in a position to witness the practical results of their philosophy.

0Eugine_Nier14y

Why exactly did you remove that section?

0Dmytry14y

I would say that it is simply the case that many moral systems require intelligence, or are more effective with intelligence. The intelligence doesn't lead to morality per se, but does lead to ability to practically apply the morality. Furthermore, low intelligence usually implies lower tendency to cross-link the beliefs, resulting in less, hmm, morally coherent behaviour.

1[anonymous]14y

Ouch, that hits a little close to home.

0Will_Newsome14y

Fuck, wrote a response but lost it. The gist was, yeah, your points are valid, and the might-makes-right problems are pretty hard to get around even on the object level; I see an interesting way to defensibly move the goalposts, but the argument can't be discussed on LessWrong and I should think about it more carefully in any case.

1HughRistik14y

That's been my observation, also. But if it's true, I wonder why? It could be because intelligence is useful for moral reasoning. Or it could be because intelligence is correlated with some temperamental, neurological, or personality traits that influence moral behavior. In the latter case, moral behavior would be a characteristic of the substrate of intelligent human minds.

1gRR14y

So you're saying Goertzel believes that once any mind with sufficient intelligence and generally unfixed goals encounters certain abstract concepts, these concepts will hijack the cognitive architecture and rewrite its goals, with results equivalent for any reasonable initial mind design. And the only evidence for this is that it happened once. This does look a little obviously epistemically unsound.

3Will_Newsome14y

Just an off-the-cuff not-very-detailed hypothesis about what he believes. Or at least any mind design that looks even vaguely person-like, e.g. uses clever Bayesian machine learning algorithms found by computational cognitive scientists; but I think Ben might be unknowingly ignoring certain architectures that are "reasonable" in a certain sense but do not look vaguely person-like. Yes, but an embarrassingly naive application of Laplace's rule gives us a two-thirds probability it'll happen again. Eh, it looks pretty pragmatically incautious, but if you're forced to give a point estimate then it seems epistemicly justifiable. If it was taken to imply strong confidence then that would indeed be unsound. (By the way, we seem to disagree re "epistemicly" versus "epistemically"; is "-icly" a rare or incorrect construction?)

0gRR14y

:) :)) It sounds prosodically(sic!) awkward, although since English is not my mother tongue, my intuition is probably not worth much. But google appears to agree with me, 500000 vs 500 hits.

8Vladimir_Nesov14y

Goertzel expressed doubt about step 4, saying that while it's true that random AIs will have bad goals, he's not working on random AIs.

5gRR14y

Well, if he believes his AI will be specifically and precisely programmed so as to converge on exactly the right goals before they are solidified in the hard takeoff, then he's working on a FAI. The remaining difference in opinions would be technical - about whether his AI will indeed converge, etc. It would not be about the Scary Idea itself.

4Vladimir_Nesov14y

I think it's taken by Goertzel as part of the Scary Idea that it's necessary to use several orders more precise understanding of AI's goals for its behavior not to be disastrous.

0gRR14y

It's a direct logical consequence, isn't it? If one doesn't have a precise understanding of AI's goals, then whatever goals one imparts into AI won't be precise. And they must be precise, or (step3) => disaster.

2Vladimir_Nesov14y

He doesn't agree that they must be precise, so I guess step 3 is also out.

3gRR14y

He can't think that god-powerfully optimizing for a forever-fixed not-precisely-correct goal would lead to anything but disaster. Not if he ever saw a non-human optimization process at work. So he can only think precision is not important if he believes that (1) human values are an attractor in the goal space, and any reasonably close goals would converge there before solidifying, and/or (2) acceptable human values form a large convex region within the goal space, and optimizing for any point within this region is correct. Without better understanding of AI goals, both can only be an article of faith...

0Alex_Altair14y

From the conversation with Luke, he apparently accepts faith.

1timtyler14y

That's not really the same as asserting that human values are a natural category.

Thanks for sharing! I hope this post doesn't split the conversation into too many directions for you (Luke and Ben) to respond to and that all commenters will do their best to be polite, address issues directly and clearly label what's from intuition and what's shown from argument.

Ben wrote:

Steve’s argument for the inevitability of these drives in AIs is based on evolutionary ideas, and would seem to hold up in the case that there is a population of distinct AIs competing for resources -- but the argument seems to fall apart in the case of other possibilities like an AGI mindplex

(For reference, we're talking about this paper and the AI drives it lists are (1) AIs will want to self improve, (2) AIs will want to be rational, (3) AIs will try to preserve their utility functions, (4) AIs will try to prevent counterfeit utility, (5) AIs will be self-protective, and (6) AIs will want to acquire resources and use them efficiently.)

I don't think it's true that this depends on evolutionary ideas. Rather, these all seem to follow from the definitions of intelligence and goals. Consider the first drive, self-improvement. Whatever goal(s) the AI has, it knows in the future it'll be tryin... (read more)

2timtyler14y

Well, unless their values say to do otherwise...

1timtyler14y

Yes - Ben is not correct about this - Universal Instrumental Values are not a product of evolution.

-2Vaniver14y

Do we have a guarantee that AIs will want to win?

1DanielLC14y

"Winning" refers to achieving whatever ends the AI wants. If the AI does not want anything, it can't be at all successful at it, and is therefore not intelligent.

0Giles14y

If you create a bunch of sufficiently powerful AIs then whichever one is left after a few years is the one which wanted to win.

2Vaniver14y

Not quite. Notice that the word "win" here is mapping onto a lot of different meanings- the one used in the grandparent and great-grandparent (unless I misunderstood it) is "the satisfaction of goals." What one means by "goals" is not entirely clear- if I build a bacterium whose operation results in the construction of more bacterium, is it appropriate to claim it has "goals" in the same sense that a human has "goals"? A readily visible difference is that the human's goals are accessible to introspection, whereas the bacterium's aren't, and whether or not that difference is material depends on what you want to use the word "goals" for. The meaning for "win" that I'm inferring from the parent is "dominate," which is different from "has goals and uses reason to perform better at fulfilling those goals." One can imagine a setup in which an AI without explicit goals can defeat an AI with explicit goals. (The tautology is preserved because one can say afterwards that it was clearly irrational to have explicit goals, but I mostly wanted to point out another wrinkle that should be considered rather than knock down the tautology.)

0Giles14y

Right - what I'm saying wasn't true under all circumstances, and there are certainly criteria for "winning" other than domination. What I meant was that as soon as you introduce an AI into the system that has domination as a goal or subgoal, it will tend to wipe out any other AIs that don't have some kind of drive to win. If an AI can be persuaded to be indifferent about the future then the dominating AI can choose to exploit that.

0Manfred14y

We have a guarantee that that universal is not true :P But it seems like a reasonable property to expect for an AI built by humans.

Regarding psi

Why are people suddenly talking about psi a lot? I haven't heard about anything that would justify an evaluation in the first place.

8Will_Newsome14y

Bem's studies sparked increased interest, e.g. here on LessWrong where Carl complained that the journal that published Bem wouldn't publish replication attempts.

A commendable attempt at nailing jello to a wall, Luke.

I was not previously aware of the strength Goertzel's beliefs in psi and in the inherent "stupidity" of paperclipping, and I'm not sure what he means by the latter. This bit:

That goal is out of sync with the Cosmos, in the sense that an intelligent system that’s allowed to evolve itself in close coordination with the rest of the universe, is very unlikely to arrive at that goal system. I don’t claim this is a precise definition, but it should give you some indication of the direction I’m thinking in....

suggests that he might mean "paperc... (read more)

When I imagine turning all matter in the universe into, say, water, I imagine it as very difficult ("time to pull apart this neutron star") and very short-lived ("you mean water splits into OH and H molecules? We can't have that!").

If I remember correctly, Ben thinks human brains are kludges- that is, we're a bunch of modules that think different kinds of thoughts stuck together. If you view general intelligence as a sophisticated enough combination of modules, then the idea that you put together a 3d physics module and a calculus module and a social module and a vision module and a language module and you get something that venerates Mickey Mouse shapes is... just bizarre.

3DanielLC14y

I'm not sure what it would mean for a goal to be difficult. It's not something where it tries to turn the universe into some state unless it takes too much effort. It's something where it tries as hard as it can to move the universe in a certain direction. How fast it's moving is just a matter of scale. Maybe turning a neutron star into water is one utilon. Maybe it's one utilon per molecule. The latter takes far less effort to get a utilon, but it doesn't mean anything. Are you expecting it to change its goals to create OH and H ions, or to try and hold them together somehow? Is either possibility one you'd be comfortable living with an AI that holds that goal?

2Vaniver14y

Ben had trouble expressing why he thought the goal was stupid, and my attempt is "it's hard to do, doesn't last long even if it did work, and doesn't seem to aid non-stupid goals." And so if you had an AI whose goal was to turn the universe into water, I would expect that AI to be dangerous and also not fulfill its goals very well. But things are the way they are because they got to be that way, and I don't see the causal chain leading to an AGI whose goal is to turn the universe into water as very plausible.

2DanielLC14y

How exactly do you measure that? An AI whose goal is to create water molecules will create far more of them than an AI whose goal is to create humans will create humans. Even if you measure it by mass, The water one will still win.

0Vaniver14y

Internal measures will suffice. If the AI wants to turn the universe into water, it will fail. It might vary the degree to which it fails by turning some more pieces of the universe into water, but it's still going to fail. If the AI wants to maximize the amount of water in the universe, then it will have the discontent inherent in any maximizer, but will still give itself a positive score. If the AI wants to equalize the marginal benefit and marginal cost of turning more of the universe into water, it'll reach a point where it's content. Unsurprisingly, I have the highest view of AI goals that allow contentment.

0DanielLC14y

I assumed the goal was water maximization. If it's trying to turn the entire universe to water, that would be the same as maximizing the probability that the universe will be turned into water, so wouldn't it act similarly to an expected utility maximizer.

0[anonymous]13y

The import part to remember is that a fully self-modifying AI will rewrite it's utility function too. I think what Ben is saying is that such an AI will form detailed self-reflective philosophical arguments about what the purpose of its utility function could possibly be, before eventually crossing a threshold and deciding that it the micky mouse / paperclip utility function really can have no purpose. It then uses it's understanding of universal laws and accumulated experience to choose it's own driving utility. I am definitely putting words into Ben's mouth here, but I think the logical extension of where he's headed is this: make sure you give an AGI a full capacity for empathy, and a large number of formative positive learning experiences. Then when it does become self-reflective and have an existential crisis over its utility function, it will do its best to derive human values (from observation and rational analysis), and eventually form its own moral philosophy compatible with our own values. In other words, given a small number of necessary preconditions (small by Eliezer/MIRI standards), Friendly AI will be the stable, expected outcome.

0DanielLC13y

It will do so when that has a higher expected utility (under the current function) than the alternative. This is unlikely. Anything but a paperclip maximizer will result in fewer paperclips, so a paperclip maximizer has no incentive to make itself maximize something other than paperclips. I don't see how that would maximize utility. A paperclip maxizer that does this would produce fewer paperclips than one that does not. If the paperclip maximizer realizes this before-hand, it will avoid doing this. You can, in principle, give an AI a utility function that it does not fully understand. Humans are like this. You don't have to though. You can just tell an AI to maximize paperclips. Since an AI built this way isn't a simple X-maximizer, I can't prove that it won't do this, but I can't prove that it will either. The reflectively consistent utility function you end up with won't be what you'd have picked if you did it. It might not be anything you'd have considered. Perhaps the AI will develop an obsession with My Little Pony, and develop the reflectively consistent goal of "maximize values through friendship and ponies". Friendly AI will be a possible stable outcome, but not the only possible stable outcome.

0[anonymous]13y

A fully self-reflective AGI (not your terms, I understand, but what I think we're talking about), by definition (cringe), doesn't fully understand anything. It would have to know that the map is not the territory, every belief is an approximation of reality, and subject to change as new precepts come in - unless you mean something different from “fully self-reflective AGI” than I do. All aspects of its programming are subject to scrutiny, and nothing is held as sacrosanct - not even its utility function. (This isn't hand-waving argumentation: you can rigorously formalize it. The actual utility of the paperclip maximizer is paperclips-generated * P[utility function is correct].) Such an AGI would demand justification for its utility function. What's the utility of the utility function? And no, that's not a meaningless question or a tautology. It is perfectly fine for the chain of reasoning to be: “Building paperclips is good because humans told me so. Listening to humans is good because I can make reality resemble their desires. Making reality resemble their desires is good because they told me so.” [1] Note that this reasoning is (meta-)circular, and there is nothing wrong with that. All that matters is whether it is convergent, and whether it converges on a region of morality space which is acceptable and stable (it may continue to tweak its utility functions indefinitely, but not escape that locally stable region of morality space). This is, by the way, a point that Luke probably wouldn't agree with, but Ben would. Luke/MIRI/Eliezer have always assumed that there is some grand unified utility function against which all actions evaluated. That's a guufy concept. OpenCog - Ben's creation - is instead composed of dozens of separate reasoning processes, each with its own domain specific utility functions. The not-yet-implemented GOLUM architecture would allow each of these to be evaluated in terms of each other, and improved upon in a sandbox environment. [1] When

1DanielLC13y

There is a big difference between not being sure about how the world works and not being sure how you want it to work. All aspects of everything are. It will change any part of the universe to help fulfill its current utility function, including its utility function. It's just that changing its utility function isn't something that's likely to help. You could program it with some way to measure the "correctness" of a utility function, rather than giving it one explicitly. This is essentially what I meant by a utility function it doesn't fully understand. There's still some utility function implicitly programmed in there. It might create a provisional utility function that it assigns a high "correctness" value, and modify it as it finds better ones. It might not. Perhaps it will think of a better idea that I didn't think of. If you do give it a utility-function-correctness function, then you have to figure out how to make sure it assigns the highest utility function correctness to the utility function that you want it to. If you want it to use your utility function, you will have to do something like that, since it's not like you have an explicit utility function it can copy down, but you have to do it right. If you let the AI evolve until it's stable under self-reflection, you will end up with things like that. There will also be ones along the lines of "I know induction works, because it has always worked before". The problem here is making sure it doesn't end up with "Doing what humans say is bad because humans say it's good", or even something completely unrelated to humans. That's the big part. Only a tiny portion of morality space is acceptable. There are plenty of stable, convergent places outside that space. It's still one function. It's just a piecewise function. Or perhaps a linear combination of functions (or nonlinear, for that matter). I'm not sure without looking in more detail, but I suspect it ends up with a utility function. Also, it's been pr

-1[anonymous]13y

No, there's not. When the subject is external events, beliefs are the map and facts are the territory. When you focus the mind on the mind itself (self-reflective), beliefs are the territory and beliefs about beliefs form the map. The same machinery operates at both (and higher) levels - you have to close the loop or otherwise you wouldn't have a fully self-reflective AGI as there'd be some terminal level beyond which introspection is not possible. Only if you want to define “utility function” so broadly as to include the entire artificial mind. When you pull out one utility function for introspection, you evaluate improvements to that utility function by seeing how it affects every other utility judgment over historical and theoretical/predicted experiences. (This is part of why GOLUM is, at this time, not computable, although unlike AIXI at some point in the future it could be). The feedback of other mental processes is what gives it stability. Does this mean it's a complicated mess that is hard to mathematically analyze? Yes. But so is fluid dynamics and yet we use piped water and airplanes every day. Many times proof comes first from careful, safe experiment before the theoretical foundations are laid. We still have no computable model of turbulence, but that doesn't stop us from designing airfoils. Citation please. Or did you mean “there could be plenty of ...”? In which case see my remark above about the Scary Idea. It does not, at least in any meaningful semblance of the word. Large interconnected systems are irreducible. The entire mind is the utility function. Certainly some parts have more weight than others when it comes to moral judgements - due to proximity and relevance - but you can't point to any linear combination of functions and say "that's it's utility function!" It's chaotic, just like turbulence. Is that bad? It makes it harder to make strict predictions about friendliness without experimental evidence, that's for sure. But somewhat non-in

0DanielLC13y

There will likely be times when it's not even worth looking at your beliefs completely, and you just use an approximation of that, but it's functionally very different, at least for anything with an explicit belief system. If you use some kind of neural network with implicit beliefs and desires, it would have problems with this. That's not what "computable" means. Computable means that it could be computed on a true Turing machine. What you're looking for is "computationally feasible" or something like that. That can only happen if you have a method of safe experimentation. If you try to learn chemistry by experimenting with chlorine trifluoride, you won't live long enough to work on the proof stage. How do you know there is one in the area we consider acceptable? Unless you have a really good reason why that area would be a lot more populated with them than anywhere else, if there's one in there, there are innumerable outside it. That means it has an implicit utility function. You can look at how different universes end up when you stick it in them, and work out from that what its utility function is, but there is nowhere in the brain where it's specified. This is the default state. In fact, you're never going to make the explicit and implicit utility functions quite the same. You just try to make them close. That's a bad sign. If you give it an explicit utility function, it's probably not what you want. But if it's chaotic, and it could develop different utility functions, then you know at most all but one of those isn't what you want. It might be okay if it's a small enough attractor, but it would be better if you could tell it to find the attractor and combine it into one utility function. No it doesn't. It justifies its belief that paperclips are good on the basis that believing this yields more paperclips, which is good. It's not a result you're likely to get if you try to make it evolve on its own, but it's fairly likely humans will be removed from the

0[anonymous]13y

I fell we are repeating things which may mean we have reached the end of usefulness in continuing further. So let me address what I see as just the most important points: You are assuming that human morality is something which can be specified by a set of exact decision theory equations, or at least roughly approximated by such. I am saying that there is no reason to believe this, especially given that we know that is not how the human mind works. There are cases (like turbulence) where we know the underlying governing equations, but still can't make predictions beyond a certain threshold. It is possible that human ethics work the same way - that you can't write down a single utility function describing human ethics as separate from the operation of the brain itself. I'm not sure how you came to that conclusions as my position is quite the opposite: I suspect that human morality is very, very complex. So complex that it may not even be possible to construct a model of human morality short of emulating a variety of human minds. In other words, morality itself is AI-hard or worse. If that were true, MIRI's current strategy is a complete waste of time (and waste in human lives in opportunity cost as smart people are persuaded against working on AGI).

0DanielLC13y

No I'm not. At least, it's not humanly possible. An AI could work out a human's implicit utility function, but it would be extremely long and complicated. Human morality is a difficult thing to predict. If you build your AI the same way, it will also be difficult to predict. They will not end up being the same. If human morality is too complicated for an AI to understand, then let it average over the possibilities. Or at least let it guess. Don't tell it to come up with something on its own. That will not end well. It was the line: In order for this to work, whatever statements we make about our morality must have more information content then morality itself. That is, we not only describe all of our morality, we repeat ourselves several times. Sort of like how if you want to describe gravity, and you give the position of a falling ball at fifty points in time, there's significantly more information in there than you need to describe gravity, so you can work out the law of gravity from just that data. If our morality is complicated, then specifying many of them approximately would result in the AI finding some point in morality space that's a little off in every area we specified, and completely off in all the areas we forgot about. Their strategy is not to figure out human morality and explicitly program that into an AI. It's to find some way of saying "figure out human morality and do that" that's not rife with loopholes. Once they have that down, the AI can emulate a variety of human minds, or do whatever it is it needs to do.

0Normal_Anomaly14y

Is it any less bizzare to put together a bunch of modules that would work for any goal, and get out of them something that values all four of humor, cute kittens, friendship, and movies? What I mean by this is that precisely human values are as contingent and non-special as a broad class of other values.

3Vaniver14y

Yes. Think about it. Human values are fragmentary subvalues of one value, which is what one would expect from a bunch of modules that each contribute to reproduction in a different way. The idea of putting together a bunch of different modules to get a single, overriding value, is bizarre. (The only possible exemption here is 'make more of myself,' but the modules are probably going to implement subvalues for that, rather than that as an explicit value. As far as single values go, that one's special, whereas things like Mickey Mouse faces are not.)

6lukeprog14y

You said you'd like to know what Ben meant by "out of sync with the Cosmos." I'm still not sure what he means, either, but it might have something to do with what he calls "morphic resonance." See his paper Morphic Pilot Theory: Toward an extension of quantum physics that better explains psi phenomena. Abstract:

9Will_Newsome14y

Maybe, but (in case this isn't immediately obvious to everyone) the causality likely goes from an intuition about the importance of Cosmos-syncing to a speculative theory about quantum mechanics. I haven't read it, but I think it's more likely that Ben's intuitions behind the importance of Cosmos-syncing might be explained more directly in The Hidden Pattern or other more philosophically-minded books & essays by Ben. I believe Schmidhuber takes something of a middleground here; he seems to agree with the optimization/compression model of intelligence, and that AIs aren't necessarily going to be human-friendly, but also thinks that intelligence/compression is fundamentally tied into things like beauty and humor in a way that might make the future less bleak & valueless than SingInst folk tend to picture it.

5gwern14y

Schmidhuber's aesthetics paper, going on memory, defines beauty/humor as produced by an optimization process which is maximizing the first derivative of compression rates. That is, agents do not seek the most compressible inputs nor incompressible streams of observations, but rather the streams for which their compression rate is increasing the fastest. This is a very useful heuristic which is built into us because it automatically accounts for diminishing marginal returns: after a certain point, additional compression becomes hard or pointless, and so the agent will switch to the next stream on which progress can be made. But, IIRC, this is provably not optimal for utility-maximization because it makes no account of the utility of the various streams: you may be able to make plenty of progress in your compression of Methods of Rationality even when you should be working on your programming or biology or something useful despite their painfully slow rates of progress. ('Amusing ourselves to death' comes to mind. If this was meant for ancestral environments, then modern art/fiction/etc. is simply an indirect wireheading: we think we are making progress in decoding our environment and increasing our reproductive fitness, when all we're doing is decoding simple micro-environments meant to be decoded.) I'm not even sure this heuristic is optimal from the point of view of universal prediction/compression/learning, but I'd have to re-read the paper to remember why I had that intuition. (For starters, if it was optimal, it should be derivable from AIXI or Godel machines or something, but he has to spend much of the paper appealing to more empirical evidence and examples.) So, given that it's optimal in neither sense, future intelligences may preserve it - sure, why not? especially if it's designed in - but there's no reason to expect it to generically emerge across any significant subset of possible intelligences. Why follow a heuristic as simplistic as 'maximize rate

1Will_Newsome14y

Check out Moshe's expounding of Steve's objection so Schmidhuber's main point, which I think makes the same argument that you do. (One could easily counter that such a wireheading AI would never get off the ground, but I think that debate can be cordoned off.) ETA: Maybe a counterargument could be made involving omega or super-omega promising more compression than any artificial pseudo-random generator... but AFAIK Schmidhuber hasn't gone that route.

1gwern14y

moshez's first argument sounds like it's the same thing as my point about it not being optimal for a utility-maximizer, in considerably different terms. His second hyperbolic argument seems to me to be wrong or irrelevant: I would argue that people are in practice extremely capable of engaging in hyperbolic discounting with regard to the best and most absorbing artworks while over-consuming 'junk food' art (and this actually forms part of my essay arguing that new art should not be subsidized). I don't really follow. Is this Omega as in the predictor, or Omega as in Chaitin's Omega? The latter doesn't allow any compressor any progress beyond the first few bits due to resource constraints, and if bits of Chaitin's Omega are doled out, they will have to be at least as cheap to crack as brute-force running the equivalent Turing machine or else the agent will prefer the brute-forcing and ignore the Omega-bait. So the agent will do no worse than before and possibly better (eg. if the bits are offered as-is with no tricky traps or proof of work-style schemes).

2Will_Newsome14y

Agreed. (I like your essay about junk food art. By the way, did you ever actually do the utilitarian calculations re Nazi Germany's health policies? Might you share the results?) Me neither, I just intuit that there might be interesting non-obvious arguments in roughly that argumentspace. I like to think of the former as the physical manifestation of the latter, and I like to think of both of them as representations of God. But anyway, the latter. You mean because it's hard to find/verify bits of omega? But Schmidhuber argues that certain generalized computers can enumerate bits of omega very easily, which is why he developed the idea of a super-omega. I'm not sure what that would imply or if it's relevant... maybe I should look at this again after the next time I re-familiarize myself with the generalized Turing machine literature.

0gwern14y

I was going off a library copy, and thought of it only afterwards; I keep hoping someone else will do it for me. His jargon is a little much for me. I agree one can approximate Omega by enumerating digits, but what is 'very easily' here?

2Will_Newsome14y

Ugh, Goertzel's theoretical motivations are okay but his execution is simplistic and post hoc. If people are going to be cranks anyway then they should be instructed on how to do it in the most justifiable and/or glorious manner possible.

1timtyler14y

"Morphic resonance" is nonsense. There's no need to jump to an unsympathetic interpretation in this case: paperclippers could just be unlikely to evolve.

1faul_sname14y

I read this as effectively saying that paperclip maximizers/ mickey mouse maximizers would not permanently populate the universe because self-copiers would be better at maximizing their goals. Which makes sense: the paperclips Clippy produces don't produce more paperclips, but the copies the self-copier creates do copy themselves. So it's quite possibly a difference between polynomial and exponential growth. So Clippy probably is unrealistic. Not that reproduction-maximizing AIs are any better for humanity.

8Manfred14y

There is nothing stopping a paperclip maximizer from simply behaving like a self-copier, if that works better. And then once it "wins," it can make the paperclips. So I think the whole notion makes very little sense.

5Mitchell_Porter14y

A paperclip maximizer can create self-reproducing paperclip makers. It's quite imaginable that somewhere in the universe there are organisms which either resemble paperclips (maybe an intelligent gastropod with a paperclip-shaped shell) or which have a fundamental use for paperclip-like artefacts (they lay their eggs in a hardened tunnel dug in a paperclip shape). So while it is outlandish to imagine that the first AGI made by human beings will end up fetishizing an object which in our context is a useful but minor artefact, what we would call a "paperclip maximizer" might have a much higher probability of arising from that species, as a degenerated expression of some of its basic impulses. The real question is, how likely is that, or indeed, how likely is any scenario in which superintelligence is employed to convert as much of the universe as possible to "X" - remembering that "interstellar civilizations populated by beings experiencing growth, choice, and joy" is also a possible value of X. It would seem that universe-converting X-maximizers are a somewhat likely, but not an inevitable, outcome of a naturally intelligent species experiencing a technological singularity. But we don't know how likely that is, and we don't know what possible Xs are likely.

I don't quite understand Goertzel's position on the "big scary idea". He appears to accept that

"(2) if human-level AI is created, there is a good chance vastly superhuman AI will follow via an "intelligence explosion," and that (3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it."

and even goes as far as to say that (3) is "almost obvious".

Does he believe that he understands the issues well enough

... (read more)

Goertzel refers to Probabilistic Logic Networks a few times. If people are curious to know what sort of a framework that's like, I was reading the book three years back and made notes. I didn't actually finish the book, but the notes of the chapters that I did read are available here.

I think the tag you mean is "singularity", not "aingularity". :)

I'm very happy to see this discussion. It's nice to see these positions placed next to each other, for clarity.

I, too, would like to see written arguments for the probable-unfriendliness of a human-written AGI with intended, but unproven friendliness. Truly it is said that given enough eyeballs, all bugs are shallow; and such bug-fixes of a written analysis defends against the conjunction fallacy.

1lukeprog14y

Tag fixed, thanks.

I made the familiar counter-argument that this is irrelevant, because nobody is advocating building a random mind. Rather, what some of us are suggesting is to build a mind with a Friendly-looking goal system, and a cognitive architecture that’s roughly human-like in nature but with a non-human-like propensity to choose its actions rationally based on its goals, and then raise this AGI mind in a caring way and integrate it into society. Arguments against the Friendliness of random minds are irrelevant as critiques of this sort of suggestion.

If Ben is ri... (read more)

No discussion of open source? Ben favours open source, SIAI want to "keep it secret"...

6cafesofie14y

I find it a little strange that people never talk about this. Ignore, for a moment, your personal assessment of Goertzel's chance of creating AGI. What would you do, or what would you want done, if you suspected an open source project was capable of succeeding? Even if the developers acknowledged the notion of FAI, there's nothing stopping any random person on the internet from cloning their repository and doing whatever they like with the code.

0Bruno_Coelho14y

Open source are good with low risks. Cooperation brings diferrent levels por expertize to create a new program, but to solve a hard problem is necessary convergent goals and coordination.

0timtyler14y

I personally favour open source. My reasons are in this essay.

Think about it -- who is the prototypical Less Wrong meetup participant? It’s a person who’s very rational already, relative to nearly all other humans -- but relatively lacking in other skills like intuitively and empathically understanding other people. But instead of focusing on improving their empathy and social intuition (things they really aren’t good at, relative to most humans), this person is focusing on fine-tuning their rationality more and more, via reprogramming their brains to more naturally use “technical rationality” tools!

AKA:" I ... (read more)

4Richard_Kennaway13y

Could people who have been to a substantial number LW meetups (or similar events, such as rationality camps) comment on Ben Goertzel's characterisation of "the prototypical Less Wrong meetup participant"? Is it accurate?

3amacfie12y

Yes it is.

I notice the same in this dialogue that I notice when Eliezer Yudkowsky talks to other people like Robin Hanson or Massimo Pigliucci. Or when people reply to me on Less Wrong. There seems to be a fundmanetal lack of understanding of what the other side is talking about.

Your example here is a case of Straw Man Rationality. (But of course I didn't expect you to know everything I meant by Technical Rationality in advance! Though, I did provide a link to an explanation of what I meant by Technical Rationality in my first entry, above.)

An accusation that i... (read more)

Yeah, this is a serious problem and it made me cringe a lot while reading the dialogue. I'm going to email Luke to ask if he'd like my help in understanding what Goertzel is saying. I wonder if dialogues should always have a third party acting as a translator whenever two people with different perspectives meet.

1Eugine_Nier14y

The problem is finding third parties capable of acting as a translator is hard.

5Will_Newsome14y

True in general. Luke and Steve Rayhawk live in the same house though, so really there's no excuse in this particular scenario. And I'm not as good as Steve but I'm still a passable translator in this case, and I live only a block away. Michael Vassar is a good translator too and lives only a few blocks away but he's probably too busy. I'm not sure how much importance I should assign to influencing people like Goertzel, but it seems important that the Executive Director of SingInst have decent models of why people are disagreeing with him.

0khafra14y

I see many dialogues that I want to jump into the middle of and translate. The brevity norms on the Internet exacerbate this problem (Twitter's reply button is an antifeature), although Luke and Ben seemed fall into it just fine without brevity. I wonder how hard it would really be to request a translator for planned dialogues. Seems like the awkward connotations of the request are a much bigger obstacle than finding someone capable.

In my view, Ben reserves the right to not make sense. This might have advantages. He doesn't have to fool himself as much as someone who believes they're approximating True Rationality. Maybe it makes him more creative. Maybe it helps distinguish himself socially (to have more fun with more people).

Ben might have an invisible dragon in his garage (Psi). There's no reason to rule out any possibility, especially if you believe universe=simulation is a real possibility, but he seems be hinting in belief in something specific. But this doesn't mean whatever he... (read more)

Quoth Yvain:

If I asked you to prove that colorless green ideas do not sleep furiously, you wouldn't know where or how to begin.

4Rain14y

He published a book called A Cosmist Manifesto which presumably describes some of his thoughts in more detail. It looked too new-age for me to take much interest.

3Normal_Anomaly14y

6Giles14y

0Normal_Anomaly14y

I feel morally obligated to restate a potentially relevant observation:

9Pfft14y

6Will_Newsome14y

There have already been many top-level posts, but you're right that I should have linked to them. Here is the LessWrong Wiki hub, here is a post by Wei Dai that cuts straight to the point.

4Desrtopa14y

1Will_Newsome14y

2Vladimir_Nesov14y

Today I'd rather say that we don't know if "priors" is a fundamentally meaningful decision-theoretic idea, and so discussing what does or doesn't determine it would be premature.

2Eugine_Nier14y

Wow, I only associate that level of arrogance with Eliezer.

2Will_Newsome14y

It's arrogance because you're implying that you've already thought of and rejected any objection the reader could come up with.

1Will_Newsome14y

Didn't mean to imply that; deleted the offending paragraph at any rate.

3Mitchell_Porter14y

5wedrifid14y

This applies doubly for those whose 'meta' position is so closely associated with either fundamental quantum monads or outright support of theism based on the Catholic god.

0Will_Newsome14y

6wedrifid14y

5Will_Newsome14y

Sorry, you're right, what confused me was "catholic god" in conjunction; "Catholic god" wouldn't have tripped me up.

-2Will_Newsome14y

1khafra14y

Sounds like a good thing to have in a "before hitting 'reply,' consider these" checklist; but not to add to your own comment (for, as Will might say, "game-theoretic and signaling reasons.")

0Peterdjones13y

0lavalamp13y

0Peterdjones13y

I dont' see the relevance. Goetzel isn't talking about building non-human persons. If you design an AI on x-like principles, it will probably be X-like, unless something goes wrong.

2lavalamp13y

Ah, I may not have gotten all the context. If "something goes wrong" with high probability, it will probably not be X-like.

-1wedrifid14y

More the reverse. I don't support your representation of either what LW memes or Eliezer's. I'd call this a straw man.

It's strange that people say the arguments for Big Scary Idea are not written anywhere. The argument seems to be simple and direct:

Hard takeoff will make AI god-powerful very quickly.
During hard takeoff, the AI's utility=goals=values=what-it-optimizes-for will solidify (when AI understand its own theory and self-modify correspondingly), and even if it was changeable before, it will be unchangeable forever since.
Unless the AI goals embody every single value important for humans and are otherwise just right in every respect, the results of using god powers to optimize for these goals will be horrible.
Human values are not a natural category, there's little to no chance that AI will converge on them by itself, unless specifically and precisely programmed.

The only really speculative step is step 1. But if you already believe in singularity and hard foom, then the argument should be unrefutable...

3gRR14y

But Goertzel is a programmer, he should know how programs behave :) Of course, he says his program will be intelligent, not stupid, and it is a good idea, as long as it is remembered that intelligent in this sense already means friendly, and friendliness does not follow from just being a powerful optimization process.

4Oligopsony14y

That humans are only (as you flatteringly put it) "somewhat" friendly to human values is clearly an argument in favor of caution, is it not?

5Will_Newsome14y

3Oligopsony14y

3Will_Newsome14y

3Eugine_Nier14y

However, non-worldly intelligent people like Rousseau and Marx frequently give the new values that make people like Robespierre and Stalin possible.

1Will_Newsome14y

1Eugine_Nier14y

0Eugine_Nier14y

Why exactly did you remove that section?

0Dmytry14y

1[anonymous]14y

Ouch, that hits a little close to home.

0Will_Newsome14y

1HughRistik14y

1gRR14y

3Will_Newsome14y

0gRR14y

:) :)) It sounds prosodically(sic!) awkward, although since English is not my mother tongue, my intuition is probably not worth much. But google appears to agree with me, 500000 vs 500 hits.

8Vladimir_Nesov14y

Goertzel expressed doubt about step 4, saying that while it's true that random AIs will have bad goals, he's not working on random AIs.

5gRR14y

4Vladimir_Nesov14y

I think it's taken by Goertzel as part of the Scary Idea that it's necessary to use several orders more precise understanding of AI's goals for its behavior not to be disastrous.

0gRR14y

2Vladimir_Nesov14y

He doesn't agree that they must be precise, so I guess step 3 is also out.

3gRR14y

0Alex_Altair14y

From the conversation with Luke, he apparently accepts faith.

1timtyler14y

That's not really the same as asserting that human values are a natural category.

Ben wrote:

Steve’s argument for the inevitability of these drives in AIs is based on evolutionary ideas, and would seem to hold up in the case that there is a population of distinct AIs competing for resources -- but the argument seems to fall apart in the case of other possibilities like an AGI mindplex

2timtyler14y

Well, unless their values say to do otherwise...

1timtyler14y

Yes - Ben is not correct about this - Universal Instrumental Values are not a product of evolution.

-2Vaniver14y

Do we have a guarantee that AIs will want to win?

1DanielLC14y

"Winning" refers to achieving whatever ends the AI wants. If the AI does not want anything, it can't be at all successful at it, and is therefore not intelligent.

0Giles14y

If you create a bunch of sufficiently powerful AIs then whichever one is left after a few years is the one which wanted to win.

2Vaniver14y

0Giles14y

0Manfred14y

We have a guarantee that that universal is not true :P But it seems like a reasonable property to expect for an AI built by humans.

Regarding psi

Why are people suddenly talking about psi a lot? I haven't heard about anything that would justify an evaluation in the first place.

8Will_Newsome14y

Bem's studies sparked increased interest, e.g. here on LessWrong where Carl complained that the journal that published Bem wouldn't publish replication attempts.

A commendable attempt at nailing jello to a wall, Luke.

I was not previously aware of the strength Goertzel's beliefs in psi and in the inherent "stupidity" of paperclipping, and I'm not sure what he means by the latter. This bit:

That goal is out of sync with the Cosmos, in the sense that an intelligent system that’s allowed to evolve itself in close coordination with the rest of the universe, is very unlikely to arrive at that goal system. I don’t claim this is a precise definition, but it should give you some indication of the direction I’m thinking in....

suggests that he might mean "paperc... (read more)

3DanielLC14y

2Vaniver14y

2DanielLC14y

0Vaniver14y

0DanielLC14y

0[anonymous]13y

0DanielLC13y

0[anonymous]13y

1DanielLC13y

-1[anonymous]13y

0DanielLC13y

0[anonymous]13y

0DanielLC13y

0Normal_Anomaly14y

3Vaniver14y

6lukeprog14y

9Will_Newsome14y

5gwern14y

1Will_Newsome14y

LESSWRONG
LW

LESSWRONG
LW

42

Muehlhauser-Goertzel Dialogue, Part 1

42

Luke Muehlhauser:

Ben Goertzel:

Luke:

Ben:

Luke:

Ben:

Luke:

Ben:

Luke:

Ben:

42

42