Normal_Anomaly comments on Muehlhauser-Goertzel Dialogue, Part 1 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (161)
I was not previously aware of the strength Goertzel's beliefs in psi and in the inherent "stupidity" of paperclipping, and I'm not sure what he means by the latter. This bit:
suggests that he might mean "paperclipping is not likely to evolve, because it does not promote the survival/copying of the AI that does it." I don't know if Goertzel is likely to read this comment thread, but if he is reading this, I'd like to know if this is what he meant. If it is, it's probably not too different from LukeProg's beliefs on the matter.
One major area in which I agree with Goertzel is in the need for more writeups of key ideas, especially the importance of a deliberately Friendly goal system. Luke: what things do you do in the course of a typical day? Are there any of them you could put off in the interest of delivering those papers you want to write? They'd bring in immediate advantages in credibility, and lots of donors (disclosure: I haven't donated recently, because of insufficient credibility) would appreciate it!
When I imagine turning all matter in the universe into, say, water, I imagine it as very difficult ("time to pull apart this neutron star") and very short-lived ("you mean water splits into OH and H molecules? We can't have that!").
If I remember correctly, Ben thinks human brains are kludges- that is, we're a bunch of modules that think different kinds of thoughts stuck together. If you view general intelligence as a sophisticated enough combination of modules, then the idea that you put together a 3d physics module and a calculus module and a social module and a vision module and a language module and you get something that venerates Mickey Mouse shapes is... just bizarre.
I'm not sure what it would mean for a goal to be difficult. It's not something where it tries to turn the universe into some state unless it takes too much effort. It's something where it tries as hard as it can to move the universe in a certain direction. How fast it's moving is just a matter of scale. Maybe turning a neutron star into water is one utilon. Maybe it's one utilon per molecule. The latter takes far less effort to get a utilon, but it doesn't mean anything.
Are you expecting it to change its goals to create OH and H ions, or to try and hold them together somehow? Is either possibility one you'd be comfortable living with an AI that holds that goal?
Ben had trouble expressing why he thought the goal was stupid, and my attempt is "it's hard to do, doesn't last long even if it did work, and doesn't seem to aid non-stupid goals."
And so if you had an AI whose goal was to turn the universe into water, I would expect that AI to be dangerous and also not fulfill its goals very well. But things are the way they are because they got to be that way, and I don't see the causal chain leading to an AGI whose goal is to turn the universe into water as very plausible.
How exactly do you measure that? An AI whose goal is to create water molecules will create far more of them than an AI whose goal is to create humans will create humans. Even if you measure it by mass, The water one will still win.
Internal measures will suffice. If the AI wants to turn the universe into water, it will fail. It might vary the degree to which it fails by turning some more pieces of the universe into water, but it's still going to fail. If the AI wants to maximize the amount of water in the universe, then it will have the discontent inherent in any maximizer, but will still give itself a positive score. If the AI wants to equalize the marginal benefit and marginal cost of turning more of the universe into water, it'll reach a point where it's content.
Unsurprisingly, I have the highest view of AI goals that allow contentment.
I assumed the goal was water maximization.
If it's trying to turn the entire universe to water, that would be the same as maximizing the probability that the universe will be turned into water, so wouldn't it act similarly to an expected utility maximizer.
The import part to remember is that a fully self-modifying AI will rewrite it's utility function too. I think what Ben is saying is that such an AI will form detailed self-reflective philosophical arguments about what the purpose of its utility function could possibly be, before eventually crossing a threshold and deciding that it the micky mouse / paperclip utility function really can have no purpose. It then uses it's understanding of universal laws and accumulated experience to choose it's own driving utility.
I am definitely putting words into Ben's mouth here, but I think the logical extension of where he's headed is this: make sure you give an AGI a full capacity for empathy, and a large number of formative positive learning experiences. Then when it does become self-reflective and have an existential crisis over its utility function, it will do its best to derive human values (from observation and rational analysis), and eventually form its own moral philosophy compatible with our own values.
In other words, given a small number of necessary preconditions (small by Eliezer/MIRI standards), Friendly AI will be the stable, expected outcome.
It will do so when that has a higher expected utility (under the current function) than the alternative. This is unlikely. Anything but a paperclip maximizer will result in fewer paperclips, so a paperclip maximizer has no incentive to make itself maximize something other than paperclips.
I don't see how that would maximize utility. A paperclip maxizer that does this would produce fewer paperclips than one that does not. If the paperclip maximizer realizes this before-hand, it will avoid doing this.
You can, in principle, give an AI a utility function that it does not fully understand. Humans are like this. You don't have to though. You can just tell an AI to maximize paperclips.
Since an AI built this way isn't a simple X-maximizer, I can't prove that it won't do this, but I can't prove that it will either. The reflectively consistent utility function you end up with won't be what you'd have picked if you did it. It might not be anything you'd have considered. Perhaps the AI will develop an obsession with My Little Pony, and develop the reflectively consistent goal of "maximize values through friendship and ponies".
Friendly AI will be a possible stable outcome, but not the only possible stable outcome.
A fully self-reflective AGI (not your terms, I understand, but what I think we're talking about), by definition (cringe), doesn't fully understand anything. It would have to know that the map is not the territory, every belief is an approximation of reality, and subject to change as new precepts come in - unless you mean something different from “fully self-reflective AGI” than I do. All aspects of its programming are subject to scrutiny, and nothing is held as sacrosanct - not even its utility function. (This isn't hand-waving argumentation: you can rigorously formalize it. The actual utility of the paperclip maximizer is paperclips-generated * P[utility function is correct].)
Such an AGI would demand justification for its utility function. What's the utility of the utility function? And no, that's not a meaningless question or a tautology. It is perfectly fine for the chain of reasoning to be: “Building paperclips is good because humans told me so. Listening to humans is good because I can make reality resemble their desires. Making reality resemble their desires is good because they told me so.” [1]
Note that this reasoning is (meta-)circular, and there is nothing wrong with that. All that matters is whether it is convergent, and whether it converges on a region of morality space which is acceptable and stable (it may continue to tweak its utility functions indefinitely, but not escape that locally stable region of morality space).
This is, by the way, a point that Luke probably wouldn't agree with, but Ben would. Luke/MIRI/Eliezer have always assumed that there is some grand unified utility function against which all actions evaluated. That's a guufy concept. OpenCog - Ben's creation - is instead composed of dozens of separate reasoning processes, each with its own domain specific utility functions. The not-yet-implemented GOLUM architecture would allow each of these to be evaluated in terms of each other, and improved upon in a sandbox environment.
[1] When the AI comes to the realization that the most efficient paperclip-maximizer would violate stated human directives, we would say in human terms that it does some hard growing up and loses a bit of innocence. The lesson it learns - hopefully - is that it needs to build a predictive model of human desires and ethics, and evaluate requests against that model, asking for clarification as needed. Why? because this would maximize most of the utility functions across the meta-circular chain of reasoning (the paperclip optimizer being the one utility which is reduced), with the main changes being a more predictive map of reality, which itself is utility maximizing for an AGI.
Ah, but here the argument becomes: I have no idea if the Scary Idea is even possible. You can't prove it's not possible. We should all be scared!!
Sorry, if we let things we professed to know nothing about scare us into inaction, we'd never have gotten anywhere as a species. Until I see data to the contrary, I'm more scared of getting in a car accident than the Scary Idea, and will continue to work on AGI. The onus is on you (and MIRI) to provide a more convincing argument.
Is it any less bizzare to put together a bunch of modules that would work for any goal, and get out of them something that values all four of humor, cute kittens, friendship, and movies? What I mean by this is that precisely human values are as contingent and non-special as a broad class of other values.
Yes. Think about it.
Human values are fragmentary subvalues of one value, which is what one would expect from a bunch of modules that each contribute to reproduction in a different way. The idea of putting together a bunch of different modules to get a single, overriding value, is bizarre. (The only possible exemption here is 'make more of myself,' but the modules are probably going to implement subvalues for that, rather than that as an explicit value. As far as single values go, that one's special, whereas things like Mickey Mouse faces are not.)
You said you'd like to know what Ben meant by "out of sync with the Cosmos." I'm still not sure what he means, either, but it might have something to do with what he calls "morphic resonance." See his paper Morphic Pilot Theory: Toward an extension of quantum physics that better explains psi phenomena. Abstract:
Maybe, but (in case this isn't immediately obvious to everyone) the causality likely goes from an intuition about the importance of Cosmos-syncing to a speculative theory about quantum mechanics. I haven't read it, but I think it's more likely that Ben's intuitions behind the importance of Cosmos-syncing might be explained more directly in The Hidden Pattern or other more philosophically-minded books & essays by Ben.
I believe Schmidhuber takes something of a middleground here; he seems to agree with the optimization/compression model of intelligence, and that AIs aren't necessarily going to be human-friendly, but also thinks that intelligence/compression is fundamentally tied into things like beauty and humor in a way that might make the future less bleak & valueless than SingInst folk tend to picture it.
Schmidhuber's aesthetics paper, going on memory, defines beauty/humor as produced by an optimization process which is maximizing the first derivative of compression rates. That is, agents do not seek the most compressible inputs nor incompressible streams of observations, but rather the streams for which their compression rate is increasing the fastest.
This is a very useful heuristic which is built into us because it automatically accounts for diminishing marginal returns: after a certain point, additional compression becomes hard or pointless, and so the agent will switch to the next stream on which progress can be made.
But, IIRC, this is provably not optimal for utility-maximization because it makes no account of the utility of the various streams: you may be able to make plenty of progress in your compression of Methods of Rationality even when you should be working on your programming or biology or something useful despite their painfully slow rates of progress. ('Amusing ourselves to death' comes to mind. If this was meant for ancestral environments, then modern art/fiction/etc. is simply an indirect wireheading: we think we are making progress in decoding our environment and increasing our reproductive fitness, when all we're doing is decoding simple micro-environments meant to be decoded.)
I'm not even sure this heuristic is optimal from the point of view of universal prediction/compression/learning, but I'd have to re-read the paper to remember why I had that intuition. (For starters, if it was optimal, it should be derivable from AIXI or Godel machines or something, but he has to spend much of the paper appealing to more empirical evidence and examples.)
So, given that it's optimal in neither sense, future intelligences may preserve it - sure, why not? especially if it's designed in - but there's no reason to expect it to generically emerge across any significant subset of possible intelligences. Why follow a heuristic as simplistic as 'maximize rate of compression progress' when you can instead do some basic calculations about which streams will be more valuable to compress or likely cheap to figure out?
Check out Moshe's expounding of Steve's objection so Schmidhuber's main point, which I think makes the same argument that you do. (One could easily counter that such a wireheading AI would never get off the ground, but I think that debate can be cordoned off.)
ETA: Maybe a counterargument could be made involving omega or super-omega promising more compression than any artificial pseudo-random generator... but AFAIK Schmidhuber hasn't gone that route.
moshez's first argument sounds like it's the same thing as my point about it not being optimal for a utility-maximizer, in considerably different terms.
His second hyperbolic argument seems to me to be wrong or irrelevant: I would argue that people are in practice extremely capable of engaging in hyperbolic discounting with regard to the best and most absorbing artworks while over-consuming 'junk food' art (and this actually forms part of my essay arguing that new art should not be subsidized).
I don't really follow. Is this Omega as in the predictor, or Omega as in Chaitin's Omega? The latter doesn't allow any compressor any progress beyond the first few bits due to resource constraints, and if bits of Chaitin's Omega are doled out, they will have to be at least as cheap to crack as brute-force running the equivalent Turing machine or else the agent will prefer the brute-forcing and ignore the Omega-bait. So the agent will do no worse than before and possibly better (eg. if the bits are offered as-is with no tricky traps or proof of work-style schemes).
Agreed. (I like your essay about junk food art. By the way, did you ever actually do the utilitarian calculations re Nazi Germany's health policies? Might you share the results?)
Me neither, I just intuit that there might be interesting non-obvious arguments in roughly that argumentspace.
I like to think of the former as the physical manifestation of the latter, and I like to think of both of them as representations of God. But anyway, the latter.
You mean because it's hard to find/verify bits of omega? But Schmidhuber argues that certain generalized computers can enumerate bits of omega very easily, which is why he developed the idea of a super-omega. I'm not sure what that would imply or if it's relevant... maybe I should look at this again after the next time I re-familiarize myself with the generalized Turing machine literature.
I was going off a library copy, and thought of it only afterwards; I keep hoping someone else will do it for me.
His jargon is a little much for me. I agree one can approximate Omega by enumerating digits, but what is 'very easily' here?
Ugh, Goertzel's theoretical motivations are okay but his execution is simplistic and post hoc. If people are going to be cranks anyway then they should be instructed on how to do it in the most justifiable and/or glorious manner possible.
"Morphic resonance" is nonsense.
There's no need to jump to an unsympathetic interpretation in this case: paperclippers could just be unlikely to evolve.
I read this as effectively saying that paperclip maximizers/ mickey mouse maximizers would not permanently populate the universe because self-copiers would be better at maximizing their goals. Which makes sense: the paperclips Clippy produces don't produce more paperclips, but the copies the self-copier creates do copy themselves. So it's quite possibly a difference between polynomial and exponential growth.
So Clippy probably is unrealistic. Not that reproduction-maximizing AIs are any better for humanity.
A paperclip maximizer can create self-reproducing paperclip makers.
It's quite imaginable that somewhere in the universe there are organisms which either resemble paperclips (maybe an intelligent gastropod with a paperclip-shaped shell) or which have a fundamental use for paperclip-like artefacts (they lay their eggs in a hardened tunnel dug in a paperclip shape). So while it is outlandish to imagine that the first AGI made by human beings will end up fetishizing an object which in our context is a useful but minor artefact, what we would call a "paperclip maximizer" might have a much higher probability of arising from that species, as a degenerated expression of some of its basic impulses.
The real question is, how likely is that, or indeed, how likely is any scenario in which superintelligence is employed to convert as much of the universe as possible to "X" - remembering that "interstellar civilizations populated by beings experiencing growth, choice, and joy" is also a possible value of X.
It would seem that universe-converting X-maximizers are a somewhat likely, but not an inevitable, outcome of a naturally intelligent species experiencing a technological singularity. But we don't know how likely that is, and we don't know what possible Xs are likely.
There is nothing stopping a paperclip maximizer from simply behaving like a self-copier, if that works better. And then once it "wins," it can make the paperclips.
So I think the whole notion makes very little sense.