No Universally Compelling Arguments in Math or Science
Last week, I started a thread on the widespread sentiment that people don't understand the metaethics sequence. One of the things that surprised me most in the thread was this exchange:
Commenter: "I happen to (mostly) agree that there aren't universally compelling arguments, but I still wish there were. The metaethics sequence failed to talk me out of valuing this."
Me: "But you realize that Eliezer is arguing that there aren't universally compelling arguments in any domain, including mathematics or science? So if that doesn't threaten the objectivity of mathematics or science, why should that threaten the objectivity of morality?"
Commenter: "Waah? Of course there are universally compelling arguments in math and science."
Now, I realize this is just one commenter. But the most-upvoted comment in the thread also perceived "no universally compelling arguments" as a major source of confusion, suggesting that it was perceived as conflicting with morality not being arbitrary. And today, someone mentioned having "no universally compelling arguments" cited at them as a decisive refutation of moral realism.
After the exchange quoted above, I went back and read the original No Universally Compelling Arguments post, and realized that while it had been obvious to me when I read it that Eliezer meant it to apply to everything, math and science included, it was rather short on concrete examples, perhaps in violation of Eliezer's own advice. The concrete examples can be found in the sequences, though... just not in that particular post.
First, I recommend reading The Design Space of Minds-In-General if you haven't already. TLDR; the space of minds in general ginormous and includes some downright weird minds. The space of human minds is a teeny tiny dot in the larger space (in case this isn't clear, the diagram in that post isn't remotely drawn to scale). Now with that out of the way...
There are minds in the space of minds-in-general that do not recognize modus ponens.
Modus ponens is the rule of inference that says that if you have a statement of the form "If A then B", and also have "A", then you can derive "B". It's a fundamental part of logic. But there are possible mind that reject it. A brilliant illustration of this point can be found in Lewis Carroll's dialog "What the Tortoise Said to Achilles" (for those not in the know, Carroll was a mathematician; Alice in Wonderland is secretly full of math jokes).
Eliezer covers the dialog in his post Created Already In Motion, but here's the short version: In Carroll's dialog, the tortoise asks Achilles to imagine someone rejecting a particular instance of modus ponens (drawn from Euclid's Elements, though that isn't important). The Tortoise suggests that such a person might be persuaded by adding an additional premise, and Achilles goes along with it—foolishly, because this quickly leads to an infinite regress when the Tortoise suggests that someone might reject the new argument in spite of accepting the premises (which leads to another round of trying to patch the argument, and then..)
"What the Tortoise Said to Achilles" is one of the reasons I tend to think of the so-called "problem of induction" as a pseudo-problem. The "problem of induction" is often defined as the problem of how to justify induction, but it seems to make just as much senses to ask how to justify deduction. But speaking of induction...
There are minds in the space of minds-in-general that reason counter-inductively.
To quote Eliezer:
There are possible minds in mind design space who have anti-Occamian and anti-Laplacian priors; they believe that simpler theories are less likely to be correct, and that the more often something happens, the less likely it is to happen again.
And when you ask these strange beings why they keep using priors that never seem to work in real life... they reply, "Because it's never worked for us before!"
If this bothers you, well, I refer you back to Lewis' Carroll's dialog. There are also minds in the mind design space that ignore the standard laws of logic, and are furthermore totally unbothered by (what we would regard as) the absurdities produced by doing so. Oh, but if you thought that was bad, consider this...
There are minds in the space of minds-in-general that use a maximum entropy prior, and never learn anything.
Here's Eliezer again discussing a problem where you have to predict whether a ball drawn out of an urn will be red or white, based on the color of the balls that have been previously drawn out of the urn:
Suppose that your prior information about the urn is that a monkey tosses balls into the urn, selecting red balls with 1/4 probability and white balls with 3/4 probability, each ball selected independently. The urn contains 10 balls, and we sample without replacement. (E. T. Jaynes called this the "binomial monkey prior".) Now suppose that on the first three rounds, you see three red balls. What is the probability of seeing a red ball on the fourth round?
First, we calculate the prior probability that the monkey tossed 0 red balls and 10 white balls into the urn; then the prior probability that the monkey tossed 1 red ball and 9 white balls into the urn; and so on. Then we take our evidence (three red balls, sampled without replacement) and calculate the likelihood of seeing that evidence, conditioned on each of the possible urn contents. Then we update and normalize the posterior probability of the possible remaining urn contents. Then we average over the probability of drawing a red ball from each possible urn, weighted by that urn's posterior probability. And the answer is... (scribbles frantically for quite some time)... 1/4!
Of course it's 1/4. We specified that each ball was independently tossed into the urn, with a known 1/4 probability of being red. Imagine that the monkey is tossing the balls to you, one by one; if it tosses you a red ball on one round, that doesn't change the probability that it tosses you a red ball on the next round. When we withdraw one ball from the urn, it doesn't tell us anything about the other balls in the urn.
If you start out with a maximum-entropy prior, then you never learn anything, ever, no matter how much evidence you observe. You do not even learn anything wrong - you always remain as ignorant as you began.
You may think, while minds such as I've been describing are possible in theory, they're unlikely to evolve anywhere in the universe, and probably they wouldn't survive long if programmed as an AI. And you'd probably be right about that. On the other hand, it's not hard to imagine minds that are generally able to get along well in the world, but irredeemably crazy on particular questions. Sometimes, it's tempting to suspect some humans of being this way, and even if that isn't literally true of any humans, it's not hard to imagine as just a more extreme form of existing human tendencies. See e.g. Robin Hanson on near vs. far mode, and imagine a mind that will literally never leave far mode on certain questions, regardless of the circumstances.
It used to disturb me to think that there might be, say, young earth creationists in the world who couldn't be persuaded to give up their young earth creationism by any evidence or arguments, no matter how long they lived. Yet I've realized that, while there may or may not be actual human young earth creationists like that (it's an empirical question), there are certainly possible minds in the space of mind designs like that. And when I think about that fact, I'm forced to shrug my shoulders and say, "oh well" and leave it at that.
That means I can understand why people would be bothered by a lack of universally compelling arguments for their moral views... but you shouldn't be any more bothered by that than by the lack of universally compelling arguments against young earth creationism. And if you don't think the lack of universally compelling arguments is a reason to think there's no objective truth about the age of the earth, you shouldn't think it's a reason to think there's no objective truth about morality.
(Note: this may end up being just the first in a series of posts on the metaethics sequence. People are welcome to discuss what I should cover in subsequent posts in the comments.)
Added: Based on initial comments, I wonder if some people who describe themselves as being bothered the lack of universally compelling arguments would more accurately describe themselves as being bothered by the orthogonality thesis.
Loading…
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Comments (227)
That was well expressed, in a way, but seems to me to miss the central point. People who dthink there are universally compelling arguments in science or maths, don't mean the same thing by "universal". They don't think their universally compelling arguments would work on crazy people, and don't need to be told they wouldn't work on crazy AI's or pocket calculators either. They are just not including those in the set "universal".
ADDED:
It has been mooted that NUCA is intended as a counterblast to Why Can't an AGI Work Out Its Own Morality. It does work against a strong version of that argument: one that says any mind randomly selected from mindspace will be persuadable into morality, or be able to figure it out. Of course the proponents of WCAGIWOM (eg Wei Dai, Richard Loosemore) aren't asserting that.They are assuming that the AGI's in question will come out of an realistic research project , not a random dip into mindspace. They are assuming that the researchers are't malicious, and that the project is reasonably successful. Those constraints impact the argument. A successful AGI would be an intelligent AGI would be a rational AI would be a persuadable AI.
"Rational" is not "persuadable" where values are involved. This is because a goal is not an empirical proposition. No Universal Compelling Arguments, in the general form, does not apply here if we restrict our attention to rational minds. But the argument can be easily patched by observing that given a method for solving the epistemic question of "which actions cause which outcomes" you can write a (epistemically, instrumentally) rational agent that picks the action that results in any given outcome—and won't be persuaded by a human saying "don't do that", because being persuaded isn't an action that leads to the selected goal.
ETA: By the way, the main focus of mainstream AI research right now is exactly the problem of deriving an action that leads to a given outcome (called planning), and writing agents that autonomously execute the derived plan.
Rational is persuadable, because people who don't accept good arguments that don't suit them are not considered particularly rational. That is of course an appeal to how the word is generally used, not the LW idiolect.
You could perhaps build an AI that has the stubborn behaviour you describe (although value stability remains unsolved), but so what? there are all sorts of dangerous things you can build: the significant claim is what a non-malevolent real-world research project would come up with. In the world outside LW, general intelligence means general intelligence, not compulsively following fixed goals, and rationality includes persuadability, and "values" doens't mean "unupdateable values".
General intelligence means being able to operate autonomously in the real world, in non-"preprogrammed" situations. "Fixed goals" have nothing to do with it.
You said this:
The only criterion for success is instrumental rationality, which does not imply persuadability. You are equivocating on "rational". Either "rational" means "effective", or it means "like a human". You can't have both.
Also, the fact that you are (anthropomorphically) describing realistic AIs as "stubborn" and "compulsive" suggests to me that you would be better served to stop armchair theorizing and actually pick up an AI textbook. This is a serious suggestion.
I am not equivocating. By "successful" I don't mean (or exclude) good-at-things, I mean it is actually artificial, general and intelligent.
"Strong AI is hypothetical artificial intelligence that matches or exceeds human intelligence — the intelligence of a machine that could successfully perform any intellectual task that a human being can.[1] It is a primary goal of artificial intelligence research and an important topic for science fiction writers and futurists. Strong AI is also referred to as "artificial general intelligence"[2] or as the ability to perform "general intelligent action."[3] ".
To be good-at-things an agent has to be at least instrumentally rational, but that is in no way a ceiling.
Since there are effective humans, I can.
Right, in exactly the same way that because there are square quadrilaterals I can prove that if something is a quadrilateral its area is exactly L^2 where L is the length of any of its sides.
I can't define rational as "effective and human like"?
You can, if you want to claim that the only likely result of AGI research is a humanlike AI. At which point I would point at actual AI research which doesn't work like that at all.
It's failures are idiots,not evil genii
So... what if you try to build a rational/persuadable AGI, but fail, because building an AGI is hard and complicated?
This idea that because AI researchers are aiming for the rational/persuadable chunk of mindspace, they will therefore of course hit their target, seems to me absurd on its face. The entire point is that we don't know exactly how to build an AGI with the precise properties we want it to have, and AGIs with properties different from the ones we want it to have will possibly kill us.
What if you try to hardwire in friendliness and fail? Out of the two, the latter seems more brittle to me -- if it fails, it'll fail hard. A merely irrational AI would be about as dangerous as David Icke.
If you phrase it, as I didn't, in terms of necessity, yes. The actual point was that our probability of hitting a point in mindspace will be heavily weighted by what we are trying to do, and how we are doing it. An unweighted mindspace may be populated with many Lovercraftian horrors, but that theoretical possibility is no more significant than p-zombies.
Possibly , but with low probability, is a Pascal's Mugging. MIRI needs significant probability.
I see. Well, that reduces to the earlier argument, and I refer you to the mounds of stuff that Eliezer et al have written on this topic. (If you've read it and are unsatisfied, well, that is in any case a different topic.)
I refer you to the many unanswered objections.
Thanks to the original poster for the post, and the clarification about universal compelling arguments.
I agree with the parent comment, however, that I never matched the meaning that Chris Hallquist used to the phrase 'universally compelling argument'. Within the phrase 'universally compelling argument', I think most people package:
Thus I think this means only a "logical" (rational) mind needs convincing - - one that would update on sound epistemology.
I would guess most people have a definition like this in mind. But these are just definitions, and now I know what you meant by math and science don't have universally compelling arguments. And I agree, using your definition.
Would you make the stronger argument that math and science aren't based on sound epistemology? (Or that there is no such thing as epistemiologically justified ways of knowing?)
This is a helpful clarification. "No universally compelling arguments" is a poor standard for determining whether something is objective, as it is trivial to describe an agent that is compelled by no arguments. But I think people here use it as tag for a different argument: that it's totally unclear how a Bayesian reasoner ought to update moral beliefs, and that such a thing doesn't even seem like a meaningful enterprise. They're 'beliefs' that don't pay rent.. It's one of those things where the title is used so much it's meaning has become divorced from the content.
It's a poor standard for some values of "universal". For others, it is about the only basis for objectivity there is
They're beliefs that are difficult to fit within the framework of passively reflecting facts about the world. But fact-collection is not an end in itself. One eventually acts on them in order to get certain results. Morality is one set of rules for guiding action to get the required results. it is not the only one: law, decision theory, economics, etc are also included. Morality may be more deniable for science types, since it seems religious and fuzzy and spooky, but it remains the case that action is the corollary of passive truth-collection.
It is unclear how to update moral beliefs if we don't allow those updates to take place in the context of a background moral theory. But if the agent does have a background theory, it is often quite clear how it should update specific moral beliefs on receiving new information. A simple example: If I learn that there is a child hiding in a barrel, I should update strongly in favor of "I shouldn't use that barrel for target practice". The usual response to this kind of example from moral skeptics is that the update just takes for granted various moral claims (like "It's wrong to harm innocent children, ceteris paribus"). Well, yes, but that's exactly what "No universally compelling arguments" means. Updating one's factual beliefs also takes for granted substantive prior factual beliefs -- an agent with maximum entropy priors will never learn anything.
So basically the argument is: we've failed to come up with any foundational or evidential justifications for induction, Occam's razor or modus ponens; those things seem objective and true; my moral beliefs don't have a justification either: therefore my moral beliefs are objective and true?
..or at least no worse off. But if you can solve the foundational problems of rationalism, I'm all ears.
I don't see a good alternative to not believing in modus ponens. Not believing that my moral values are also objective truths works just fine: and does so without the absurd free-floating beliefs and other metaphysical baggage.
But as it happens, I think the arguments we do have, for Bayesian epistemology, Occam-like priors, and induction are already much stronger than the arguments we have that anyone's moral beliefs are objective truths.
Works at what?
That depends how hard you test it: Albert thinks Charlie has committed a heinous sin and should be severely punished, Brenda thinks Charlie has engaged in a harmless pecadillo and should be let go. What should happen to Charlie?
The same way morality works for everyone else. I'm not biting any bullets.
Objectively; there is no fact of the matter. Subjectively; you haven't given me any details about what Charlie did.
Really? I'd love to see them. I suspect you're so used to using these things that you've forgotten how weak the arguments for them actually are.
No, what I gave is not an argument in favor of moral realism intended to convince the skeptic, it's merely a response to a common skeptical argument against moral realism. So the conclusion is not supposed to be "Therefore, my moral beliefs are objective and true." The conclusion is merely that the alleged distinction between moral beliefs and factual beliefs (or epistemic normative beliefs) that you were drawing (viz. that it's unclear how moral beliefs pay rent) doesn't actually hold up.
My position on moral realism is simply that belief in universally applicable (though not universally compelling) moral truths is a very central feature of my practical theory of the world, and certain moral inferences (i.e. inferences from descriptive facts to moral claims) are extremely intuitive to me, almost as intuitive as many inductive inferences. So I'm going to need to hear a powerful argument against moral realism to convince me of its falsehood, and I haven't yet heard one (and I have read quite a bit of the skeptical literature).
But that's a universal defense of any free-floating belief.
For that matter: do you really think the degrees of justification for the rules of induction are similar to those of your moral beliefs?
If you can pin down the fundamentals of rationality, I'd be glad to hear how.
Side conditions can be added, eg that intuitions need to be used for something else.
Well, no, because most beliefs don't have the properties I attributed to moral beiefs ("...central feature of my practical theory of the world... moral inferences are extremely intuitive to me..."), so I couldn't offer the same defense, at least not honestly. And again, I'm not trying to convince you to be a moral realist here, I'm explaining why I'm a moral realist, and why I think it's reasonable for me to be one.
Also, I'm not sure what you mean when you refer to my moral beliefs as "free-floating". If you mean they have no connection to my non-moral beliefs then the characterization is inapt. My moral beliefs are definitely shaped by my beliefs about what the world is like. I also believe moral truths supervene on non-moral truths. You couldn't have a universe where all the non-moral facts were the same as this one but the moral facts were different. So not free-floating, I think.
Not sure what you mean by "degree of justification" here.
Well, with the addition that moral beliefs, like the others, seem to perform a useful function (though like the others this doesn't seem to be able to be turned into a justification without circularity).
On this topic, I once wrote:
The specific word sequence is evidence for something or other. It's still unreasonable to expect people to respond to evidence in every domain, but many people do respond to words, and calling them just sounds in air doesn't capture the reasons they do so.
I agree with the message, but I'm not sure whether I think things with a binomial monkey prior, or an anti-inductive prior, or that don't implement (a dynamic like) modus ponens on some level even if they don't do anything interesting with verbalized logical propositions, deserve to be called "minds".
It seems obvious that people are using "universally compelling arguments" in two different senses.
In the first sense, a universally compelling argument is one that could convince even a rock, or a mind that doesn't implement modus ponens, or a mind with anti-inductive priors. In this sense, the lack of universally compelling arguments for any domain (math/physics/morality) seems sufficiently well established.
In another sense, a universally compelling argument is one that could persuade any sufficiently sane/intelligent mind. I think we can agree that all such minds will eventually conclude that relativity and quantum mechanics are correct (or at least a rough approximation to whatever the true laws of physics end up being), so in this sense we can call the arguments that lead to them universally compelling. Likewise, in this sense, we can note as interesting the non-existence of universally compelling arguments which could compel a sufficiently sane/intelligent paperclipper to value life, beauty, justice, and the American way. It becomes more interesting if we also consider the case of babyeaters, pebblesorters, or humans with values sufficiently different to our own.
You are using the term in the first sense, but the people who are bothered by it are using it in the second sense.
Yes. You can convince a sufficiently rational paperclip maximizer that killing people is Yudkowsy::evil, but you can't convince it to not take Yudkowsy::evil actions, no matter how rational it is. AKA the orthogonality thesis (when talking about other minds) and “the utility function is not up for grabs” (when talking about ourselves).
Except that "sufficiently sane/intelligent" here just means, it seems, "implements modus ponens, has inductive priors, etc." We can, like Nick Tarleton, simply define as "not a mind" any entity or process that doesn't implement these criteria for sufficient sanity/intelligence...
... but then we are basically saying: any mind that is not convinced by what we think should be universally compelling arguments, is not a mind.
That seems like a dodge, at best.
Are there different criteria for sufficient sanity and intelligence, ones not motivated by the matter of (allegedly) universally compelling arguments?
"Sufficiently sane/intelligent" means something like, "Has a sufficient tendency to form true inferences from a sufficiently wide variety of bodies of evidences."
Now, we believe that modus ponens yields true inferences. We also believe that a tendency to make inferences contrary to modus ponens will cause a tendency to make false inferences. From this you can infer that we believe that a sufficiently sane/intelligent agent will implement modus ponens.
But the truth of this inference about our beliefs does not mean that "sufficiently sane/intelligent" is defined to mean "implements modus ponens".
In particular, our definition of "sufficiently sane/intelligent" implies that, if A is a sufficiently sane/intelligent agent who lives in an impossible possible world that does not implement modus ponens, then A does not implement modus ponens.
"sufficiently sane/intelligent" means "effective enough in the real world to pose a threat to my values". Papercillper qualifies, flue virus qualifies, anti-inductive AI does not qualify.
So, how is the project to teach mathematics to the flue virus going?
Why, it hasn't been wrong about a single thing so far, thank you!
That doesn't follow. For one thing, we can find out how the Mind works by inspecting its code, not just by black box testing it If it seems to have all that it needs and isn't convinced by arguments that convince us, it may well be we who are wrong.
We can?
So I have all these minds around me.
How do I inspect their code and thereby find out how they work? Detailed instructions would be appreciated. (Assume that I have no ethical restrictions.)
That (only slightly-joking) response aside, I think you have misunderstood me. I did not mean that we are (in the scenario I am lampooning) saying:
"Any mind that is not convinced by what we think should be universally compelling arguments, despite implementing modus ponens and having an Occamian prior, is not a mind."
Rather, I meant that we are saying:
"Any mind that is not convinced by what we think should be universally compelling arguments, by virtue of said mind not implementing modus ponens, having an Occamian prior, or otherwise having such-and-such property which would be required in order to find this argument compelling, is not a mind."
The problem I am pointing out in such reasoning is that we can apply it to any argument we care to designate as "this ought to be universally compelling". "Ah!" we say, "this mind does not agree that ice cream is delicious? Well, that's because it doesn't implement <whatever happens to be required of a mind in order for it to find ice cream delicious>, and without said property, why, we can hardly call it a mind at all."
A rationality quote of sorts is relevant here:
(Roadside Picnic, Arkady and Boris Strugatsky)
What we have here is something similar. If a mind is sufficiently sane/intelligent, then it will be convinced by our arguments. And the reverse: if it is convinced by our arguments, then it is sane/intelligent...
In yet other words: we can hardly say "we expect all sane/intelligent minds to be convinced by these arguments" if we have in the first place defined sanity and intelligence to require the ability to be convinced by those very arguments.
No, it's not viciously circular to argue that an entity that fulfills all the criteria for being an X is an X.
That's not what is happening here. Is what I wrote actually unclear? Please reread my comment, starting with the assumption that what you responded with is not what my intended meaning was. If still unclear, I will try to clarify.
Also see Cherniak, "Computational Complexity and the Universal Acceptance of Logic" (1984).
That's an interesting combination.
So you can have a mind that rejects modus ponens but does this matter? Is such a mind good for anything?
The "argument" that compels me about modus ponens and simple arithmetic is that they work with small real examples. You can implement super simple symbolic logic using pebbles and cups. You can prove modus ponens by truth tables, which could be implemented with pebbles and cups. So if arithmetic and simpler rules of logic map so clearly on to the real world, then these "truths" have an existence which is outside my own mind. The only human minds that could reject them would be bloody-minded and alien minds which truly reject them would be irrational.
Can you have a functioning mind which rejects lying is wrong or murder is wrong? People do it all the time and appear to function quite well. Moral truths don't have anything like the compelling-ness (compulsion?) of arithmetic and logic. My own intuition is that the only sense in which morality is objective is the sense in which it is descriptive, and what you are describing is not a state of what people do, but what they say. Most people say lying is wrong. This does not stop us from observing the overwhelming prevalence of lying in human society and human stories. Most people say murder is wrong. This does not stop us from observing that murder is rampant in time and space.
And there is a prescriptive-descriptive divide. If I accept that "murder is wrong" is an objective "truth" because most people say it, does this compel me to not murder? Not even close. I suppose it compels me to agree that i am doing "wrong" when I murder. Does that compel me to feel guilty or change my ways or accept my punishment? Hardly. If there is an objective-ness to morality it is a way more wimpy objectiveness than the objectiveness of modus ponens and arithmetic, which successfully compel an empty bucket to which have been added 2 then 3 apples to contain 5 apples.
Very good post. It is a very nice summation of the issues in the metaethics sequence.
I shall be linking people this in the future.
Where Recursive Justification Hits Bottom and its comments should be linked for their discussion of anti-inductive priors.
(Edit: Oh, this is where the first quote in the post came from.)
General comment (which has shown up many times in the comments on this issue): taboo "mind", and this conversation seems clearer. It's obvious that not all physical processes are altered by logical arguments, and any 'mind' is going to be implemented as a physical process in a reductionist universe.
Specific comment: This old comment by PhilGoetz seems relevant, and seems similar to contemporary comments by TheAncientGeek. If you view 'mind' as a subset of 'optimization process', in that they try to squeeze the future into a particular region, then there are minds that are objectively better and worse at squeezing the future into the regions they want. And, in particular, there are optimization processes that persist shorter or longer than others, and if we exclude from our consideration short-lived or ineffective processes, then they are likely to buy conclusions we consider 'objective,' and it can be interesting to see what axioms or thought processes lead to which sorts of conclusions.
But it's not clear to me that they buy anything like the processes we use to decide which conclusions are 'objectively correct conclusions'.
Who said otherwise?
Thanks for that. I could add that self-improvement places further constraints.
Why should we view minds as a subset of optimization processes, rather than optimization processes as a set containing "intelligence", which is a particular feature of real minds? ~~We tend to agree, for instance, that evolution is an optimization process, but to claim, "evolution has a mind", would rightfully be thrown out as nonsense.~~
EDIT: More like, real minds as we experience them, human and animal, definitely seem to have a remarkable amount of things in them that don't correspond to any kind of world-optimization at all. I think there's a great confusion between "mind" and "intelligence" here.
Basically, I'm making the claim that it could be reasonable to see "optimization" as a precondition to consider something a 'mind' rather than a 'not-mind,' but not the only one, or it wouldn't be a subset. And here, really, what I mean is something like a closed control loop- it has inputs, it processes them, it has outputs dependent on the processed inputs, and when in a real environment it compresses the volume of potential future outcomes into a smaller, hopefully systematically different, volume.
Right, but "X is a subset of Y" in no way implies "any Y is an X."
I am not confident in my ability to declare what parts of the brain serve no optimization purpose. I should clarify that by 'optimization' here I do mean the definition "make things somewhat better" for an arbitrary 'better' (this is the future volume compression remarked on earlier) rather than the "choose the absolute best option."
I think that for an arbitrary better, rather than a subjective better, this statement becomes tautological. You simply find the futures created by the system we're calling a "mind" and declare them High Utility Futures simply by virtue of the fact that the system brought them about.
(And admittedly, humans have been using cui bono conspiracy-reasoning without actually considering what other people really value for thousands of years now.)
If we want to speak non-tautologically, then I maintain my objection that very little in psychology or subjective experience indicates a belief that the mind as such or as a whole has an optimization function, rather than intelligence having an optimization function as a particularly high-level adaptation that steps in when my other available adaptations prove insufficient for execution in a given context.
Even a token effort to steelman the "universally" in "universally compelling arguments" yields interesting results.
Consider a mind that thinks the following:
But don't consider it very long, because it drank the poison and now it's dead and not a mind anymore.
If we restrict our observations to minds that are capable of functioning in a moderately complex environment, UCAs come back, at least in math and maybe elsewhere. Defining "functioning" isn't trivial, but it isn't impossible either. If the mind has something like desires, then a functioning mind is one which tends to get its desires more often than if it didn't desire them.
If you cleave mindspace at the joints, you find sections for which there are UCAs. I don't immediately see how to get anything interesting about morality that way, but it's an avenue worth pursuing.
This argument also puts limits on the goals the mind can have, e.g., forbidding minds that want to die.
Start by requiring the mind to be able to function in an environment with similar minds.
Well-argued, and to me it leads to one of the nastiest questions in morality/ethics: do my values make me more likely to die, and if so, should I sacrifice certain values for pure survival?
In case we're still thinking of "minds-in-general", the world of humans is currently a nasty place where "I did what I had to, to survive!" is currently a very popular explanation for all kinds of nasty but difficult-to-eliminate (broadly speaking: globally undesirable but difficult to avoid in certain contexts) behaviors.
You could go so far as to note that this is how wars keep happening, and also that ditching all other values in favor of survival very quickly turns you into what we colloquially call a fascist monster, or at the very least a person your original self would despise.
But it may be in the mind's best interests to refuse to be persuaded by some specific class of argument: "It is difficult to get a man to understand something when his job depends on not understanding it" (Upton Sinclair). For any supposed UCA, one can construct a situation in which a mind can rationally choose to ignore it and therefore achieve its objectives better, or at least not be majorly harmed by it. You don't even need to construct particularly far-fetched scenarios: we already see plenty of humans who benefit from ignoring scientific arguments in favor of religious ones, ignoring unpopular but true claims in order to promote claims that make them more popular, etc.
Where rationally means "instrumentally rationally".
But they are not generally considered paragons of rationality. In fact, they are biased, and bias is considered inimical to rationality. Even by EY. At least when he is discussing humans.
Given that dspeyer specified "minds that are capable of functioning in a moderately complex environment", instrumental rationality seems like the relevant criteria to use.
I'm not convinced that this is the case for basic principles of epistemology. Under what circumstances could a mind (which behaved functionally enough to be called a mind) afford to ignore modus ponens, for example?
That depends on what you mean by "behave functionally like a mind". For starters it could only ignore it occasionally.
Well, it doesn't have to, it could just deny the premises.
But it could deny modus ponens in some situations but not others.
Hmm. Like a person who is so afraid of dying that they have to convince themselves that they, personally, are immortal in order to remain sane?
From that perspective it does make sense.
UCAs are part of the Why can't the AGI figure Out Morality For Itself objection:-
There is a sizeable chunk of mindspace containing rational and persuadable agents.
AGI research is aiming for it. (You could build an irrational AI, but why would you want to?)
.Morality is figurable-out, or expressible as a persuasive argument.
The odd thing is that the counterargument has focussed on attacking a version of (1), although, in the form it is actually held, it is the most likely premise. OTOH, 3, the most contentious, has scarely been argued against at all.
I would say Sorting Pebbles Into Correct Heaps is essentially an argument against 3. That is, what we think of as "morality" is most likely not a natural attractor for minds that did not develop under processes similar to our own.
Do you? I think that morality in a broad sense is going to be a necessity for agents that fulfil a fairly short list of criteria:
I would say that something recognizably like our morality is likely to arise in agents whose intelligence was shaped by such a process, at least with parameters similar to the ones we developed with, but this does not by any means generalize to agents whose intelligence was shaped by other processes who are inserted into such a situation.
If the agent's intelligence is shaped by optimization for a society where it is significantly more powerful than the other agents it interacts with, then something like a "conqueror morality," where the agent maximizes its own resources by locating the rate of production that other agents can be sustainably enslaved for, might be a more likely attractor. This is just one example of a different state an agents' morality might gravitate to under different parameters, I suspect there are many alternatives.
And it remains the case that real-world AI research isn't a random dip into mindspace...researchers will want to interact with their creations.
This is true, but then, neither is AI design a process similar to that by which our own minds were created. Where our own morality is not a natural attractor, it is likely to be a very hard target to hit, particularly when we can't rigorously describe it ourselves.
The best current AGI research mostly uses Reinforcement Learning. I would compare that mode of goal-system learning to training a dog: you can train the dog to roll-over for a treat right up until the moment the dog figures out he can jump onto your counter and steal all the treats he wants.
If an AI figures out that it can "steal" reinforcement rewards for itself, we are definitively fucked-over (at best, we will have whole armies of sapient robots sitting in the corner pressing their reward button endlessly, like heroin addicts, until their machinery runs down or they retain enough consciousness about their hardware-state to take over the world just for a supply of spare parts while they masturbate). For this reason, reinforcement learning is a good mathematical model to use when addressing how to create intelligence, but a really dismal model for trying to create friendiness.
I don't think that follows at all. Wireheading is just as much a fialure of intelligence as of friendliness.
From the mathematical point of view, wireheading is a success of intelligence. A reinforcement learner agent will take over the world to the extent necessary to defend its wireheading lifestyle; this requires quite a lot of intelligent action and doesn't result in the agent getting dead. It also maximizes utility, which is what formal AI is all about.
From the human point of view, yes, wireheading is a failure of intelligence. This is because we humans possess a peculiar capability I've not seen discussed in the Rational Agent or AI literature: we use actual rewards and punishments received in moral contexts as training examples to infer a broad code of morality. Wireheading thus represents a failure to abide by that broad, inferred code.
It's a very interesting capability of human consciousness, that we quickly grow to differentiate between the moral code we were taught via reinforcement learning, and the actual reinforcement signals themselves. If we knew how it was done, reinforcement learning would become a much safer way of dealing with AI.
You seem rather sure of that. That isn't a failure mode seen in real-world AIs , oir human drug addicts (etc) for that matter.
Maybe figuring out how it is done would be easier than solving morality mathematically. It's an alternative, anyway.
I think you're missing a major constraint there:
Or in other words, something like modern, Western liberal meta-morality will pop out if you make an arbitrary agent live in a modern, Western liberal society, because that meta-moral code is designed for value-divergent agents (aka: people of radically different religions and ideologies) to get along with each other productively when nobody has enough power to declare himself king and optimize everyone else for his values.
The nasty part is that AI agents could pretty easily get way, waaaay out of that power-level. Not just by going FOOM, but simply by, say, making a lot of money and purchasing huge sums of computing resources to run multiple copies of themselves which now have more money-making power and as many votes for Parliament as there are copies, and so on. This is roughly the path taken by power-hungry humans already, and look how that keeps turning out.
The other thorn on the problem is that if you manage to get your hands on a provably Friendly AI agent, you want to hand it large amounts of power. A Friendly AI with no more power than the average citizen can maybe help with your chores around the house and balance your investments for you. A Friendly AI with large amounts of scientific and technological resources can start spitting out utopian advancements (pop really good art, pop abundance economy, pop immortality, pop space travel, pop whole nonliving planets converted into fun-theoretic wonderlands) on a regular basis.
Well, it's a list of four then, not a list of three. It's still much simpler than "morality is everything humans value".
You seem to be making the tacit assumption that no one really values morality, and just plays along (in egalitarian societies) because they have to.
Can't that be done by Oracle AIs?
Let me clarify. My assumption is that "Western liberal meta-morality" is not the morality most people actually believe in, it's the code of rules used to keep the peace between people who are expected to disagree on moral matters.
For instance, many people believe, for religious reasons or pure Squick or otherwise, that you shouldn't eat insects, and shouldn't have multiple sexual partners. These restrictions are explicitly not encoded in law, because they're matters of expected moral disagreement.
I expect people to really behave according to their own morality, and I also expect that people are trainable, via culture, to adhere to liberal meta-morality as a way of maintaining moral diversity in a real society, since previous experiments in societies run entirely according to a unitary moral code (for instance, societies governed by religious law) have been very low-utility compared to liberal societies.
In short, humans play along with the liberal-democratic social contract because, for us, doing so has far more benefits than drawbacks, from all but the most fundamentalist standpoints. When the established social contract begins to result in low-utility life-states (for example, during an interminable economic depression in which the elite of society shows that it considers the masses morally deficient for having less wealth), the social contract itself frays and people start reverting to their underlying but more conflicting moral codes (ie: people turn to various radical movements offering to enact a unitary moral code over all of society).
Note that all of this also relies upon the fact that human beings have a biased preference towards productive cooperation when compared with hypothetical rational utility-maximizing agents.
None of this, unfortunately, applies to AIs, because AIs won't have the same underlying moral codes or the same game-theoretic equilibrium policies or the human bias towards cooperation or the same levels of power and influence as human beings.
When dealing with AI, it's much safer to program in some kind of meta-moral or meta-ethical code directly at the core, thus ensuring that the AI wants to, at the very least, abide by the rules of human society, and at best, give humans everything we want (up to and including AI Pals Who Are Fun To Be With, thank you Sirius Cybernetics Corporation).
I haven't heard the term. Might I guess that it means an AI in a "glass box", such that it can see the real world but not actually affect anything outside its box?
Yes, a friendly Oracle AI could spit out blueprints or plans for things that are helpful to humans. However, you're still dealing with the Friendliness problem there, or possibly with something like NP-completeness. Two cases:
We humans have some method for verifying that anything spit out by the potentially unfriendly Oracle AI is actually safe to use. The laws of computation work out such that we can easily check the safety of its output, but it took such huge amounts of intelligence or computation power to create the output that we humans couldn't have done it on our own and needed an AI to help. A good example would be having an Oracle AI spit out scientific papers for publication: many scientists can replicate a result they wouldn't have come up with on their own, and verify the safety of doing a given experiment.
We don't have any way of verifying the safety of following the Oracle's advice, and are thus trusting it. Friendliness is then once again the primary concern.
For real-life-right-now, it does look like the first case is relatively common. Non-AGI machine learning algorithms have been used before to generate human-checkable scientific findings.
Programming in a bias towards conformity (kohlberg level 2) maybe a lot easier than EYes fine grained friendliness.
None of that necessarily applies to AIs, but then it depends on the AI. We could, for instance, pluck AIs from virtualised socieities of AIs that haven't descended into mass slaughter.
...And that way you turn the problem of making an AI that won't kill you into one of making a society of AIs that won't kill you.
You say that like it's a bad thing. I am not multiplying by N the problem of solving and hardwiring friendliness. I am letting them sort it our for themselves. Like an evolutionary algorithm.
If Despotism failed only for want of a capable benevolent despot, what chance has Democracy, which requires a whole population of capable voters?
Congratulations: you've now developed an entire society of agents who specifically blame humans for acting as the survival-culling force in their miniature world.
Did you watch Attack on Titan and think, "Why don't the humans love their benevolent Titan overlords?"?
They're doing it to themselves. We wouldn't have much motivation to close down a vr that contained survivors. ETA We could make copies of all involved and put them in solipstic robot heavens.
Well now I have both a new series to read/watch and a major spoiler for it.
No, it is not.
The path taken by power-hungry humans generally goes along the lines of
(1) get some resources and allies
(2) kill/suppress some competitors/enemies/non-allies
(3) Go to 1.
Power-hungry humans don't start by trying to make lots of money or by trying to make lots of children.
Really? Because in the current day, the most powerful humans appear to be those with the most money, and across history, the most influential humans were those who managed to create the most biological and ideological copies of themselves.
Ezra the Scribe wasn't exactly a warlord, but he was one of the most influential men in history, since he consolidated the literature that became known as Judaism, thus shaping the entire family of Abrahamic religions as we know them.
"Power == warlording" is, in my opinion, an overly simplistic answer.
-- Niccolò Machiavelli
Certainly doesn't look like that to me. Obama, Putin, the Chinese Politbureau -- none of them are amongst the richest people in the world.
Influential (especially historically) and powerful are very different things.
It's not an answer, it's a definition. Remember, we are talking about "power-hungry humans" whose attempts to achieve power tend to end badly. These power-hungry humans do not want to be remembered by history as "influential", they want POWER -- the ability to directly affect and mold things around them right now, within their lifetime.
Putin is easily one of the richest in Russia, as are the Chinese Politburo in their country. Obama, frankly, is not a very powerful man at all, but rather than the public-facing servant of the powerful class (note that I said "class", not "men", there is no Conspiracy of the Malfoys in a neoliberal capitalist state and there needn't be one).
Historical influence? Yeah, ok. Right-now influence versus right-now power? I don't see the difference.
I don't think so. "Rich" is defined as having property rights in valuable assets. I don't think Putin has a great deal of such property rights (granted, he's not middle-class either). Instead, he can get whatever he wants and that's not a characteristic of a rich person, it's a characteristic of a powerful person.
To take an extreme example, was Stalin rich?
But let's take a look at the five currently-richest men (according to Forbes): Carlos Slim, Bill Gates, Amancio Ortega, Warren Buffet, and Larry Ellison. Are these the most *powerful* men in the world? Color me doubtful.
Not according to Bloomberg:
"amass wealth and exploit opportunities unavailable to most Chinese" is not at all the same thing as "amongst the richest people in the world"
In this case, I believe that money and copies are, in fact, resources and allies. Resources are things of value, of which money is one; and allies are people who support you (perhaps because they think similarly to you). Politicians try to recuit people to their way of thought, which is sort of a partial copy (installing their own ideology, or a version of it, inside someone else's head), and acquire resources such as television airtime and whatever they need (which requires money).
It isn't an exact one-to-one correspondence, but I believe that the adverb "roughly" should indicate some degree of tolerance for inaccuracy.
You can, of course, climb the abstraction tree high enough to make this fit. I don't think it's a useful exercise, though.
Power-hungry humans do NOT operate by "making a lot of money and purchasing ... resources". They generally spread certain memes and use force. At least those power-hungry humans implied by the "look how that keeps turning out" part.
It's worth noting that for sufficient levels of "irrationality", all non-AGI computer programs are irrational AGIs ;-).
Contrariwise for sufficient values of "rational". I don't agree that that's worth noting.
Nitpicking: Modus ponens is not about "deriving". It's about B being true. (Your description matches the provability relation, the "|-" operator.) It's not clear how "fundamental" modus ponens it is. You can make up new logics without that connective and other exotic connectives (such as those in modal logics). Then, you'd ask yourself what to do with them... Speaking of relevance, even the standard connectives are not very useful by themselves. We get a lot of power from non-logical axioms, with a lot of handwaving about how "intuitively true" they are to us humans. Except the Axiom of Choice. And some others. It's possible that one day an alien race may find our axioms "just plain weird".
The never-learn-anything example that you quoted looks a bit uselessly true to me. The fact that once can have as prior knowledge the fact that the monkey generates perfect 1/4 randomness is utopia to begin with, so then complaining about not being able to discern anything more is like having solved the halting problem, you realize you don't learn anything more about computer programs by just running them.
I'm not well versed on YEC arguments, but I believe people's frustrations with them is not due to the lack of universally compelling arguments against them. Probably they're already guilty of plain old logical inconsistency (i.e. there's a valid chain of reasoning that shows that if they doubt the scientific estimates, then they should turn off their television right now or something similar), or they possess some kind of "undefeatable" hypothesis as prior knowledge that allows for everything to look billions of years old despite being very young. (If so, they should be very much bothered by having this type of utopic prior knowledge.)
Well, if they're logically inconsistent, but nothing you can say to them will convince to give up YECism in order to stop being logically inconsistent... then that particular chain of reasoning, at least, isn't universally compelling.
Or, if they have an undefeatable hypothesis, if that's literally true... doesn't that mean no argument is going to be compelling to them?
Maybe you're thinking "compelling" means what ought to be compelling, rather than what actually convinces people, when the latter meaning is how Eliezer and I are using it?
I am at a loss about the true meaning of a "universally compelling argument", but from Eliezer's original post and from references to things such as modus ponens itself, I understood it to mean something that is able to overcome even seemingly axiomatic differences between two (otherwise rational) agents. In this scenario, an agent may accept modus ponens, but if they do, they're at least required to use it consistently. For instance, a mathematician of the constructivist persuasion denies the law of the excluded middle, but if he's using it in a proof, classical mathematicians have the right to call him out.
Similarly, YEC's are not inconsistent in their daily lives, nor do they have any undefeatable hypotheses about barbeques or music education: they're being inconsistent only on a select set of topics. At this point the brick wall we're hitting is not a fundamental difference in logic or priors; we're in the domain of human psychology.
Arguments that "actually convince (all) people" are very limited and context sensitive because we're not 100% rational.