Morality Isn't Logical

19 Wei_Dai 26 December 2012 11:08PM

What do I mean by "morality isn't logical"? I mean in the same sense that mathematics is logical but literary criticism isn't: the "reasoning" we use to think about morality doesn't resemble logical reasoning. All systems of logic, that I'm aware of, have a concept of proof and a method of verifying with high degree of certainty whether an argument constitutes a proof. As long as the logic is consistent (and we have good reason to think that many of them are), once we verify a proof we can accept its conclusion without worrying that there may be another proof that makes the opposite conclusion. With morality though, we have no such method, and people all the time make moral arguments that can be reversed or called into question by other moral arguments. (Edit: For an example of this, see these posts.)

Without being a system of logic, moral philosophical reasoning likely (or at least plausibly) doesn't have any of the nice properties that a well-constructed system of logic would have, for example, consistency, validity, soundness, or even the more basic property that considering arguments in a different order, or in a different mood, won't cause a person to accept an entirely different set of conclusions. For all we know, somebody trying to reason about a moral concept like "fairness" may just be taking a random walk as they move from one conclusion to another based on moral arguments they encounter or think up.

In a recent post, Eliezer said "morality is logic", by which he seems to mean... well, I'm still not exactly sure what, but one interpretation is that a person's cognition about morality can be described as an algorithm, and that algorithm can be studied using logical reasoning. (Which of course is true, but in that sense both math and literary criticism as well as every other subject of human study would be logic.) In any case, I don't think Eliezer is explicitly claiming that an algorithm-for-thinking-about-morality constitutes an algorithm-for-doing-logic, but I worry that the characterization of "morality is logic" may cause some connotations of "logic" to be inappropriately sneaked into "morality". For example Eliezer seems to (at least at one point) assume that considering moral arguments in a different order won't cause a human to accept an entirely different set of conclusions, and maybe this is why. To fight this potential sneaking of connotations, I suggest that when you see the phrase "morality is logic", remind yourself that morality isn't logical.

 

Beware Selective Nihilism

39 Wei_Dai 20 December 2012 06:53PM

In a previous post, I argued that nihilism is often short changed around here. However I'm far from certain that it is correct, and in the mean time I think we should be careful not to discard our values one at a time by engaging in "selective nihilism" when faced with an ontological crisis, without even realizing that's what's happening. Karl recently reminded me of the post Timeless Identity by Eliezer Yudkowsky, which I noticed seems to be an instance of this.

As I mentioned in the previous post, our values seem to be defined in terms of a world model where people exist as ontologically primitive entities ruled heuristically by (mostly intuitive understandings of) physics and psychology. In this kind of decision system, both identity-as-physical-continuity and identity-as-psychological-continuity make perfect sense as possible values, and it seems humans do "natively" have both values. A typical human being is both reluctant to step into a teleporter that works by destructive scanning, and unwilling to let their physical structure be continuously modified into a psychologically very different being. 

If faced with the knowledge that physical continuity doesn't exist in the real world at the level of fundamental physics, one might conclude that it's crazy to continue to value it, and this is what Eliezer's post argued. But if we apply this reasoning in a non-selective fashion, wouldn't we also conclude that we should stop valuing things like "pain" and "happiness" which also do not seem to exist at the level of fundamental physics?

In our current environment, there is widespread agreement among humans as to which macroscopic objects at time t+1 are physical continuations of which macroscopic objects existing at time t. We may not fully understand what exactly it is we're doing when judging such physical continuity, and the agreement tends to break down when we start talking about more exotic situations, and if/when we do fully understand our criteria for judging physical continuity it's unlikely to have a simple definition in terms of fundamental physics, but all of this is true for "pain" and "happiness" as well.

I suggest we keep all of our (potential/apparent) values intact until we have a better handle on how we're supposed to deal with ontological crises in general. If we convince ourselves that we should discard some value, and that turns out to be wrong, the error may be unrecoverable once we've lived with it long enough.

Ontological Crisis in Humans

41 Wei_Dai 18 December 2012 05:32PM

Imagine a robot that was designed to find and collect spare change around its owner's house. It had a world model where macroscopic everyday objects are ontologically primitive and ruled by high-school-like physics and (for humans and their pets) rudimentary psychology and animal behavior. Its goals were expressed as a utility function over this world model, which was sufficient for its designed purpose. All went well until one day, a prankster decided to "upgrade" the robot's world model to be based on modern particle physics. This unfortunately caused the robot's utility function to instantly throw a domain error exception (since its inputs are no longer the expected list of macroscopic objects and associated properties like shape and color), thus crashing the controlling AI.

According to Peter de Blanc, who used the phrase "ontological crisis" to describe this kind of problem,

Human beings also confront ontological crises. We should find out what cognitive algorithms humans use to solve the same problems described in this paper. If we wish to build agents that maximize human values, this may be aided by knowing how humans re-interpret their values in new ontologies.

I recently realized that a couple of problems that I've been thinking over (the nature of selfishness and the nature of pain/pleasure/suffering/happiness) can be considered instances of ontological crises in humans (although I'm not so sure we necessarily have the cognitive algorithms to solve them). I started thinking in this direction after writing this comment:

This formulation or variant of TDT requires that before a decision problem is handed to it, the world is divided into the agent itself (X), other agents (Y), and "dumb matter" (G). I think this is misguided, since the world doesn't really divide cleanly into these 3 parts.

What struck me is that even though the world doesn't divide cleanly into these 3 parts, our models of the world actually do. In the world models that we humans use on a day to day basis, and over which our utility functions seem to be defined (to the extent that we can be said to have utility functions at all), we do take the Self, Other People, and various Dumb Matter to be ontologically primitive entities. Our world models, like the coin collecting robot's, consist of these macroscopic objects ruled by a hodgepodge of heuristics and prediction algorithms, rather than microscopic particles governed by a coherent set of laws of physics.

For example, the amount of pain someone is experiencing doesn't seem to exist in the real world as an XML tag attached to some "person entity", but that's pretty much how our models of the world work, and perhaps more importantly, that's what our utility functions expect their inputs to look like (as opposed to, say, a list of particles and their positions and velocities). Similarly, a human can be selfish just by treating the object labeled "SELF" in its world model differently from other objects, whereas an AI with a world model consisting of microscopic particles would need to somehow inherit or learn a detailed description of itself in order to be selfish.

To fully confront the ontological crisis that we face, we would have to upgrade our world model to be based on actual physics, and simultaneously translate our utility functions so that their domain is the set of possible states of the new model. We currently have little idea how to accomplish this, and instead what we do in practice is, as far as I can tell, keep our ontologies intact and utility functions unchanged, but just add some new heuristics that in certain limited circumstances call out to new physics formulas to better update/extrapolate our models. This is actually rather clever, because it lets us make use of updated understandings of physics without ever having to, for instance, decide exactly what patterns of particle movements constitute pain or pleasure, or what patterns constitute oneself. Nevertheless, this approach hardly seems capable of being extended to work in a future where many people may have nontraditional mind architectures, or have a zillion copies of themselves running on all kinds of strange substrates, or be merged into amorphous group minds with no clear boundaries between individuals.

By the way, I think nihilism often gets short changed around here. Given that we do not actually have at hand a solution to ontological crises in general or to the specific crisis that we face, what's wrong with saying that the solution set may just be null? Given that evolution doesn't constitute a particularly benevolent and farsighted designer, perhaps we may not be able to do much better than that poor spare-change collecting robot? If Eliezer is worried that actual AIs facing actual ontological crises could do worse than just crash, should we be very sanguine that for humans everything must "add up to moral normality"?

To expand a bit more on this possibility, many people have an aversion against moral arbitrariness, so we need at a minimum a utility translation scheme that's principled enough to pass that filter. But our existing world models are a hodgepodge put together by evolution so there may not be any such sufficiently principled scheme, which (if other approaches to solving moral philosophy also don't pan out) would leave us with legitimate feelings of "existential angst" and nihilism. One could perhaps still argue that any current such feelings are premature, but maybe some people have stronger intuitions than others that these problems are unsolvable?

Do we have any examples of humans successfully navigating an ontological crisis? The LessWrong Wiki mentions loss of faith in God:

In the human context, a clear example of an ontological crisis is a believer’s loss of faith in God. Their motivations and goals, coming from a very specific view of life suddenly become obsolete and maybe even nonsense in the face of this new configuration. The person will then experience a deep crisis and go through the psychological task of reconstructing its set of preferences according the new world view.

But I don't think loss of faith in God actually constitutes an ontological crisis, or if it does, certainly not a very severe one. An ontology consisting of Gods, Self, Other People, and Dumb Matter just isn't very different from one consisting of Self, Other People, and Dumb Matter (the latter could just be considered a special case of the former with quantity of Gods being 0), especially when you compare either ontology to one made of microscopic particles or even less familiar entities.

But to end on a more positive note, realizing that seemingly unrelated problems are actually instances of a more general problem gives some hope that by "going meta" we can find a solution to all of these problems at once. Maybe we can solve many ethical problems simultaneously by discovering some generic algorithm that can be used by an agent to transition from any ontology to another? 

(Note that I'm not saying this is the right way to understand one's real preferences/morality, but just drawing attention to it as a possible alternative to other more "object level" or "purely philosophical" approaches. See also this previous discussion, which I recalled after writing most of the above.)

Reasons for someone to "ignore" you

23 Wei_Dai 08 October 2012 07:50PM

I often feel guilty for ignoring other people's comments or questions, and frustrated when other people seem to be ignoring me. If I can't indicate to someone exactly why I'm not answering, or can't receive such an indication myself, I can at least help my future selves and others obtain a better probability distribution over such reasons. To that end, I'm listing all of the reasons I can think of for someone to not respond to a comment/question, to save the effort of regenerating these hypotheses from scratch each time and prevent the possibility of failing to consider the actual reason. Note that these are not meant to be mutually exclusive.

  • They haven't checked their inbox yet.
  • They got too many responses in their inbox and didn't pay enough attention to yours.
  • They are temporarily too busy to respond.
  • They were planning to respond but then forgot to.
  • They don't understand the comment yet and are still trying.
  • They've stopped trying to understand the comment and don't expect further discussion to resolve the confusion.
  • They think it's obvious that they agree.
  • They think it's obvious that they disagree.
  • They disagree and are planning to write up the reasons later.
  • They don't know whether to agree or disagree and are still thinking about it.
  • They think all useful information has been exchanged and it's not worth another comment just to indicate final agreement/disagreement.
  • They think you just want to express your opinion and don't care what they think.
  • They are tired of the discussion and don't want to think about it any more.
  • The comment shows a level of intelligence and/or rationality and/or knowledge that makes it not worthwhile for them to engage you.
  • They already addressed your question or point before but you missed it or didn't get it.
  • They don't know how to answer your question and are too embarrassed to admit it.
  • They interpreted your question as being addressed to the public rather than to them personally.
  • They think most people already know the answer (or don't care to know) and don't want to bother answering just for you or a few other people.
  • They think you are mainly signaling/status-seeking instead of truth-seeking.
  • They are mainly signaling/status-seeking (perhaps subconsciously) and think not responding is optimal for that.
  • They can't see how to respond honestly without causing or prolonging a personal enmity.
  • They consider you a troll or potential troll and don't want to reinforce you with attention.
  • They have an emotional aversion against talking to you.
  • They have some other instrumental reason for not responding.
  • Suggested by shminux: You're on a list of LWers they never reply to, because a number of prior conversations with you were invariably futile for one or more of the reasons described above, and their estimate of any future conversation going any better is very low.
  • Suggested by wedrifid: Technical difficulties. They first read your comment via a mobile device, composed (mentally) a reply that would take too long to type on that medium and two days later they either forget to type it out via keyboard, no longer care about the subject or think that a late reply would be inappropriate given developments in the conversation.
  • Suggested by wedrifid: Previous comments by them in the thread had been downvoted or otherwise opposed and they choose to accede to the implied wishes of the community rather than try to fight it or defy it.
  • Suggested by cata: Not answering promptly caused them to feel guilty, which caused more delay and more guilt, so they never respond to hide their shame.
  • Suggested by wedrifid: They think your comment missed the point of the context and so doesn't make sense but it is not important enough to embarrass you by explaining or challenging.
  • Suggested by Morendil: Your post/comment didn't contain a single question mark, so there's no call to answer.
  • Suggesetd by sixes_and_sevens: They think the discussion is going off topic.
  • Suggested by Airedale: They're purposefully trying to disengage early rather than getting into a fight about who has the "last word" on the subject, e.g., on some level they may want to respond or even to "win" the exchange, but they're purposefully telling themselves to step away from the computer.

If I missed any reasons (that happen often enough to be worth including in this list), please give them in the comments. See also this related comment.

"Hide comments in downvoted threads" is now active

18 Wei_Dai 05 October 2012 07:23AM

I just found out that a new website feature was implemented 2 days ago. If a comment is voted to -4 or below, it and all replies and downstream comments from it will be hidden from Recent Comments, and further replies in that subthread will incur 5 karma points penalty. The hiding, but not karma penalty, applies retroactively to comments in that subthread posted before the -4 vote.

This seems to be worth a discussion post since most people are probably still voting things to below -3 without knowing the new consequences of doing so.

Under-acknowledged Value Differences

47 Wei_Dai 12 September 2012 10:02PM

I've been reading a lot of the recent LW discussions on politics and gender, and noticed that people rarely bring up or explicitly acknowledge that different people affected by some political or gender issue have different values/preferences, and therefore solving the problem involves a strong element of bargaining and is not just a matter of straightforward optimization. Instead, we tend to talk as if there is some way to solve the problem that's best for everyone, and that rational discussion will bring us closer to finding that one best solution.

For example, when discussing gender-related problems, one solution may be generally better for men, while another solution may be generally better for women. If people are selfish, then they will each prefer the solution that's individually best for them, even if they can agree on all of the facts. (It's unclear whether people should be selfish, but it seems best to assume that most are, for practical purposes.)

Unfortunately, in bargaining situations, epistemic rationality is not necessarily instrumentally rational. In general, convincing others of a falsehood can be useful for moving the negotiated outcome closer to one's own preferences and away from others', and this may be done more easily if one honestly believes the falsehood. (One of these falsehoods may be, for example, "My preferred solution is best for everyone.") Given these (subconsciously or evolutionarily processed) incentives, it seems reasonable to think that the more solving a problem resembles bargaining, the more likely we are to be epistemicaly irrationality when thinking and talking about it.

If we do not acknowledge and keep in mind that we are in a bargaining situation, then we are less likely to detect such failures of epistemic rationality, especially in ourselves. We're also less likely to see that there's an element of Prisoner's Dilemma in participating in such debates: your effort to convince people to adopt your preferred solution is costly (in time and in your and LW's overall sanity level) but may achieve little because someone else is making an opposite argument. Both of you may be better off if neither engaged in the debate.

Kelly Criteria and Two Envelopes

6 Wei_Dai 16 August 2012 09:57PM

(This post is motivated by recent discussions here of the two titular topics.)

Suppose someone hands you two envelopes and gives you some information that allows you to conclude either:

  1. The expected ratio of amount of money in the red envelope to the amount in the blue is >1, or
  2. With probability close to 1 (say 0.999) the amount of money in the red envelope is greater than the amount in the blue.
In either case, is the conclusion sufficient to imply that one should choose the red envelope over the blue? Obviously not, right? (Well, at least #2 should be obvious, and #1 was recently pointed out by VincentYu.) In any case I will also give some simple counter-examples here:
  1. Suppose red envelope has $5 and blue envelope has even chance of $1 and $100. E(R/B) = .5(5/1)+.5(5/100) = 2.525 but one would want to choose the blue envelope assuming utility linear in money.
  2. Red envelope has $100, blue envelope has $99 with probability 0.999 and $1 million with probability 0.001. 

Notice that it's not sufficient to establish both conclusions at once either (my second example above actually satisfies both).

A common argument for the Kelly Criteria being "optimal" (see page 10 of this review paper recommended by Robin Hanson) is to mathematically establish conclusions 1 and 2, with Kelly Criteria in place of the red envelope and "any other strategy" in place of the blue envelope. However it turns out that "optimal" is not supposed to be normative, as the paper later explains:

In essence the critique is that you should maximize your utility function rather than to base your investment decision on some other criterion. This is certainly correct, but fails to appreciate that Kelly's results are not necessarily normative but rather descriptive.

So the upshot here is that unless your utility function is actually log in money and not, say, linear (or even superlinear) in the amount of resources under your control, you may not want to adopt the Kelly Criteria even when the other commonly mentioned assumptions are satisfied.

Cynical explanations of FAI critics (including myself)

21 Wei_Dai 13 August 2012 09:19PM

Related Posts: A cynical explanation for why rationalists worry about FAIA belief propagation graph

Lately I've been pondering the fact that while there are many critics of SIAI and its plan to form a team to build FAI, few of us seem to agree on what SIAI or we should do instead. Here are some of the alternative suggestions offered so far:

  • work on computer security
  • work to improve laws and institutions
  • work on mind uploading
  • work on intelligence amplification
  • work on non-autonomous AI (e.g., Oracle AI, "Tool AI", automated formal reasoning systems, etc.)
  • work on academically "mainstream" AGI approaches or trust that those researchers know what they are doing
  • stop worrying about the Singularity and work on more mundane goals
Given that ideal reasoners are not supposed to disagree, it seems likely that most if not all of these alternative suggestions can also be explained by their proponents being less than rational. Looking at myself and my suggestion to work on IA or uploading, I've noticed that I have a tendency to be initially over-optimistic about some technology and then become gradually more pessimistic as I learn more details about it, so that I end up being more optimistic about technologies that I'm less familiar with than the ones that I've studied in detail. (Another example of this is me being initially enamoured with Cypherpunk ideas and then giving up on them after inventing some key pieces of the necessary technology and seeing in more detail how it would actually have to work.)
I'll skip giving explanations for other critics to avoid offending them, but it shouldn't be too hard for the reader to come up with their own explanations. It seems that I can't trust any of the FAI critics, including myself, nor do I think Eliezer and company are much better at reasoning or intuiting their way to a correct conclusion about how we should face the apparent threat and opportunity that is the Singularity. What useful implications can I draw from this? I don't know, but it seems like it can't hurt to pose the question to LessWrong. 

 

Work on Security Instead of Friendliness?

29 Wei_Dai 21 July 2012 06:28PM

So I submit the only useful questions we can ask are not about AGI, "goals", and other such anthropomorphic, infeasible, irrelevant, and/or hopelessly vague ideas. We can only usefully ask computer security questions. For example some researchers I know believe we can achieve virus-safe computing. If we can achieve security against malware as strong as we can achieve for symmetric key cryptography, then it doesn't matter how smart the software is or what goals it has: if one-way functions exist no computational entity, classical or quantum, can crack symmetric key crypto based on said functions. And if NP-hard public key crypto exists, similarly for public key crypto. These and other security issues, and in particular the security of property rights, are the only real issues here and the rest is BS.

-- Nick Szabo

Nick Szabo and I have very similar backrounds and interests. We both majored in computer science at the University of Washington. We're both very interested in economics and security. We came up with similar ideas about digital money. So why don't I advocate working on security problems while ignoring AGI, goals and Friendliness?

In fact, I once did think that working on security was the best way to push the future towards a positive Singularity and away from a negative one. I started working on my Crypto++ Library shortly after reading Vernor Vinge's A Fire Upon the Deep. I believe it was the first general purpose open source cryptography library, and it's still one of the most popular. (Studying cryptography lead me to become involved in the Cypherpunks community with its emphasis on privacy and freedom from government intrusion, but a major reason for me to become interested in cryptography in the first place was a desire to help increase security against future entities similar to the Blight described in Vinge's novel.)

I've since changed my mind, for two reasons.

1. The economics of security seems very unfavorable to the defense, in every field except cryptography.

Studying cryptography gave me hope that improving security could make a difference. But in every other security field, both physical and virtual, little progress is apparent, certainly not enough that humans might hope to defend their property rights against smarter intelligences. Achieving "security against malware as strong as we can achieve for symmetric key cryptography" seems quite hopeless in particular. Nick links above to a 2004 technical report titled "Polaris: Virus Safe Computing for Windows XP", which is strange considering that it's now 2012 and malware have little trouble with the latest operating systems and their defenses. Also striking to me has been the fact that even dedicated security software like OpenSSH and OpenSSL have had design and coding flaws that introduced security holes to the systems that run them.

One way to think about Friendly AI is that it's an offensive approach to the problem of security (i.e., take over the world), instead of a defensive one.

2. Solving the problem of security at a sufficient level of generality requires understanding goals, and is essentially equivalent to solving Friendliness.

What does it mean to have "secure property rights", anyway? If I build an impregnable fortress around me, but an Unfriendly AI causes me to give up my goals in favor of its own by crafting a philosophical argument that is extremely convincing to me but wrong (or more generally, subverts my motivational system in some way), have I retained my "property rights"? What if it does the same to one of my robot servants, so that it subtly starts serving the UFAI's interests while thinking it's still serving mine? How does one define whether a human or an AI has been "subverted" or is "secure", without reference to its "goals"? It became apparent to me that fully solving security is not very different from solving Friendliness.

I would be very interested to know what Nick (and others taking a similar position) thinks after reading the above, or if they've already had similar thoughts but still came to their current conclusions.

Open Problems Related to Solomonoff Induction

27 Wei_Dai 06 June 2012 12:26AM

Solomonoff Induction seems clearly "on the right track", but there are a number of problems with it that I've been puzzling over for several years and have not made much progress on. I think I've talked about all of them in various comments in the past, but never collected them in one place.

Apparent Unformalizability of “Actual” Induction

Argument via Tarski’s Indefinability of Truth

Informally, the theorem states that arithmetical truth cannot be defined in arithmetic. The theorem applies more generally to any sufficiently strong formal system, showing that truth in the standard model of the system cannot be defined within the system.

Suppose we define a generalized version of Solomonoff Induction based on some second-order logic. The truth predicate for this logic can’t be defined within the logic and therefore a device that can decide the truth value of arbitrary statements in this logical has no finite description within this logic. If an alien claimed to have such a device, this generalized Solomonoff induction would assign the hypothesis that they're telling the truth zero probability, whereas we would assign it some small but positive probability.

Argument via Berry’s Paradox

Consider an arbitrary probability distribution P, and the smallest integer (or the lexicographically least object) x such that P(x) < 1/3^^^3 (in Knuth's up-arrow notation). Since x has a short description, a universal distribution shouldn't assign it such a low probability, but P does, so P can't be a universal distribution.

Is Solomonoff Induction “good enough”?

Given the above, is Solomonoff Induction nevertheless “good enough” for practical purposes? In other words, would an AI programmed to approximate Solomonoff Induction do as well as any other possible agent we might build, even though it wouldn’t have what we’d consider correct beliefs?

Is complexity objective?

Solomonoff Induction is supposed to be a formalization of Occam’s Razor, and it’s confusing that the formalization has a free parameter in the form of a universal Turing machine that is used to define the notion of complexity. What’s the significance of the fact that we can’t seem to define a parameterless concept of complexity? That complexity is subjective?

Is Solomonoff an ideal or an approximation?

Is it the case that the universal prior (or some suitable generalization of it that somehow overcomes the above "unformalizability problems") is the “true” prior and that Solomonoff Induction represents idealized reasoning, or does Solomonoff just “work well enough” (in some sense) at approximating any rational agent?

How can we apply Solomonoff when our inputs are not symbol strings?

Solomonoff Induction is defined over symbol strings (for example bit strings) but our perceptions are made of “qualia” instead of symbols. How is Solomonoff Induction supposed to work for us?

What does Solomonoff Induction actually say?

What does Solomonoff Induction actually say about, for example, whether we live in a creatorless universe that runs on physics? Or the Simulation Argument?

View more: Prev | Next