## No Anthropic Evidence

9 23 September 2012 10:33AM

Closely related to: How Many LHC Failures Is Too Many?

Consider the following thought experiment. At the start, an "original" coin is tossed, but not shown. If it was "tails", a gun is loaded, otherwise it's not. After that, you are offered a big number of rounds of decision, where in each one you can either quit the game, or toss a coin of your own. If your coin falls "tails", the gun gets triggered, and depending on how the original coin fell (whether the gun was loaded), you either get shot or not (if the gun doesn't fire, i.e. if the original coin was "heads", you are free to go). If your coin is "heads", you are all right for the round. If you quit the game, you will get shot at the exit with probability 75% independently of what was happening during the game (and of the original coin). The question is, should you keep playing or quit if you observe, say, 1000 "heads" in a row?

Intuitively, it seems as if 1000 "heads" is "anthropic evidence" for the original coin being "tails", that the long sequence of "heads" can only be explained by the fact that "tails" would have killed you. If you know that the original coin was "tails", then to keep playing is to face the certainty of eventually tossing "tails" and getting shot, which is worse than quitting, with only 75% chance of death. Thus, it seems preferable to quit.

On the other hand, each "heads" you observe doesn't distinguish the hypothetical where the original coin was "heads" from one where it was "tails". The first round can be modeled by a 4-element finite probability space consisting of options {HH, HT, TH, TT}, where HH and HT correspond to the original coin being "heads" and HH and TH to the coin-for-the-round being "heads". Observing "heads" is the event {HH, TH} that has the same 50% posterior probabilities for "heads" and "tails" of the original coin. Thus, each round that ends in "heads" doesn't change the knowledge about the original coin, even if there were 1000 rounds of this type. And since you only get shot if the original coin was "tails", you only get to 50% probability of dying as the game continues, which is better than the 75% from quitting the game.

(See also the comments by simon2 and Benja Fallenstein on the LHC post, and this thought experiment by Benja Fallenstein.)

The result of this exercise could be generalized by saying that counterfactual possibility of dying doesn't in itself influence the conclusions that can be drawn from observations that happened within the hypotheticals where one didn't die. Only if the possibility of dying influences the probability of observations that did take place, would it be possible to detect that possibility. For example, if in the above exercise, a loaded gun would cause the coin to become biased in a known way, only then would it be possible to detect the state of the gun (1000 "heads" would imply either that the gun is likely loaded, or that it's likely not).

## A Mathematical Explanation of Why Charity Donations Shouldn't Be Diversified

2 20 September 2012 11:03AM

There is a standard argument against diversification of donations, popularly explained by Steven Landsburg in the essay Giving Your All. This post is an attempt to communicate a narrow special case of that argument in a form that resists misinterpretation better, for the benefit of people with a bit of mathematical training. Understanding this special case in detail might be useful as a stepping stone to the understanding of the more general argument. (If you already agree that one should donate only to the charity that provides the greatest marginal value, and that it makes sense to talk about the comparison of marginal value of different charities, there is probably no point in reading this post.)1

Suppose you are considering two charities, one that accomplishes the saving of antelopes, and the other the saving of babies. Depending on how much funding these charities secure, they are able to save respectively A antelopes and B babies, so the outcome can be described by a point (A,B) that specifies both pieces of data.

Let's say you have a complete transitive preference over possible values of (A,B), that is you can make a comparison between any two points, and if you prefer (A1,B1) over (A2,B2) and also (A2,B2) over (A3,B3), then you prefer (A1,B1) over (A3,B3). Let's further suppose that this preference can be represented by a sufficiently smooth real-valued function U(A,B), such that U(A1,B1)>U(A2,B2) precisely when you prefer (A1,B1) to (A2,B2). U doesn't need to be a utility function in the standard sense, since we won't be considering uncertainty, it only needs to represent ordering over individual points, so let's call it "preference level".

Let A(Ma) be the dependence of the number of antelopes saved by the Antelopes charity if it attains the level of funding Ma, and B(Mb) the corresponding function for the Babies charity. (For simplicity, let's work with U, A, B, Ma and Mb as variables that depend on each other in specified ways.)

You are considering a decision to donate, and at the moment the charities have already secured Ma and Mb amounts of money, sufficient to save A antelopes and B babies, which would result in your preference level U. You have a relatively small amount of money dM that you want to distribute between these charities. dM is such that it's small compared to Ma and Mb, and if donated to either charity, it will result in changes of A and B that are small compared to A and B, and in a change of U that is small compared to U.

## Consequentialist Formal Systems

12 08 May 2012 08:38PM

This post describes a different (less agent-centric) way of looking at UDT-like decision theories that resolves some aspects of the long-standing technical problem of spurious moral arguments. It's only a half-baked idea, so there are currently a lot of loose ends.

### On spurious arguments

UDT agents are usually considered as having a disinterested inference system (a "mathematical intuition module" in UDT and first order proof search in ADT) that plays a purely epistemic role, and preference-dependent decision rules that look for statements that characterize possible actions in terms of the utility value that the agent optimizes.

The statements (supplied by the inference system) used by agent's decision rules (to pick one of the many variants) have the form [(A=A1 => U=U1) and U<=U1]. Here, A is a symbol defined to be the actual action chosen by the agent, U is a similar symbol defined to be the actual value of world's utility, and A1 and U1 are some particular possible action and possible utility value. If the agent finds that this statement is provable, it performs action A1, thereby making A1 the actual action.

The use of this statement introduces the problem of spurious arguments: if A1 is a bad action, but for some reason it's still chosen, then [(A=A1 => U=U1) and U<=U1] is true, since utility value U will in that case be in fact U1, which justifies (by the decision rule) choosing the bad action A1. In usual cases, this problem results in the difficulty of proving that an agent will behave in the expected manner (i.e. won't choose a bad action), which is resolved by adding various compilicated clauses to its decision algorithm. But even worse, it turns out that if an agent is hapless enough to take seriously a (formally correct) proof of such a statement supplied by an enemy (or if its own inference system is malicious), it can be persuaded to take any action at all, irrespective of agent's own preferences.

## Predictability of Decisions and the Diagonal Method

14 09 March 2012 11:53PM

This post collects a few situations where agents might want to make their decisions either predictable or unpredictable to certain methods of prediction, and considers a method of making a decision unpredictable by "diagonalizing" a hypothetical prediction of that decision. The last section takes a stab at applying this tool to the ASP problem.

### The diagonal step

To start off, consider the halting problem, interpreted in terms of agents and predictors. Suppose that there is a Universal Predictor, an algorithm that is able to decide whether any given program halts or runs forever. Then, it's easy for a program (agent) to evade its gaze by including a diagonal step in its decision procedure: the agent checks (by simulation) if Universal Predictor comes to some decision about the agent, and if it does, the agent acts contrary to the Predictor's decision. This makes the prediction wrong, and Universal Predictors impossible.

The same trick could be performed against something that could exist, normal non-universal Predictors, which allows an agent to make itself immune to their predictions. In particular, ability of other agents to infer decisions of our agent may be thought of as prediction that an agent might want to hinder. This is possible so long as the predictors in question can be simulated in enough detail, that is it's known what they do (what they know) and our agent has enough computational resources to anticipate their hypothetical conclusions. (If an agent does perform the diagonal step with respect to other agents, the predictions of other agents don't necessarily become wrong, as they could be formally correct by construction, but they cease to be possible, which could mean that the predictions won't be made at all.)

## Shifting Load to Explicit Reasoning

13 07 May 2011 06:00PM

What's damaging about moralizing that we wish to avoid, what useful purpose does moralizing usually serve, and what allows to avoid the damage while retaining the usefulness? It engages psychological adaptations that promote conflict (by playing on social status), which are unpleasant to experience and can lead to undesirable consequences in the long run (such as feeling systematically uncomfortable interacting with a person, and so not being able to live or work or be friends with them). It serves the purpose of imprinting your values, which you feel to be right, on the people you interact with. Consequentialist elucidation of reasons for approving or disapproving of a given policy (virtue) is an effective persuasion technique if your values are actually right (for the people you try to confer them on), and it doesn't engage the same parts of your brain that make moralizing undesirable.

What happens here is transfer of responsibility for important tasks from the imperfect machinery that historically used to manage them (with systematic problems in any given context that humans but not evolution can notice), to explicit reasoning.

## Karma Bubble Fix (Greasemonkey script)

23 07 May 2011 01:14PM

I wrote a greasemonkey script that fixes the problem with Karma bubble that prevents you from seeing the last digits of Karma for big Karma values. You can install it from userscripts site. You'll need a greasemonkey extension (install for Firefox, install for Google Chrome).

## Counterfactual Calculation and Observational Knowledge

11 31 January 2011 04:28PM

Consider the following thought experiment ("Counterfactual Calculation"):

You are taking a test, which includes a question: "Is Q an even number?", where Q is a complicated formula that resolves to some natural number. There is no a priori reason for you to expect that Q is more likely even or odd, and the formula is too complicated to compute the number (or its parity) on your own. Fortunately, you have an old calculator, which you can use to type in the formula and observe the parity of the result on display. This calculator is not very reliable, and is only correct 99% of the time, furthermore its errors are stochastic (or even involve quantum randomness), so for any given problem statement, it's probably correct but has a chance of making an error. You type in the formula and observe the result (it's "even"). You're now 99% sure that the answer is "even", so naturally you write that down on the test sheet.

Then, unsurprisingly, Omega (a trustworthy all-powerful device) appears and presents you with the following decision. Consider the counterfactual where the calculator displayed "odd" instead of "even", after you've just typed in the (same) formula Q, on the same occasion (i.e. all possible worlds that fit this description). The counterfactual diverges only in the calculator showing a different result (and what follows). You are to determine what is to be written (by Omega, at your command) as the final answer to the same question on the test sheet in that counterfactual (the actions of your counterfactual self who takes the test in the counterfactual are ignored).

## Note on Terminology: "Rationality", not "Rationalism"

28 14 January 2011 09:21PM

I feel that the term "rationalism", as opposed to "rationality", or "study of rationality", has undesirable connotations. My concerns are presented well by Eric Drexler in the article For Darwin’s sake, reject "Darwin-ism" (and other pernicious terms):

To call something an “ism” suggests that it is a matter ideology or faith, like Trotskyism or creationism. In the evolution wars, the term “evolutionism” is used to insinuate that the modern understanding of the principles, mechanisms, and pervasive consequences of evolution is no more than the dogma of a sect within science. It creates a false equivalence between a mountain of knowledge and the emptiness called “creationism”.

So, my suggestion is to use "rationality" consistently and to avoid using "rationalism". Via similarity to "scientist" and "physicist", "rationalist" doesn't seem to have the same problem. Discuss.

(Typical usage on Less Wrong is this way already, 3720 Google results for "rationality" and 1210 for "rationalist", against 251 for "rationalism". I've made this post as a reference for when someone uses "rationalism".)

## Unpacking the Concept of "Blackmail"

25 10 December 2010 12:53AM

There is a reasonable game-theoretic heuristic, "don't respond to blackmail" or "don't negotiate with terrorists". But what is actually meant by the word "blackmail" here? Does it have a place as a fundamental decision-theoretic concept, or is it merely an affective category, a class of situations activating a certain psychological adaptation that expresses disapproval of certain decisions and on the net protects (benefits) you, like those adaptation that respond to "being rude" or "offense"?

We, as humans, have a concept of "default", "do nothing strategy". The other plans can be compared to the moral value of the default. Doing harm would be something worse than the default, doing good something better than the default.

Blackmail is then a situation where by decision of another agent ("blackmailer"), you are presented with two options, both of which are harmful to you (worse than the default), and one of which is better for the blackmailer. The alternative (if the blackmailer decides not to blackmail) is the default.

Compare this with the same scenario, but with the "default" action of the other agent being worse for you than the given options. This would be called normal bargaining, as in trade, where both parties benefit from exchange of goods, but to a different extent depending on which cost is set.

Why is the "default" special here?

## Agents of No Moral Value: Constrained Cognition?

6 21 November 2010 04:41PM

Thought experiments involving multiple agents usually postulate that the agents have no moral value, so that the explicitly specified payoff from the choice of actions can be considered in isolation, as both the sole reason and evaluation criterion for agents' decisions. But is that really possible to require from an opposing agent to have no moral value, without constraining what it's allowed to think about?

If agent B is not a person, how do we know it can't decide to become a person for the sole reason of gaming the problem, manipulating agent A (since B doesn't care about personhood, so it costs B nothing, but A does)? If it's stipulated as part of the problem statement, it seems that B's cognition is restricted, and the most rational course of action is prohibited from being considered for no within-thought-experiment reason accessible to B.

It's not enough to require that the other agent is inhuman in the sense of not being a person and not holding human values, as our agent must also not care about the other agent. And once both agents don't care about each other's cognition, the requirement for them not being persons or valuable becomes extraneous.

Thus, instead of requiring that the other agent is not a person, the correct way of setting up the problem is to require that our agent is indifferent to whether the other agent is a person (and conversely).

(It's not a very substantive observation I would've posted with less polish in an open thread if not for the discussion section.)

View more: Next