Pancritical Rationalism Can Apply to Preferences and Behavior
ETA: As stated below, criticizing beliefs is trivial in principle, either they were arrived at with an approximation to Bayes' rule starting with a reasonable prior and then updated with actual observations, or they weren't. Subsequent conversation made it clear that criticizing behavior is also trivial in principle, since someone is either taking the action that they believe will best suit their preferences, or not. Finally, criticizing preferences became trivial too -- the relevant question is "Does/will agent X behave as though they have preferences Y", and that's a belief, so go back to Bayes' rule and a reasonable prior. So the entire issue that this post was meant to solve has evaporated, in my opinion. Here's the original article, in case anyone is still interested:
Pancritical rationalism is a fundamental value in Extropianism that has only been mentioned in passing on LessWrong. I think it deserves more attention here. It's an approach to epistemology, that is, the question of "How do we know what we know?", that avoids the contradictions inherent in some of the alternative approaches.
The fundamental source document for it is William Bartley's Retreat to Commitment. He describes three approaches to epistemology, along with the dissatisfying aspects of the other two:
- Nihilism. Nothing matters, so it doesn't matter what you believe. This path is self-consistent, but it gives no guidance.
- Justificationlism. Your belief is justified because it is a consequence of other beliefs. This path is self-contradictory. Eventually you'll go in circles trying to justify the other beliefs, or you'll find beliefs you can't jutify. Justificationalism itself cannot be justified.
- Pancritical rationalism. You have taken the available criticisms for the belief into account and still feel comfortable with the belief. This path gives guidance about what to believe, although it does not uniquely determine one's beliefs. Pancritical rationalism can be criticized, so it is self-consistent in that sense.
Read on for a discussion about emotional consequences and extending this to include preferences and behaviors as well as beliefs.
A summary of Savage's foundations for probability and utility.
Edit: I think the P2c I wrote originally may have been a bit too weak; fixed that. Nevermind, rechecking, that wasn't needed.
More edits (now consolidated): Edited nontriviality note. Edited totality note. Added in the definition of numerical probability in terms of qualitative probability (though not the proof that it works). Also slight clarifications on implications of P6' and P6''' on partitions into equivalent and almost-equivalent parts, respectively.
One very late edit, June 2: Even though we don't get countable additivity, we still want a σ-algebra rather than just an algebra (this is needed for some of the proofs in the "partition conditions" section that I don't go into here). Also noted nonemptiness of gambles.
The idea that rational agents act in a manner isomorphic to expected-utility maximizers is often used here, typically justified with the Von Neumann-Morgenstern theorem. (The last of Von Neumann and Morgenstern's axioms, the independence axiom, can be grounded in a Dutch book argument.) But the Von Neumann-Morgenstern theorem assumes that the agent already measures its beliefs with (finitely additive) probabilities. This in turn is often justified with Cox's theorem (valid so long as we assume a "large world", which is implied by e.g. the existence of a fair coin). But Cox's theorem assumes as an axiom that the plausibility of a statement is taken to be a real number, a very large assumption! I have also seen this justified here with Dutch book arguments, but these all seem to assume that we are already using some notion of expected utility maximization (which is not only somewhat circular, but also a considerably stronger assumption than that plausibilities are measured with real numbers).
There is a way of grounding both (finitely additive) probability and utility simultaneously, however, as detailed by Leonard Savage in his Foundations of Statistics (1954). In this article I will state the axioms and definitions he gives, give a summary of their logical structure, and suggest a slight modification (which is equivalent mathematically but slightly more philosophically satisfying). I would also like to ask the question: To what extent can these axioms be grounded in Dutch book arguments or other more basic principles? I warn the reader that I have not worked through all the proofs myself and I suggest simply finding a copy of the book if you want more detail.
Peter Fishburn later showed in Utility Theory for Decision Making (1970) that the axioms set forth here actually imply that utility is bounded.
(Note: The versions of the axioms and definitions in the end papers are formulated slightly differently from the ones in the text of the book, and in the 1954 version have an error. I'll be using the ones from the text, though in some cases I'll reformulate them slightly.)
Values vs. parameters
I've written before about the difficulty of distinguishing values from errors, from algorithms, and from context. Now I have to add to that list: How can we distinguish our utility function from the parameters we use to apply it?
Is Kiryas Joel an Unhappy Place?
I was browsing my RSS feed, as one does, and came across a New York Times article, "A Village With the Numbers, Not the Image, of the Poorest Place", about the Satmar Hasidic Jews of Kiryas Joel (NY).
Their interest lies in their extraordinarily high birthrate & population growth, and their poverty - which are connected. From the article:
"...officially, at least, none of the nation’s 3,700 villages, towns or cities with more than 10,000 people has a higher proportion of its population living in poverty than Kiryas Joel, N.Y., a community of mostly garden apartments and town houses 50 miles northwest of New York City in suburban Orange County.
About 70 percent of the village’s 21,000 residents live in households whose income falls below the federal poverty threshold, according to the Census Bureau. Median family income ($17,929) and per capita income ($4,494) rank lower than any other comparable place in the country. Nearly half of the village’s households reported less than $15,000 in annual income. About half of the residents receive food stamps, and one-third receive Medicaid benefits and rely on federal vouchers to help pay their housing costs.
Kiryas Joel’s unlikely ranking results largely from religious and cultural factors. Ultra-Orthodox Satmar Hasidic Jews predominate in the village; many of them moved there from Williamsburg, Brooklyn, beginning in the 1970s to accommodate a population that was growing geometrically. Women marry young, remain in the village to raise their families and, according to religious strictures, do not use birth control. As a result, the median age (under 12) is the lowest in the country and the household size (nearly six) is the highest. Mothers rarely work outside the home while their children are young. Most residents, raised as Yiddish speakers, do not speak much English. And most men devote themselves to Torah and Talmud studies rather than academic training — only 39 percent of the residents are high school graduates, and less than 5 percent have a bachelor’s degree. Several hundred adults study full time at religious institutions.
...Because the community typically votes as a bloc, it wields disproportionate political influence, which enables it to meet those challenges creatively. A luxurious 60-bed postnatal maternal care center was built with $10 million in state and federal grants. Mothers can recuperate there for two weeks away from their large families. Rates, which begin at $120 a day, are not covered by Medicaid, although, Mr. Szegedin said, poorer women are typically subsidized by wealthier ones.
...The village does aggressively pursue economic opportunities. A kosher poultry slaughterhouse, which processes 40,000 chickens a day, is community owned and considered a nonprofit organization. A bakery that produces 800 pounds of matzo daily is owned by one of the village’s synagogues.
Most children attend religious schools, but transportation and textbooks are publicly financed. Several hundred handicapped students are educated by the village’s own public school district, which, because virtually all the students are poor and disabled, is eligible for sizable state and federal government grants.
... Still, poverty is largely invisible in the village. Parking lots are full, but strollers and tricycles seem to outnumber cars. A jeweler shares a storefront with a check-cashing office. To avoid stigmatizing poorer young couples or instilling guilt in parents, the chief rabbi recently decreed that diamond rings were not acceptable as engagement gifts and that one-man bands would suffice at weddings. Many residents who were approached by a reporter said they did not want to talk about their finances.
...Are as many as 7 in 10 Kiryas Joel residents really poor? “It is, in a sense, a statistical anomaly,” Professor Helmreich said. “They are clearly not wealthy, and they do have a lot of children. They spend whatever discretionary income they have on clothing, food and baby carriages. They don’t belong to country clubs or go to movies or go on trips to Aruba.
...David Jolly, the social services commissioner for Orange County, also said that while the number of people receiving benefits seemed disproportionately high, the number of caseloads — a family considered as a unit — was much less aberrant. A family of eight who reports as much as $48,156 in income is still eligible for food stamps, although the threshold for cash assistance ($37,010), which relatively few village residents receive, is lower....“You also have no drug-treatment programs, no juvenile delinquency program, we’re not clogging the court system with criminal cases, you’re not running programs for AIDS or teen pregnancy,” he [Mr. Szegedin, the village administrator] said. “I haven’t run the numbers, but I think it’s a wash.”
From Wikipedia:
The land for Kiryas Joel was purchased in 1977, and fourteen Satmar families settled there. By 2006, there were over 3,000...In 1990, there were 7,400 people in Kiryas Joel; in 2000, 13,100, nearly doubling the population. In 2005, the population had risen to 18,300, a rate of growth suggesting it will double again in the ten years between 2000 and 2010.
Robin Hanson has argued that uploaded/emulated minds will establish a new Malthusian/Darwinian equilibrium in "IF UPLOADS COME FIRST: The crack of a future dawn" - an equilibrium in comparison to which our own economy will look like a delusive dreamtime of impossibly unfit and libertine behavior. The demographic transition will not last forever. But despite our own distaste for countless lives living at near-subsistence rather than our own extreme per-capita wealth (see the Repugnant Conclusion), those many lives will be happy ones (even amidst disaster).
So. Are the inhabitants of Kiryas Joel unhappy?
Updateless anthropics
Three weeks ago, I set out to find a new theory of anthropics, to try and set decision theory on a firm footing with respect to copying, deleting copies, merging them, correlated decisions, and the presence or absence of extra observers. I've since come full circle, and realised that UDT already has a built-in anthropic theory, that resolves a lot of the problems that had been confusing me.
The theory is simple, and is essentially a rephrasing of UDT: if you are facing a decision X, and trying to figure out the utility of X=a for some action a, then calculate the full expected utility of X being a, given the objective probabilities of each world (including those in which you don't exist).
As usual, you have to consider the consequences of X=a for all agents who will make the same decision as you, whether they be exact copies, enemies, simulations or similar-minded people. However, your utility will have to do more work that is usually realised: notions such as selfishness or altruism with respect to your copies have to be encoded in the utility function, and will result in substantially different behaviour.
The rest of the post is a series of cases-studies illustrating this theory. Utility is assumed to be linear in cash for convenience.
Sleeping with the Presumptuous Philosopher
The first test case is the Sleeping Beauty problem.
In its simplest form, this involves a coin toss; if it comes out heads, one copy of Sleeping Beauty is created. If it comes out tails, two copies are created. Then the copies are asked at what odds they would be prepared to bet that the coin came out tails. You can assume either that the different copies care for each other in the manner I detailed here, or more simply that all winnings will be kept by a future merged copy (or an approved charity). Then the algorithm is simple: the two worlds have equal probability. Let X be the decision where sleeping beauty decides between a contract that pays out $1 if the coin is heads, versus one that pays out $1 if the coin is tails. If X="heads" (to use an obvious shorthand), then Sleeping Beauty will expect to make $1*0.5, as she is offered the contract once. If X="tails", then the total return of that decision is $1*2*0.5, as copies of her will be offered the contract twice, and they will all make the same decision. So Sleeping Beauty will follow the SIA 2:1 betting odds of tails over heads.
Variants such as "extreme Sleeping Beauty" (where thousands of copies are created on tails) will behave in the same way; if it feels counter-intuitive to bet at thousands-to-one odds that a fair coin landed tails, it's the fault of expected utility itself, as the rewards of being right dwarf the costs of being wrong.
But now let's turn to the Presumptuous Philosopher, a thought experiment that is often confused with Sleeping Beauty. Here we have exactly the same setup as "extreme Sleeping Beauty", but the agents (the Presumptuous philosophers) are mutually selfish. Here the return to X="heads" remains $1*0.5. However the return to X="tails" is also $1*0.5, since even if all the Presumptuous Philosophers in the "tails" universe bet on "tails", each one will still only get $1 in utility. So the Presumptuous Philosopher should only take even SSA betting 1:1 odds on the result of the coin flip.
So SB is acts like she follows the self-indication assumption, (SIA), and while the PP is following the self-sampling assumption (SSA). This remains true if we change the setup so that one agent is given a betting opportunity in the tails universe. Then the objective probability of any one agent being asked is low, so both SB and PP model the "objective probability" of the tails world, given that they have been asked to bet, as being low. However, SB gains utility if any of her copies is asked to bet and receives a profit, so the strategy "if I'm offered $1 if I guess correctly whether the coin is heads or tails, I will say tails" gets her $1*0.5 utility whether or not she is the specific one who is asked. Betting heads nets her the same result, so SB will give SIA 1:1 odds in this case.
On the other hand, the PP will only gain utility in the very specific world where he himself is asked to bet. So his gain from the updateless "if I'm offered $1 if I guess correctly whether the coin is heads or tails, I will say tails" is tiny, as he's unlikely to be asked to bet. Hence he will offer the SSA odds that make heads a much more "likely" proposition.
The Doomsday argument
Now, using SSA odds brings us back into the realm of the classical Doomsday argument. How is it that Sleeping Beauty is immune to the Doomsday argument while the Presumptuous Philosopher is not? Which one is right; is the world really about to end?
Asking about probabilities independently of decisions is meaningless here; instead, we can ask what would agents decide in particular cases. It's not surprising that agents will reach different decisions on such questions as, for instance, existential risk mitigation, if they have different preferences.
Let's do a very simplified model, where there are two agents in the world, and that one of them is approached at random to see if they would pay $Y to add a third agent. Each agent derives a (non-indexical) utility of $1 for the presence of this third agent, and nothing else happens in the world to increase or decrease anyone's utility.
First, let's assume that each agent is selfish about their indexical utility (their cash in the hand). If the decision is to not add a third agent, all will get $0 utility. If the decision is to add a third agent, then there are three agents in the world, and one them will be approached to lose $Y. Hence the expected utility is $(1-Y/3).
Now let us assume the agents are altruistic towards each other's indexical utilities. Then the expected utility of not adding a third agent is still $0. If the decision is to add a third agent, then there are three agents in the world, and one of them will be approached to lose $Y - but all will value that lose at the same amount. Hence the expected utility is $(1-Y).
So if $Y=$2, for instance, the "selfish" agents will add the third agent, and the "altruistic" ones will not. So generalising this to more complicated models describing existential risk mitigations schemes, we would expect SB-type agents to behave differently to PP-types in most models. There is no sense in asking which one is "right" and which one gives the more accurate "probability of doom"; instead ask yourself which better corresponds to your own utility model, hence what your decision will be.
Psy-Kosh's non-anthropic problem
Cousin_it has a rephrasing of Psy-Kosh's non-anthropic problem to which updateless anthropics can be illustratively applied:
You are one of a group of 10 people who care about saving African kids. You will all be put in separate rooms, then I will flip a coin. If the coin comes up heads, a random one of you will be designated as the "decider". If it comes up tails, nine of you will be designated as "deciders". Next, I will tell everyone their status, without telling the status of others. Each decider will be asked to say "yea" or "nay". If the coin came up tails and all nine deciders say "yea", I donate $1000 to VillageReach. If the coin came up heads and the sole decider says "yea", I donate only $100. If all deciders say "nay", I donate $700 regardless of the result of the coin toss. If the deciders disagree, I don't donate anything.
We'll set aside the "deciders disagree" and assume that you will all reach the same decision. The point of the problem was to illustrate a supposed preference inversion: if you coordinate ahead of time, you should all agree to say "nay", but after you have been told you're a decider, you should update in the direction of the coin coming up tails, and say "yea".
From the updateless perspective, however, there is no mystery here: the strategy "if I were a decider, I would say nay" maximises utility both for the deciders and the non-deciders.
But what if the problem were rephrased in a more selfish way, with the non-deciders not getting any utility from the setup (maybe they don't get to see the photos of the grateful saved African kids), while the deciders got the same utility as before? Then the strategy "if I were a decider, I would say yea" maximises your expect utility, because non-deciders get nothing, thus reducing the expected utility gains and losses in the world where the coin came out tails. This is similar to SIA odds, again.
That second model is similar to the way I argued for SIA with agents getting created and destroyed. That post has been superseded by this one, which pointed out the flaw in the argument which was (roughly speaking) not considering setups like Psy-Kosh's original model. So once again, whether utility is broadly shared or not affects the outcome of the decision.
The Anthropic Trilemma
Eliezer's anthropic trilemma was an interesting puzzle involving probabilities, copying, and subjective anticipation. It inspired me to come up with a way of spreading utility across multiple copies which was essentially a Sleeping Beauty copy-altruistic model. The decision process going with it is then the same as the updateless decision process outlined here. Though initially it was phrased in terms of SIA probabilities and individual impact, the isomorphism between the two can be seen here.
Revisiting the Anthropic Trilemma II: axioms and assumptions
tl;dr: I present four axioms for anthropic reasoning under copying/deleting/merging, and show that these result in a unique way of doing it: averaging non-indexical utility across copies, adding indexical utility, and having all copies being mutually altruistic.
Some time ago, Eliezer constructed an anthropic trilemma, where standard theories of anthropic reasoning seemed to come into conflict with subjective anticipation. rwallace subsequently argued that subjective anticipation was not ontologically fundamental, so we should not expect it to work out of the narrow confines of everyday experience, and Wei illustrated some of the difficulties inherent in "copy-delete-merge" types of reasoning.
Wei also made the point that UDT shifts the difficulty in anthropic reasoning away from probability and onto the utility function, and ata argued that neither the probabilities nor the utility function are fundamental, that it was the decisions that resulted from them that were important - after all, if two theories give the same behaviour in all cases, what grounds do we have for distinguishing them? I then noted that this argument could be extended to subjective anticipation: instead of talking about feelings of subjective anticipation, we could replace it by questions such as "would I give up a chocolate bar now for one of my copies to have two in these circumstances?"
I then made a post where I applied by current intuitions to the anthropic trilemma, and showed how this results in complete nonsense, despite the fact that I used a bona fide utility function. What we need are some sensible criteria for which to divide utility and probability between copies, and this post is an attempt to figure that out. The approach is similar to expected utility, where a quadruped of natural axioms forced all decision processes to have a single format.
The assumptions are:
- No intrinsic value in the number of copies
- No preference reversals
- All copies make the same personal indexical decisions
- No special status to any copy.
In the Pareto-optimised crowd, be sure to know your place
tldr: In a population playing independent two-player games, Pareto-optimal outcomes are only possible if there is an agreed universal scale of value relating each players' utility, and the players then acts to maximise the scaled sum of all utilities.
In a previous post, I showed that if you are about the play a bargaining game with someone when the game's rules are initially unknown, then the best plan is not to settle on a standard result like the Nash Bargaining Solution or the Kalai-Smorodinsky Bargaining Solution (see this post). Rather, it is to decide in advance how much your respective utilities are worth relative to each other, and then maximise their sum. Specifically, if you both have (representatives of) utility functions u1 and u2, then you must pick a θ>0 and maximise u1+θu2 (with certain extra measures to break ties). This result also applies if the players are to play a series of known independent games in sequence. But how does this extend to more than two players?
Consider the case where there are three players (named imaginatively 1, 2 and 3), and that they are going to pair off in each of the possible pairs (12, 23 and 31) and each play a game. The utility gains from each game are presumed to be independent. Then each of the pairs will choose factors θ12, θ23 and θ31, and seek to maximise u1+θ12u2, u2+θ23u3 and u3+θ31u1 respectively. Note here that I am neglecting tie-breaking and such; the formal definitions needed will be given in the proof section.
A very interesting situation comes up when θ12θ23θ31=1. In that case, there is an universal scale of "worth" for each of the utilities: it's as if the three utilities are pounds, dollars and euros. Once you know the exchange rate from pounds to dollars (θ12), and from dollars to euros (θ23), then you know the exchange rate from euros to pounds (θ31=1/(θ12θ23)). We'll call these situations transitive.
Ideally we'd want the outcomes to be Pareto-optimal for the three utilities. Then the major result is:
The outcome utilities are Pareto-optimal if and only if the θ are transitive.
If you don't know the name of the game, just tell me what I mean to you
Following: Let's split the Cake
tl;dr: Both the Nash Bargaining solution (NBS), and the Kalai-Smorodinsky Bargaining Solution (KSBS), though acceptable for one-off games that are fully known in advance, are strictly inferior for independent repeated games, or when there exists uncertainty as to which game will be played.
Let play a bargaining game, you and I. We can end up with you getting €1 and me getting €3, both of us getting €2, or you getting €3 and me getting €1. If we fail to agree, neither of us gets anything.
Oh, and did I forget to mention that another option was for you to get an aircraft carrier and me to get nothing?
Think of that shiny new aircraft carrier, loaded full with jets, pilots, weapons and sailors; think of all the things you could do with it, all the fun you could have. Places to bomb or city harbours to cruise majestically into, with the locals gaping in awe at the sleek powerful lines of your very own ship.
Then forget all about it, because Kalai-Smorodinsky says you can't have it. The Kalai-Smorodinsky bargaining solution to this game is 1/2 of a chance of getting that ship for you, and 1/2 of a chance of getting €3 for me (the Nash Bargaining Solution is better, but still not the best, as we'll see later). This might be fair; after all, unless you have some way of remunerating me for letting you have it, why should I take a dive for you?
But now imagine we are about to start the game, and we don't know the full rules yet. We know about the €'s involved, that's all fine, we know there will be an offer of an aircraft carrier; but we don't know who is going to get the offer. If we wanted to decide on our bargaining theory in advance, what would we do?
Let's split the cake, lengthwise, upwise and slantwise
This post looks at some of the current models for how two agents can split the gain in non-zero sum interactions. For instance, if you and a buddy have to split £10 between the two of you, where the money is discarded if you can't reach a deal. Or there is an opportunity to trade your Elvis memorabilia for someone's collection of North Korean propaganda posters: unless you can agree to a way of splitting the gain from trade, the trade won't happen. Or there is the stereotypical battle of the sexes: either a romantic dinner (RD) or a night of Battlestar Galactica (BG) is on offer, and both members of the couple prefer doing something together to doing it separately - but, of course, each one has their preference on which activity to do together.
Unlike standard games such as Prisoner's Dilemma, this is a coordinated game: the two agents will negotiate a joint solution, which is presumed to be binding. This allows for such solutions as 50% (BG,BG) + 50% (RD,RD), which cannot happen with each agent choosing their moves independently. The two agents will be assumed to be expected utility maximisers. What would your feelings be on a good bargaining outcome in this situation?
Enough about your feelings; let's see what the experts are saying. In general, if A and C are outcomes with utilities (a,b) and (c,d), then another possible outcome is pA + (1-p)C (where you decide first, with odds p:1-p, whether to do outcome A or C), with utility p(a,b) + (1-p)(c,d). Hence if you plot every possible expected utilities in the plane for a given game, you get a convex set.
For instance, if there is an interaction with possible pure outcomes (-1,2), (2,10), (4,9), (5,7), (6,3), then the set of actual possible utilities is the pentagon presented here:
AI indifference through utility manipulation
Indifference is a precious and rare commodity for complex systems. The most likely effect of making a change in an intricate apparatus is a whole slew of knock-on effects crowned with unintended consequences. It would be ideal if one could make a change and be sure that the effects would remain isolated - that the rest of the system would be indifferent to the change.
For instance, it might be a sensible early-AI precaution to have an extra observer somewhere, sitting with his hand upon a button, ready to detonate explosives should the AI make a visible power grab. Except, of course, the AI will become aware of this situation, and will factor it in in any plans it makes, either by increasing its deception or by grabbing control of the detonation system as a top priority. We would be a lot safer if the AI were somehow completely indifferent to the observer and the explosives. That is a complex wish that we don't really know how to phrase; let's make it simpler, and make it happen.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)