Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

## Zombies Redacted

33 02 July 2016 08:16PM

I looked at my old post Zombies! Zombies? and it seemed to have some extraneous content.  This is a redacted and slightly rewritten version.

continue reading »

## Cooperating with agents with different ideas of fairness, while resisting exploitation

38 16 September 2013 08:27AM

There's an idea from the latest MIRI workshop which I haven't seen in informal theories of negotiation, and I want to know if this is a known idea.

(Old well-known ideas:)

Suppose a standard Prisoner's Dilemma matrix where (3, 3) is the payoff for mutual cooperation, (2, 2) is the payoff for mutual defection, and (0, 5) is the payoff if you cooperate and they defect.

Suppose we're going to play a PD iterated for four rounds.  We have common knowledge of each other's source code so we can apply modal cooperation or similar means of reaching a binding 'agreement' without other enforcement methods.

If we mutually defect on every round, our net mutual payoff is (8, 8).  This is a 'Nash equilibrium' because neither agent can unilaterally change its action and thereby do better, if the opponents' actions stay fixed.  If we mutually cooperate on every round, the result is (12, 12) and this result is on the 'Pareto boundary' because neither agent can do better unless the other agent does worse.  It would seem a desirable principle for rational agents (with common knowledge of each other's source code / common knowledge of rationality) to find an outcome on the Pareto boundary, since otherwise they are leaving value on the table.

But (12, 12) isn't the only possible result on the Pareto boundary.  Suppose that running the opponent's source code, you find that they're willing to cooperate on three rounds and defect on one round, if you cooperate on every round, for a payoff of (9, 14) slanted their way.  If they use their knowledge of your code to predict you refusing to accept that bargain, they will defect on every round for the mutual payoff of (8, 8).

I would consider it obvious that a rational agent should refuse this unfair bargain.  Otherwise agents with knowledge of your source code will offer you only this bargain, instead of the (12, 12) of mutual cooperation on every round; they will exploit your willingness to accept a result on the Pareto boundary in which almost all of the gains from trade go to them.

(Newer ideas:)

Generalizing:  Once you have a notion of a 'fair' result - in this case (12, 12) - then an agent which accepts any outcome in which it does worse than the fair result, while the opponent does better, is 'exploitable' relative to this fair bargain.  Like the Nash equilibrium, the only way you should do worse than 'fair' is if the opponent also does worse.

So we wrote down on the whiteboard an attempted definition of unexploitability in cooperative games as follows:

"Suppose we have a [magical] definition N of a fair outcome.  A rational agent should only do worse than N if its opponent does worse than N, or else [if bargaining fails] should only do worse than the Nash equilibrium if its opponent does worse than the Nash equilibrium."  (Note that this definition precludes giving in to a threat of blackmail.)

(Key possible-innovation:)

It then occurred to me that this definition opened the possibility for other, intermediate bargains between the 'fair' solution on the Pareto boundary, and the Nash equilibrium.

Suppose the other agent has a slightly different definition of fairness and they think that what you consider to be a payoff of (12, 12) favors you too much; they think that you're the one making an unfair demand.  They'll refuse (12, 12) with the same feeling of indignation that you would apply to (9, 14).

Well, if you give in to an arrangement with an expected payoff of, say, (11, 13) as you evaluate payoffs, then you're giving other agents an incentive to skew their definitions of fairness.

But it does not create poor incentives (AFAICT) to accept instead a bargain with an expected payoff of, say, (10, 11) which the other agent thinks is 'fair'.  Though they're sad that you refused the truly fair outcome of (as you count utilons) 11, 13 and that you couldn't reach the Pareto boundary together, still, this is better than the Nash equilibrium of (8, 8).  And though you think the bargain is unfair, you are not creating incentives to exploit you.  By insisting on this definition of fairness, the other agent has done worse for themselves than other (12, 12).  The other agent probably thinks that (10, 11) is 'unfair' slanted your way, but they likewise accept that this does not create bad incentives, since you did worse than the 'fair' outcome of (11, 13).

There could be many acceptable negotiating equilibria between what you think is the 'fair' point on the Pareto boundary, and the Nash equilibrium.  So long as each step down in what you think is 'fairness' reduces the total payoff to the other agent, even if it reduces your own payoff even more.  This resists exploitation and avoids creating an incentive for claiming that you have a different definition of fairness, while still holding open the possibility of some degree of cooperation with agents who honestly disagree with you about what's fair and are trying to avoid exploitation themselves.

This translates into an informal principle of negotiations:  Be willing to accept unfair bargains, but only if (you make it clear) both sides are doing worse than what you consider to be a fair bargain.

I haven't seen this advocated before even as an informal principle of negotiations.  Is it in the literature anywhere?  Someone suggested Schelling might have said it, but didn't provide a chapter number.

ADDED:

Clarification 1:  Yes, utilities are invariant up to a positive affine transformation so there's no canonical way to split utilities evenly.  Hence the part about "Assume a magical solution N which gives us the fair division."  If we knew the exact properties of how to implement this magical solution, taking it at first for magical, that might give us some idea of what N should be, too.

Clarification 2:  The way this might work is that you pick a series of increasingly unfair-to-you, increasingly worse-for-the-other-player outcomes whose first element is what you deem the fair Pareto outcome:  (100, 100), (98, 99), (96, 98).  Perhaps stop well short of Nash if the skew becomes too extreme.  Drop to Nash as the last resort.  The other agent does the same, starting with their own ideal of fairness on the Pareto boundary.  Unless one of you has a completely skewed idea of fairness, you should be able to meet somewhere in the middle.  Both of you will do worse against a fixed opponent's strategy by unilaterally adopting more self-favoring ideas of fairness.  Both of you will do worse in expectation against potentially exploitive opponents by unilaterally adopting looser ideas of fairness.  This gives everyone an incentive to obey the Galactic Schelling Point and be fair about it.  You should not be picking the descending sequence in an agent-dependent way that incentivizes, at cost to you, skewed claims about fairness.

Clarification 3:  You must take into account the other agent's costs and other opportunities when ensuring that the net outcome, in terms of final utilities, is worse for them than the reward offered for 'fair' cooperation.  Offering them the chance to buy half as many paperclips at a lower, less fair price, does no good if they can go next door, get the same offer again, and buy the same number of paperclips at a lower total price.

## The Ultimate Newcomb's Problem

18 10 September 2013 02:03AM

You see two boxes and you can either take both boxes, or take only box B. Box A is transparent and contains \$1000. Box B contains a visible number, say 1033.  The Bank of Omega, which operates by very clear and transparent mechanisms, will pay you \$1M if this number is prime, and \$0 if it is composite. Omega is known to select prime numbers for Box B whenever Omega predicts that you will take only Box B; and conversely select composite numbers if Omega predicts that you will take both boxes. Omega has previously predicted correctly in 99.9% of cases.

Separately, the Numerical Lottery has randomly selected 1033 and is displaying this number on a screen nearby. The Lottery Bank, likewise operating by a clear known mechanism, will pay you \$2 million if it has selected a composite number, and otherwise pay you \$0.  (This event will take place regardless of whether you take only B or both boxes, and both the Bank of Omega and the Lottery Bank will carry out their payment processes - you don't have to choose one game or the other.)

You previously played the game with Omega and the Numerical Lottery a few thousand times before you ran across this case where Omega's number and the Lottery number were the same, so this event is not suspicious.

Omega also knew the Lottery number before you saw it, and while making its prediction, and Omega likewise predicts correctly in 99.9% of the cases where the Lottery number happens to match Omega's number.  (Omega's number is chosen independently of the lottery number, however.)

You have two minutes to make a decision, you don't have a calculator, and if you try to factor the number you will be run over by the trolley from the Ultimate Trolley Problem.

Do you take only box B, or both boxes?

## Q for GiveWell: What is GiveDirectly's mechanism of action?

16 31 July 2013 08:02PM

I first wrote up the following post, then happened to run into Holden Karnofsky in person and asked him a much-shortened form of the question verbally.  My attempt to recount Holden's verbal reply is also given further below.  I was moderately impressed by Holden's response because I had not thought of it when listing out possible replies, but I don't understand yet why Holden's response should be true.  Since GiveWell has recently posted about objections to GiveDirectly and replies, I decided to go ahead and post this now.

A question for GiveWell:

Your current #2 top-rated charity is GiveDirectly, which gives one-time gifts of \$1000 over 9 months, directly to poor recipients in Kenya via M-PESA.

Givewell tries for high standards of evidence of efficacy and cost-effectiveness.  As I understand it, you don't just want the charity to be arguably cost effective, you want a very high probability that the charity is cost-effective.

The main evidence I've seen cited for direct giving is that the recipients who received the \$1000 are then substantially better off 9 months later compared to people who aren't.

While I can imagine arguments that could repair the obvious objection to this reasoning, I haven't seen yet how the resulting evidence about cost-effectiveness could rise again to the epistemic standards one would expect of Givewell's #2 evidence-based charity.

The obvious objection is as follows:  Suppose the Kenyan government simply printed new shillings and handed out \$1000 of such shillings to the same recipients targeted by GiveDirectly.  Although the recipients would be better off than non-recipients, this might not reflect any improvement in net utility in Kenya because no new resources were created by printing the money.

There are of course obvious replies to this obvious objection:

(1)  Because the shillings handed out by GiveDirectly are purchased on the foreign currency exchange market using U. S. dollars, and would otherwise have been spent in Kenya in other ways, we should not expect any inflation of the shilling, and should expect an increase in Kenyan consumption of foreign goods corresponding to the increased price of shillings implied by GiveDirectly adding their marginal demand to the auction and thereby raising the marginal price of all shillings sold.  The primary mechanism of action by which GiveDirectly benefits Kenya is by raising the price of shillings in the foreign exchange market and making more hard currency available to sellers of shillings.  So far as I can tell, this argument ought to generalize:  Any argument that the Kenyan government could not accomplish most of the same good by printing shillings will mean that the primary mechanism of GiveWell's effectiveness must be the U.S. dollars being exchanged for the shillings on the foreign currency market.  This in turn means that GiveDirectly could accomplish most of its good by buying the same shillings on the foreign currency market and burning them.

(Or to sharpen the total point of this article:  The sum of the good accomplished by GiveDirectly should equal:

• The good accomplished by the Kenyan government printing shillings and distributing them to the same recipients;
• plus the good accomplished by GiveDirectly then purchasing shillings on the foreign exchange market using US dollars, and burning them.

Indeed, since these mechanisms of action seem mostly independent, we ought to be able to state a percentage of good accomplished which is allegedly attributed to each, summing to 1.  E.g. maybe 80% of the good would be achieved by printing shillings and distributing them to the same recipients, and 20% would be achieved by purchasing shillings on the foreign exchange market and burning them.  But then we have mostly the same questions as before about how to generate wealth by printing shillings.)

(2)  Inequality in Kenya is such that redistributing the supply of shillings toward the very poor increases utility in Kenya. Thus the Kenyan government could accomplish as much good as GiveDirectly by printing an equivalent number of shillings and giving them to the same recipients.  This would create inflation that is a loss to other Kenyans, some of them also very poor, but so much of the shilling supply is held by the rich that the net results are favorable.  Printing shillings can create happiness because it shifts resources from making speedboats for the rich to making corrugated iron roofs for the poor.

(It would be nice if the Kenyan government just printed shillings for GiveDirectly to use, but this the Kenyan government will not realistically do.  Effective altruists must live in the real world, and in the real world GiveDirectly will only accomplish its goals with the aid of effective altruists.  One cannot live in the should-universe where Kenya's government is taking up the burden.  Effective altruists should reason as if the Kenya government consists of plastic dolls who cannot be the locus of responsibility instead of them - that's heroic epistemology 101.  Maybe there will eventually be returns on lobbying for Minimum Guaranteed Income in Kenya if the programs work, but that's for tomorrow, not right now.)

(3)  Like the European Union, Kenya is not printing enough shillings under standard economic theory.  (I have no idea if this is plausibly true for Kenya in particular.)  If the government printed shillings and gave them to the same recipients, this would create real wealth in Kenya because the economy was operating below capacity and velocity of trade would pick up.  The shillings purchased by GiveDirectly would otherwise have stayed in bank accounts rather than going to other Kenyans.  Note that this contradicts the argument step in (1) where we said that the purchased shillings would otherwise have been spent elsewhere, so you should have questioned one argument step or the other.

(4)  Village moneylenders and bosses can successfully extract most surplus generated within their villages by raising rents or demanding bribes.  The only way that individuals can escape the grasp of moneylenders and rentiers is with a one-time gift that was not expected and which the moneylenders and bosses could not arrange to capture.  The government could accomplish as much good as GiveDirectly by printing the same number of shillings and giving them to the same people in an unpredictable pattern.  This would create some inflation but village moneylenders or bosses would ease off on people from whom they couldn't extract as much value, whereas the one-time gift recipients can purchase capital goods that will make them permanently better off in ways that don't allow the new value to be extracted by moneylenders or bosses.

If I recall correctly, GiveDirectly uses the example of a family using some of the gift money to purchase a corrugated iron roof.  From my perspective the obvious objection is that they could just be purchasing a corrugated iron roof that would've gone to someone else and raising the prices of roofs.  (1) says that Kenya has more foreign exchange on hands and can import, not one more corrugated iron roof, but a variety of other foreign goods; (2) says that the resources used in the corrugated iron roof would otherwise have been used to make a speedboat; (3) says that a new trade takes place in which somebody makes a corrugated iron roof that wouldn't have been manufactured otherwise; and (4) says that the village moneylenders usually adjust their interest rates so as to prevent anyone from saving up enough money to buy a corrugated iron roof.

The trouble is that all of these mechanisms of action seem much harder to measure and be sure of, than the measurable outcomes for gift recipients vs. non-recipients.

To reiterate, the sum of the good accomplished by GiveDirectly should equal the good accomplished by the Kenyan government printing shillings and distributing them to the same recipients, plus the good accomplished by GiveDirectly purchasing shillings on the foreign exchange market using US dollars and then burning them.  It seems to me to be difficult to arrive at a state of strong evidence about either of the two terms in this sum, with respect to any mechanism of action I've thought of so far.

With respect to the second term in this sum:  GiveDirectly buying shillings on the foreign exchange market and burning them might create wealth, but it's hard to see how you would measure this over the relevant amounts, and no such evidence was cited in the recommendation of GiveDirectly as the #2 charity.

With respect to the first term in this sum:  Under the Bayesian definition of evidence, strong evidence is evidence we are unlikely to see when the theory is false.  Even in the absence of any mechanism whereby printing nominal shillings creates happiness or wealth, we would still expect to find that the wealth and happiness of gift recipients exceeded the wealth of non-recipients.  So measuring that the gift recipients are wealthier and happier is not strong or even medium evidence that printing nominal shillings creates wealth, unless I'm missing something here.  Our posterior that printing shillings and giving them to certain people would create net wealth in any given quantity, should roughly equal our prior, after updating on the stated experimental evidence.

When I posed a shortened form of this question to Holden Karnofsky, he replied (roughly, I am trying to rephrase from memory):

It seems to me that this is a perverse decomposition of the benefit accomplished.  There's no inflation in the shilling because you're buying them, and since this is true, decomposing the benefit into an operation that does inflationary damage as a side effect, and then another operation that makes up for the inflation, is perverse.  It's like criticizing the Against Malaria Foundation based on a hypothetical which involves the mosquito nets being made from the flesh of babies and then adding another effect which saves the lives of other babies.  Since this is a perverse sum involving a strange extra side effect, it's okay that we can't get good estimates involving either of the terms in it.

Please keep in mind that this is Holden's off-the-cuff, non-written in-person response as rephrased by Eliezer Yudkowsky from imperfect memory.

With that said, I've thought about (what I think was) Holden's answer and I feel like I'm still missing something.  I agree that if U.S. dollars were being sent directly to Kenyan recipients and used only to purchase foreign goods, so that foreign goods were being directly sent from the U.S. to Kenyan recipients, then improvement in measured outcome for recipients compared to non-recipients would be an appropriate metric, and that the decomposition would be perverse.  But if the received money, in the form of Kenyan shillings, is being used primarily to purchase Kenyan goods, and causing those goods to be shipped to one villager rather than another while also possibly increasing velocity of trade, remedying inequality, and enabling completely different actors to buy some amount of foreign goods, then I honestly don't understand why this scenario should have the same causal mechanisms as the scenario where foreign goods are being shipped in from outside the country.  And then I honestly don't understand why measured improvements for one Kenyan over another should be a good proxy for aggregate welfare change to the country.

I may be missing something that an economist would find obvious or I may have misunderstood Holden's reply.  But to me, my sum seems like an obvious causal decomposition of the effects in Kenya, neither of whose terms can be estimated well.  I don't understand why I should expect the uncertainty in these two estimates to cancel out when they are added; I don't understand what background causal model yields this conclusion.

To be clear, I personally would guess that the U.S. would be net better off, if the Federal Reserve directly sent everyone in the U.S. with income under \$20K/year a one-time \$6,000 check with the money phasing out at a 10% rate up to \$80K/year.  This is because, in order of importance:

• I buy the analogous market monetarist argument (3) that the U.S. is printing too little money.
• I buy the analogous argument (2) about inequality.
• (However, I also somewhat suspect that some analogous form of (4) is going on with poor people somehow systematically having all but a certain amount of value extracted from them, which is in general how a modern country can have only 2% instead of 95% of the population being farmers, and yet there are still people living hand-to-mouth.  I would worry that a predictable, universal one-time gift of \$6K would not defeat this phenomenon, and that the gift money will just be extracted again somehow.  In the case of Minimum Guaranteed Income, I would worry that the labor share of income will drop proportionally to small amounts of MGI as wages are just bid down by people who can live on less.  Or something.  This would be a much longer discussion and the ideas are much less simple than the above two notions, probably also less important.  I'm just mentioning it again because of my long-term puzzlement with the question "Why are there still poor people after agricultural productivity rose by a factor of 100?")

What I wouldn't say is that my belief in the above is as strong as my belief in, say, the intelligence explosion.  I'd guess that the printing operation would do more good than harm, but it's not what I would call a strong evidence-based conclusion.  If we're going to be okay with that standard of argument generally, then the top charity under that standard of reasoning, generally and evenhandedly applied, ought to work out to some charity that does science and technology research.  (X-risk minimization might seem substantially 'weirder' than that, but the best science-funding charities should be only equally weird.)  And I wouldn't measure the excess of happiness of gift-recipients compared to non-recipients in a pilot program, and call this a good estimate of the net good if a Minimum Guaranteed Income were universally adopted.

So to reiterate, my question to Givewell is not "Why do you think GiveDirectly might maybe end up doing some good anyway?" but "Does GiveDirectly rise to the standards required for your #2 evidence-based charity?"

## The Robots, AI, and Unemployment Anti-FAQ

47 25 July 2013 06:46PM

Q.  Are the current high levels of unemployment being caused by advances in Artificial Intelligence automating away human jobs?

A.  Conventional economic theory says this shouldn't happen.  Suppose it costs 2 units of labor to produce a hot dog and 1 unit of labor to produce a bun, and that 30 units of labor are producing 10 hot dogs in 10 buns.  If automation makes it possible to produce a hot dog using 1 unit of labor instead, conventional economics says that some people should shift from making hot dogs to buns, and the new equilibrium should be 15 hot dogs in 15 buns.  On standard economic theory, improved productivity - including from automating away some jobs - should produce increased standards of living, not long-term unemployment.

Q.  Sounds like a lovely theory.  As the proverb goes, the tragedy of science is a beautiful theory slain by an ugly fact.  Experiment trumps theory and in reality, unemployment is rising.

A.  Sure.  Except that the happy equilibrium with 15 hot dogs in buns, is exactly what happened over the last four centuries where we went from 95% of the population being farmers to 2% of the population being farmers (in agriculturally self-sufficient developed countries).  We don't live in a world where 93% of the people are unemployed because 93% of the jobs went away.  The first thought of automation removing a job, and thus the economy having one fewer job, has not been the way the world has worked since the Industrial Revolution.  The parable of the hot dog in the bun is how economies really, actually worked in real life for centuries.  Automation followed by re-employment went on for literally centuries in exactly the way that the standard lovely economic model said it should.  The idea that there's a limited amount of work which is destroyed by automation is known in economics as the "lump of labour fallacy".

Q.  But now people aren't being reemployed.  The jobs that went away in the Great Recession aren't coming back, even as the stock market and corporate profits rise again.

A.  Yes.  And that's a new problem.  We didn't get that when the Model T automobile mechanized the entire horse-and-buggy industry out of existence.  The difficulty with supposing that automation is producing unemployment is that automation isn't new, so how can you use it to explain this new phenomenon of increasing long-term unemployment?

continue reading »

## Public Service Announcement Collection

37 27 June 2013 05:20PM

P/S/A:  There are single sentences which can create life-changing amounts of difference.

• P/S/A:  If you're not sure whether or not you've ever had an orgasm, it means you haven't had one, a condition known as primary anorgasmia which is 90% treatable by cognitive-behavioral therapy.
• P/S/A:  The people telling you to expect above-trend inflation when the Federal Reserve started printing money a few years back, disagreed with the market forecasts, disagreed with standard economics, turned out to be actually wrong in reality, and were wrong for reasonably fundamental reasons so don't buy gold when they tell you to.
• P/S/A:  There are many many more submissive/masochistic men in the world than there are dominant/sadistic women, so if you are a woman who feels a strong temptation to command men and inflict pain on them, and you want a large harem of men serving your every need, it will suffice to state this fact anywhere on the Internet and you will have fifty applications by the next morning.
• P/S/A:  Most of the personal-finance-advice industry is parasitic and/or self-deluded, and it's generally agreed on by economic theory and experimental measurement that an index fund will deliver the best returns you can get without huge amounts of effort.
• P/S/A:  If you are smart and underemployed, you can very quickly check to see if you are a natural computer programmer by pulling up a page of Python source code and seeing whether it looks like it makes natural sense, and if this is the case you can teach yourself to program very quickly and get a much higher-paying job even without formal credentials.

## How to Write Deep Characters

45 16 June 2013 02:10AM

Triggered by:  Future Story Status

A helpful key to understanding the art and technique of character in storytelling, is to consider the folk-psychological notion from Internal Family Systems of people being composed of different 'parts' embodying different drives or goals. A shallow character is a character with only one 'part'.

A good rule of thumb is that to create a 3D character, that person must contain at least two different 2D characters who come into conflict. Contrary to the first thought that crosses your mind, three-dimensional good people are constructed by combining at least two different good people with two different ideals, not by combining a good person and a bad person. Deep sympathetic characters have two sympathetic parts in conflict, not a sympathetic part in conflict with an unsympathetic part. Deep smart characters are created by combining at least two different people who are geniuses.

E.g. HPMOR!Hermione contains both a sensible young girl who tries to keep herself and her friends out of trouble, and a starry-eyed heroine, neither of whom are stupid.  (Actually, since HPMOR!Hermione is also the one character who I created as close to her canon self as I could manage - she didn't *need* upgrading - I should credit this one to J. K. Rowling.)  (Admittedly, I didn't actually follow that rule deliberately to construct Methods, I figured it out afterward when everyone was praising the characterization and I was like, "Wait, people are calling me a character author now?  What the hell did I just do right?")

If instead you try to construct a genius character by having an emotionally impoverished 'genius' part in conflict with a warm nongenius part... ugh.  Cliche.  Don't write the first thing that pops into your head from watching Star Trek.  This is not how real geniuses work.  HPMOR!Harry, the primary protagonist, contains so many different people he has to give them names, and none of them are stupid, nor does any one of them contain his emotions set aside in a neat jar; they contain different mixtures of emotions and ideals.  Combining two cliche characters won't be enough to build a deep character.  Combining two different realistic people in that character's situation works much better.  Two is not a limit, it's a minimum, but everyone involved still has to be recognizably the same person when combined.

Closely related is Orson Scott Card's observation that a conflict between Good and Evil can be interesting, but it's often not half as interesting as a conflict between Good and Good. All standard rules about cliches still apply, and a conflict between good and good which you've previously read about and to which the reader can already guess your correct approved answer, cannot carry the story. A good rule of thumb is that if you have a conflict between good and good which you feel unsure about yourself, or which you can remember feeling unsure about, or you're not sure where exactly to draw the line, you can build a story around it. I consider the most successful moral conflict in HPMOR to be the argument between Harry and Dumbledore in Ch. 77 because it almost perfectly divided the readers on who was in the right *and* about whose side the author was taking.  (*This* was done by deliberately following Orson Scott Card's rule, not by accident.  Likewise _Three Worlds Collide_, though it was only afterward that I realized how much of the praise for that story, which I hadn't dreamed would be considered literarily meritful by serious SF writers, stemmed from the sheer rarity of stories built around genuinely open moral arguments.  Orson Scott Card:  "Propaganda only works when the reader feels like you've been absolutely fair to other side", and writing about a moral dilemma where *you're* still trying to figure out the answer is an excellent way to achieve this.)

Character shallowness can be a symptom of moral shallowness if it reflects a conflict between Good and Evil drawn along lines too clear to bring two good parts of a good character into conflict. This is why it would've been hard for Lord of the Rings to contain conflicted characters without becoming an entirely different story, though as Robin Hanson has just remarked, LotR is a Mileu story, not a Character story.  Conflicts between evil and evil are even shallower than conflicts between good and evil, which is why what passes for 'maturity' in some literature is so uninteresting. There's nothing to choose there, no decision to await with bated breath, just an author showing off their disillusionment as a claim of sophistication.

## After critical event W happens, they still won't believe you

38 13 June 2013 09:59PM

In general and across all instances I can think of so far, I do not agree with the part of your futurological forecast in which you reason, "After event W happens, everyone will see the truth of proposition X, leading them to endorse Y and agree with me about policy decision Z."

Example 1:  "After a 2-year-old mouse is rejuvenated to allow 3 years of additional life, society will realize that human rejuvenation is possible, turn against deathism as the prospect of lifespan / healthspan extension starts to seem real, and demand a huge Manhattan Project to get it done."  (EDIT:  This has not happened, and the hypothetical is mouse healthspan extension, not anything cryonic.  It's being cited because this is Aubrey de Grey's reasoning behind the Methuselah Mouse Prize.)

Alternative projection:  Some media brouhaha.  Lots of bioethicists acting concerned.  Discussion dies off after a week.  Nobody thinks about it afterward.  The rest of society does not reason the same way Aubrey de Grey does.

Example 2:  "As AI gets more sophisticated, everyone will realize that real AI is on the way and then they'll start taking Friendly AI development seriously."

Alternative projection:  As AI gets more sophisticated, the rest of society can't see any difference between the latest breakthrough reported in a press release and that business earlier with Watson beating Ken Jennings or Deep Blue beating Kasparov; it seems like the same sort of press release to them.  The same people who were talking about robot overlords earlier continue to talk about robot overlords.  The same people who were talking about human irreproducibility continue to talk about human specialness.  Concern is expressed over technological unemployment the same as today or Keynes in 1930, and this is used to fuel someone's previous ideological commitment to a basic income guarantee, inequality reduction, or whatever.  The same tiny segment of unusually consequentialist people are concerned about Friendly AI as before.  If anyone in the science community does start thinking that superintelligent AI is on the way, they exhibit the same distribution of performance as modern scientists who think it's on the way, e.g. Hugo de Garis, Ben Goertzel, etc.

Consider the situation in macroeconomics.  When the Federal Reserve dropped interest rates to nearly zero and started printing money via quantitative easing, we had some people loudly predicting hyperinflation just because the monetary base had, you know, gone up by a factor of 10 or whatever it was.  Which is kind of understandable.  But still, a lot of mainstream economists (such as the Fed) thought we would not get hyperinflation, the implied spread on inflation-protected Treasuries and numerous other indicators showed that the free market thought we were due for below-trend inflation, and then in actual reality we got below-trend inflation.  It's one thing to disagree with economists, another thing to disagree with implied market forecasts (why aren't you betting, if you really believe?) but you can still do it sometimes; but when conventional economics, market forecasts, and reality all agree on something, it's time to shut up and ask the economists how they knew.  I had some credence in inflationary worries before that experience, but not afterward...  So what about the rest of the world?  In the heavily scientific community you live in, or if you read econblogs, you will find that a number of people actually have started to worry less about inflation and more about sub-trend nominal GDP growth.  You will also find that right now these econblogs are having worry-fits about the Fed prematurely exiting QE and choking off the recovery because the elderly senior people with power have updated more slowly than the econblogs.  And in larger society, if you look at what happens when Congresscritters question Bernanke, you will find that they are all terribly, terribly concerned about inflation.  Still.  The same as before.  Some econblogs are very harsh on Bernanke because the Fed did not print enough money, but when I look at the kind of pressure Bernanke was getting from Congress, he starts to look to me like something of a hero just for following conventional macroeconomics as much as he did.

That issue is a hell of a lot more clear-cut than the medical science for human rejuvenation, which in turn is far more clear-cut ethically and policy-wise than issues in AI.

After event W happens, a few more relatively young scientists will see the truth of proposition X, and the larger society won't be able to tell a damn difference.  This won't change the situation very much, there are probably already some scientists who endorse X, since X is probably pretty predictable even today if you're unbiased.  The scientists who see the truth of X won't all rush to endorse Y, any more than current scientists who take X seriously all rush to endorse Y.  As for people in power lining up behind your preferred policy option Z, forget it, they're old and set in their ways and Z is relatively novel without a large existing constituency favoring it.  Expect W to be used as argument fodder to support conventional policy options that already have political force behind them, and for Z to not even be on the table.

## Do Earths with slower economic growth have a better chance at FAI?

30 12 June 2013 07:54PM

I was raised as a good and proper child of the Enlightenment who grew up reading The Incredible Bread Machine and A Step Farther Out, taking for granted that economic growth was a huge in-practice component of human utility (plausibly the majority component if you asked yourself what was the major difference between the 21st century and the Middle Ages) and that the "Small is Beautiful" / "Sustainable Growth" crowds were living in impossible dreamworlds that rejected quantitative thinking in favor of protesting against nuclear power plants.

And so far as I know, such a view would still be an excellent first-order approximation if we were going to carry on into the future by steady technological progress:  Economic growth = good.

But suppose my main-line projection is correct and the "probability of an OK outcome" / "astronomical benefit" scenario essentially comes down to a race between Friendly AI and unFriendly AI.  So far as I can tell, the most likely reason we wouldn't get Friendly AI is the total serial research depth required to develop and implement a strong-enough theory of stable self-improvement with a possible side order of failing to solve the goal transfer problem.  Relative to UFAI, FAI work seems like it would be mathier and more insight-based, where UFAI can more easily cobble together lots of pieces.  This means that UFAI parallelizes better than FAI.  UFAI also probably benefits from brute-force computing power more than FAI.  Both of these imply, so far as I can tell, that slower economic growth is good news for FAI; it lengthens the deadline to UFAI and gives us more time to get the job done.  I have sometimes thought half-jokingly and half-anthropically that I ought to try to find investment scenarios based on a continued Great Stagnation and an indefinite Great Recession where the whole developed world slowly goes the way of Spain, because these scenarios would account for a majority of surviving Everett branches.

Roughly, it seems to me like higher economic growth speeds up time and this is not a good thing.  I wish I had more time, not less, in which to work on FAI; I would prefer worlds in which this research can proceed at a relatively less frenzied pace and still succeed, worlds in which the default timelines to UFAI terminate in 2055 instead of 2035.

I have various cute ideas for things which could improve a country's economic growth.  The chance of these things eventuating seems small, the chance that they eventuate because I write about them seems tiny, and they would be good mainly for entertainment, links from econblogs, and possibly marginally impressing some people.  I was thinking about collecting them into a post called "The Nice Things We Can't Have" based on my prediction that various forces will block, e.g., the all-robotic all-electric car grid which could be relatively trivial to build using present-day technology - that we are too far into the Great Stagnation and the bureaucratic maturity of developed countries to get nice things anymore.  However I have a certain inhibition against trying things that would make everyone worse off if they actually succeeded, even if the probability of success is tiny.  And it's not completely impossible that we'll see some actual experiments with small nation-states in the next few decades, that some of the people doing those experiments will have read Less Wrong, or that successful experiments will spread (if the US ever legalizes robotic cars or tries a city with an all-robotic fleet, it'll be because China or Dubai or New Zealand tried it first).  Other EAs (effective altruists) care much more strongly about economic growth directly and are trying to increase it directly.  (An extremely understandable position which would typically be taken by good and virtuous people).

Throwing out remote, contrived scenarios where something accomplishes the opposite of its intended effect is cheap and meaningless (vide "But what if MIRI accomplishes the opposite of its purpose due to blah") but in this case I feel impelled to ask because my mainline visualization has the Great Stagnation being good news.  I certainly wish that economic growth would align with FAI because then my virtues would align and my optimal policies have fewer downsides, but I am also aware that wishing does not make something more likely (or less likely) in reality.

To head off some obvious types of bad reasoning in advance:  Yes, higher economic growth frees up resources for effective altruism and thereby increases resources going to FAI, but it also increases resources going to the AI field generally which is mostly pushing UFAI, and the problem arguendo is that UFAI parallelizes more easily.

Similarly, a planet with generally higher economic growth might develop intelligence amplification (IA) technology earlier.  But this general advancement of science will also accelerate UFAI, so you might just be decreasing the amount of FAI research that gets done before IA and decreasing the amount of time available after IA before UFAI.  Similarly to the more mundane idea that increased economic growth will produce more geniuses some of whom can work on FAI; there'd also be more geniuses working on UFAI, and UFAI probably parallelizes better and requires less serial depth of research.  If you concentrate on some single good effect on blah and neglect the corresponding speeding-up of UFAI timelines, you will obviously be able to generate spurious arguments for economic growth having a positive effect on the balance.

So I pose the question:  "Is slower economic growth good news?" or "Do you think Everett branches with 4% or 1% RGDP growth have a better chance of getting FAI before UFAI"?  So far as I can tell, my current mainline guesses imply, "Everett branches with slower economic growth contain more serial depth of cognitive causality and have more effective time left on the clock before they end due to UFAI, which favors FAI research over UFAI research".

This seems like a good parameter to have a grasp on for any number of reasons, and I can't recall it previously being debated in the x-risk / EA community.

EDIT:  To be clear, the idea is not that trying to deliberately slow world economic growth would be a maximally effective use of EA resources and better than current top targets; this seems likely to have very small marginal effects, and many such courses are risky.  The question is whether a good and virtuous person ought to avoid, or alternatively seize, any opportunities which come their way to help out on world economic growth.

EDIT 2:  Carl Shulman's opinion can be found on the Facebook discussion here.

## Tiling Agents for Self-Modifying AI (OPFAI #2)

55 06 June 2013 08:24PM

An early draft of publication #2 in the Open Problems in Friendly AI series is now available:  Tiling Agents for Self-Modifying AI, and the Lobian Obstacle.  ~20,000 words, aimed at mathematicians or the highly mathematically literate.  The research reported on was conducted by Yudkowsky and Herreshoff, substantially refined at the November 2012 MIRI Workshop with Mihaly Barasz and Paul Christiano, and refined further at the April 2013 MIRI Workshop.

Abstract:

We model self-modication in AI by introducing 'tiling' agents whose decision systems will approve the construction of highly similar agents, creating a repeating pattern (including similarity of the offspring's goals).  Constructing a formalism in the most straightforward way produces a Godelian difficulty, the Lobian obstacle.  By technical methods we demonstrate the possibility of avoiding this obstacle, but the underlying puzzles of rational coherence are thus only partially addressed.  We extend the formalism to partially unknown deterministic environments, and show a very crude extension to probabilistic environments and expected utility; but the problem of finding a fundamental decision criterion for self-modifying probabilistic agents remains open.

Commenting here is the preferred venue for discussion of the paper.  This is an early draft and has not been reviewed, so it may contain mathematical errors, and reporting of these will be much appreciated.

The overall agenda of the paper is introduce the conceptual notion of a self-reproducing decision pattern which includes reproduction of the goal or utility function, by exposing a particular possible problem with a tiling logical decision pattern and coming up with some partial technical solutions.  This then makes it conceptually much clearer to point out the even deeper problems with "We can't yet describe a probabilistic way to do this because of non-monotonicity" and "We don't have a good bounded way to do this because maximization is impossible, satisficing is too weak and Schmidhuber's swapping criterion is underspecified."  The paper uses first-order logic (FOL) because FOL has a lot of useful standard machinery for reflection which we can then invoke; in real life, FOL is of course a poor representational fit to most real-world environments outside a human-constructed computer chip with thermodynamically expensive crisp variable states.

As further background, the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each self-modification, and that self-modification will (at least in initial stages) take place within the highly deterministic environment of a computer chip.  This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals).  Mathematical proofs have the property that they are as strong as their axioms and have no significant conditionally independent per-step failure probability if their axioms are semantically true, which suggests that something like mathematical reasoning may be appropriate for certain particular types of self-modification during some developmental stages.

Thus the content of the paper is very far off from how a realistic AI would work, but conversely, if you can't even answer the kinds of simple problems posed within the paper (both those we partially solve and those we only pose) then you must be very far off from being able to build a stable self-modifying AI.  Being able to say how to build a theoretical device that would play perfect chess given infinite computing power, is very far off from the ability to build Deep Blue.  However, if you can't even say how to play perfect chess given infinite computing power, you are confused about the rules of the chess or the structure of chess-playing computation in a way that would make it entirely hopeless for you to figure out how to build a bounded chess-player.  Thus "In real life we're always bounded" is no excuse for not being able to solve the much simpler unbounded form of the problem, and being able to describe the infinite chess-player would be substantial and useful conceptual progress compared to not being able to do that.  We can't be absolutely certain that an analogous situation holds between solving the challenges posed in the paper, and realistic self-modifying AIs with stable goal systems, but every line of investigation has to start somewhere.

Parts of the paper will be easier to understand if you've read Highly Advanced Epistemology 101 For Beginners including the parts on correspondence theories of truth (relevant to section 6) and model-theoretic semantics of logic (relevant to 3, 4, and 6), and there are footnotes intended to make the paper somewhat more accessible than usual, but the paper is still essentially aimed at mathematically sophisticated readers.

View more: Next