Siren worlds and the perils of over-optimised search
tl;dr An unconstrained search through possible future worlds is a dangerous way of choosing positive outcomes. Constrained, imperfect or under-optimised searches work better.
Some suggested methods for designing AI goals, or controlling AIs, involve unconstrained searches through possible future worlds. This post argues that this is a very dangerous thing to do, because of the risk of being tricked by "siren worlds" or "marketing worlds". The thought experiment starts with an AI designing a siren world to fool us, but that AI is not crucial to the argument: it's simply an intuition pump to show that siren worlds can exist. Once they exist, there is a non-zero chance of us being seduced by them during a unconstrained search, whatever the search criteria are. This is a feature of optimisation: satisficing and similar approaches don't have the same problems.
The AI builds the siren worlds
Imagine that you have a superintelligent AI that's not just badly programmed, or lethally indifferent, but actually evil. Of course, it has successfully concealed this fact, as "don't let humans think I'm evil" is a convergent instrumental goal for all AIs.
We've successfully constrained this evil AI in a Oracle-like fashion. We ask the AI to design future worlds and present them to human inspection, along with an implementation pathway to create those worlds. Then if we approve of those future worlds, the implementation pathway will cause them to exist (assume perfect deterministic implementation for the moment). The constraints we've programmed means that the AI will do all these steps honestly. Its opportunity to do evil is limited exclusively to its choice of worlds to present to us.
The AI will attempt to design a siren world: a world that seems irresistibly attractive while concealing hideous negative features. If the human mind is hackable in the crude sense - maybe through a series of coloured flashes - then the AI would design the siren world to be subtly full of these hacks. It might be that there is some standard of "irresistibly attractive" that is actually irresistibly attractive: the siren world would be full of genuine sirens.
Even without those types of approaches, there's so much manipulation the AI could indulge in. I could imagine myself (and many people on Less Wrong) falling for the following approach:
Rationality Quotes April 2014
Another month has passed and here is a new rationality quotes thread. The usual rules are:
- Please post all quotes separately, so that they can be upvoted or downvoted separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
- Do not quote yourself.
- Do not quote from Less Wrong itself, HPMoR, Eliezer Yudkowsky, or Robin Hanson. If you'd like to revive an old quote from one of those sources, please do so here.
- No more than 5 quotes per person per monthly thread, please.
And one new rule:
- Provide sufficient information (URL, title, date, page number, etc.) to enable a reader to find the place where you read the quote, or its original source if available. Do not quote with only a name.
Earning to Give vs. Altruistic Career Choice Revisited
A commonly voiced sentiment in the effective altruist community is that the best way to do the most good is generally to make as much money as possible, with a view toward donating to the most cost-effective charities. This is often referred to as “earning to give.” In the article To save the world, don’t get a job at a charity; go work on Wall Street William MacAskill wrote:
Top undergraduates who want to “make a difference” are encouraged to forgo the allure of Wall Street and work in the charity sector ... while researching ethical career choice, I concluded that it’s in fact better to earn a lot of money and donate a good chunk of it to the most cost-effective charities, a path that I call “earning to give.” ... In general, the charitable sector is people-rich but money-poor. Adding another person to the labor pool just isn’t as valuable as providing more money, so that more workers can be hired.
In private correspondence, MacAskill clarified that he wasn’t arguing that “earning to give” is the best way to do good, only that it’s often better than working at a given nonprofit. In a recent comment MacAskill wrote
I think there's too much emphasis on “earning to give” as the *best* option rather than as the *baseline* option
and raises a number of counter-considerations against “earning to give.” Despite this, the idea that “earning to give” is optimal has caught on in the effective altruist community, and so it’s important to discuss it.
Over the past three years, I myself have shifted from the position that “earning to give” is philanthropically optimal, to the position that it’s generally the case that one can do more good by choosing a career with high direct social value than by choosing a lucrative career with a view toward donating as much as possible.
In this post I’ll outline some arguments in favor of this view.
A self-experiment in training "noticing confusion"
I previously discussed the potential relevance of therapeutic and instructional models of metacognitive training to LW-style rationality skills. As an attempted concrete realization of what this connection could look like, I ran a self-experiment in which I counted instances of noticing confusion. Below I elaborate on the motivation and design of the experiment, then discuss some quantitative results and qualitative reflections.
Tell Culture
Followup to: Ask and Guess
Ask culture: "I'll be in town this weekend for a business trip. Is it cool if I crash at your place?" Response: “Yes“ or “no”.
Guess culture: "Hey, great news! I'll be in town this weekend for a business trip!" Response: Infer that they might be telling you this because they want something from you, conclude that they might want a place to stay, and offer your hospitality only if you want to. Otherwise, pretend you didn’t infer that.
The two basic rules of Ask Culture: 1) Ask when you want something. 2) Interpret things as requests and feel free to say "no".
The two basic rules of Guess Culture: 1) Ask for things if, and *only* if, you're confident the person will say "yes". 2) Interpret requests as expectations of "yes", and, when possible, avoid saying "no".
Both approaches come with costs and benefits. In the end, I feel pretty strongly that Ask is superior.
But these are not the only two possibilities!
"I'll be in town this weekend for a business trip. I would like to stay at your place, since it would save me the cost of a hotel, plus I would enjoy seeing you and expect we’d have some fun. I'm looking for other options, though, and would rather stay elsewhere than inconvenience you." Response: “I think I need some space this weekend. But I’d love to get a beer or something while you’re in town!” or “You should totally stay with me. I’m looking forward to it.”
There is a third alternative, and I think it's probably what rationalist communities ought to strive for. I call it "Tell Culture".
The two basic rules of Tell Culture: 1) Tell the other person what's going on in your own mind whenever you suspect you'd both benefit from them knowing. (Do NOT assume others will accurately model your mind without your help, or that it will even occur to them to ask you questions to eliminate their ignorance.) 2) Interpret things people tell you as attempts to create common knowledge for shared benefit, rather than as requests or as presumptions of compliance.
Suppose you’re in a conversation that you’re finding aversive, and you can’t figure out why. Your goal is to procure a rain check.
- Guess: *You see this annoyed body language? Huh? Look at it! If you don’t stop talking soon I swear I’ll start tapping my foot.* (Or, possibly, tell a little lie to excuse yourself. “Oh, look at the time…”)
- Ask: “Can we talk about this another time?”
- Tell: "I'm beginning to find this conversation aversive, and I'm not sure why. I propose we hold off until I've figured that out."
Here are more examples from my own life:
- "I didn't sleep well last night and am feeling frazzled and irritable today. I apologize if I snap at you during this meeting. It isn’t personal."
- "I just realized this interaction will be far more productive if my brain has food. I think we should head toward the kitchen."
- "It would be awfully convenient networking for me to stick around for a bit after our meeting to talk with you and [the next person you're meeting with]. But on a scale of one to ten, it's only about 3 useful to me. If you'd rate the loss of utility for you as two or higher, then I have a strong preference for not sticking around."
The burden of honesty is even greater in Tell culture than in Ask culture. To a Guess culture person, I imagine much of the above sounds passive aggressive or manipulative, much worse than the rude bluntness of mere Ask. It’s because Guess people aren’t expecting relentless truth-telling, which is exactly what’s necessary here.
If you’re occasionally dishonest and tell people you want things you don't actually care about--like their comfort or convenience--they’ll learn not to trust you, and the inherent freedom of the system will be lost. They’ll learn that you only pretend to care about them to take advantage of their reciprocity instincts, when in fact you’ll count them as having defected if they respond by stating a preference for protecting their own interests.
Tell culture is cooperation with open source codes.
This kind of trust does not develop overnight. Here is the most useful Tell tactic I know of for developing that trust with a native Ask or Guess. It’s saved me sooooo much time and trouble, and I wish I’d thought of it earlier.
"I'm not asking because I expect you to say ‘yes’. I'm asking because I'm having trouble imagining the inside of your head, and I want to understand better. You are completely free to say ‘no’, or to tell me what you’re thinking right now, and I promise it will be fine." It is amazing how often people quickly stop looking shifty and say 'no' after this, or better yet begin to discuss further details.
Dark Arts of Rationality
Today, we're going to talk about Dark rationalist techniques: productivity tools which seem incoherent, mad, and downright irrational. These techniques include:
- Willful Inconsistency
- Intentional Compartmentalization
- Modifying Terminal Goals
I expect many of you are already up in arms. It seems obvious that consistency is a virtue, that compartmentalization is a flaw, and that one should never modify their terminal goals.
I claim that these 'obvious' objections are incorrect, and that all three of these techniques can be instrumentally rational.
In this article, I'll promote the strategic cultivation of false beliefs and condone mindhacking on the values you hold most dear. Truly, these are Dark Arts. I aim to convince you that sometimes, the benefits are worth the price.
To signal effectively, use a non-human, non-stoppable enforcer
Follow-up to: this comment in this thread
Summary: see title
Much effort is spent (arguably wasted) by humans in a zero-sum game of signaling that they hold good attributes. Because humans have strong incentive to fake these attributes, they cannot simply inform each other that:
I am slightly more committed to this group’s welfare, particularly to that of its weakest members, than most of its members are. If you suffer a serious loss of status/well-being I will still help you in order to display affiliation to this group even though you will no longer be in a position to help me. I am substantially more kind and helpful to the people I like and substantially more vindictive and aggressive towards those I dislike. I am generally stable in who I like. I am much more capable and popular than most members of this group, demand appropriate consideration, and grant appropriate consideration to those more capable than myself. I adhere to simple taboos so that my reputation and health are secure and so that I am unlikely to contaminate the reputations or health of my friends. I currently like you and dislike your enemies but I am somewhat inclined towards ambivalence on regarding whether I like you right now so the pay-off would be very great for you if you were to expend resources pleasing me and get me into the stable 'liking you' region of my possible attitudinal space. Once there, I am likely to make a strong commitment to a friendly attitude towards you rather than wasting cognitive resources checking a predictable parameter among my set of derivative preferences.
Or, even better:
I would cooperate with you if and only if (you would cooperate with me if and only if I would cooperate with you).
Mere Messiahs
Followup to: Superhero Bias
Yesterday I discussed how the halo effect, which causes people to see all positive characteristics as correlated—for example, more attractive individuals are also perceived as more kindly, honest, and intelligent—causes us to admire heroes more if they're super-strong and immune to bullets. Even though, logically, it takes much more courage to be a hero if you're not immune to bullets. Furthermore, it reveals more virtue to act courageously to save one life than to save the world. (Although if you have to do one or the other, of course you should save the world.)
"The police officer who puts their life on the line with no superpowers", I said, "reveals far greater virtue than Superman, who is a mere superhero."
But let's be more specific.
John Perry was a New York City police officer who also happened to be an Extropian and transhumanist, which is how I come to know his name. John Perry was due to retire shortly and start his own law practice, when word came that a plane had slammed into the World Trade Center. He died when the north tower fell. I didn't know John Perry personally, so I cannot attest to this of direct knowledge; but very few Extropians believe in God, and I expect that Perry was likewise an atheist.
The Irrationality Game
Please read the post before voting on the comments, as this is a game where voting works differently.
Warning: the comments section of this post will look odd. The most reasonable comments will have lots of negative karma. Do not be alarmed, it's all part of the plan. In order to participate in this game you should disable any viewing threshold for negatively voted comments.
Here's an irrationalist game meant to quickly collect a pool of controversial ideas for people to debate and assess. It kinda relies on people being honest and not being nitpickers, but it might be fun.
Write a comment reply to this post describing a belief you think has a reasonable chance of being true relative to the the beliefs of other Less Wrong folk. Jot down a proposition and a rough probability estimate or qualitative description, like 'fairly confident'.
Example (not my true belief): "The U.S. government was directly responsible for financing the September 11th terrorist attacks. Very confident. (~95%)."
If you post a belief, you have to vote on the beliefs of all other comments. Voting works like this: if you basically agree with the comment, vote the comment down. If you basically disagree with the comment, vote the comment up. What 'basically' means here is intuitive; instead of using a precise mathy scoring system, just make a guess. In my view, if their stated probability is 99.9% and your degree of belief is 90%, that merits an upvote: it's a pretty big difference of opinion. If they're at 99.9% and you're at 99.5%, it could go either way. If you're genuinely unsure whether or not you basically agree with them, you can pass on voting (but try not to). Vote up if you think they are either overconfident or underconfident in their belief: any disagreement is valid disagreement.
That's the spirit of the game, but some more qualifications and rules follow.
The Second Law of Thermodynamics, and Engines of Cognition
Followup to: Superexponential Conceptspace, and Simple Words
The first law of thermodynamics, better known as Conservation of Energy, says that you can't create energy from nothing: it prohibits perpetual motion machines of the first type, which run and run indefinitely without consuming fuel or any other resource. According to our modern view of physics, energy is conserved in each individual interaction of particles. By mathematical induction, we see that no matter how large an assemblage of particles may be, it cannot produce energy from nothing - not without violating what we presently believe to be the laws of physics.
This is why the US Patent Office will summarily reject your amazingly clever proposal for an assemblage of wheels and gears that cause one spring to wind up another as the first runs down, and so continue to do work forever, according to your calculations. There's a fully general proof that at least one wheel must violate (our standard model of) the laws of physics for this to happen. So unless you can explain how one wheel violates the laws of physics, the assembly of wheels can't do it either.
A similar argument applies to a "reactionless drive", a propulsion system that violates Conservation of Momentum. In standard physics, momentum is conserved for all individual particles and their interactions; by mathematical induction, momentum is conserved for physical systems whatever their size. If you can visualize two particles knocking into each other and always coming out with the same total momentum that they started with, then you can see how scaling it up from particles to a gigantic complicated collection of gears won't change anything. Even if there's a trillion quadrillion atoms involved, 0 + 0 + ... + 0 = 0.
But Conservation of Energy, as such, cannot prohibit converting heat into work. You can, in fact, build a sealed box that converts ice cubes and stored electricity into warm water. It isn't even difficult. Energy cannot be created or destroyed: The net change in energy, from transforming (ice cubes + electricity) to (warm water), must be 0. So it couldn't violate Conservation of Energy, as such, if you did it the other way around...
Perpetual motion machines of the second type, which convert warm water into electrical current and ice cubes, are prohibited by the Second Law of Thermodynamics.
The Second Law is a bit harder to understand, as it is essentially Bayesian in nature.
Yes, really.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)