On 'Why Global Poverty?' and Arguments from Unobservable Impacts

8 Gram_Stone 13 February 2016 06:04AM

Related: Is Molecular Nanotechnology "Scientific"?

For context, Jeff Kaufman delivered a speech on effective altruism and cause prioritization at EA Global 2015 entitled 'Why Global Poverty?', which he has transcribed and made available here. It's certainly worth reading.

I was dissatisfied with this speech in some ways. For the sake of transparency and charity, I will say that Kaufman has written a disclaimer explaining that, because of a miscommunication, he wrote this speech in the span of two hours immediately before he delivered it (instead of eating lunch, I would like to add), and that even after writing the text version, he is not entirely satisfied with the result.

I'm not that familiar with the EA community, but I predict that debates about cause prioritization, especially when existential risk mitigation is among the causes being discussed, can become mind-killed extremely quickly. And I don't mean to convey that in the tone of a wise outsider. It makes sense, considering the stakes at hand and the eschatological undertones of existential risk. (That is to say that the phrase 'save the world' can be sobering or gross, depending on the individual.) So, as is always implicit, but is sometimes worth making explicit, I'm criticizing some arguments as I understand them, not any person. I write this precisely because rationality is a common interest of many causes. I'll be focusing on the part about existential risk, as well as the parts that it is dependent upon. Lastly, I'd be interested in knowing if anyone else has criticized this speech in writing or come to conclusions similar to mine. Without further ado:

Jeff Kaufman's explanation of EA and why it makes sense is boilerplate; I agree with it, naturally. I also agree with the idea that certain existential risk mitigation strategies are comparatively less neglected by national governments and thus that risks like these are considerably less likely to be where one can make one's most valuable marginal donation. E.g., there are people who are paid to record and predict the trajectories of celestial objects, celestial mechanics is well-understood, and an impact event in the next two centuries is, with high meta-confidence, far less probable than many other risks. You probably shouldn't donate to asteroid impact risk mitigation organizations if you have to choose a cause from the category of existential risk mitigation organizations. The same goes for most natural (non-anthropogenic) risks.

The next few parts are worth looking at in detail, however:

At the other end we have risks like the development of an artificial intelligence that destroys us through its indifference. Very few people are working on this, there's low funding, and we don't have much understanding of the problem. Neglectedness is a strong heuristic for finding causes where your contribution can go far, and this does seem relatively neglected. The main question for me, though, is how do you know if you're making progress?

Everything before the question seems accurate to me. Furthermore, if I interpret the question correctly, then what's implied is a difference between the observable consequences of global poverty mitigation and existential risk mitigation. I think the implied difference is fair. You can see the malaria evaporating but you only get one chance to build a superintelligence right. (It's worth saying that AI risk is also the example that Kaufman uses in his explanation.)

However, I don't think that this necessarily implies that we can't have some confidence that we're actually mitigating existential risks. This is clear if we dissolve the question. What are the disguised queries behind the question 'How do you know if you're making progress?'

If your disguised query is 'Can I observe the consequences of my interventions and update my beliefs and correct my actions accordingly?', then in the case of existential risks, the answer is "No", at least in the traditional sense of an experiment.

If your disguised query is 'Can I have confidence in the effects of my interventions without observing their consequences?', then that seems like a different, much more complicated question that is both interesting and worth examining further. I'll expand on this conceivably more controversial bit later, so that it doesn't seem like I'm being uncharitable or quoting out of context. Kaufman continues:

First, a brief digression into feedback loops. People succeed when they have good feedback loops. Otherwise they tend to go in random directions. This is a problem for charity in general, because we're buying things for others instead of for ourselves. If I buy something and it's no good I can complain to the shop, buy from a different shop, or give them a bad review. If I buy you something and it's no good, your options are much more limited. Perhaps it failed to arrive but you never even knew you were supposed to get it? Or it arrived and was much smaller than I intended, but how do you know. Even if you do know that what you got is wrong, chances are you're not really in a position to have your concerns taken seriously.

This is a big problem, and there are a few ways around this. We can include the people we're trying to help much more in the process instead of just showing up with things we expect them to want. We can give people money instead of stuff so they can choose the things they most need. We can run experiments to see which ways of helping people work best. Since we care about actually helping people instead of just feeling good about ourselves, we not only can do these things, we need to do them. We need to set up feedback loops where we only think we're helping if we're actually helping.

Back to AI risk. The problem is we really really don't know how to make good feedback loops here. We can theorize that an AI needs certain properties not to just kill us all, and that in order to have those properties it would be useful to have certain theorems proved, and go work on those theorems. And maybe we have some success at this, and the mathematical community thinks highly of us instead of dismissing our work. But if our reasoning about what math would be useful is off there's no way for us to find out. Everything will still seem like it's going well.

I think I get where Kaufman is coming from on this. First, I'm going to use an analogy to convey what I believe to be the commonly used definition of the phrase 'feedback loop'.

If you're an entrepreneur, you want your beliefs about which business strategies will be successful to be entangled with reality. You also have a short financial runway, so you need to decide quickly, which means that you have to obtain your evidence quickly if you want your beliefs to be entangled in time for it to matter. So immediately after you affect the world, you look at it to see what happened and update on it. And this is virtuous.

And of course, people are notoriously bad at remaining entangled with reality when they don't look at it. And this seems like an implicit deficiency in any existential risk mitigation intervention; you can't test the effectiveness of your intervention. You succeed or fail, one time.

Next, let's taboo the phrase 'feedback loop'.

So, it seems like there's a big difference between first handing out insecticidal bed nets and then looking to see whether or not the malaria incidence goes down, and paying some mathematicians to think about AI risk. When the AI researchers 'make progress', where can you look? What in the world is different because they thought instead of not, beyond the existence of an academic paper?

But a big part of this rationality thing is knowing that you can arrive at true beliefs by correct reasoning, and not just by waiting for the answer to smack you in the face.

And I would argue that any altruist is doing the same thing when they have to choose between causes before they can make observations. There are a million other things that the founders of the Against Malaria Foundation could have done, but they took the risk of riding on distributing bed nets, even though they had yet to see it actually work.

In fact, AI risk is not-that-different from this, but you can imagine it as a variant where you have to predict much further into the future, the stakes are higher, and you don't get a second try after you observe the effect of your intervention.

And if you imagine a world where a global authoritarian regime involuntarily reads its citizens' minds as a matter of course, and there it is lawful that anyone who identifies as an EA is to be put in an underground chamber where they are given a minimum income that they may donate as they please, and they are allowed to reason on their prior knowledge only, never being permitted to observe the consequences of their donations, then I bet that EAs would not say, "I have no feedback loop and I therefore cannot decide between any of these alternatives."

Rather, I bet that they would say, "I will never be able to look at the world and see the effects of my actions at a time that affects my decision-making, but this is my best educated guess of what the best thing I can do is, and it's sure as hell better than doing nothing. Yea, my decision is merely rational."

You want observational consequences because they give you confidence in your ability to make predictions. But you can make accurate predictions without being able to observe the consequences of your actions, and without just getting lucky, and sometimes you have to.

But in reality we're not deciding between donating something and donating nothing. We're choosing between charitable causes. But I don't think that the fact that our interventions are less predictable should make us consider the risk more negligible or the prevention thereof less valuable. Above choosing causes where the effects of interventions are predictable, don't we want to choose the most valuable causes? A bias towards causes with consistently, predictably, immediately effective interventions doesn't seem like something that should completely dominate our decision-making process even if there's an alternative cause that can be less predictably intervened upon but that would result in outcomes with extremely high utility if successfully intervened upon.

To illustrate, imagine that you are at some point on a long road, truly in the middle of nowhere, and you see a man whose car has a flat tire. You know that someone else may not drive by for hours, and you don't know how well-prepared the man is for that eventuality. You consider stopping your car to help; you have a spare, you know how to change tires, and you've seen it work before. And if you don't do it right the first time for some weird reason, you can always try again.

But suddenly, you notice that there is a person lying motionless on the ground, some ways down the road; far, but visible. There's no cellphone service, it would take an ambulance hours to get here unless they happened to be driving by, and you have no medical training or experience.

I don't know about you, but even if I'm having an extremely hard time thinking of things to do about a guy dying on my watch in the middle of nowhere, the last thing I do is say, "I have no idea what to do if I try to save that guy, but I know exactly how to change a tire, so why don't I just change the tire instead." Because even if I don't know what to do, saving a life is so much more important than changing a tire that I don't care about the uncertainty. And maybe if I went and actually tried saving his life, even if I wasn't sure how to go about it, it would turn out that I would find a way, or that he needed help, but he wasn't about to die immediately, or that he was perfectly fine all along. And I never would've known if I'd changed a tire and driven in the opposite direction.

And it doesn't mean that the strategy space is open season. I'm not going to come up with a new religion on the spot that contains a prophetic vision that this man will survive his medical emergency, nor am I going to try setting him on fire. There are things that will obviously not work without me trying them out. And that can be built on with other ideas that are not-obviously-wrong-but-may-turn-out-to-be-wrong-later. It's great to have an idea of what you can know is wrong even if you can't try anything. Because not being able to try more than once is precisely the problem.

If we stop talking about what rational thinking feels like, and just start talking about rational thinking with the usual words, then what I'm getting at is that, in reality, there is an inside view to the AI risk arguments. You can always talk about confidence levels outside of an argument, but it helps to go into the details of the inside view, to see where our uncertainty about various assertions is greatest. Otherwise, where is your outside estimate even coming from, besides impression?

We can't run an experiment to see if the mathematics of self-reference, for example, is a useful thing to flesh out before trying to solve the larger problem of AI risk, but there are convincing reasons that it is. And sometimes that's all you have at the time.

And if you ever ask me, "Why does your uncertainty bottom out here?", then I'll ask you "Why does your uncertainty bottom out there?" Because it bottoms out somewhere, even if it's at the level of "I know that I know nothing," or some other similarly useless sentimentAnd it's okay.

But I will say that this state of affairs is not optimal. It would be nice if we could be more confident about our reasoning in situations where we aren't able to make predictions, and then perform interventions, and then make observations that we can update on, and then try again. It's great to have medical training in the middle of nowhere.

And I will also say that I imagine that Kaufman is not talking about it being a fundamentally bad idea forever to donate to existential risk mitigation, but that it just doesn't seem like a good idea right now, because we don't know enough about when we should be confident in predictions that we can't test before we have to take action.

But if you know you're confused about how to determine the impact of interventions intended to mitigate existential risks, it's almost as if you should consider trying to figure out that problem itself. If you could crack the problem of mitigating existential risks, it would blow global poverty out of the water. And the problem doesn't immediately seem completely obviously intractable.

In fact, it's almost as if the cause you should choose is the research of existential risk strategy (a subset of cause prioritization). And, if you were to write a speech about it, it seems like it would be a good idea to make it really clear that that's probably very impactful, because value of information counts.

And so, when you read a speech that you claim is entitled 'Why Global Poverty?', I read a speech entitled 'Why Existential Risk Strategy Research?'

Is Molecular Nanotechnology "Scientific"?

22 Eliezer_Yudkowsky 20 August 2007 04:11AM

Prerequisite / Read this first:  Scientific Evidence, Legal Evidence, Rational Evidence

Consider the statement "It is physically possible to construct diamondoid nanomachines which repair biological cells."  Some people will tell you that molecular nanotechnology is "pseudoscience" because it has not been verified by experiment - no one has ever seen a nanofactory, so how can believing in their possibility be scientific?

Drexler, I think, would reply that his extrapolations of diamondoid nanomachines are based on standard physics, which is to say, scientific generalizations. Therefore, if you say that nanomachines cannot work, you must be inventing new physics.  Or to put it more sharply:  If you say that a simulation of a molecular gear is inaccurate, if you claim that atoms thus configured would behave differently from depicted, then either you know a flaw in the simulation algorithm or you're inventing your own laws of physics.

continue reading »

Scientific Evidence, Legal Evidence, Rational Evidence

37 Eliezer_Yudkowsky 19 August 2007 05:36AM

Suppose that your good friend, the police commissioner, tells you in strictest confidence that the crime kingpin of your city is Wulky Wilkinsen.  As a rationalist, are you licensed to believe this statement?  Put it this way: if you go ahead and mess around with Wulky's teenage daughter, I'd call you foolhardy.  Since it is prudent to act as if Wulky has a substantially higher-than-default probability of being a crime boss, the police commissioner's statement must have been strong Bayesian evidence.

Our legal system will not imprison Wulky on the basis of the police commissioner's statement.  It is not admissible as legal evidence.  Maybe if you locked up every person accused of being a crime boss by a police commissioner, you'd initially catch a lot of crime bosses, plus some people that a police commissioner didn't like.  Power tends to corrupt: over time, you'd catch fewer and fewer real crime bosses (who would go to greater lengths to ensure anonymity) and more and more innocent victims (unrestrained power attracts corruption like honey attracts flies).

This does not mean that the police commissioner's statement is not rational evidence.  It still has a lopsided likelihood ratio, and you'd still be a fool to mess with Wulky's teenager daughter.  But on a social level, in pursuit of a social goal, we deliberately define "legal evidence" to include only particular kinds of evidence, such as the police commissioner's own observations on the night of April 4th.  All legal evidence should ideally be rational evidence, but not the other way around.  We impose special, strong, additional standards before we anoint rational evidence as "legal evidence".

As I write this sentence at 8:33pm, Pacific time, on August 18th 2007, I am wearing white socks.  As a rationalist, are you licensed to believe the previous statement?  Yes.  Could I testify to it in court?  Yes.  Is it a scientific statement?  No, because there is no experiment you can perform yourself to verify it.  Science is made up of generalizations which apply to many particular instances, so that you can run new real-world experiments which test the generalization, and thereby verify for yourself that the generalization is true, without having to trust anyone's authority.  Science is the publicly reproducible knowledge of humankind.

Like a court system, science as a social process is made up of fallible humans.  We want a protected pool of beliefs that are especially reliable.  And we want social rules that encourage the generation of such knowledge.  So we impose special, strong, additional standards before we canonize rational knowledge as "scientific knowledge", adding it to the protected belief pool.

continue reading »

Outreach Thread

6 Gleb_Tsipursky 06 March 2016 10:18PM

Based on an earlier suggestion, here's an outreach thread where you can leave comments about any recent outreach that you have done to convey rationality-style ideas broadly. The goal of having this thread is to organize information about outreach and provide community support and recognition for raising the sanity waterline. Likewise, doing so can help inspire others to emulate some aspects of these good deeds through social proof and network effects.


 

Cryptographic Boxes for Unfriendly AI

24 paulfchristiano 18 December 2010 08:28AM

Related to: Shut up and do the impossible!; Everything about an AI in a box.

One solution to the problem of friendliness is to develop a self-improving, unfriendly AI, put it in a box, and ask it to make a friendly AI for us.  This gets around the incredible difficulty of developing a friendly AI, but it creates a new, apparently equally impossible problem. How do you design a box strong enough to hold a superintelligence?  Lets suppose, optimistically, that researchers on friendly AI have developed some notion of a certifiably friendly AI: a class of optimization processes whose behavior we can automatically verify will be friendly. Now the problem is designing a box strong enough to hold an unfriendly AI until it modifies itself to be certifiably friendly (of course, it may have to make itself smarter first, and it may need to learn a lot about the world to succeed).

Edit: Many people have correctly pointed out that certifying friendliness is probably incredibly difficult. I personally believe it is likely to be significantly easier than actually finding an FAI, even if current approaches are more likely to find FAI first. But this isn't really the core of the article. I am describing a general technique for quarantining potentially dangerous and extraordinarily sophisticated code, at great expense. In particular, if we developed uFAI before having any notion of certifiable friendliness, then we could still use this technique to try and use the uFAI in a very limited way. It allows us to quarantine an AI and force everything it tries to say to us through an arbitrarily complicated filter. So, for example, if we are interested in solving some problem with a unique and verifiably correct answer, then we can ask the AI for this answer and only see the output if it passes the verification. We can also use a quarantined AI in more useful ways, at the expense of introducing some limited channels of influence (for example, if we ask for the solution to a problem which may have several solutions, we allow the AI to influence us by choosing which solution to provide)

Minimally, I hope this scheme at least suggests that putting an AI in a box may not be catastrophic. What to do with a boxed AI is another question. /Edit.

Concretely, suppose someone comes to you with the design of a recursively self-improving AI---call it Hal.  Unfortunately, Hal is not certifiably friendly.  Maybe it is friendly but not provably so, but more likely it is ambivalent to the human condition and is likely to destroy the universe if you just tell it to make a friendly AI.  Nevertheless, you think Hal is smart enough to solve the friendly AI problem for you.  All you need to do is give it access to all of the information about the world it wants and a couple of hours to run; after that, it should output a modified version of itself which is certifiably friendly.

You almost run Hal and ask it to design a friendly AI for you. After all, if you just run Hal and check to see if it outputs a certifiably friendly AI, what can it do other than refuse?  But after some more thought you realize this is overly optimistic.  Maybe you can avoid giving it an output channel that a human will have to read, thereby eliminating the most obvious failure mode of a boxed AI, but you still have to give it access to a computer to run on.  Computation produces electromagnetic radiation, heat, etc., and it is honestly hard to say what a superintelligence could do with such a channel.

We seem to be faced with an impossible problem.  We need to ask Hal to design us a certifiably friendly AI, but we also need to give him absolutely no ability to interact with the world destructively.

There is one field in particular which has a good history of solving obviously impossible problems: computational cryptography.  We want to implement the following obviously impossible device.  A cage into which we can place Hal so that he can think and examine all of the data we give him, but from which he cannot interact with his environment.  In the cage with him is a tamper-proof arbiter implementing our friendliness test.  Hal may give the arbiter the code to a single AI; if the arbiter confirms that this AI is certifiably friendly it will kill Hal, replace him with the friendly AI, and then allow the friendly AI to send a single message out of the cage.

To implement this impossible device we co-opt an important cryptographic primitive called public key homomorphic encryption, typically used for the impossible problem of private delegated computation.  To get a basic idea you can see the Wikipedia article on homomorphic encryption. I'll give the brief description here (you can skip the next two paragraphs if you are familiar with homomorphic encryption).

A public key encryption scheme has the following properties.  Anyone can encrypt a message using a publicly published key, but you cannot efficiently distinguish the encryptions of two different messages unless you know something about the secret key.  A fully homomorphic encryption scheme has the additional counter intuitive property that, although you cannot learn anything about the encrypted data without the secret key, you can still perform arbitrary operations on it.  In particular, if you have a homomorphic encryption of the source code of a program you can run that program without learning anything about it (except how long the source is), but the output of the program will be homomorphically encrypted.  All of this can be done using only the public key.

The existence of a fully homomorphic encryption scheme was completely open until recently, when it was shown by Craig Gentry that such a scheme exists under the assumption that a certain problem on lattices is computationally intractable.  Known homomorphic encryption schemes are very inefficient and based on new assumptions, but we can realistically expect improvements in the coming years and decades both in speed and the strength of the necessary assumptions.

If you run a homomorphically encrypted AI, sufficiently isolated from the private key, you get the very strong guarantee that the AI's execution has no computationally detectable effect on the rest of the world.  If the AI was able to effect the world in a computationally detectable way, then we would have successfully broken the homomorphic encryption scheme by distinguishing the encrypted execution of the AI from the encrypted execution of some benign program.

Now we need to give the arbiter some means to allow a genuinely friendly AI access to the rest of the world, without giving Hal any illegitimate access.  We allow Hal access to an output channel whose outputs are homomorphically encrypted.  After some time, we destroy Hal and then homomorphically check to see if he has output the code of a certifiably friendly AI over the channel (we have to decode a single homomorphically encrypted bit to perform this check).  If he hasn't, then we do nothing---apparently Hal would rather not give us a friendly AI.  If he has output the code of a certifiably friendly AI, we homomorphically run it, obtain a homomorphic encryption of its output, and then decrypt only this output.

That's the entire scheme.  Whenever you have some code whose execution itself might do incredible damage, this precaution allows you to execute it in perfect safety (the main caveat is that you need an automatic way to screen the output which doesn't give the algorithm enough room to possibly do harm).

Note: If you are really exceptionally paranoid you may be able to obtain even better guarantees by developing more powerful cryptographic techniques tailored for this application.  In particular, the secret key used by the encryption procedure may be leaked to a sufficiently powerful AI through a side channel (the computations involving the key typically leak some data in the form of radiation, which might be recovered from sufficiently exhaustive sensor data by a sufficiently powerful AI).  By doing these computations on a quantum computer you can probably guarantee that no data was leaked. If you want to be even more fantastically paranoid (ie, if you want a formal proof of security, which is a rare treat in real problems) you will actually have to shred the key, guaranteeing that an AI will have to interact with the shredded key to recover it. If you do this, you have to use another new primitive to implement the final revelation of the friendly AI's message.

If you have any concerns about the security of this protocol, I would be happy to try and defend it and would be surprised and interested if I failed. Whether or not such a cryptographic box is really an interesting or important object is another question. (It is interesting to me as an unexpected application of cryptography).

Reply to Holden on 'Tool AI'

94 Eliezer_Yudkowsky 12 June 2012 06:00PM

I begin by thanking Holden Karnofsky of Givewell for his rare gift of his detailed, engaged, and helpfully-meant critical article Thoughts on the Singularity Institute (SI). In this reply I will engage with only one of the many subjects raised therein, the topic of, as I would term them, non-self-modifying planning Oracles, a.k.a. 'Google Maps AGI' a.k.a. 'tool AI', this being the topic that requires me personally to answer.  I hope that my reply will be accepted as addressing the most important central points, though I did not have time to explore every avenue.  I certainly do not wish to be logically rude, and if I have failed, please remember with compassion that it's not always obvious to one person what another person will think was the central point.

Luke Mueulhauser and Carl Shulman contributed to this article, but the final edit was my own, likewise any flaws.

Summary:

Holden's concern is that "SI appears to neglect the potentially important distinction between 'tool' and 'agent' AI." His archetypal example is Google Maps:

Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.

The reply breaks down into four heavily interrelated points:

First, Holden seems to think (and Jaan Tallinn doesn't apparently object to, in their exchange) that if a non-self-modifying planning Oracle is indeed the best strategy, then all of SIAI's past and intended future work is wasted.  To me it looks like there's a huge amount of overlap in underlying processes in the AI that would have to be built and the insights required to build it, and I would be trying to assemble mostly - though not quite exactly - the same kind of team if I was trying to build a non-self-modifying planning Oracle, with the same initial mix of talents and skills.

Second, a non-self-modifying planning Oracle doesn't sound nearly as safe once you stop saying human-English phrases like "describe the consequences of an action to the user" and start trying to come up with math that says scary dangerous things like (he translated into English) "increase the correspondence between the user's belief about relevant consequences and reality".  Hence why the people on the team would have to solve the same sorts of problems.

Appreciating the force of the third point is a lot easier if one appreciates the difficulties discussed in points 1 and 2, but is actually empirically verifiable independently:  Whether or not a non-self-modifying planning Oracle is the best solution in the end, it's not such an obvious privileged-point-in-solution-space that someone should be alarmed at SIAI not discussing it.  This is empirically verifiable in the sense that 'tool AI' wasn't the obvious solution to e.g. John McCarthy, Marvin Minsky, I. J. Good, Peter Norvig, Vernor Vinge, or for that matter Isaac Asimov.  At one point, Holden says:

One of the things that bothers me most about SI is that there is practically no public content, as far as I can tell, explicitly addressing the idea of a "tool" and giving arguments for why AGI is likely to work only as an "agent."

If I take literally that this is one of the things that bothers Holden most... I think I'd start stacking up some of the literature on the number of different things that just respectable academics have suggested as the obvious solution to what-to-do-about-AI - none of which would be about non-self-modifying smarter-than-human planning Oracles - and beg him to have some compassion on us for what we haven't addressed yet.  It might be the right suggestion, but it's not so obviously right that our failure to prioritize discussing it reflects negligence.

The final point at the end is looking over all the preceding discussion and realizing that, yes, you want to have people specializing in Friendly AI who know this stuff, but as all that preceding discussion is actually the following discussion at this point, I shall reserve it for later.

continue reading »

Doublethink (Choosing to be Biased)

33 Eliezer_Yudkowsky 14 September 2007 08:05PM

An oblong slip of newspaper had appeared between O'Brien's fingers. For perhaps five seconds it was within the angle of Winston's vision. It was a photograph, and there was no question of its identity. It was the photograph. It was another copy of the photograph of Jones, Aaronson, and Rutherford at the party function in New York, which he had chanced upon eleven years ago and promptly destroyed. For only an instant it was before his eyes, then it was out of sight again. But he had seen it, unquestionably he had seen it! He made a desperate, agonizing effort to wrench the top half of his body free. It was impossible to move so much as a centimetre in any direction. For the moment he had even forgotten the dial. All he wanted was to hold the photograph in his fingers again, or at least to see it.

'It exists!' he cried.

'No,' said O'Brien.

He stepped across the room.

There was a memory hole in the opposite wall. O'Brien lifted the grating. Unseen, the frail slip of paper was whirling away on the current of warm air; it was vanishing in a flash of flame. O'Brien turned away from the wall.

'Ashes,' he said. 'Not even identifiable ashes. Dust. It does not exist. It never existed.'

'But it did exist! It does exist! It exists in memory. I remember it. You remember it.'

'I do not remember it,' said O'Brien.

Winston's heart sank. That was doublethink. He had a feeling of deadly helplessness. If he could have been certain that O'Brien was lying, it would not have seemed to matter. But it was perfectly possible that O'Brien had really forgotten the photograph. And if so, then already he would have forgotten his denial of remembering it, and forgotten the act of forgetting. How could one be sure that it was simple trickery? Perhaps that lunatic dislocation in the mind could really happen: that was the thought that defeated him.

   —George Orwell, 1984

What if self-deception helps us be happy?  What if just running out and overcoming bias will make us—gasp!—unhappy?  Surely, true wisdom would be second-order rationality, choosing when to be rational.  That way you can decide which cognitive biases should govern you, to maximize your happiness.

Leaving the morality aside, I doubt such a lunatic dislocation in the mind could really happen.

continue reading »

Reasoning isn't about logic (it's about arguing)

49 Morendil 14 March 2010 04:42AM

"Why do humans reason" (PDF), a paper by Hugo Mercier and Dan Sperber, reviewing an impressive amount of research with a lot of overlap with themes previously explored on Less Wrong, suggests that our collective efforts in "refining the art of human rationality" may ultimately be more successful than most individual efforts to become stronger. The paper sort of turns the "fifth virtue" on its head; rather than argue in order to reason (as perhaps we should), in practice, we reason in order to argue, and that should change our views quite a bit.

I summarize Mercier and Sperber's "argumentative theory of reasoning" below and point out what I believe its implications are to the mission of a site such as Less Wrong.

Human reasoning is one mechanism of inference among others (for instance, the unconscious inference involved in perception). It is distinct in being a) conscious, b) cross-domain, c) used prominently in human communication. Mercier and Sperber make much of this last aspect, taking it as a huge hint to seek an adaptive explanation in the fashion of evolutionary psychology, which may provide better answers than previous attempts at explanations of the evolution of reasoning.

continue reading »

A toy model of the control problem

19 Stuart_Armstrong 16 September 2015 02:59PM

EDITED based on suggestions for improving the model

Jaan Tallinn has suggested creating a toy model of the control problem, so that it can be analysed without loaded concepts like "autonomy", "consciousness", or "intentionality". Here a simple (too simple?) attempt:

 

A controls B. B manipulates A.

Let B be a robot agent that moves in a two dimensional world, as follows:

continue reading »

Clearing An Overgrown Garden

9 Anders_H 29 January 2016 10:16PM

(tl;dr: In this post, I make some concrete suggestions for LessWrong 2.0.)

Less Wrong 2.0

A few months ago, Vaniver posted some ideas about how to reinvigorate Less Wrong. Based on comments in that thread and based on personal discussions I have had with other members of the community, I believe there are several different views on why Less Wrong is dying. The following are among the most popular hypotheses:

(1) Pacifism has caused our previously well-kept garden to become overgrown

(2) The aversion to politics has caused a lot of interesting political discussions to move away from the website

(3) People prefer posting to their personal blogs.

With this background, I suggest the following policies for Less Wrong 2.0.  This should be seen only as a starting point for discussion about the ideal way to implement a rationality forum. Most likely, some of my ideas are counterproductive. If anyone has better suggestions, please post them to the comments.

Moderation Policy:

There are four levels of users:  

  1. Users
  2. Trusted Users 
  3. Moderators
  4. Administrator
Users may post comments and top level posts, but their contributions must be approved by a moderator.

Trusted users may post comments and top level posts which appear immediately. Trusted user status is awarded by 2/3 vote among the moderators

Moderators may approve comments made by non-trusted users. There should be at least 10 moderators to ensure that comments are approved within an hour of being posted, preferably quicker. If there is disagreement between moderators, the matter can be discussed on a private forum. Decisions may be altered by a simple majority vote.

The administrator (preferably Eliezer or Nate) chooses the moderators.

Personal Blogs:


All users are assigned a personal subdomain, such as Anders_H.lesswrong.com. When publishing a top-level post, users may click a checkbox to indicate whether the post should appear only on their personal subdomain, or also in the Less Wrong discussion feed. The commenting system is shared between the two access pathways. Users may choose a design template for their subdomain. However, when the post is accessed from the discussion feed, the default template overrides the user-specific template. The personal subdomain may include a blogroll, an about page, and other information. Users may purchase a top-level domain as an alias for their subdomain

Standards of Discourse and Policy on Mindkillers:

All discussion in Less Wrong 2.0 is seen explicitly as an attempt to exchange information for the purpose of reaching Aumann agreement. In order to facilitate this goal, communication must be precise. Therefore, all users agree to abide by Crocker's Rules for all communication that takes place on the website.  

However, this is not a license for arbitrary rudeness.  Offensive language is permitted only if it is necessary in order to point to a real disagreement about the territory. Moreover, users may not repeatedly bring up the same controversial discussion outside of their original context.

Discussion of politics is explicitly permitted as long as it adheres to the rules outlined above. All political opinions are permitted (including opinions which are seen as taboo by society as large), as long as the discussion is conducted with civility and in a manner that is suited for dispassionate exchange of information, and suited for accurate reasoning about the consequences of policy choice. By taking part in any given discussion, all users are expected to pre-commit to updating in response to new information.

Upvotes:

Only trusted users may vote. There are two separate voting systems.  Users may vote on whether the post raises a relevant point that will result in interesting discussion (quality of contribution) and also on whether they agree with the comment (correctness of comment). The first is a property both of the comment and of the user, and is shown in their user profile.  The second scale is a property only of the comment. 

All votes are shown publicly (for an example of a website where this is implemented, see for instance dailykos.com).  Abuse of the voting system will result in loss of Trusted User Status. 

How to Implement This

After the community comes to a consensus on the basic ideas behind LessWrong 2.0, my preference is for MIRI to implement it as a replacement for Less Wrong. However, if for some reason MIRI is unwilling to do this, and if there is sufficient interest in going in this direction, I offer to pay server costs. If necessary, I also offer to pay some limited amount for someone to develop the codebase (based on Open Source solutions). 

Other Ideas:


MIRI should start a professionally edited rationality journal (For instance called "Rationality") published bi-monthly. Users may submit articles for publication in the journal. Each week, one article is chosen for publication and posted to a special area of Less Wrong. This replaces "main". Every two months, these articles are published in print in the journal.  

The idea behind this is as follows:
(1) It will incentivize users to compete for the status of being published in the journal.
(2) It will allow contributors to put the article on their CV.
(3) It may bring in high-quality readers who are unlikely to read blogs.  
(4) Every week, the published article may be a natural choice for discussion topic at Less Wrong Meetup

View more: Prev | Next