One ideal I have never abandoned and never considered abandoning is that if you disagree with a final conclusion, you ought to be able to exhibit a particular premise or reasoning step that you disagree with. Michael Vassar views this as a fundamental divide that separates sanitykind from Muggles; with Tyler Cowen, for example, rejecting cryonics but not feeling obligated to reject any particular premise of Hanson's. Perhaps we should call ourselves the Modusponenstsukai.
It's usually much harder to find a specific flaw in an argument than it is to see that there is probably something wrong with the conclusion. For example, I probably won't be able to spot the specific flaw in most proposed designs for a perpetual motion machine, but I can still conclude that it won't work as advertised!
I read "ought to be able to" not as "you're not allowed to reject the conclusion without rejecting a premise" so much as "you ought to be able to, so when you find you're not able to, it should bother you; you have learned that there's a key failing in your understanding of that area."
I agree and while reading Eliezer's comment I mentally added in something like "or if you cant then you explicitly model your confusion as being a limitation in your current understanding and so lower your confidence in the related suspect reasoning appropriately - ideally until your confusion can be resolved and your curiosity satisfied" as a footnote.
As an example take the idea of quantum suicide. I wouldn’t commit quantum suicide even given a high confidence in the many-worlds interpretation of quantum mechanics being true. Logical implications just don’t seem enough in some cases.
Red herring. The 'logical implications' of quantum suicide are that it's a terrible idea because you'll mostly die. Using quantum suicide as a reason to ignore logical implications is blatantly fallacious.
Many of these issues arise from some combination of allowing unbounded utilities, and assuming utility is linear. Where problems seem to arise, this is the place to fix them. It is much easier to incorporate fixes into our utility function (which is already extremely complicated, and poorly understood) than it is to incorporate them into the rules of reasoning or the rules of evidence, which are comparatively simple, and built upon math rather than on psychology.
Bounded utility solves Pascal's Mugging and Torture vs Dust Specs straightforwardly. You choose some numbers to represent "max goodness" and "max badness"; really good and bad things approach these bounds asymptotically; and when you meet Pascal's mugger, you take "max badness", multiply it by an extremely tiny value, and get a very tiny value.
Quantum suicide is also a utility function issue, but not the same one. If your utility function only cares about average utility over the worlds in which you're still alive, then you should commit quantum suicide. But revealed preferences indicate that people care about all worlds, and philosophy seems to indicate that they should care about all worlds, so quantum suicide is wrong.
Sure, agreed.
But the question arises, compared to what?
If I develop an algorithm A1 for solving certain problems that is more reliable than my own intuition, the fact that A1 is not perfectly reliable is a great reason to try and develop a superior algorithm A2. It's a poor reason to discard A1 and rely on my own intuition instead.
...Assume that humanity managed to create a friendly AI (FAI). Given the enormous amount of resources that each human is poised to consume until the dark era of the universe, wouldn't the same arguments that now suggest that we should contribute money to existential risk charities then suggest that we should donate our resources to the friendly AI? Our resources could enable it to find a way to either travel back in time, leave the universe or hack the matrix. Anything that could avert the end of the universe and allow the FAI to support many more agents has
I like this point from Terry Tao:
...I think an epsilon of paranoia is useful to regularise these sorts of analyses. Namely, one supposes that there is an adversary out there who is actively trying to lower your expected utility through disinformation (in order to goad you into making poor decisions), but is only able to affect all your available information by an epsilon. One should then adjust one’s computations of expected utility accordingly. In particular, the contribution of any event that you expect to occur with probability less than epsilon should p
...Assume that humanity managed to create a friendly AI (FAI). Given the enormous amount of resources that each human is poised to consume until the dark era of the universe, wouldn't the same arguments that now suggest that we should contribute money to existential risk charities then suggest that we should donate our resources to the friendly AI? Our resources could enable it to find a way to either travel back in time, leave the universe or hack the matrix. Anything that could avert the end of the universe and allow the FAI to support many more agents has
As an example take the idea of quantum suicide. I wouldn’t commit quantum suicide even given a high confidence in the many-worlds interpretation of quantum mechanics being true. Logical implications just don’t seem enough in some cases.
You shouldn't commit quantum suicide because it decreases your measure, which by observation we know is important in ways we don't theoretically understand, and, unless you are very careful, the worlds where you escape death are not likely to be pleasant. You don't need skepticism of rationality itself to reach this conclusion.
Our current methods might turn out to be biased in new and unexpected ways. Pascal's mugging, the Lifespan Dilemma, blackmailing and the wrath of Löb's theorem are just a few examples on how an agent build according to our current understanding of rationality could fail.
I don't really get it. For example, building a machine that is sceptical of Pascal's wager doesn't seem harder than building a machine that is sceptical of other verbal offers unsupported by evidence. I don't see what's wrong with the idea that "extraordinary claims require extraordinary evidence".
Our current methods might turn out to be biased in new and unexpected ways. Pascal's mugging, the Lifespan Dilemma, blackmailing and the wrath of Löb's theorem are just a few examples on how an agent build according to our current understanding of rationality could fail.
What are you trying to do here? Are you trying to give specific examples of cases in which doing the rational thing could be the wrong thing to do? Surely not, that would be oxymoronic - if you already know that the 'rational thing' is a mistake then it isn't the rational thing. Failing...
I think that we should sometimes demand particular proof P; and if proof P is not available, then we should discount seemingly absurd or undesirable consequences even if our theories disagree.
Possibly, if the theory predicts that proof P would be available, then the lack of such proof is evidence against the theory. Otherwise, alternate proof should be acceptable.
Pascal's mugging, the Lifespan Dilemma, blackmailing and the wrath of Löb's theorem
Sorry, what's wrong with Lob's theorem?
But my best guess right now is that we simply have to draw a lot of arbitrary lines and arbitrarily refuse some steps.
Can you name three alternatives, and why you reject them? How hard did you think to come up with alternatives?
wouldn't the same arguments that now suggest that we should contribute money to existential risk charities then suggest that we should donate our resources to the friendly AI?
My answer is "No." How hard did you try to discover if the best answer to that question is actually "No"?
For very large or very small probabilities, I agree it's important to start taking into account the "model uncertainty." And if some argument leads to the conclusion 2=1 (or that you should never act as if you'll die, which is of similar levels of wrong), of course you discount it, not in defiance of probability, but with probability, since we have so much evidence against that claim.
However, in the "donating to SIAI" case, I don't think we're actually talking about particularly large or small probabilities, or fallacious arguments. Implications can be labeled "extraordinary" for being socially unusual. This sort of extraordinary doesn't seem like it should be discounted.
It seems to me that what you're saying is that our theories of rationality are ultimately based on a process of reflection that starts with pre-theoretical judgments about what we think is rational, about what dispositions help agents achieve their goals, etc., and that this means that we should take quite seriously our pre-theoretical judgments when they strongly conflict with our theories of rationality.
This is right in principle (as well as in the Pascal's Mugging case), but I think you're too conservative. For example, should I heavily discount argumen...
Relativity is less wrong than Newtonian mechanics but it still breaks down in describing singularities
What does it mean for reality to break down? What does it mean for reality to "describe" something?
The quoted sentence doesn't make any sense to me, and doesn't seem to follow from the article text.
We are not going to stop loving our girlfriend
Interesting use of "we" :)
(Spawned by an exchange between Louie Helm and Holden Karnofsky.)
tl;dr:
The field of formal rationality is relatively new and I believe that we would be well-advised to discount some of its logical implications that advocate extraordinary actions.
Our current methods might turn out to be biased in new and unexpected ways. Pascal's mugging, the Lifespan Dilemma, blackmailing and the wrath of Löb's theorem are just a few examples on how an agent build according to our current understanding of rationality could fail.
Bayes’ Theorem, the expected utility formula, and Solomonoff induction are all reasonable heuristics. Yet those theories are not enough to build an agent that will be reliably in helping us to achieve our values, even if those values were thoroughly defined.
If we wouldn't trust a superhuman agent equipped with our current grasp of rationality to be reliably in extrapolating our volition, how can we trust ourselves to arrive at correct answers given what we know?
We should of course continue to use our best methods to decide what to do. But I believe that we should also draw a line somewhere when it comes to extraordinary implications.
Intuition, Rationality and Extraordinary Implications
Holden Karnofsky is suggesting that in some cases we should follow the simple rule that "extraordinary claims require extraordinary evidence".
I think that we should sometimes demand particular proof P; and if proof P is not available, then we should discount seemingly absurd or undesirable consequences even if our theories disagree.
I am not referring to the weirdness of the conclusions but the foreseeable scope of the consequences of being wrong about them. We should be careful in using the implied scope of certain conclusions to outweigh their low probability. I feel we should put more weight to the consequences of our conclusions being wrong than being right.
As an example take the idea of quantum suicide and assume it would make sense under certain circumstances. I wouldn’t commit quantum suicide even given a high confidence in the many-worlds interpretation of quantum mechanics being true. Logical implications just don’t seem enough in some cases.
To be clear, extrapolations work and often are the best we can do. But since there are problems such as the above, that we perceive to be undesirable and that lead to absurd actions and their consequences, I think it is reasonable to ask for some upper and lower bounds regarding the use and scope of certain heuristics.
We are not going to stop pursuing whatever terminal goal we have chosen just because someone promises us even more utility if we do what that person wants. We are not going to stop loving our girlfriend just because there are other people who do not approve our relationship and who together would experience more happiness if we divorced than the combined happiness of us and our girlfriend being in love. Therefore we already informally established some upper and lower bounds.
I have read about people who became very disturbed and depressed taking ideas too seriously. That way madness lies, and I am not willing to choose that path yet.
Maybe I am simply biased and have been unable to overcome it yet. But my best guess right now is that we simply have to draw a lot of arbitrary lines and arbitrarily refuse some steps.
Taking into account considerations of vast utility or low probability quickly leads to chaos theoretic considerations like the butterfly effect. As a computationally bounded and psychical unstable agent I am unable to cope with that. Consequently I see no other way than to neglect the moral impossibility of extreme uncertainty.
Until the problems are resolved, or rationality is sufficiently established, I will continue to put vastly more weight on empirical evidence and my intuition than on logical implications, if only because I still lack the necessary educational background to trust my comprehension and judgement of the various underlying concepts and methods used to arrive at those implications.
Expected Utility Maximization and Complex Values
One of the problems with my current grasp of rationality that I perceive to be unacknowledged are the consequences of expected utility maximization with respect to human nature and our complex values.
I am still genuinely confused about what a person should do. I don't even know how much sense that concept makes. Does expected utility maximization has anything to do with being human?
Those people who take existential risks seriously and who are currently involved in their mitigation seem to be disregarding many other activities that humans usually deem valuable because the expected utility of saving the world does outweigh the pursuit of other goals. I do not disagree with that assessment but find it troubling.
The problem is, will there ever be anything but a single goal, a goal that can either be more effectively realized and optimized to yield the most utility or whose associated expected utility simply outweighs all other values?
Assume that humanity managed to create a friendly AI (FAI). Given the enormous amount of resources that each human is poised to consume until the dark era of the universe, wouldn't the same arguments that now suggest that we should contribute money to existential risk charities then suggest that we should donate our resources to the friendly AI? Our resources could enable it to find a way to either travel back in time, leave the universe or hack the matrix. Anything that could avert the end of the universe and allow the FAI to support many more agents has effectively infinite expected utility.
The sensible decision would be to concentrate on those scenarios with the highest expected utility now, e.g. solving friendly AI, and worry about those problems later. But not only does the same argument always work but the question is also relevant to the nature of friendly AI and our ultimate goals. Is expected utility maximization even compatible with our nature? Does expected utility maximization lead to world states in which wireheading is favored, either directly or indirectly by focusing solely on a single high-utility goal that does outweigh all other goals?
Conclusion
It seems to me that our notion of rationality is not the last word on the topic and that we shouldn't act as if it was.