Two-Tier Rationalism

Alicorn

Related to: Bayesians vs. Barbarians

Consequentialism¹ is a catchall term for a vast number of specific ethical theories, the common thread of which is that they take goodness (usually of a state of affairs) to be the determining factor of rightness (usually of an action). One family of consequentialisms that came to mind when it was suggested that I post about my Weird Forms of Utilitarianism class is called "Two-Tier Consequentialism", which I think can be made to connect interestingly to our rationalism goals on Less Wrong. Here's a summary of two-tier consequentialism².

(Some form of) consequentialism is correct and yields the right answer about what people ought to do. But (this form of) consequentialism has many bad features:

It is unimplementable (because to use it correctly requires more calculation than anyone has time to do based on more information than anyone has time to gather and use).

It is "alienating" (because people trying to obey consequentialistic dictates find them very unlike the sorts of moral motivations they usually have, like "I want to be a nice person" or "so-and-so is my friend")³.
It is "integrity-busting" (because it can force you to consider alternatives that are unthinkably horrifying, if there is the possibility that they might lead to the "best" consequences).
It is "virtue-busting" (because it too often requires a deviation from a pattern of behavior that we consider to be an expression of good personal qualities that we would naturally hope and expect from good people).
It is prone to self-serving abuse (because it's easy, when calculating utilities, to "cook the books" and wind up with the outcome you already wanted being the "best" outcome).

It is "cooperation-busting" (because individuals don't tend to have an incentive to avoid free-riding when their own participation in a cooperative activity will neither make nor break the collective good).

To solve these problems, some consequentialist ethicists (my class focused on Railton and Hare) invented "two-tier consequentialism". The basic idea is that because all of these bad features of (pick your favorite kind of) consequentialism, being a consequentialist has bad consequences, and therefore you shouldn't do it. Instead, you should layer on top of your consequentialist thinking a second tier of moral principles called your "Practically Ideal Moral Code", which ought to have the following more convenient properties:

Must be moral principles that identify a situation or class of situations and call for an action in that/those situation(s).
Must be believable. You should not put in your second tier any principles that you can't buy on a deep level.
Must be potentially sturdy. Your second tier of principles should be ones that you could stick to, in the face of both your own fallibility and the possibility that they will sometimes lead you to perform acts that are not, strictly speaking by your favorite consequentialism, right.
Must be useful. They cannot be principles that are as unimplementable as the original favorite consequentialism - you have to be able to bring them to bear quickly and easily.
Must guide you in actions that are consistent with the expressions of virtue and integrity.
Must satisfy a publicity condition. That is, widespread acceptance of this set of principles should be conducive to cooperation and not lead to the same self-serving abuse problem that consequentialism has.
And most importantly, the principles must, collectively, lead you to usually perform actions that your favorite consequentialism would endorse. In fact, part of the point of the second tier is that, in the long run, it should do a better job of making you do things that are right-according-to-consquentialism than actually trying to implement your favorite consequentialism would.

That last part is key, because the two-tier consequentialist is not abandoning consequentialism. Unlike a rule consequentialist, he still thinks that any given action (if his favorite consequentialism is act-based) is right according to the goodness of something (probably a resulting state of affairs), not according to whether they are permitted by his Practically Ideal Moral Code. He simply brainwashes himself into using his Practically Ideal Moral Code because over the long run, this will be for the best according to his initial, consequentialist values.

And here is the reason I linked to "Bayesians vs. Barbarians", above: what Eliezer is proposing as the best course of action for a rationalist society that is attacked from without sounds like a second-tier rationalism. If it is rational - for the society as a whole and in the long run - that there be soldiers chosen by a particular mechanism who go on to promptly obey the orders of their commanding officers? Well, then, the rational society will just have to arrange for that - even if this causes some individual actions to be non-rational - because the general strategy is the one that generates the results they are interested in (winning the war), and the most rational general strategy isn't one that consists of all the most individually rational parts.

In ethics, the three main families are consequentialism (rightness via goodness), deontic ethics (rightness as adherence to duty), and virtue ethics (rightness as the implementation of virtues and/or faithfulness to an archetype of a good person). Inasmuch as rationality and morality are isomorphic, it seems like you could just as easily have a duty-based rationalism or a virtue-based rationalism. I have strong sympathy for two-tier consequentialism as consequentialist ethics go. But it seems like Eliezer is presupposing a kind of consequentialism of rationality, both in that article and in general with the maxim "rationalists should win!" It sounds rather like we are supposed to be rationalists because rationalists win, and winning is good.

I don't know how widely that maps onto other people's motivations, but for my part, I think my intuitions around why I wish to be rational have more to do with considering it a virtue. I like winning just fine, but even if it turns out that devoting myself to ever finer-grained rationalism confers no significant winnings, I will still consider it a valuable thing in and of itself to be rational. It's not difficult to imagine someone who thinks that it is the duty of intelligent beings in general to hone their intellects in the form of rationalist training; such a person would be more of a deontic rationalist.

¹I'm not a consequentialist of any stripe myself. However, my views are almost extensionally equivalent with an extremely overcomplicated interpretation of Nozickian side-constraint rights-based utilitarianism.

²Paraphrased liberally from classroom handouts by Fred Feldman.

³The example we were given in class was of a man who buys flowers for his wife and, when asked why, says, "As her husband, I'm in a special position to efficiently generate utility in her, and considered buying her flowers to be for the best overall." This in contrast to, "Well, because she's my wife and I love her and she deserves a bouquet of carnations every now and then."

An AI consists of a pragmatically computable approximation of a decision theory, not an implementation of a decision theory. This would seem to be the human reflection of that.

In general, I approve; I've never told anyone to run calculations they can't do, including, in some cases, quantifying probabilities.

But the process of going from a simple, pure decision theory, to a collection of approximate rules, is itself subject to distortion in the mind of a less than perfect rationalist - i.e., people will be tempted to endorse just about any rule that has an intuitive appeal, even if it leads to e.g. circular altruism.

What the rules are an approximation to is some purer decision theory - one is still seeking the same pure thing at the center, taking some particular path closer to it. If you're trying to approximate some incomputable formula, you will nonetheless quite often think of that formula, even if abstractly.

When you paint a sunset you don't end up with a sunset, you end up with a painting; but you should still be looking at the sunset itself, not just your brushes.

And here is the reason I linked to "Bayesians vs. Barbarians", above: what Eliezer is proposing as the best course of action for a rationalist society that is attacked from without sounds like a second-tier rationalism.

Not exactly. Since I intend to work with self-modifying AIs, any decision theory I care to spend much time thinking about should be reflectively consistent and immediately so. This excludes e.g. both causal decision theory and evidential decision theory as usually formulated.

The idea of sacrificing your life after being selected in a draft lottery that maximized your expectation of survival if all other agents behaved the same way you did, is not meant to be second-tier.

But if humans cannot live up to such stern rationality in the face of Newcomblike decision problems, then after taking their own weakness into account, they may have cause to resort to enforcement mechanisms. This is second-tier-ish in a way, but still pretty strongly interpretable as maximizing, to the extent that you vote on the decision before the lottery.

IMO people grossly underrate the degree to which constantly doing explicit expected utility calculations can help achieve most goals. As a very primitive example, no good chess player would do without assigning point values to pieces. There is room here for a sort of "Benthamite conspiracy" which, like the anti-akrasiac conspiracy but in a different way, aimed toward instrumental rationality. Things like alienation are an issue, but everything is alienating if nobody else does it.

ETA: I may try writing a "techniques of the Benthamite Conspiracy" post. Though actually "Neumann/Morgenstern Conspiracy" would be a better name.

I have attempted explicit EU calculations in the past, and have had to make very troubling assumptions and unit approximations, which has limited my further experimentation.

I would be very interested in seeing concrete examples and calculation rules in plausible situations.

The code here is obviously not working. Does anyone know how to fix that so it's parsed properly?

Edit: Okay, I fixed everything in the WYSIWYG. How very irritating.

Edit 2: What conceivable reason could there be for downvoting this comment? Should I have deleted it on account of clutter or something?

What conceivable reason could there be for downvoting this comment?

A possible explanation: this comment is amongst the least useful of all comments on this article. If we ask the question, "Which comments should a random passerby see at the top of the list", this comment belongs all the way at the bottom, possibly entirely hidden from sight.

That's one reason the multipurpose nature of karma is so irritating. Voting doesn't just move comments around, it affects a user's overall karma. Besides, it was already at the bottom.

Don't worry about downvotes here and there - it all comes out in the wash.

Edit 2: What conceivable reason could there be for downvoting this comment? Should I have deleted it on account of clutter or something?

Voter stupidity.

Watch me as I wish fervently for identification of votes.

You need to use html, or the the default WYSIWYG editor. Also, you can preview an article if you save it as "Draft" first.

Even my HTML tags (for superscript) are not working.

Even my HTML tags (for superscript) are not working.

That's because it's a WYSIWYG editor; you have to use the "source" button to be able to edit the HTML directly.

The code for writing posts is completely different from that for writing comments. Edit your post and look for the GUI at the top.

The code here is obviously not working. Does anyone know how to fix that so it's parsed properly?

The problem is that the post editor uses plain old HTML, and doesn't support any of the markup used by the comment editor. (You could stick your code into a comment, then copy the HTML source, I suppose.)

actually, with the WYSIWYG editor, she could stick her code into a comment, and then just copy the resulting comment.

That sounds to me like running away from reality to a shelter of lies that has been trickily arranged to have output resembling the truth.

We certainly aren't doing that. We're trying to fit human-computable functions over ones we can't compute. We aren't running from the raw math, we're frustrated by our inability to use it. We aren't maintaining a distance from its conclusions, we're trying to approximate arbitrarily close.

I like winning just fine, but even if it turns out that devoting myself to ever finer-grained rationalism confers no significant winnings, I will still consider it a valuable thing in and of itself to be rational.

What Eliezer means by "winning" is "achieving your goals, whatever they may be". If one of your most important goals is to be an (epistemic) rationalist, then devoting yourself to ever finer-grained (epistemic) rationalism will confer significant winnings.

That seems circular to the point of absurdity. If rationalism is whatever makes you win and winning is achieving your goals and my goal is to be rational...

That's why I cut at "and my goal is to be rational". Instrumental rationality is systematized winning, winning is steering the future into regions that are higher in your preference ordering. If the possession of truth is also fun, that's just secondary fuel for our epistemic rationality.

But it seems like Eliezer is presupposing a kind of consequentialism of rationality, both in that article and in general with the maxim "rationalists should win!"

Seems that way. Disclaimer: IHAPMOE (I have a poor model of Eliezer).

He [no longer speaking of Eliezer] simply brainwashes himself into using his Practically Ideal Moral Code because over the long run, this will be for the best according to his initial, consequentialist values.

See for example my comment on why trying to maximize happiness should increase your utility more than trying to maximize your utility would. If happiness is the derivative of utility, then maximizing happiness over a finite time period maximizes the increase in utility over that time-period. If you repeatedly engage in maximizing your happiness over a timespan that's small relative to your lifespan, at the end of your life you'll have attained a higher utility than someone who tried to maximize utility over those time-periods.

Must satisfy a publicity condition. That is, widespread acceptance of this set of principles should be conducive to cooperation and not lead to the same self-serving abuse problem that consequentialism has.

This variant on Kant's maxim seems still to be universally adhered to by moralists; yet it's wrong. I know that's a strong claim.

The problem is that everybody has different reasoning abilities. A universal moral code, from which one could demand that it satisfy the publicity condition, must be one that is optimal for EY and for chimpanzees.

If you admit that it may be more optimal for EY to adopt a slightly more sophisticated moral code than the chimpanzees do, then satisfaction of the publicity condition implies suboptimality.

Some of the conditions for one's Practically Ideal Moral Code mean that it's actually not the case that everyone should use the same one. The publicity condition is a sort of a "ceteris paribus, if everyone was just as well-suited to the use of this code as you and used it, would that be okay?" You could replace this formulation of the condition with something like "if everyone did things mostly like the ones I would do under this code, would that be okay?"

That's a more reasonable position, but I think it may be more optimal to view public morality as an ecosystem. It provides more utility to have different agents occupy different niches, even if they have equal abilities. It may have high utility for most people to eschew a particular behavior, yet society may require some people to engage in that behavior. Having multiple moral codes allows this.

that identify a situation or class of situations and call for an action in that/those situation(s).

You don't need multiple moral codes, you just need to identify in a single moral code the situations under which it's appropriate to perform that generally-eschewed action.

Doesn't the publicity condition allow you to make statements like "If you have the skills to do A then do A, otherwise do B"? Similarly, to solve the case where everyone was just like you, a code can alter itself in the case that publicity cares about: "If X percent of agents are using this code, do Y, otherwise do Z." It seems sensible to alter your behavior in both cases, even if it feels like dodging the condition.

An AI consists of a pragmatically computable approximation of a decision theory, not an implementation of a decision theory. This would seem to be the human reflection of that.

In general, I approve; I've never told anyone to run calculations they can't do, including, in some cases, quantifying probabilities.

When you paint a sunset you don't end up with a sunset, you end up with a painting; but you should still be looking at the sunset itself, not just your brushes.

And here is the reason I linked to "Bayesians vs. Barbarians", above: what Eliezer is proposing as the best course of action for a rationalist society that is attacked from without sounds like a second-tier rationalism.

ETA: I may try writing a "techniques of the Benthamite Conspiracy" post. Though actually "Neumann/Morgenstern Conspiracy" would be a better name.

I have attempted explicit EU calculations in the past, and have had to make very troubling assumptions and unit approximations, which has limited my further experimentation.

I would be very interested in seeing concrete examples and calculation rules in plausible situations.

The code here is obviously not working. Does anyone know how to fix that so it's parsed properly?

Edit: Okay, I fixed everything in the WYSIWYG. How very irritating.

Edit 2: What conceivable reason could there be for downvoting this comment? Should I have deleted it on account of clutter or something?

What conceivable reason could there be for downvoting this comment?

That's one reason the multipurpose nature of karma is so irritating. Voting doesn't just move comments around, it affects a user's overall karma. Besides, it was already at the bottom.

Don't worry about downvotes here and there - it all comes out in the wash.

Edit 2: What conceivable reason could there be for downvoting this comment? Should I have deleted it on account of clutter or something?

Voter stupidity.

Watch me as I wish fervently for identification of votes.

You need to use html, or the the default WYSIWYG editor. Also, you can preview an article if you save it as "Draft" first.

Even my HTML tags (for superscript) are not working.

Even my HTML tags (for superscript) are not working.

That's because it's a WYSIWYG editor; you have to use the "source" button to be able to edit the HTML directly.

The code for writing posts is completely different from that for writing comments. Edit your post and look for the GUI at the top.

The code here is obviously not working. Does anyone know how to fix that so it's parsed properly?

actually, with the WYSIWYG editor, she could stick her code into a comment, and then just copy the resulting comment.

That sounds to me like running away from reality to a shelter of lies that has been trickily arranged to have output resembling the truth.

I like winning just fine, but even if it turns out that devoting myself to ever finer-grained rationalism confers no significant winnings, I will still consider it a valuable thing in and of itself to be rational.

That seems circular to the point of absurdity. If rationalism is whatever makes you win and winning is achieving your goals and my goal is to be rational...

But it seems like Eliezer is presupposing a kind of consequentialism of rationality, both in that article and in general with the maxim "rationalists should win!"

Seems that way. Disclaimer: IHAPMOE (I have a poor model of Eliezer).

He [no longer speaking of Eliezer] simply brainwashes himself into using his Practically Ideal Moral Code because over the long run, this will be for the best according to his initial, consequentialist values.

Must satisfy a publicity condition. That is, widespread acceptance of this set of principles should be conducive to cooperation and not lead to the same self-serving abuse problem that consequentialism has.

This variant on Kant's maxim seems still to be universally adhered to by moralists; yet it's wrong. I know that's a strong claim.

If you admit that it may be more optimal for EY to adopt a slightly more sophisticated moral code than the chimpanzees do, then satisfaction of the publicity condition implies suboptimality.

that identify a situation or class of situations and call for an action in that/those situation(s).

You don't need multiple moral codes, you just need to identify in a single moral code the situations under which it's appropriate to perform that generally-eschewed action.

48

Two-Tier Rationalism

48

48

48