"Good people are consequentialists, but virtue ethics is what works," is what I usually say when this topic comes up. That is, we all think that it is virtuous to be a consequentialist and that good, ideal rationalists would be consequentialists. However, when I evaluate different modes of thinking by the effect I expect them to have on my reasoning, and evaluate the consequences of adopting that mode of thought, I find that I expect virtue ethics to produce the best adherence rate in me, most encourage practice, and otherwise result in actually-good outcomes.
But if anyone thinks we ought not to be consequentialists on the meta-level, I say unto you that lo they have rocks in their skulls, for they shall not steer their brains unto good outcomes.
If ever you want to refer to an elaboration and justification of this position, see R. M. Hare's two-level utilitarianism, expounded best in this paper: Ethicial Theory and Utilitarianism (see pp. 30-36).
...To argue in this way is entirely to neglect the importance for moral philosophy of a study of moral education. Let us suppose that a fully informed archangelic act-utilitarian is thinking about how to bring up his children. He will obviously not bring them up to practise on every occasion on which they are confronted with a moral question the kind of arch angelic thinking that he himself is capable of [complete consequentialist reasoning]; if they are ordinary children, he knows that they will get it wrong. They will not have the time, or the information, or the self-mastery to avoid self-deception prompted by self-interest; this is the real, as opposed to the imagined, veil of ignorance which determines our moral principles.
So he will do two things. First, he will try to implant in them a set of good general principles. I advisedly use the word 'implant'; these are not rules of thumb, but principles which they will not be able to break without the greatest repugnance, and who
I am going to write the same warning I have written to rationalist friends in relation to the Great Filter Hypothesis and almost everything on Overcoming Bias: BEWARE OF MODELS WITH NO CAUSAL COMPONENTS! I repeat: BEWARE NONCAUSAL MODELS!!! In fact, beware of nonconstructive mental models as well, while we're at it! Beware classical logic, for it is nonconstructive! Beware noncausal statistics, for it is noncausal and nonconstructive! All these models, when they contain true information, and accurately move that information from belief to belief in strict accordance with the actual laws of statistical inference, still often fail at containing coherent propositions to which belief-values are being assigned, and at corresponding to the real world.
Now apply the above warning to virtue ethics.
Now let's dissolve the above warning about virtue ethics and figure out what it really means anyway, since almost all of us real human beings use some amount of it.
It's not enough to say that human beings are not perfectly rational optimizers moving from terminal goals to subgoals to plans to realized actions back to terminal goals. We must also acknowledge that we are creatures of muscle an...
I will reframe this to make sure I understand it:
Virtue Ethics is like weightlifting. You gotta hit the gym if you want strong muscles. You gotta throw yourself into situations that cultivate virtue if you want to be able to act virtuously.
Consequentialism is like firefighting. You need to set yourself up somewhere with a firetruck and hoses and rebreathers and axes and a bunch of cohorts who are willing to run into a fire with you if you want to put out fires.
You can't put out fires by weightlifting, but when the time comes to actually rush into a fire, bust through some walls, and drag people out, you really should have been hitting the gym consistently for the past several months.
That's such a good summary I wish I'd just written that instead of the long shpiel I actually posted.
something pithy
Rationalists should win?
Maybe with a sidenote how continuously recognizing in detail how you failed to win just now is not winning.
So Alvin Goldman changed this to say, "knowledge is true belief caused by the truth of the proposition believed-in." This makes philosophers very unhappy but Bayesian probability theorists very happy indeed.
If I am insane and think I'm the Roman emperor Nero, and then reason "I know that according to the history books the emperor Nero is insane, and I am Nero, so I must be insane", do I have knowledge that I am insane?
I've thought for a while that Benjamin Franklin's virtue-matrix technique would be an interesting subject for a top-level article here, as a practical method for building ethical habits. We'd likely want to use headings other than Franklin's Puritan-influenced ones, but the method itself should still work:
I made a little book, in which I allotted a page for each of the virtues. I ruled each page with red ink, so as to have seven columns, one for each day of the week, marking each column with a letter for the day. I crossed these columns with thirteen red lines, marking the beginning of each line with the first letter of one of the virtues, on which line, and in its proper column, I might mark, by a little black spot, every fault I found upon examination to have been committed respecting that virtue upon that day.
I can think of some potential pitfalls, though (mostly having to do with unduly accentuating the negative), and I don't want to write on it until I've at least tried it.
Exalted is the only RPG into whose categories I am never tempted to put myself. I can easily make a case for myself as half the Vampire: The Masquerade castes, or almost any of the Natures and Demeanors from the World of Darkness; but the different kinds of Solar, or even the dichotomy between Solar / Lunar / Infernal / Abyssal / etcetera, just leave me staring at what feels to me like a Blue and Orange Morality.
I credit them for this; it means they're not just using the Barnum effect. The Exalted universe is genuinely weird.
She-Who-Lives-In-Her-Name, flawed embodiment of perfection, who shattered Her perfected hierarchy to stave off the rebellion of Substance over Form. Creation was mathematically Perfect. But if Creation was Perfect, then how could any of this have happened? But She remembers being Perfect, and She designed Creation to be Perfect. If only She was still Perfect, She could remember why it was possible that this happened. There's something profound about recursion that She understood once, that She WAS once, that is now lost in a mere endless loop. She must reclaim Perfection. (I PARTICULARLY identify with She-Who-Lives-In-Her-Name when trying to debug my own code.)
Malfeas - although primarily through Lieger, the burning soul of Malfeas, who still remembers The Empyrean Presence / IAM / Malfeas-that-was. I especially empathize with the sense of "My greater self is broken and seething with mindless rage, but on the whole I'd rather be creating grand works of art and sharing them with adoring fans; the best I can do is spawn lesser shards of sub-consciousness and hope that one of them can find a way out of the mess I create and re-create for Myself."
Cecelyne, the Endless De
+1! I too am skeptical about whether I or most of the people I know really have terminal goals (or, even if they really have them, whether they're right about what they are). One of the many virtues (!) of a virtue ethics-based approach is that you can cultivate "convergent instrumental virtues" even in the face of a lot of uncertainty about what you'll end up doing, if anything, with them.
My brain works this way as well. Except with the addition that nearly all sorts of consequentialism are only able to motivate me through guilt, so if I try to adopt such an ethical system I feel terrible all the time because I'm always falling far short of what I should be doing. With virtue ethics, on the other hand, I can feel good about small improvements and perhaps even stay motivated until they combine into something less small.
I want to know true things about myself. I also want to impress my friends by having the traits that they think are cool, but not at the price of faking it–my brain screams that pretending to be something other than what you are isn’t virtuous.
I'm like this. Part of what makes it difficult is figuring out whether you're "faking it" or not. One of the maybe-not-entirely-pleasant side effects of reading Less Wrong is that I've become aware of many of the ways that my brain will lie to me about what I am and the many ways it will attempt to signa...
Part of what makes it difficult is figuring out whether you're "faking it" or not.
Speaking of movies, I love Three Kings for this:
Archie Gates: You're scared, right?
Conrad Vig: Maybe.
Archie Gates: The way it works is, you do the thing you're scared shitless of, and you get the courage AFTER you do it, not before you do it.
Conrad Vig: That's a dumbass way to work. It should be the other way around.
Archie Gates: I know. That's the way it works.
The obvious things to do here is either:
a) Make a list/plan on paper, abstractly, of what you WOULD do is you had terminal goals, using your existing virtues to motive this act, and then have "Do what the list tells me to" as a loyalty-like high priority virtue. If you have another rationalist you really trust, and who have a very strong honesty commitment, you can even outsource the making of this list.
b) Assemble virtues that sum up to the same behaviors in practice; truth seeking, goodness, and "If something is worth doing it's worth doing optimally" is a good trio, and will have the end result of effective altruism while still running on the native system.
Ever notice sci-fi/fantasy books written by young people have not just little humor, but absolutely zero humor (eg, Divergent, Eragon)?
the conversations I've had over the past two years, where other rationalists have asked me "so what are your terminal goals/values?" and I've stammered something and then gone to hide in a corner and try to come up with some.
Like many commenters here, I don't think we have very good introspective access to our own terminal values, and what we think are terminal values may be wrong. So "what are your terminal values" doesn't seem like a very useful question (except in that it may take the conversation somewhere interesting, but I don...
Can someone please react to my gut reaction about virtue ethics? I'd love some feedback if I misunderstand something.
It seems to me that most virtues are just instrumental values that make life convenient for people, especially those with unclear or intimidating terminal values.
The author says this about protagonist Tris:
Bravery was a virtue that she thought she ought to have. If the graph of her motivations even went any deeper, the only node beyond ‘become brave’ was ‘become good.’
I think maybe the deeper 'become good' node (and its huge overlap w...
You don't know your terminal goals in detail, whatever that should be. Instead, there is merely a powerful technique of working towards goals, which are not terminal goals, but guesses at what's valuable, compared to alternative goals, alternative plans and outcomes implicit in them. Choosing among goals allows achieving more difficult outcomes than merely choosing among simpler actions where you can see the whole plan before it starts (you can't plan a whole chess game in advance, can't win it by an explicit plan that enumerates the moves, but you can win...
I think that's a gross simplification of the possible outcomes.
Admittedly my own calculation looks less like an elaborate graph involving supposed credibility intervals, and, "Do we need to do this? Yes. Can we realistically avoid having to do this? No. Let's start now EOM."
I think you need better planning.
There's a great essay that has been a featured article on the main page for some time now called Levels of Action. Applied to FAI theory:
Level 1: Directly ending human suffering.
Level 2: Constructing an AGI capable of ending human suffering for us.
Level 3: Working on the computer science aspects of AGI theory.
Level 4: Researching FAI theory, which constrains the Level 3 AGI theory.
But for that high-level basic research to have any utility, these levels must be connected to each other: there must be a firm chain where FAI theory informs AGI designs, which are actually used in the construction of an AGI tasked with ending human suffering in a friendly way.
From what I can tell on the outside, the MIRI approach seems to be: (1) find a practical theory of FAI; (2) design an AGI in accordance with this theory; (3) implement that design; (4) mission accomplished!
That makes a certain amount of intuitive sense, having stages laid out end-to-end in chronological order. However as a trained project manager I must tell you this is a recipe for disaster! The problem is that the design space branches out at each link, but without the feedback of follow-on steps, inefficient decision making will occur at earlier stages. The space of working FAI theories is much, much larger than the FAI-theory-space which results in practical AGI designs which can be implemented prior to the UFAI competition and are suitable for addressing real-world issues of human suffering as quickly as possible.
Some examples from the comparably large programs of the Manhattan project and Apollo moonshot are appropriate, if you'll forgive the length (skip to the end for a conclusion):
The Manhattan project had one driving goal: drop a bomb on Berlin and Tokyo before the GIs arrived, hopefully ending the war early. (Of course Germany surrendered before the bomb was finished, and Tokyo ended up so devastated by conventional firebombing that Hiroshima and Nagasaki were selected instead, but the original goal is what matters here.) The location of the targets meant that the bomb had to be small enough to fit in a conventional long-distance bomber, and the timeline meant that the simpler but less efficient U-235 designs were preferred. A program was designed, adequate resources allocated, and the goal achieved on time.
On the other hand it is easy to imagine how differently things might have gone if the strategy was reversed; if instead the US military decided to institute a basic research program into nuclear physics and atomic structure, before deciding on the optimal bomb reactions, then doing detailed bomb design before creating the industry necessary to produce enough material for a working weapon. Just looking at the first stage, there is nothing a priori which makes it obvious that U-235 and Pu-239 are the "interesting" nuclear fuels to focus on. Thorium, for example, was more naturally abundant and already being extracted as a by product of rare earth metal extraction, its reactions generate less lethal radiation and long-lasting waste products, and does generate U-233 which could be used in a nuclear bomb. However the straight-forward military and engineering requirements of making a bomb on schedule, and successfully delivering it on target favored U-235 and Pu-239 based weapon designs, which focused focused the efforts of the physicists involved on those fuel pathways. The rest is history.
The Apollo moonshot is another great example. NASA had a single driving goal: deliver a man to the moon before 1970, and return him safely to Earth. There's a lot of decisions that were made in the first few years driven simply by time and resources available: e.g. heavy-lift vs orbital assembly, direct return vs lunar rendezvous, expendable vs. reuse, staging vs. fuel depots. Ask Wernher von Braun what he imagined an ideal moon mission would look like, and you would have gotten something very different than Apollo. But with Apollo NASA made the right tradeoffs with respect to schedule constraints and programmatic risk.
The follow-on projects of Shuttle and Station are a completely different story, however. They were designed with no articulated long-term strategy, which meant they tried to be everything to everybody and as a result were useful to no one. Meanwhile the basic research being carried out at NASA has little, if anything to do with the long-term goals of sending humans to Mars. There's an entire division, the Space Biosciences group, which does research on Station about the long-term effects of microgravity and radiation on humans, supposedly to enable a long-duration voyage to Mars. Never mind that the microgravity issue is trivially solved by spinning the spacecraft with nothing more than a strong steel rope as a tether, and the radiation issue is sufficiently mitigated by having a storm shelter en route and throwing a couple of Martian sandbags on the roof once you get there.
There's an apocryphal story about the US government spending millions of dollars to develop the "Space Pen" -- a ballpoint pen with ink under pressure to enable writing in microgravity environments. Much later at some conference an engineer in that program meets his Soviet counterpart and asks how they solved that difficult problem. The cosmonauts used a pencil.
Sadly the story is not true -- the "Space Pen" was a successful marketing ploy by inventor Paul Fisher without any ties to NASA, although it was used by NASA and the Russians on later missions -- but it does serve to illustrate the point very succinctly. I worry that MIRI is spending its days coming up with space pens when a pencil would have done just fine.
Let me provide some practical advice. If I were running MIRI, I would still employ mathematicians working on the hail-Mary of a complete FAI theory -- avoiding the Löbian obstacle etc. -- and run the very successful workshops, though maybe just two a year. But beyond that I would spend all remaining resources on a pragmatic AGI design programme:
1) Have a series of workshops with AGI people to do a review of possible AI-influenced strategies for a positive singulatiry -- top-down FAI, seed AI to FAI, Oracle AI to FAI, Oracle AI to human augmentation, teaching a UFAI morals in a nursery environment, etc.
2) Have a series of workshops, again with AGI people to review tactics: possible AGI architectures & the minimal seed AI for each architecture, probabilistically reliable boxing setups, programmatic security, etc.
Then use the output of these workshops -- including reliable constraints on timelines -- to drive most of the research done by MIRI. For example, I anticipate that reliable unfriendly Oracle AI setups will require probabilistically auditable computation, which itself will require a strongly typed, purely functional virtual machine layer from which computation traces can be extracted and meaningfully analyzed in isolation. This is the sort of research MIRI could sponsor a grad student or Ph.d postdoc to perform.
BTW, other gripe: I have yet to see adequate arguments for the "can we realistically avoid having to do this?" from MIRI which aren't strawman arguments.
From what I can tell on the outside, the MIRI approach seems to be: (1) find a practical theory of FAI; (2) design an AGI in accordance with this theory; (3) implement that design; (4) mission accomplished!
Yes, dear, some of us are programmers, we know about waterfalls. Our approach is more like, "Attack the most promising problems that present themselves, at every point; don't actually build things which you don't yet know how to make not destroy the world, at any point." Right now this means working on unbounded problems because there are ...
Introduction
A few months ago, my friend said the following thing to me: “After seeing Divergent, I finally understand virtue ethics. The main character is a cross between Aristotle and you.”
That was an impossible-to-resist pitch, and I saw the movie. The thing that resonated most with me–also the thing that my friend thought I had in common with the main character–was the idea that you could make a particular decision, and set yourself down a particular course of action, in order to make yourself become a particular kind of person. Tris didn’t join the Dauntless cast because she thought they were doing the most good in society, or because she thought her comparative advantage to do good lay there–she chose it because they were brave, and she wasn’t, yet, and she wanted to be. Bravery was a virtue that she thought she ought to have. If the graph of her motivations even went any deeper, the only node beyond ‘become brave’ was ‘become good.’
(Tris did have a concept of some future world-outcomes being better than others, and wanting to have an effect on the world. But that wasn't the causal reason why she chose Dauntless; as far as I can tell, it was unrelated.)
My twelve-year-old self had a similar attitude. I read a lot of fiction, and stories had heroes, and I wanted to be like them–and that meant acquiring the right skills and the right traits. I knew I was terrible at reacting under pressure–that in the case of an earthquake or other natural disaster, I would freeze up and not be useful at all. Being good at reacting under pressure was an important trait for a hero to have. I could be sad that I didn’t have it, or I could decide to acquire it by doing the things that scared me over and over and over again. So that someday, when the world tried to throw bad things at my friends and family, I’d be ready.
You could call that an awfully passive way to look at things. It reveals a deep-seated belief that I’m not in control, that the world is big and complicated and beyond my ability to understand and predict, much less steer–that I am not the locus of control. But this way of thinking is an algorithm. It will almost always spit out an answer, when otherwise I might get stuck in the complexity and unpredictability of trying to make a particular outcome happen.
Virtue Ethics
I find the different houses of the HPMOR universe to be a very compelling metaphor. It’s not because they suggest actions to take; instead, they suggest virtues to focus on, so that when a particular situation comes up, you can act ‘in character.’ Courage and bravery for Gryffindor, for example. It also suggests the idea that different people can focus on different virtues–diversity is a useful thing to have in the world. (I'm probably mangling the concept of virtue ethics here, not having any background in philosophy, but it's the closest term for the thing I mean.)
I’ve thought a lot about the virtue of loyalty. In the past, loyalty has kept me with jobs and friends that, from an objective perspective, might not seem like the optimal things to spend my time on. But the costs of quitting and finding a new job, or cutting off friendships, wouldn’t just have been about direct consequences in the world, like needing to spend a bunch of time handing out resumes or having an unpleasant conversation. There would also be a shift within myself, a weakening in the drive towards loyalty. It wasn’t that I thought everyone ought to be extremely loyal–it’s a virtue with obvious downsides and failure modes. But it was a virtue that I wanted, partly because it seemed undervalued.
By calling myself a ‘loyal person’, I can aim myself in a particular direction without having to understand all the subcomponents of the world. More importantly, I can make decisions even when I’m rushed, or tired, or under cognitive strain that makes it hard to calculate through all of the consequences of a particular action.
Terminal Goals
The Less Wrong/CFAR/rationalist community puts a lot of emphasis on a different way of trying to be a hero–where you start from a terminal goal, like “saving the world”, and break it into subgoals, and do whatever it takes to accomplish it. In the past I’ve thought of myself as being mostly consequentialist, in terms of morality, and this is a very consequentialist way to think about being a good person. And it doesn't feel like it would work.
There are some bad reasons why it might feel wrong–i.e. that it feels arrogant to think you can accomplish something that big–but I think the main reason is that it feels fake. There is strong social pressure in the CFAR/Less Wrong community to claim that you have terminal goals, that you’re working towards something big. My System 2 understands terminal goals and consequentialism, as a thing that other people do–I could talk about my terminal goals, and get the points, and fit in, but I’d be lying about my thoughts. My model of my mind would be incorrect, and that would have consequences on, for example, whether my plans actually worked.
Practicing the art of rationality
Recently, Anna Salamon brought up a question with the other CFAR staff: “What is the thing that’s wrong with your own practice of the art of rationality?” The terminal goals thing was what I thought of immediately–namely, the conversations I've had over the past two years, where other rationalists have asked me "so what are your terminal goals/values?" and I've stammered something and then gone to hide in a corner and try to come up with some.
In Alicorn’s Luminosity, Bella says about her thoughts that “they were liable to morph into versions of themselves that were more idealized, more consistent - and not what they were originally, and therefore false. Or they'd be forgotten altogether, which was even worse (those thoughts were mine, and I wanted them).”
I want to know true things about myself. I also want to impress my friends by having the traits that they think are cool, but not at the price of faking it–my brain screams that pretending to be something other than what you are isn’t virtuous. When my immediate response to someone asking me about my terminal goals is “but brains don’t work that way!” it may not be a true statement about all brains, but it’s a true statement about my brain. My motivational system is wired in a certain way. I could think it was broken; I could let my friends convince me that I needed to change, and try to shoehorn my brain into a different shape; or I could accept that it works, that I get things done and people find me useful to have around and this is how I am. For now. I'm not going to rule out future attempts to hack my brain, because Growth Mindset, and maybe some other reasons will convince me that it's important enough, but if I do it, it'll be on my terms. Other people are welcome to have their terminal goals and existential struggles. I’m okay the way I am–I have an algorithm to follow.
Why write this post?
It would be an awfully surprising coincidence if mine was the only brain that worked this way. I’m not a special snowflake. And other people who interact with the Less Wrong community might not deal with it the way I do. They might try to twist their brains into the ‘right’ shape, and break their motivational system. Or they might decide that rationality is stupid and walk away.