Jocko Podcast
I've recently been extracting extraordinary value from the Jocko Podcast.
Jocko Willink is a retired Navy SEAL commander, jiu-jitsu black belt, management consultant and, in my opinion, master rationalist. His podcast typically consists of detailed analysis of some book on military history or strategy followed by a hands-on Q&A session. Last week's episode (#38) was particularly good and if you want to just dive in, I would start there.
As a sales pitch, I'll briefly describe some of his recurring talking points:
- Extreme ownership. Take ownership of all outcomes. If your superior gave you "bad orders", you should have challenged the orders or adapted them better to the situation; if your subordinates failed to carry out a task, then it is your own instructions to them that were insufficient. If the failure is entirely your own, admit your mistake and humbly open yourself to feedback. By taking on this attitude you become a better leader and through modeling you promote greater ownership throughout your organization. I don't think I have to point out the similarities between this and "Heroic Morality" we talk about around here.
- Mental toughness and discipline. Jocko's language around this topic is particularly refreshing, speaking as someone who has spent too much time around "self help" literature, in which I would partly include Less Wrong. His ideas are not particularly new, but it is valuable to have an example of somebody who reliably executes on his the philosophy of "Decide to do it, then do it." If you find that you didn't do it, then you didn't truly decide to do it. In any case, your own choice or lack thereof is the only factor. "Discipline is freedom." If you adopt this habit as your reality, it become true.
- Decentralized command. This refers specifically to his leadership philosophy. Every subordinate needs to truly understand the leader's intent in order to execute instructions in a creative and adaptable way. Individuals within a structure need to understand the high-level goals well enough to be able to act in a almost all situations without consulting their superiors. This tightens the OODA loop on an organizational level.
- Leadership as manipulation. Perhaps the greatest surprise to me was the subtlety of Jocko's thinking about leadership, probably because I brought in many erroneous assumptions about the nature of a SEAL commander. Jocko talks constantly about using self-awareness, detachment from one's ideas, control of one's own emotions, awareness of how one is perceived, and perspective-taking of one's subordinates and superiors. He comes off more as HPMOR!Quirrell than as a "drill sergeant".
The Q&A sessions, in which he answers questions asked by his fans on Twitter, tend to be very valuable. It's one thing to read the bullet points above, nod your head and say, "That sounds good." It's another to have Jocko walk through the tactical implementation of this ideas in a wide variety of daily situations, ranging from parenting difficulties to office misunderstandings.
For a taste of Jocko, maybe start with his appearance on the Tim Ferriss podcast or the Sam Harris podcast.
Playing offense
There is a pattern I noticed that appears whenever some new interesting idea or technology gains some media traction above certain threshold. Those who are considered opinion-makers (journalists, intellectuals) more often than not write about this new movement/idea/technology in a way that it somewhere between cautions and negative. As this happens those who adopted this new ideas/technologies became somewhat weary of promoting it and in fear of a low status decide to retreat from their public advocacy of the mentioned ideas.
I was wondering that maybe in some circumstances, the right move for those that are getting the negative attention is not to defend themselves but instead to go on a offense. And one the most interesting offensive tactic might be is to try to reverse the framing of the subject matter and put the burden of the argument on the critics in a way that requires them to seriously reconsider their position:
- Those who are critical of this idea are actually doing something wrong and they are unable to see the magnitude of their mistake
- The fact that they are not adopting this idea/product has a big cost that they are not aware of, and in reality they are the ones making a weird choice
- They already adopted that idea/position but they don't notice it as it not framed in context they understand or find comfortable
The site has crossed a threshold—it is now so widely trafficked that it's fast becoming a routine aid to social interaction, like e-mail and antiperspirant.
As far as a transhumanist is concerned, if you see someone in danger of dying, you should save them; if you can improve someone’s health, you should. There, you’re done. No special cases.
You also don’t ask whether the remedy will involve only “primitive” technologies (like a stretcher to lift the six-year-old off the railroad tracks); or technologies invented less than a hundred years ago (like penicillin) which nonetheless seem ordinary because they were around when you were a kid; or technologies that seem scary and sexy and futuristic (like gene therapy) because they were invented after you turned 18; or technologies that seem absurd and implausible and sacrilegious (like nanotech) because they haven’t been invented yet.
Forecasting health gaps
You're an average person.
You don't know what diseases you'll get in the future.
You know people get diseases and certain populations get diseases more than others, enough to say certain things cause diseases.
You're not quite the average person.
You have a strong preference against sickness and a strong belief in your ability to mitigate deleterious circumstances.
You have access to preventative research. You know if you don't work in a coal mine, overtrain when running, and eat healthy, you can stay healthier than those who take those risks.
You know that some disease outcomes are less than predictable, so you want to work towards the available of treatments that fill gaps in the availability of therapeutics. For instance, you might want a treatment for HIV to be developed, in case you become HIV infected, since there is a risk of HIV exposure for almost anyone exposed to unprotected sex, since they won't necessarily know their sexual partners entire serohistory (noologism?)
However, you don't know which diseases you will get. So how do you prioritise?
Perhaps, medical device and pharmaceutical company strategies could be ported to your situation.
Most people, including non-epidemiologist researchers, don't have access to epidemiology data sets.
Most people, don't have the patience to read a book on medical market research
You don't have the funds or connections to employ the world's only specialist in the area of medical market forecasting.
At least he's broken down the field into best practice questions:
- Where can we find epidemiological information/data?
- How do we judge/evaluate it?
- What is the correct methodology for using it?
- What's useful and what's not useful for pharma market researchers?
- How do we combine/apply it with MR data?
The only firm, other than Bill's, that appears to specialist in the area fortunately breaks down the techniques in the field for us:
- Integrated forecasts based on choice modeling or univariate demand research to ensure that the primary marketing research is aligned with the needs of forecast
- Volumetric new product forecasting to provide the accuracy required for pre-launch planning
- Combination epidemiology-/sales-volume-based forecast models that provide robust market sizing and trend information
- Custom patient flow models that represent the dynamics of complex markets not possible with cross-sectional methods
- Oncology-specific forecast models to accept the data and assumptions unique to cancer therapeutics and accurately forecast patients on therapy
- Subscription forecasting software for clients who would like to build their own forecasts using user-friendly functionality to save time and prevent calculation and logic errors
The generalisations in the industry, things that are applicable across particular populations, therapeutics or firms appears to be summarised here:
It's 36 pages long, but well worth it if area is interesting to you.
So now you know how this market operates, what are the outputs:
Mega trends are available here
A detailed review is available here
Do they answer the questions, use the techniques proposed, and answer the ultimate question of what gaps exist in the provision of medical therapeutics?
I don't know how to apply the techniques to tell. What do you think?
I know there are other ways to think about these problems.
For instance, if I put myself in a pharmaceutical company's position, I could use strategic tools like Porter's 4 forces and see whether a particular decision looks compelling.
The 2018 paper suggests that pain killers in developed countries are going to get lots of government investment.
So, does it makes sense to supply that demand?
There are a number of highly risky threats that might suggest say a potential poppy producer shouldn't proceed:
**technological**
Disruptive biotechnology, such as genetically modified yeast which can convert glucose to morphine. There have been suggestions that this invention is overhyped
**political**
Licensing poppy producers who currently supply illicit drug producers
This said, the whole thing is very underdetermined so I suspect actual organisations are far more procedural in their approaches. What do you think?
Learning to get things right first time
These are quick notes on an idea for an indirect strategy to increase the likelihood of society acquiring robustly safe and beneficial AI.
Motivation:
-
Most challenges we can approach with trial-and-error, so many of our habits and social structures are set up to encourage this. There are some challenges where we may not get this opportunity, and it could be very helpful to know what methods help you to tackle a complex challenge that you need to get right first time.
-
Giving an artificial intelligence good values may be a particularly important challenge, and one where we need to be correct first time. (Distinct from creating systems that act intelligently at all, which can be done by trial and error.)
-
Building stronger societal knowledge about how to approach such problems may make us more robustly prepared for such challenges. Having more programmers in the AI field familiar with the techniques is likely to be particularly important.
Idea: Develop methods for training people to write code without bugs.
-
Trying to teach the skill of getting things right first time.
-
Writing or editing code that has to be bug-free without any testing is a fairly easy challenge to set up, and has several of the right kind of properties. There are some parallels between value specification and programming.
-
Set-up puts people in scenarios where they only get one chance -- no opportunity to test part/all of the code, just analyse closely before submitting.
-
Interested in personal habits as well as social norms or procedures that help this.
-
Daniel Dewey points to standards for code on the space shuttle as a good example of getting high reliability code edits.
-
-
How to implement:
-
Ideal: Offer this training to staff at software companies, for profit.
-
Although it’s teaching a skill under artificial hardship, it seems plausible that it could teach enough good habits and lines of thinking to noticeably increase productivity, so people would be willing to pay for this.
-
Because such training could create social value in the short run, this might give a good opportunity to launch as a business that is simultaneously doing valuable direct work.
-
Similarly, there might be a market for a consultancy that helped organisations to get general tasks right the first time, if we knew how to teach that skill.
-
-
More funding-intensive, less labour intensive: run competitions with cash prizes
-
Try to establish it as something like a competitive sport for teams.
-
Outsource the work of determining good methods to the contestants.
-
This is all quite preliminary and I’d love to get more thoughts on it. I offer up this idea because I think it would be valuable but not my comparative advantage. If anyone is interested in a project in this direction, I’m very happy to talk about it.
Democracy and rationality
Note: This is a draft; so far, about the first half is complete. I'm posting it to Discussion for now; when it's finished, I'll move it to Main. In the mean time, I'd appreciate comments, including suggestions on style and/or format. In particular, if you think I should(n't) try to post this as a sequence of separate sections, let me know.
Summary: You want to find the truth? You want to win? You're gonna have to learn the right way to vote. Plurality voting sucks; better voting systems are built from the blocks of approval, medians (Bucklin cutoffs), delegation, and pairwise opposition. I'm working to promote these systems and I want your help.
Contents: 1. Overblown¹ rhetorical setup ... 2. Condorcet's ideals and Arrow's problem ... 3. Further issues for politics ... 4. Rating versus ranking; a solution? ... 5. Delegation and SODA ... 6. Criteria and pathologies ... 7. Representation, Proportional representation, and Sortition ... 8. What I'm doing about it and what you can ... 9. Conclusions and future directions ... 10. Appendix: voting systems table ... 11. Footnotes
1.
This is a website focused on becoming more rational. But that can't just mean getting a black belt in individual epistemic rationality. In a situation where you're not the one making the decision, that black belt is just a recipe for frustration.
Of course, there's also plenty of content here about how to interact rationally; how to argue for truth, including both hacking yourself to give in when you're wrong and hacking others to give in when they are. You can learn plenty here about Aumann's Agreement Theorem on how two rational Bayesians should never knowingly disagree.
But "two rational Bayesians" isn't a whole lot better as a model for society than "one rational Bayesian". Aspiring to be rational is well and good, but the Socratic ideal of a world tied together by two-person dialogue alone is as unrealistic as the sociopath's ideal of a world where their own voice rules alone. Society needs structures for more than two people to interact. And just as we need techniques for checking irrationality in one- and two-person contexts, we need them, perhaps all the more, in multi-person contexts.
Most of the basic individual and dialogical rationality techniques carry over. Things like noticing when you are confused, or making your opponent's arguments into a steel man, are still perfectly applicable. But there's also a new set of issues when n>2: the issues of democracy and voting. For a group of aspiring rationalists to come to a working consensus, of course they need to begin by evaluating and discussing the evidence, but eventually it will be time to cut off the discussion and just vote. When they do so, they should understand the strengths and pitfalls of voting in general and of their chosen voting method in particular.
And voting's not just useful for an aspiring rationalist community. As it happens, it's an important part of how governments are run. Discussing politics may be a mind-killer in many contexts, but there are an awful lot of domains where politics is a part of the road to winning.² Understanding voting processes a little bit can help you navigate that road; understanding them deeply opens the possibility of improving that road and thus winning more often.
2. Collective rationality: Condorcet's ideals and Arrow's problem
Imagine it's 1785, and you're a member of the French Academy of Sciences. You're rubbing elbows with most of the giants of science and mathematics of your day: Coulomb, Fourier, Lalande, Lagrange, Laplace, Lavoisier, Monge; even the odd foreign notable like Franklin with his ideas to unify electrostatics and electric flow.

One day, they'll put your names in front of lots of cameras (even though that foreign yokel Franklin will be in more pictures)
And this academy, with many of the smartest people in the world, has votes on stuff. Who will be our next president; who should edit and schedule our publications; etc. You're sure that if you all could just find the right way to do the voting, you'd get the right answer. In fact, you can easily prove that, or something like it: if a group is deciding between one right and one wrong option, and each member is independently more than 50% likely to get it right, then as the group size grows the chance of a majority vote choosing the right option goes to 1.
But somehow, there's still annoying politics getting in the way. Some people seem to win the elections simply because everyone expects them to win. So last year, the academy decided on a new election system to use, proposed by your rival, Charles de Borda, in which candidates get different points for being a voters first, second, or third choice, and the one with the most points wins. But you're convinced that this new system will lead to the opposite problem: people who win the election precisely because nobody expected them to win, by getting the points that voters strategically don't want to give to a strong rival. But when people point that possibility out to Borda, he only huffs that "my system is meant for honest men!"
So with your proof of the above intuitive, useful result about two-way elections, you try to figure out how to reduce an n-way election to the two-candidate case. Clearly, you can show that Borda's system will frequently give the wrong results from that perspective. But frustratingly, you find that there could sometimes be no right answer; that there will be no candidate who would beat all the others in one-on-one races. A crack has opened up; could it be that the collective decisions of intelligent individual rational agents could be irrational?
Of course, the "you" in this story is the Marquis de Condorcet, and the year 1785 is when he published his Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix, a work devoted to the question of how to acheive collective rationality. The theorem referenced above is Condorcet's Jury Theorem, which seems to offer hope that democracy can point the way from individually-imperfect rationality towards an ever-more-perfect collective rationality. Just as Aumann's Agreement Theorem shows that two rational agents should always move towards consensus, the Condorcet Jury Theorem apparently shows that if you have enough rational agents, the resulting consensus will be correct.
But as I said, Condorcet also opened a crack in that hope: the possibility that collective preferences will be cyclical. If the assumptions of the jury theorem don't hold — if each voter doesn't have a >50% chance of being right on a randomly-selected question, OR if the correctness of two randomly-selected voters is not independent and uncorrelated — then individually-sensible choices can lead to collectively-ridiculous ones.
What do I mean by "collectively-ridiculous"? Let's imagine that the Rationalist Marching Band is choosing the colors for their summer, winter, and spring uniforms, and that they all agree that the only goal is to have as much as possible of the best possible colors. The summer-style uniforms come in red or blue, and they vote and pick blue; the winter-style ones come in blue or green, and they pick green; and the spring ones come in green or red, and they pick red.
Obviously, this makes us doubt their collective rationality. If, as they all agree they should, they had a consistent favorite color, they should have chosen that color both times that it was available, rather than choosing three different colors in the three cases. Theoretically, the salesperson could use such a fact to pump money out of them; for instance, offering to let them "trade up" their spring uniform from red to blue, then to green, then back to red, charging them a small fee each time; if they voted consistently as above, they would agree to each trade (though of course in reality human voters would probably catch on to the trick pretty soon, so the abstract ideal of an unending circular money pump wouldn't work).
This is the kind of irrationality that Condorcet showed was possible in collective decisionmaking. He also realized that there was a related issue with logical inconsistencies. If you were take a vote on 3 logically related propositions — say, "Should we have a Minister of Silly Walks, to be appointed by the Chancellor of the Excalibur", "Should we have a Minister of Silly Walks, but not appointed by the Chancellor of the Excalibur", and "Should we in fact have a Minister of Silly Walks at all", where the third cannot be true unless one of the first is — then you could easily get majority votes for inconsistent results — in this case, no, no, and yes, respectively. Obviously, there are many ways to fix the problem in this simple case — probably many less-wrong'ers would suggest some Bayesian tricks related to logical networks and treating votes as evidence⁸ — but it's a tough problem in general even today, especially when the logical relationships can be complex, and Condorcet was quite right to be worried about its implications for collective rationality.³
And that's not the only tough problem he correctly foresaw. Nearly 200 years later and an ocean away, in the 1960s, Kenneth Arrow showed that it was impossible for a preferential voting system to avoid the problem of a "Condorcet cycle" of preferences. Arrows theorem shows that any voting system which can consistently give the same winner (or, in ties, winners) for the same voter preferences; which does not make one voter the effective dictator; which is sure to elect a candidate if all voters prefer them; and which will switch the results for two candidates if you switch their names on all the votes... must exhibit, in at least some situation, the pathology that befell the Rationalist Marching Band above, or in other words, must fail "independence of irrelevant alternatives".
Arrow's theorem is far from obvious a priori, but proof is not hard to understand intuitively using Condorcet's insight. Say that there are three candidates, X, Y, and Z, with roughly equal bases of support; and that they form a Condorcet cycle, because in two-way races, X would beat Y with help from Z supporters, Y would beat Z with help from X supporters, and Z would beat X with help from Y supporters. So whoever wins in the three-way race — say, X — just remove the one who would have lost to them — Y in this case — and that "irrelevant" change will change the winner to be the third — Z in this case.
Summary of above: Collective rationality is harder than individual or two-way rationality. Condorcet saw the problem and tried to solve it, but Arrow saw that Condorcet had been doomed to fail.
3. Further issues for politics
So Condorcet's ideals of better rationality through voting appear to be in ruins. But at least we can hope that voting is a good way to do politics, right?
Not so fast. Arrow's theorem quickly led to further disturbing results. Alan Gibbard (and also Mark Satterthwaite) extended that there is no voting system which doesn't encourage voting strategy. That is, if you view an voting system as a class of games where the finite players and finite available strategies are fixed, no player is effectively a dictator, and the only thing that varies are the payoffs for each player from each outcome, there is no voting system where you can derive your best strategic vote purely by looking "honestly" at your own preferences; there is always the possibility of situations where you have to second-guess what others will do.
Amartya Sen piled on with another depressing extension of Arrows' logic. He showed that there is no possible way of aggregating individual choices into collective choice that satisfies two simple criteria. First, it shouldn't choose pareto-dominated outcomes; if everyone prefers situation XYZ to ABC, that they don't do XYZ. Second, it is "minimally liberal"; that is, there are at least two people who each get to freely make their own decision on at least one specific issue each, no matter what, so for instance I always get to decide between X and A (in Gibbard's⁴ example, colors for my house), and you always get to decide between Y and B (colors for your own house). The problem is that if you nosily care more about my house's color, the decision that should have been mine, and I nosily care about yours, more than we each care about our own, then the pareto-dominant situation is the one where we don't decide our own houses; and that nosiness could, in theory, be the case for any specific choice that, a priori, someone might have labelled as our Inalienable Right. It's not such a surprising result when you think about it that way, but it does clearly show that unswerving ideals of Democracy and Liberty will never truly be compatible.
Meanwhile, "public choice" theorists⁵ like Duncan Black, James Buchanan, etc. were busy undermining the idea of democratic government from another direction: the motivations of the politicians and bureaucrats who are supposed to keep it running. They showed that various incentives, including the strange voting scenarios explored by Condorcet and Arrow, would tend open a gap between the motives of the people and those of the government, and that strategic voting and agenda-setting within a legislature would tend to extend the impact of that gap. Where Gibbard and Sen had proved general results, these theorists worked from specific examples. And in one aspect, at least, their analysis is devastatingly unanswerable: the near-ubiquitous "democratic" system of plurality voting, also known as first-past-the-post or vote-for-one or biggest-minority-wins, is terrible in both theory and practice.
So, by the 1980s, things looked pretty depressing for the theory of democracy. Politics, the theory went, was doomed forever to be a worse than sausage factory; disgusting on the inside and distasteful even from outside.
Should an ethical rationalist just give up on politics, then? Of course not. As long as the results it produces are important, it's worth trying to optimize. And as soon as you take the engineer's attitude of optimizing, instead of dogmatically searching for perfection or uselessly whining about the problems, the results above don't seem nearly as bad.
From this engineer's perspective, public choice theory serves as an unsurprising warning that tradeoffs are necessary, but more usefully, as a map of where those tradeoffs can go particularly wrong. In particular, its clearest lesson, in all-caps bold with a blink tag, that PLURALITY IS BAD, can be seen as a hopeful suggestion that other voting systems may be better. Meanwhile, the logic of both Sen's and Gibbard's theorems are built on Arrow's earlier result. So if we could find a way around Arrow, it might help resolve the whole issue.
Summary of above: Democracy is the worst political system... (...except for all the others?) But perhaps it doesn't have to be quite so bad as it is today.
4. Rating versus ranking
So finding a way around Arrow's theorem could be key to this whole matter. As a mathematical theorem, of course, the logic is bulletproof. But it does make one crucial assumption: that the only inputs to a voting system are rankings, that is, voters' ordinal preference orders for the candidates. No distinctions can be made using ratings or grades; that is, as long as you prefer X to Y to Z, the strength of those preferences can't matter. Whether you put Y almost up near X or way down next to Z, the result must be the same.
Relax that assumption, and it's easy to create a voting system which meets Arrow's criteria. It's called Score voting⁶, and it just means rating each candidate with a number from some fixed interval (abstractly speaking, a real number; but in practice, usually an integer); the scores are added up and the highest total or average wins. (Unless there are missing values, of course, total or average amount to the same thing.) You've probably used it yourself on Yelp, IMDB, or similar sites. And it clearly passes all of Arrow's criteria. Non-dictatorship? Check. Unanimity? Check. Symmetry over switching candidate names? Check. Independence of irrelevant alternatives? In the mathematical sense — that is, as long as the scores for other candidates are unchanged — check.
So score voting is an ideal system? Well, it's certainly a far sight better than plurality. But let's check it against Sen and against Gibbard.
Sen's theorem was based on a logic similar to Arrow. However, while Arrow's theorem deals with broad outcomes like which candidate wins, Sen's deals with finely-grained outcomes like (in the example we discussed) how each separate house should be painted. Extending the cardinal numerical logic of score voting to such finely-grained outcomes, we find we've simply reinvented markets. While markets can be great things and often work well in practice, Sen's result still holds in this case; if everything is on the market, then there is no decision which is always yours to make. But since, in practice, as long as you aren't destitute, you tend to be able to make the decisions you care the most about, Sen's theorem seems to have lost its bite in this context.
What about Gibbard's theorem on strategy? Here, things are not so easy. Yes, Gibbard, like Sen, parallels Arrow. But while Arrow deals with what's written on the ballot, Gibbard deals with what's in the voters head. In particular, if a voter prefers X to Y by even the tiniest margin, Gibbard assumes (not unreasonably) that they may be willing to vote however they need to, if by doing so they can ensure X wins instead of Y. Thus, the internal preferences Gibbard treats are, effectively, just ordinal rankings; and the cardinal trick by which score voting avoided Arrovian problems no longer works.
How does score voting deal with strategic issues in practice? The answer to that has two sides. On the one hand, score never requires voters to be actually dishonest. Unlike the situation in a ranked system such as plurality, where we all know that the strategic vote may be to dishonestly ignore your true favorite and vote for a "lesser evil" among the two frontrunners, in score voting you never need to vote a less-preferred option above a more-preferred option. At worst, all you have to do is exaggerate some distinctions and minimize others, so that you might end giving equal votes to less- and more-preferred options.
Did I say "at worst"? I meant, "almost always". Voting strategy only matters to the result when, aside from your vote, two or more candidates are within one vote of being tied for first. Except in unrealistic, perfectly-balanced conditions, as the number of voters rises, the probability that anyone but the two a priori frontrunner candidates is in on this tie falls to zero.⁷ Thus, in score voting, the optimal strategy is nearly always to vote your preferred frontrunner and all candidate above at the maximum, and your less-preferred frontrunner and all candidates below at the minimum. In other words, strategic score voting is basically equivalent to approval voting, where you give each candidate a 1 or 0 and the highest total wins.
In one sense, score voting reducing to approval OK. Approval voting is not a bad system at all. For instance, if there's a known majority Condorcet winner — a candidate who could beat any other by a majority in a one-on-one race — and voters are strategic — they anticipate the unique strong Nash equilibrium, the situation where no group of voters could improve the outcome for all its members by changing their votes, whenever such a unique equilibrium exists — then the Condorcet winner will win under approval. That's a lot of words to say that approval will get the "democratic" results you'd expect in most cases.
But in another sense, it's a problem. If one side of an issue is more inclined to be strategic than the other side, the more-strategic faction could win even if it's a minority. That clashes with many people's ideals of democracy; and worse, it encourages mind-killing political attitudes, where arguments are used as soldiers rather than as ways to seek the truth.
But score and approval voting are not the only systems which escape Arrow's theorem through the trapdoor of ratings. If score voting, using the average of voter ratings, too-strongly encourages voters to strategically seek extreme ratings, then why not use the median rating instead? We know that medians are less sensitive to outliers than averages. And indeed, median-based systems are more resistant to one-sided strategy than average-based ones, giving better hope for reasonable discussion to prosper. That is to say, in a simple model, a minority would need twice as much strategic coordination under median as under average, in order to overcome a majority; and there's good reason to believe that, because of natural factional separation, reality is even more favorable to median systems than that model.
There are several different median systems available. In the US during the 1910-1925 Progressive Era, early versions collectively called "Bucklin voting" were used briefly in over a dozen cities. These reforms, based on counting all top preferences, then adding lower preferences one level at a time until some candidate(s) reach a majority, were all rolled back soon after, principally by party machines upset at upstart challenges or victories. The possibility of multiple, simultaneous majorities is a principal reason for the variety of Bucklin/Median systems. Modern proposals of median systems include Majority Approval Voting, Majority Judgment, and Graduated Majority Judgment, which would probably give the same winners almost all of the time. An important detail is that most median system ballots use verbal or letter grades rather than numeric scores. This is justifiable because the median is preserved under any monotonic transformation, and studies suggest that it would help discourage strategic voting.
Serious attention to rated systems like approval, score, and median systems barely began in the 1980s, and didn't really pick up until 2000. Meanwhile, the increased amateur interest in voting systems in this period — perhaps partially attributable to the anomalous 2000 US presidential election, or to more-recent anomalies in the UK, Canada, and Australia — has led to new discoveries in ranked systems as well. Though such systems are still clearly subject to Arrow's theorem, new "improved Condorcet" methods which use certain tricks to count a voter's equal preferences between to candidates on either side of the ledger depending on the strategic needs, seem to offer promise that Arrovian pathologies can be kept to a minimum.
With this embarrassment of riches of systems to choose from, how should we evaluate which is best? Well, at least one thing is a clear consensus: plurality is a horrible system. Beyond that, things are more controversial; there are dozens of possible objective criteria one could formulate, and any system's inventor and/or supporters can usually formulate some criterion by which it shines.
Ideally, we'd like to measure the utility of each voting system in the real world. Since that's impossible — it would take not just a statistically-significant sample of large-scale real-world elections for each system, but also some way to measure the true internal utility of a result in situations where voters are inevitably strategically motivated to lie about that utility — we must do the next best thing, and measure it in a computer, with simulated voters whose utilities are assigned measurable values. Unfortunately, that requires assumptions about how those utilities are distributed, how voter turnout is decided, and how and whether voters strategize. At best, those assumptions can be varied, to see if findings are robust.
In 2000, Warren Smith performed such simulations for a number of voting systems. He found that score voting had, very robustly, one of the top expected social utilities (or, as he termed it, lowest Bayesian regret). Close on its heels were a median system and approval voting. Unfortunately, though he explored a wide parameter space in terms of voter utility models and inherent strategic inclination of the voters, his simulations did not include voters who were more inclined to be strategic when strategy was more effective. His strategic assumptions were also unfavorable to ranked systems, and slightly unrealistic in other ways. Still, though certain of his numbers must be taken with a grain of salt, some of his results were large and robust enough to be trusted. For instance, he found that plurality voting and instant runoff voting were clearly inferior to rated systems; and that approval voting, even at its worst, offered over half the benefits compared to plurality of any other system.
Summary of above: Rated systems, such as approval voting, score voting, and Majority Approval Voting, can avoid the problems of Arrow's theorem. Though they are certainly not immune to issues of strategic voting, they are a clear step up from plurality. Starting with this section, the opinions are my own; the two prior sections were based on general expert views on the topic.
5. Delegation and SODA
Rated systems are not the only way to try to beat the problems of Arrow and Gibbard (/Satterthwaite).
Summary of above:
6. Criteria and pathologies
do.
Summary of above:
7. Representation, proportionality, and sortition
do.
Summary of above:
8. What I'm doing about it and what you can
do.
Summary of above:
9. Conclusions and future directions
do.
Summary of above:
10. Appendix: voting systems table
Compliance of selected systems (table)
The following table shows which of the above criteria are met by several single-winner systems. Note: contains some errors; I'll carefully vet this when I'm finished with the writing. Still generally reliable though.
| Majority/ MMC | Condorcet/ Majority Condorcet | Cond. loser | Monotone | Consistency/ Participation | Reversal symmetry | IIA | Cloneproof | Polytime/ Resolvable | Summable | Equal rankings allowed | Later prefs allowed | Later-no-harm/ Later-no-help | FBC:No favorite betrayal |
||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Approval[nb 1] | Ambiguous | No/Strategic yes[nb 2] | No | Yes | Yes[nb 2] | Yes | Ambiguous | Ambig.[nb 3] | Yes | O(N) | Yes | No | [nb 4] | Yes | |
| Borda count | No | No | Yes | Yes | Yes | Yes | No | No (teaming) | Yes | O(N) | No | Yes | No | No | |
| Copeland | Yes | Yes | Yes | Yes | No | Yes | No (but ISDA) | No (crowding) | Yes/No | O(N2) | Yes | Yes | No | No | |
| IRV (AV) | Yes | No | Yes | No | No | No | No | Yes | Yes | O(N!)[nb 5] | No | Yes | Yes | No | |
| Kemeny-Young | Yes | Yes | Yes | Yes | No | Yes | No (but ISDA) | No (teaming) | No/Yes | O(N2)[nb 6] | Yes | Yes | No | No | |
| Majority Judgment[nb 7] | Yes[nb 8] | No/Strategic yes[nb 2] | No[nb 9] | Yes | No[nb 10] | No[nb 11] | Yes | Yes | Yes | O(N)[nb 12] | Yes | Yes | No[nb 13] | Yes | Yes |
| Minimax | Yes/No | Yes[nb 14] | No | Yes | No | No | No | No (spoilers) | Yes | O(N2) | Some variants | Yes | No[nb 14] | No | |
| Plurality | Yes/No | No | No | Yes | Yes | No | No | No (spoilers) | Yes | O(N) | No | No | [nb 4] | No | |
| Range voting[nb 1] | No | No/Strategic yes[nb 2] | No | Yes | Yes[nb 2] | Yes | Yes[nb 15] | Ambig.[nb 3] | Yes | O(N) | Yes | Yes | No | Yes | |
| Ranked pairs | Yes | Yes | Yes | Yes | No | Yes | No (but ISDA) | Yes | Yes | O(N2) | Yes | Yes | No | No | |
| Runoff voting | Yes/No | No | Yes | No | No | No | No | No (spoilers) | Yes | O(N)[nb 16] | No | No[nb 17] | Yes[nb 18] | No | |
| Schulze | Yes | Yes | Yes | Yes | No | Yes | No (but ISDA) | Yes | Yes | O(N2) | Yes | Yes | No | No | |
| SODA voting[nb 19] | Yes | Strategic yes/yes | Yes | Ambiguous[nb 20] | Yes/Up to 4 cand. [nb 21] | Yes[nb 22] | Up to 4 candidates[nb 21] | Up to 4 cand. (then crowds) [nb 21] | Yes[nb 23] | O(N) | Yes | Limited[nb 24] | Yes | Yes | |
| Random winner/ arbitrary winner[nb 25] |
No | No | No | NA | No | Yes | Yes | NA | Yes/No | O(1) | No | No | Yes | ||
| Random ballot[nb 26] | No | No | No | Yes | Yes | Yes | Yes | Yes | Yes/No | O(N) | No | No | Yes | ||
"Yes/No", in a column which covers two related criteria, signifies that the given system passes the first criterion and not the second one.
- ^ a b These criteria assume that all voters vote their true preference order. This is problematic for Approval and Range, where various votes are consistent with the same order. See approval voting for compliance under various voter models.
- ^ a b c d e In Approval, Range, and Majority Judgment, if all voters have perfect information about each other's true preferences and use rational strategy, any Majority Condorcet or Majority winner will be strategically forced – that is, win in the unique Strong Nash equilibrium. In particular if every voter knows that "A or B are the two most-likely to win" and places their "approval threshold" between the two, then the Condorcet winner, if one exists and is in the set {A,B}, will always win. These systems also satisfy the majority criterion in the weaker sense that any majority can force their candidate to win, if it so desires. (However, as the Condorcet criterion is incompatible with the participation criterion and the consistency criterion, these systems cannot satisfy these criteria in this Nash-equilibrium sense. Laslier, J.-F. (2006) "Strategic approval voting in a large electorate,"IDEP Working Papers No. 405 (Marseille, France: Institut D'Economie Publique).)
- ^ a b The original independence of clones criterion applied only to ranked voting methods. (T. Nicolaus Tideman, "Independence of clones as a criterion for voting rules", Social Choice and Welfare Vol. 4, No. 3 (1987), pp. 185–206.) There is some disagreement about how to extend it to unranked methods, and this disagreement affects whether approval and range voting are considered independent of clones. If the definition of "clones" is that "every voter scores them within ±ε in the limit ε→0+", then range voting is immune to clones.
- ^ a b Approval and Plurality do not allow later preferences. Technically speaking, this means that they pass the technical definition of the LNH criteria - if later preferences or ratings are impossible, then such preferences can not help or harm. However, from the perspective of a voter, these systems do not pass these criteria. Approval, in particular, encourages the voter to give the same ballot rating to a candidate who, in another voting system, would get a later rating or ranking. Thus, for approval, the practically meaningful criterion would be not "later-no-harm" but "same-no-harm" - something neither approval nor any other system satisfies.
- ^ The number of piles that can be summed from various precincts is floor((e-1) N!) - 1.
- ^ Each prospective Kemeny-Young ordering has score equal to the sum of the pairwise entries that agree with it, and so the best ordering can be found using the pairwise matrix.
- ^ Bucklin voting, with skipped and equal-rankings allowed, meets the same criteria as Majority Judgment; in fact, Majority Judgment may be considered a form of Bucklin voting. Without allowing equal rankings, Bucklin's criteria compliance is worse; in particular, it fails Independence of Irrelevant Alternatives, which for a ranked method like this variant is incompatible with the Majority Criterion.
- ^ Majority judgment passes the rated majority criterion (a candidate rated solo-top by a majority must win). It does not pass the ranked majority criterion, which is incompatible with Independence of Irrelevant Alternatives.
- ^ Majority judgment passes the "majority condorcet loser" criterion; that is, a candidate who loses to all others by a majority cannot win. However, if some of the losses are not by a majority (including equal-rankings), the Condorcet loser can, theoretically, win in MJ, although such scenarios are rare.
- ^ Balinski and Laraki, Majority Judgment's inventors, point out that it meets a weaker criterion they call "grade consistency": if two electorates give the same rating for a candidate, then so will the combined electorate. Majority Judgment explicitly requires that ratings be expressed in a "common language", that is, that each rating have an absolute meaning. They claim that this is what makes "grade consistency" significant. MJ. Balinski M. and R. Laraki (2007) «A theory of measuring, electing and ranking». Proceedings of the National Academy of Sciences USA, vol. 104, no. 21, 8720-8725.
- ^ Majority judgment can actually pass or fail reversal symmetry depending on the rounding method used to find the median when there are even numbers of voters. For instance, in a two-candidate, two-voter race, if the ratings are converted to numbers and the two central ratings are averaged, then MJ meets reversal symmetry; but if the lower one is taken, it does not, because a candidate with ["fair","fair"] would beat a candidate with ["good","poor"] with or without reversal. However, for rounding methods which do not meet reversal symmetry, the chances of breaking it are on the order of the inverse of the number of voters; this is comparable with the probability of an exact tie in a two-candidate race, and when there's a tie, any method can break reversal symmetry.
- ^ Majority Judgment is summable at order KN, where K, the number of ranking categories, is set beforehand.
- ^ Majority judgment meets a related, weaker criterion: ranking an additional candidate below the median grade (rather than your own grade) of your favorite candidate, cannot harm your favorite.
- ^ a b A variant of Minimax that counts only pairwise opposition, not opposition minus support, fails the Condorcet criterion and meets later-no-harm.
- ^ Range satisfies the mathematical definition of IIA, that is, if each voter scores each candidate independently of which other candidates are in the race. However, since a given range score has no agreed-upon meaning, it is thought that most voters would either "normalize" or exaggerate their vote such that it votes at least one candidate each at the top and bottom possible ratings. In this case, Range would not be independent of irrelevant alternatives. Balinski M. and R. Laraki (2007) «A theory of measuring, electing and ranking». Proceedings of the National Academy of Sciences USA, vol. 104, no. 21, 8720-8725.
- ^ Once for each round.
- ^ Later preferences are only possible between the two candidates who make it to the second round.
- ^ That is, second-round votes cannot harm candidates already eliminated.
- ^ Unless otherwise noted, for SODA's compliances:
- Delegated votes are considered to be equivalent to voting the candidate's predeclared preferences.
- Ballots only are considered (In other words, voters are assumed not to have preferences that cannot be expressed by a delegated or approval vote.)
- Since at the time of assigning approvals on delegated votes there is always enough information to find an optimum strategy, candidates are assumed to use such a strategy.
- ^ For up to 4 candidates, SODA is monotonic. For more than 4 candidates, it is monotonic for adding an approval, for changing from an approval to a delegation ballot, and for changes in a candidate's preferences. However, if changes in a voter's preferences are executed as changes from a delegation to an approval ballot, such changes are not necessarily monotonic with more than 4 candidates.
- ^ a b c For up to 4 candidates, SODA meets the Participation, IIA, and Cloneproof criteria. It can fail these criteria in certain rare cases with more than 4 candidates. This is considered here as a qualified success for the Consistency and Participation criteria, which do not intrinsically have to do with numerous candidates, and as a qualified failure for the IIA and Cloneproof criteria, which do.
- ^ SODA voting passes reversal symmetry for all scenarios that are reversible under SODA; that is, if each delegated ballot has a unique last choice. In other situations, it is not clear what it would mean to reverse the ballots, but there is always some possible interpretation under which SODA would pass the criterion.
- ^ SODA voting is always polytime computable. There are some cases where the optimal strategy for a candidate assigning delegated votes may not be polytime computable; however, such cases are entirely implausible for a real-world election.
- ^ Later preferences are only possible through delegation, that is, if they agree with the predeclared preferences of the favorite.
- ^ Random winner: Uniformly randomly chosen candidate is winner. Arbitrary winner: some external entity, not a voter, chooses the winner. These systems are not, properly speaking, voting systems at all, but are included to show that even a horrible system can still pass some of the criteria.
- ^ Random ballot: Uniformly random-chosen ballot determines winner. This and closely related systems are of mathematical interest because they are the only possible systems which are truly strategy-free, that is, your best vote will never depend on anything about the other voters. They also satisfy both consistency and IIA, which is impossible for a deterministic ranked system. However, this system is not generally considered as a serious proposal for a practical method.
11. Footnotes
¹ When I call my introduction "overblown", I mean that I reserve the right to make broad generalizations there, without getting distracted by caveats. If you don't like this style, feel free to skip to section 2.
² Of course, the original "politics is a mind killer" sequence was perfectly clear about this: "Politics is an important domain to which we should individually apply our rationality—but it's a terrible domain in which to learn rationality, or discuss rationality, unless all the discussants are already rational." The focus here is on the first part of that quote, because I think Less Wrong as a whole has moved too far in the direction of avoiding politics as not a domain for rationalists.
³ Bayes developed his theorem decades before Condorcet's Essai, but Condorcet probably didn't know of it, as it wasn't popularized by Laplace until about 30 years later, after Condorcet was dead.
⁴ Yes, this happens to be the same Alan Gibbard from the previous paragraph.
⁵ Confusingly, "public choice" refers to a school of thought, while "social choice" is the name for the broader domain of study. Stop reading this footnote now if you don't want to hear mind-killing partisan identification. "Public choice" theorists are generally seen as politically conservative in the solutions they suggest. It seems to me that the broader "social choice" has avoided taking on a partisan connotation in this sense.
⁶ Score voting is also called "range voting" by some. It is not a particularly new idea — for instance, the "loudest cheer wins" rule of ancient Sparta, and even aspects of honeybees' process for choosing new hives, can be seen as score voting — but it was first analyzed theoretically around 2000. Approval voting, which can be seen as a form of score voting where the scores are restricted to 0 and 1, had entered theory only about two decades earlier, though it too has a history of practical use back to antiquity.
⁷ OK, fine, this is a simplification. As a voter, you have imperfect information about the true level of support and propensity to vote in the superpopulation of eligible voters, so in reality the chances of a decisive tie between other than your two expected frontrunners is non-zero. Still, in most cases, it's utterly negligible.
⁸ This article will focus more on the literature on multi-player strategic voting (competing boundedly-instrumentally-rational agents) than on multi-player Aumann (cooperating boundedly-epistemically-rational agents). If you're interested in the latter, here are some starting points: Scott Aaronson's work is, as far as I know, the state of the art on 2-player Aumann, but its framework assumes that the players have a sophisticated ability to empathize and reason about each others' internal knowledge, and the problems with this that Aaronson plausibly handwaves away in the 2-player case are probably less tractable in the multi-player one. Dalkiran et al deal with an Aumann-like problem over a social network; they find that attempts to "jump ahead" to a final consensus value instead of simply dumbly approaching it asymptotically can lead to failure to converge. And Kanoria et al have perhaps the most interesting result from the perspective of this article; they use the convergence of agents using a naive voting-based algorithm to give a nice upper bound on the difficulty of full Bayesian reasoning itself. None of these papers explicitly considers the problem of coming to consensus on more than one logically-related question at once, though Aaronson's work at least would clearly be easy to extend in that direction, and I think such extensions would be unsurprisingly Bayesian.
Open Thread: How much strategic thinking have you done recently?
I'm tired of people never, ever, ever, EVER stopping 2 hours to 1) Think of what their goals are 2)Checking if their current path leads to desired goals 3)Correcting course and 4)Creating a system to verify, in the future, whether goals are being achieved. I'm really tired of that. Really.
... so we may want to remind and encourage each other to do so, and exchange tips!
- Have you thought about your life goals recently?
- Do you know what your long-term and medium-term goals are?
- If you're facing big problems or annoyances, have you thought of ways of solving them?
- Do you have a system you use regularly that pushes you in the right direction?
See also: Humans are not automatically strategic, levels of action.
Three Approaches to "Friendliness"
I put "Friendliness" in quotes in the title, because I think what we really want, and what MIRI seems to be working towards, is closer to "optimality": create an AI that minimizes the expected amount of astronomical waste. In what follows I will continue to use "Friendly AI" to denote such an AI since that's the established convention.
I've often stated my objections MIRI's plan to build an FAI directly (instead of after human intelligence has been substantially enhanced). But it's not because, as some have suggested while criticizing MIRI's FAI work, that we can't foresee what problems need to be solved. I think it's because we can largely foresee what kinds of problems need to be solved to build an FAI, but they all look superhumanly difficult, either due to their inherent difficulty, or the lack of opportunity for "trial and error", or both.
When people say they don't know what problems need to be solved, they may be mostly talking about "AI safety" rather than "Friendly AI". If you think in terms of "AI safety" (i.e., making sure some particular AI doesn't cause a disaster) then that does looks like a problem that depends on what kind of AI people will build. "Friendly AI" on the other hand is really a very different problem, where we're trying to figure out what kind of AI to build in order to minimize astronomical waste. I suspect this may explain the apparent disagreement, but I'm not sure. I'm hoping that explaining my own position more clearly will help figure out whether there is a real disagreement, and what's causing it.
The basic issue I see is that there is a large number of serious philosophical problems facing an AI that is meant to take over the universe in order to minimize astronomical waste. The AI needs a full solution to moral philosophy to know which configurations of particles/fields (or perhaps which dynamical processes) are most valuable and which are not. Moral philosophy in turn seems to have dependencies on the philosophy of mind, consciousness, metaphysics, aesthetics, and other areas. The FAI also needs solutions to many problems in decision theory, epistemology, and the philosophy of mathematics, in order to not be stuck with making wrong or suboptimal decisions for eternity. These essentially cover all the major areas of philosophy.
For an FAI builder, there are three ways to deal with the presence of these open philosophical problems, as far as I can see. (There may be other ways for the future to turns out well without the AI builders making any special effort, for example if being philosophical is just a natural attractor for any superintelligence, but I don't see any way to be confident of this ahead of time.) I'll name them for convenient reference, but keep in mind that an actual design may use a mixture of approaches.
- Normative AI - Solve all of the philosophical problems ahead of time, and code the solutions into the AI.
- Black-Box Metaphilosophical AI - Program the AI to use the minds of one or more human philosophers as a black box to help it solve philosophical problems, without the AI builders understanding what "doing philosophy" actually is.
- White-Box Metaphilosophical AI - Understand the nature of philosophy well enough to specify "doing philosophy" as an algorithm and code it into the AI.
The problem with Normative AI, besides the obvious inherent difficulty (as evidenced by the slow progress of human philosophers after decades, sometimes centuries of work), is that it requires us to anticipate all of the philosophical problems the AI might encounter in the future, from now until the end of the universe. We can certainly foresee some of these, like the problems associated with agents being copyable, or the AI radically changing its ontology of the world, but what might we be missing?
Black-Box Metaphilosophical AI is also risky, because it's hard to test/debug something that you don't understand. Besides that general concern, designs in this category (such as Paul Christiano's take on indirect normativity) seem to require that the AI achieve superhuman levels of optimizing power before being able to solve its philosophical problems, which seems to mean that a) there's no way to test them in a safe manner, and b) it's unclear why such an AI won't cause disaster in the time period before it achieves philosophical competence.
White-Box Metaphilosophical AI may be the most promising approach. There is no strong empirical evidence that solving metaphilosophy is superhumanly difficult, simply because not many people have attempted to solve it. But I don't think that a reasonable prior combined with what evidence we do have (i.e., absence of visible progress or clear hints as to how to proceed) gives much hope for optimism either.
To recap, I think we can largely already see what kinds of problems must be solved in order to build a superintelligent AI that will minimize astronomical waste while colonizing the universe, and it looks like they probably can't be solved correctly with high confidence until humans become significantly smarter than we are now. I think I understand why some people disagree with me (e.g., Eliezer thinks these problems just aren't that hard, relative to his abilities), but I'm not sure why some others say that we don't yet know what the problems will be.
Evaluating the feasibility of SI's plan
(With Kaj Sotala)
SI's current R&D plan seems to go as follows:
1. Develop the perfect theory.
2. Implement this as a safe, working, Artificial General Intelligence -- and do so before anyone else builds an AGI.
The Singularity Institute is almost the only group working on friendliness theory (although with very few researchers). So, they have the lead on Friendliness. But there is no reason to think that they will be ahead of anyone else on the implementation.
The few AGI designs we can look at today, like OpenCog, are big, messy systems which intentionally attempt to exploit various cognitive dynamics that might combine in unexpected and unanticipated ways, and which have various human-like drives rather than the sort of supergoal-driven, utility-maximizing goal hierarchies that Eliezer talks about, or which a mathematical abstraction like AIXI employs.
A team which is ready to adopt a variety of imperfect heuristic techniques will have a decisive lead on approaches based on pure theory. Without the constraint of safety, one of them will beat SI in the race to AGI. SI cannot ignore this. Real-world, imperfect, safety measures for real-world, imperfect AGIs are needed. These may involve mechanisms for ensuring that we can avoid undesirable dynamics in heuristic systems, or AI-boxing toolkits usable in the pre-explosion stage, or something else entirely.
SI’s hoped-for theory will include a reflexively consistent decision theory, something like a greatly refined Timeless Decision Theory. It will also describe human value as formally as possible, or at least describe a way to pin it down precisely, something like an improved Coherent Extrapolated Volition.
The hoped-for theory is intended to provide not only safety features, but also a description of the implementation, as some sort of ideal Bayesian mechanism, a theoretically perfect intelligence.
SIers have said to me that SI's design will have a decisive implementation advantage. The idea is that because strap-on safety can’t work, Friendliness research necessarily involves more fundamental architectural design decisions, which also happen to be general AGI design decisions that some other AGI builder could grab and save themselves a lot of effort. The assumption seems to be that all other designs are based on hopelessly misguided design principles. SI-ers, the idea seems to go, are so smart that they'll build AGI far before anyone else. Others will succeed only when hardware capabilities allow crude near-brute-force methods to work.
Yet even if the Friendliness theory provides the basis for intelligence, the nitty-gritty of SI’s implementation will still be far away, and will involve real-world heuristics and other compromises.
We can compare SI’s future AI design to AIXI, another mathematically perfect AI formalism (though it has some critical reflexivity issues). Schmidhuber, Hutter, and colleagues think that their AXI can be scaled down into a feasible implementation, and have implemented some toy systems. Similarly, any actual AGI based on SI's future theories will have to stray far from its mathematically perfected origins.
Moreover, SI's future friendliness proof may simply be wrong. Eliezer writes a lot about logical uncertainty, the idea that you must treat even purely mathematical ideas with same probabilistic techniques as any ordinary uncertain belief. He pursues this mostly so that his AI can reason about itself, but the same principle applies to Friendliness proofs as well.
Perhaps Eliezer thinks that a heuristic AGI is absolutely doomed to failure; that a hard takeoff immediately soon after the creation of the first AGI is so overwhelmingly likely that a mathematically designed AGI is the only one that could stay Friendly. In that case, we have to work on a pure-theory approach, even if it has a low chance of being finished first. Otherwise we'll be dead anyway. If an embryonic AGI will necessarily undergo an intelligence explosion, we have no choice but to "shut up and do the impossible."
I am all in favor of gung-ho knife-between-the teeth projects. But when you think that your strategy is impossible, then you should also look for a strategy which is possible, if only as a fallback. Thinking about safety theory until drops of blood appear on your forehead (as Eliezer puts it, quoting Gene Fowler), is all well and good. But if there is only a 10% chance of achieving 100% safety (not that there really is any such thing), then I'd rather go for a strategy that provides only a 40% promise of safety, but with a 40% chance of achieving it. OpenCog and the like are going to be developed regardless, and probably before SI's own provably friendly AGI. So, even an imperfect safety measure is better than nothing.
If heuristic approaches have a 99% chance of an immediate unfriendly explosion, then that might be wrong. But SI, better than anyone, should know that any intuition-based probability estimate of “99%” really means “70%”. Even if other approaches are long-shots, we should not put all our eggs in one basket. Theoretical perfection and stopgap safety measures can be developed in parallel.
Given what we know about human overconfidence and the general reliability of predictions, the actual outcome will to a large extent be something that none of us ever expected or could have predicted. No matter what happens, progress on safety mechanisms for heuristic AGI will improve our chances if something entirely unexpected happens.
What impossible thing should SI be shutting up and doing? For Eliezer, it’s Friendliness theory. To him, safety for heuristic AGI is impossible, and we shouldn't direct our efforts in that direction. But why shouldn't safety for heuristic AGI be another impossible thing to do?
(Two impossible things before breakfast … and maybe a few more? Eliezer seems to be rebuilding logic, set theory, ontology, epistemology, axiology, decision theory, and more, mostly from scratch. That's a lot of impossibles.)
And even if safety for heuristic AGIs is really impossible for us to figure out now, there is some chance of an extended soft takeoff that will allow for the possibility of us developing heuristic AGIs which will help in figuring out AGI safety, whether because we can use them for our tests, or because they can by applying their embryonic general intelligence to the problem. Goertzel and Pitt have urged this approach.
Yet resources are limited. Perhaps the folks who are actually building their own heuristic AGIs are in a better position than SI to develop safety mechanisms for them, while SI is the only organization which is really working on a formal theory on Friendliness, and so should concentrate on that. It could be better to focus SI's resources on areas in which it has a relative advantage, or which have a greater expected impact.
Even if so, SI should evangelize AGI safety to other researchers, not only as a general principle, but also by offering theoretical insights that may help them as they work on their own safety mechanisms.
In summary:
1. AGI development which is unconstrained by a friendliness requirement is likely to beat a provably-friendly design in a race to implementation, and some effort should be expended on dealing with this scenario.
2. Pursuing a provably-friendly AGI, even if very unlikely to succeed, could still be the right thing to do if it was certain that we’ll have a hard takeoff very soon after the creation of the first AGIs. However, we do not know whether or not this is true.
3. Even the provably friendly design will face real-world compromises and errors in its implementation, so the implementation will not itself be provably friendly. Thus, safety protections of the sort needed for heuristic design are needed even for a theoretically Friendly design.
What does the world look like, the day before FAI efforts succeed?
TL;DR: let's visualize what the world looks like if we successfully prepare for the Singularity.
I remember reading once, though I can't remember where, about a technique called 'contrasting'. The idea is to visualize a world where you've accomplished your goals, and visualize the current world, and hold the two worlds in contrast to each other. Apparently there was a study about this; the experimental 'contrasting' group was more successful than the control in accomplishing its goals.
It occurred to me that we need some of this. Strategic insights about the path to FAI are not robust or likely to be highly reliable. And in order to find a path forward, you need to know where you're trying to go. Thus, some contrasting:
It's the year 20XX. The time is 10 AM, on the day that will thereafter be remembered as the beginning of the post-Singularity world. Since the dawn of the century, a movement rose in defense of humanity's future. What began with mailing lists and blog posts became a slew of businesses, political interventions, infrastructure improvements, social influences, and technological innovations designed to ensure the safety of the world.
Despite all odds, we exerted a truly extraordinary effort, and we did it. The AI research is done; we've laboriously tested and re-tested our code, and everyone agrees that the AI is safe. It's time to hit 'Run'.
And so I ask you, before we hit the button: what does this world look like? In the scenario where we nail it, which achievements enabled our success? Socially? Politically? Technologically? What resources did we acquire? Did we have superior technology, or a high degree of secrecy? Was FAI research highly prestigious, attractive, and well-funded? Did we acquire the ability to move quickly, or did we slow unFriendly AI research efforts? What else?
I had a few ideas, which I divided between scenarios where we did a 'fantastic', 'good', or 'sufficient' job at preparing for the Singularity. But I need more ideas! I'd like to fill this out in detail, with the help of Less Wrong. So if you have ideas, write them in the comments, and I'll update the list.
Some meta points:
- This speculation is going to be, well, pretty speculative. That's fine - I'm just trying to put some points on the map.
- However, I'd like to get a list of reasonable possibilities, not detailed sci-fi stories. Do your best.
- In most cases, I'd like to consolidate categories of possibilities. For example, we could consolidate "the FAI team has exclusive access to smart drugs" and "the FAI team has exclusive access to brain-computer interfaces" into "the FAI team has exclusive access to intelligence-amplification technology."
- However, I don't want too much consolidation. For example, I wouldn't want to consolidate "the FAI team gets an incredible amount of government funding" and "the FAI team has exclusive access to intelligence-amplification technology" into "the FAI team has a lot of power".
- Lots of these possibilities are going to be mutually exclusive; don't see them as aspects of the same scenario, but rather different scenarios.
Anyway - I'll start.
Visualizing the pre-FAI world
- Fantastic scenarios
- The FAI team has exclusive access to intelligence amplification technology, and use it to ensure Friendliness & strategically reduce X-risk.
- The government supports Friendliness research, and contributes significant resources to the problem.
- The government actively implements legislation which FAI experts and strategists believe has a high probability of making AI research safer.
- FAI research becomes a highly prestigious and well-funded field, relative to AGI research.
- Powerful social memes exist regarding AI safety; any new proposal for AI research is met with a strong reaction (among the populace and among academics alike) asking about safety precautions. It is low status to research AI without concern for Friendliness.
- The FAI team discovers important strategic insights through a growing ecosystem of prediction technology; using stables of experts, prediction markets, and opinion aggregation.
- The FAI team implements deliberate X-risk reduction efforts to stave off non-AI X-risks. Those might include a global nanotech immune system, cheap and rigorous biotech tests and safeguards, nuclear safeguards, etc.
- The FAI team implements the infrastructure for a high-security research effort, perhaps offshore, implementing the best available security measures designed to reduce harmful information leaks.
- Giles writes: Large amounts of funding are available, via government or through business. The FAI team and its support network may have used superior rationality to acquire very large amounts of money.
- Giles writes: The technical problem of establishing Friendliness is easier than expected; we are able t construct a 'utility function' (or a procedure for determining such a function) in order to implement human values that people (including people with a broad range of expertise) are happy with.
- Crude_Dolorium writes: FAI research proceeds much faster than AI research, so by the time we can make a superhuman AI, we already know how to make it Friendly (and we know what we really want that to mean).
- Pretty good scenarios
- Intelligence amplification technology access isn't exclusive to the FAI team, but it is differentially adopted by the FAI team and their supporting network, resulting in a net increase in FAI team intelligence relative to baseline. The FAI team uses it to ensure Friendliness and implement strategy surrounding FAI research.
- The government has extended some kind of support for Friendliness research, such as limited funding. No protective legislation is forthcoming.
- FAI research becomes slightly more high status than today, and additional researchers are attracted to answer important open questions about FAI.
- Friendliness and rationality memes grow at a reasonable rate, and by the time the Friendliness program occurs, society is more sane.
- We get slightly better at making predictions, mostly by refining our current research and discussion strategies. This allows us a few key insights that are instrumental in reducing X-risk.
- Some X-risk reduction efforts have been implemented, but with varying levels of success. Insights about which X-risk efforts matter are of dubious quality, and the success of each effort doesn't correlate well to the seriousness of the X-risk. Nevertheless, some X-risk reduction is achieved, and humanity survives long enough to implement FAI.
- Some security efforts are implemented, making it difficult but not impossible for pre-Friendly AI tech to be leaked. Nevertheless, no leaks happen.
- Giles writes: Funding is harder to come by, but small donations, limited government funding, or moderately successful business efforts suffice to fund the FAI team.
- Giles writes: The technical problem of aggregating values through a Friendliness function is difficult; people have contradictory and differing values. However, there is broad agreement as to how to aggregate preferences. Most people accept that FAI needs to respect values of humanity as a whole, not just their own.
- Crude_Dolorium writes: Superhuman AI arrives before we learn how to make it Friendly, but we do learn how to make an 'Anchorite' AI that definitely won't take over the world. The first superhuman AIs use this architecture, and we use them to solve the harder problems of FAI before anyone sets off an exploding UFAI.
- Sufficiently good scenarios
- Intelligence amplification technology is widespread, preventing any differential adoption by the FAI team. However, FAI researchers are able to keep up with competing efforts to use that technology for AI research.
- The government doesn't support Friendliness research, but the research group stays out of trouble and avoids government interference.
- FAI research never becomes prestigious or high-status, but the FAI team is able to answer the important questions anyway.
- Memes regarding Friendliness aren't significantly more widespread than today, but the movement has grown enough to attract the talent necessary to implement a Friendliness program.
- Predictive ability is no better than it is today, but the few insights we've gathered suffice to build the FAI team and make the project happen.
- There are no significant and successful X-risk reduction efforts, but humanity survives long enough to implement FAI anyway.
- No significant security measures are implemented for the FAI project. Still, via cooperation and because the team is relatively unknown, no dangerous leaks occur.
- Giles writes: The team is forced to operate on a shoestring budget, but succeeds anyway because the problem turns out to not be incredibly sensitive to funding constraints.
- Giles writes: The technical problem of aggregating values is incredibly difficult. Many important human values contradict each other, and we have discovered no "best" solution to those conflicts. Most people agree on the need for a compromise but quibble over how that compromise should be reached. Nevertheless, we come up with a satisfactory compromise.
- Crude_Dolorium writes: The problems of Friendliness aren't solved in time, or the solutions don't apply to practical architectures, or the creators of the first superhuman AIs don't use them, so the AIs have only unreliable safeguards. They're given cheap, attainable goals; the creators have tools to read the AIs' minds to ensure they're not trying anything naughty, and killswitches to stop them; they have an aversion to increasing their intelligence beyond a certain point, and to whatever other failure modes the creators anticipate; they're given little or no network connectivity; they're kept ignorant of facts more relevant to exploding than to their assigned tasks; they require special hardware, so it's harder for them to explode; and they're otherwise designed to be safer if not actually safe. Fortunately they don't encounter any really dangerous failure modes before they're replaced with descendants that really are safe.
AI Risk & Opportunity: Strategic Analysis Via Probability Tree
Part of the series AI Risk and Opportunity: A Strategic Analysis.
(You can leave anonymous feedback on posts in this series here. I alone will read the comments, and may use them to improve past and forthcoming posts in this series.)
There are many approaches to strategic analysis (Bishop et al. 2007). Though a morphological analysis (Ritchey 2006) could model our situation in more detail, the present analysis uses a simple probability tree (Harshbarger & Reynolds 2008, sec. 7.4) to model potential events and interventions.
A very simple tree
In our initial attempt, the first disjunction concerns which of several (mutually exclusive and exhaustive) transformative events comes first:
- "FAI" = Friendly AI.
- "uFAI" = UnFriendly AI, not including uFAI developed with insights from WBE.
- "WBE" = Whole brain emulation.
- "Doom" = Human extinction, including simulation shutdown and extinction due to uFAI striking us from beyond our solar system.
- "Other" = None of the above four events occur in our solar system, perhaps due to stable global totalitarianism or for unforeseen reasons.
Our probability tree begins simply:

Each circle is a chance node, which represents a random variable. The leftmost chance node above represents the variable of whether FAI, uFAI, WBE, Doom, or Other will come first. The rightmost chance nodes are open to further disjunctions: the random variables they represent will be revealed as we continue to develop the probability tree.
Each left-facing triangle is a terminal node, which for us serves the same function as a utility node in a Bayesian decision network. The only utility node in the tree above assigns a utility of 0 (bad!) to the Doom outcome.
Each branch in the tree is assigned a probability. For the purposes of illustration, the above tree assigns .01 probability to FAI coming first, .52 probability to uFAI coming first, .07 probability to WBE coming first, .35 to Doom coming first, and .05 to Other coming first.
How the tree could be expanded
The simple tree above could be expanded "downstream" by adding additional branches:

We could also make the probability tree more actionable by trying to estimate the probability of desirable and undesirable outcomes given certain that certain shorter-term goals are met. In the example below, "private push" means that a non-state actor passionate about safety invests $30 billion or more into developing WBE technology within 30 years from today. Perhaps there's a small chance this safety-conscious actor could get to WBE before state actors, upload FAI researchers, and have them figure out FAI before uFAI is created.

We could also expand the tree "upstream" by making the first disjunction be not concerned with our five options for what comes first but instead with a series of disjunctions that feed into which option will come first.
We could add hundreds or thousands of nodes to our probability tree, and then use the software to test for how much the outcomes change when particular inputs are changed, and learn what things we can do now to most increase our chances of a desirable outcome, given our current model.
We would also need to decide which "endgame scenarios" we want to include as possible terminals, and the utility of each. These choices may be complicated by our beliefs about multiverses and simulations.
However, decision trees become enormously large and complex very quickly as you add more variables. If we had the resources for a more complicated model, we'd probably want to use influence diagrams instead (Howard & Matheson 2005), e.g. one built in Analytica, like the ICAM climate change model. Of course, one must always worry that one's model is internally consistent but disconnected from the real world (Kay 2012).
References
- Bishop et al. (2007). The current state of scenario development: an overview of techniques.
- Harshbarger & Reynolds (2008). Mathematical Applications for the Management, Life, and Social Sciences.
- Howard & Matheson (2005). Influence Diagrams.
- Kay (2012). The map is not the territory: Models, scientists, and the state of modern macroeconomics.
- Ritchey (2006). Problem structuring using computer-aided morphological analysis.
AI Risk & Opportunity: Questions We Want Answered
Part of the series AI Risk and Opportunity: A Strategic Analysis.
(You can leave anonymous feedback on posts in this series here. I alone will read the comments, and may use them to improve past and forthcoming posts in this series.)
This post provides a list of questions about AI risk strategy — questions we want answered. Please suggest additional questions (a paragraph of explanation is preferred but not necessary); I may add them to the list. You can submit questions anonymously here.
Also, please identify which 3-5 of these questions you think are low-hanging fruit for productive strategic analysis on Less Wrong.
The list is in no particular order, but question numbers will remain unchanged (so that you can reliably refer to questions by their number):
-
What methods can we use to predict technological development? We don't yet have reliable methods for long-term technological forecasting. But not all methods have been examined yet. Perhaps technology futures have a good track record. Perhaps we could look at historical technological predictions and see if there is any pattern in the data suggesting that certain character traits and contexts lend themselves to accurate technological predictions. Perhaps there are creative solutions we haven't thought of yet.
-
Which kinds of differential technological development should we encourage, and how? Should we "push" on WBE, or not? Are some kinds of AI research risk-reducing, and other kinds risk-increasing? How can we achieve such effects, if they are desired?
-
Which open problems are safe to discuss, and which are potentially dangerous? AI risk research may itself produce risk in some cases, in the form of information hazards (Bostrom 2011). Is it safe to discuss decision theories? Acausal trade? Certain kinds of strategic questions, for example involving government intervention?
-
What can we do to reduce the risk of an AI arms race?
-
What can we do to raise the "sanity waterline," and how much will this help?
-
What can we do to attract more funding, support, and research to x-risk reduction and to the specific sub-problems of successful Singularity navigation?
-
Which interventions should we prioritize?
-
How should x-risk reducers and AI safety researchers interact with governments and corporations? Does Drexler's interaction with the U.S. government regarding molecular nanotechnology provide any lessons for how AI risk researchers should act?
-
How can optimal philanthropists get the most x-risk reduction for their philanthropic buck?
-
How does AI risk compare to other existential risks?
-
Which problems do we need to solve, and which ones can we have an AI solve?
-
How can we develop microeconomic models of WBEs and self-improving systems?
-
How can we be sure a Friendly AI development team will be altruistic?
-
How hard is it to create Friendly AI?
-
What is the strength of feedback from neuroscience to AI rather than brain emulation?
-
Is there a safe way to do uploads, where they don't turn into neuromorphic AI?
-
How much must we spend on security when developing a Friendly AI team?
-
What's the best way to recruit talent toward working on AI risks?
-
How difficult is stabilizing the world so we can work on Friendly AI slowly?
-
How hard will a takeoff be? To what degree is "intelligence" (as efficient cross-domain optimization) a matter of content vs. algorithms? How much does takeoff depend on slow, real-world experiments?
-
What is the value of strategy vs. object-level progress toward a positive Singularity?
-
What different kinds of Oracle AI are there, and are any of them both safe and feasible?
-
How much should we be worried about "metacomputational hazards"? E.g. should we worry about nonperson predicates? Oracle AIs engaging in self-fulfilling prophecies? Acausal hijacking?
-
What improvements can we make to the way we go about answering strategy questions? Wei Dai's notes on this question: "For example, should we differentiate between "strategic insights" (such as Carl Shulman's insight that WBE-based Singletons may be feasible) and "keeping track of the big picture" (forming the overall strategy and updating it based on new insights and evidence), and aim to have people specialize in each, so that people deciding strategy won't be tempted to overweigh their own insights? Another example: is there a better way to combine probability estimates from multiple people?"
-
How do people in other fields answer strategy questions? Wei Dai's notes on this question: "Is there such a thing as a science or art of strategy that we can copy from (and perhaps improve upon with ideas from x-rationality)?"
[more questions to come, as they are posted to the comments section]
Some Thoughts on Singularity Strategies
Followup to: Outline of possible Singularity scenarios (that are not completely disastrous)
Given that the Singularity and being strategic are popular topics around here, it's surprising there hasn't been more discussion on how to answer the question "In what direction should we nudge the future, to maximize the chances and impact of a positive Singularity?" ("We" meaning the SIAI/FHI/LW/Singularitarian community.)
(Is this an appropriate way to frame the question? It's how I would instinctively frame the question, but perhaps we ought to discussed alternatives first. For example, one might be "What quest should we embark upon to save the world?", which seems to be the frame that Eliezer instinctively prefers. But I worry that thinking in terms of "quest" favors the part of the brain that is built mainly for signaling instead of planning. Another alternative would be "What strategy maximizes expect utility?" but that seems too technical for human minds to grasp on an intuitive level, and we don't have the tools to answer the question formally.)
Let's start by assuming that humanity will want to build at least one Friendly superintelligence sooner or later, either from scratch, or by improving human minds, because without such an entity, it's likely that eventually either a superintelligent, non-Friendly entity will arise, or civilization will collapse. The current state of affairs, in which there is no intelligence greater than baseline-human level, seems unlikely to be stable over the billions of years of the universe's remaining life. (Nor does that seem particularly desirable even if it is possible.)
Whether to push for (or personally head towards) de novo AI directly, or IA/uploading first, depends heavily on the expected (or more generally, subjective probability distribution of) difficulty of building a Friendly AI from scratch, which in turn involves a great deal of logical and philosophical uncertainty. (For example, if it's known that it actually takes a minimum of 10 people with IQ 200 to build a Friendly AI, then there is clearly little point in pushing for de novo AI first.)
Besides the expected difficulty of building FAI from scratch, another factor that weighs heavily in the decision is the risk of accidentally building an unFriendly AI (or contributing to others building UFAIs) while trying to build FAI. Taking this into account also involves lots of logical and philosophical uncertainty. (But it seems safe to assume that this risk, if plotted against the intelligence of the AI builders, forms an inverted U shape.)
Since we don't have good formal tools for dealing with logical and philosophical uncertainty, it seems hard to do better than to make some incremental improvements over gut instinct. One idea is to train our intuitions to be more accurate, for example by learning about the history of AI and philosophy, or learning known cognitive biases and doing debiasing exercises. But this seems insufficient to gap the widely differing intuitions people have on these questions.
My own feeling is that the chance of success of of building FAI, assuming current human intelligence distribution, is low (even if given unlimited financial resources), while the risk of unintentionally building or contributing to UFAI is high. I think I can explicate a part of my intuition this way: There must be a minimum level of intelligence below which the chances of successfully building an FAI is negligible. We humans seem at best just barely smart enough to build a superintelligent UFAI. Wouldn't it be surprising that the intelligence threshold for building UFAI and FAI turn out to be the same?
Given that there are known ways to significantly increase the number of geniuses (i.e., von Neumann level, or IQ 180 and greater), by cloning or embryo selection, an obvious alternative Singularity strategy is to invest directly or indirectly in these technologies, and to try to mitigate existential risks (for example by attempting to delay all significant AI efforts) until they mature and bear fruit (in the form of adult genius-level FAI researchers). Other strategies in the same vein are to pursue cognitive/pharmaceutical/neurosurgical approaches to increasing the intelligence of existing humans, or to push for brain emulation first followed by intelligence enhancement of human minds in software form.
Social/PR issues aside, these alternatives make more intuitive sense to me. The chances of success seem higher, and if disaster does occur as a result of the intelligence amplification effort, we're more likely to be left with a future that is at least partly influenced by human values. (Of course, in the final analysis, we also have to consider social/PR problems, but all Singularity approaches seem to have similar problems, which can be partly ameliorated by the common sub-strategy of "raising the general sanity level".)
I'm curious in what others think. What does your intuition say about these issues? Are there good arguments in favor of any particular strategy that I've missed? Is there another strategy that might be better than the ones mentioned above?
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)