Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
[Epistemic status: exploratory exercise in naming and concept-formation.]
Among the kinds of people, are the Actors, and the Scribes. Actors mainly relate to speech as action that has effects. Scribes mainly relate to speech as a structured arrangement of pointers that have meanings.
I previously described this as a distinction between promise-keeping "Quakers" and impulsive "Actors," but I think this missed a key distinction. There's "telling the truth," and then there's a more specific thing that's more obviously distinct from even Actors who are trying to make honest reports: keeping precisely accurate formal accounts. This leaves out some other types – I'm not exactly sure how it relates to engineers and diplomats, for instance – but I think I have the right names for these two things now.
Everyone agrees that words have meaning; they convey information from the speaker to the listener or reader. That's all they do. So when I used the phrase “words have meanings” to describe one side of a divide between people who use language to report facts, and people who use language to enact roles, was I strawmanning the other side?
I say no. Many common uses of language, including some perfectly legitimate ones, are not well-described by "words have meanings." For instance, people who try to use promises like magic spells to bind their future behavior don't seem to consider the possibility that others might treat their promises as a factual representation of what the future will be like.
Some uses of language do not simply describe objects or events in the world, but are enactive, designed to evoke particular feelings or cause particular actions. Even when speech can only be understood as a description of part of a model of the world, the context in which a sentence is uttered often implies an active intent, so if we only consider the direct meaning of the text, we will miss the most important thing about the sentence.
Some apparent uses of language’s denotative features may in fact be purely enactive. This is possible because humans initially learn language mimetically, and try to copy usage before understanding what it’s for. Primarily denotative language users are likely to assume that structural inconsistencies in speech are errors, when they’re often simply signs that the speech is primarily intended to be enactive.
Some uses of words are enactive: ways to build or reveal momentum. Others denote the position of things on your world-map.
In the denotative framing, words largely denote concepts that refer to specific classes of objects, events, or attributes in the world, and should be parsed as such. The meaning of a sentence is mainly decomposable into the meanings of its parts and their relations to each other. Words have distinct meanings that can be composed together in structures to communicate complex and nonobvious messages, or just uses and connotations.
In the enactive mode, the function of speech is to produce some action or disposition in your listener, who may be yourself. Ideas are primarily associative, reminding you of the perceptions with which the speech-act is associated. Other uses of language are structural. When you speak in this mode, it’s to describe models - relationships between concepts, which refer to classes of objects in the world.
When I wrote about admonitions as performance-enhancing speech, I gave the example of someone being encouraged by their workout buddies:
Recently, at the gym, I overheard some group of exercise buddies admonishing their buddy on some machine to keep going with each rep. My first thought was, “why are they tormenting their friend? Why can’t they just leave him alone? Exercise is hard enough without trying to parse social interactions at the same time.”
And then I realized - they’re doing it because, for them, it works. It's easier for them to do the workout if someone is telling them, “Keep going! Push it! One more!”
In the same post, I quoted Wittgenstein’s thought experiment of a language where words are only ever used as commands, with a corresponding action, never to refer to an object. Wittgenstein gives the example of a language used for nothing but military orders, and then elaborates on a hypothetical language used strictly for work orders. For instance, a foreman might use the utterance “Slab!” to direct a worker to fetch a slab of rock. I summarized the situation thus:
When I hear “slab”, my mind interprets this by imagining the object. A native speaker of Wittgenstein’s command language, when hearing the utterance “Slab!”, might - merely as the act of interpreting the word - feel a sense of readiness to go fetch a stone slab.
Wittgenstein’s listener might think of the slab itself, but only as a secondary operation in the process of executing the command. Likewise, I might, after thinking of the object, then infer that someone wants me to do something with the slab. But that requires an additional operation: modeling the speaker as an agent and using Gricean implicature to infer their intentions. The word has different cognitive content or implications for me, than for the speaker of Wittgenstein’s command language.
Military drills are also often about disintermediating between a command and action. Soldiers learn that when you receive an order, you just do the thing. This can lead to much more decisive and coordinated action in otherwise confusing situations – a familiar stimulus can lead to a regular response.
When someone gives you driving directions by telling you what you'll observe, and what to do once you make that observation, they're trying to encode a series of observation-action linkages in you.
This sort of linkage can happen to nonverbal animals too. Operant conditioning of animals gets around most animals' difficulty understanding spoken instructions, by associating a standardized reward indicator with the desired action. Often, if you want to train a comparatively complex action like pigeons playing pong, you'll need to train them one step at a time, gradually chaining the steps together, initially rewarding much simpler behaviors that will eventually compose into the desired complex behavior.
Crucially, the communication is never about the composition itself, just the components to be composed. Indeed, it’s not about anything, from the perspective of the animal being trained. This is similar to an old-fashioned army reliant on drill, in which, during battle, soldiers are told the next action they are to take, not told about overall structure of their strategy. They are told to, not told about.
Indeterminacy of translation
It’s conceivable that having what appears to be a language in common does not protect against such differences in interpretation. Quine also points to indeterminacy of translation and thus of explicable meaning with his "gavagai" example. As Wikipedia summarizes it:
Indeterminacy of reference refers to the interpretation of words or phrases in isolation, and Quine's thesis is that no unique interpretation is possible, because a 'radical interpreter' has no way of telling which of many possible meanings the speaker has in mind. Quine uses the example of the word "gavagai" uttered by a native speaker of the unknown language Arunta upon seeing a rabbit. A speaker of English could do what seems natural and translate this as "Lo, a rabbit." But other translations would be compatible with all the evidence he has: "Lo, food"; "Let's go hunting"; "There will be a storm tonight" (these natives may be superstitious); "Lo, a momentary rabbit-stage"; "Lo, an undetached rabbit-part." Some of these might become less likely – that is, become more unwieldy hypotheses – in the light of subsequent observation. Other translations can be ruled out only by querying the natives: An affirmative answer to "Is this the same gavagai as that earlier one?" rules out some possible translations. But these questions can only be asked once the linguist has mastered much of the natives' grammar and abstract vocabulary; that in turn can only be done on the basis of hypotheses derived from simpler, observation-connected bits of language; and those sentences, on their own, admit of multiple interpretations.
Everyone begins life as a tiny immigrant who does not know the local language, and has to make such inferences, or something like them. Thus, many of the difficulties in nailing down exactly what a word is doing in a foreign language have analogues in nailing down exactly what a word is doing for another speaker of one’s own language.
Mimesis, association, and structure
Not only do we all begin life as immigrants, but as immigrants with no native language to which we can analogize our adopted tongue. We learn language through mimesis. For small children, language is perhaps more like Wittgenstein's command language than my reference-language. It's a commonplace observation that children learn the utterance "No!" as an expression of will. In The Ways of Naysaying: No, Not, Nothing, and Nonbeing, Eva Brann provides a charming example:
Children acquire some words, some two-word phrases, and then no. […] They say excited no to everything and guilelessly contradict their naysaying in the action: "Do you want some of my jelly sandwich?" "No." Gets on my lap and takes it away from me. […] It is a documented observation that the particle no occurs very early in children's speech, sometimes in the second year, quite a while before sentences are negated by not.
First we learn language as an assertion of will, a way to command. Then, later, we learn how to use it to describe structural features of world-models. I strongly suspect that this involves some new, not entirely mimetic cognitive machinery kicking in, something qualitatively different: we start to think in terms of pointer-referent and concept-referent relations. In terms of logical structures, where "no" is not simply an assertion of negative affect, but inverts the meaning of whatever follows. Only after this do recursive clauses, conditionals, and negation of negation make any sense at all.
As long as we agree on something like rules of assembly for sentences, mimesis might mask a huge difference in how we think about things. It's instructive to look at how the current President of the United States uses language. He's talking to people who aren't bothering to track the structure of sentences. This makes him sound more "conversational" and, crucially, allows him to emphasize whichever words or phrases he wants, without burying them in a potentially hard-to-parse structure. As Katy Waldman of Slate says:
For some of us, Trump’s language is incendiary garbage. It’s not just that the ideas he wants to communicate are awful but that they come out as Saturnine gibberish or lewd smearing or racist gobbledygook. The man has never met a clause he couldn’t embellish forever and then promptly forget about. He uses adjectives as cudgels. You and I view his word casserole as not just incoherent but representative of the evil at his heart.
But it works. […]
Why? What’s the secret to Trump’s accidental brilliance? A few theories: simple component parts, weaponized unintelligibility, dark innuendo, and power signifiers.
[…] Trump tends to place the most viscerally resonant words at the end of his statements, allowing them to vibrate in our ears. For instance, unfurling his national security vision like a nativist pennant, Trump said:
But, Jimmy, the problem –
I mean, look, I’m for it.
But look, we have people coming into the country
that are looking to do tremendous harm….
Look what happened in Paris.
Look what happened in California,
with, you know, 14 people dead.
Other people are going to die,
they’re badly injured, we have a real problem.
Ironically, because Trump relies so heavily on footnotes, false starts, and flights of association, and because his digressions rarely hook back up with the main thought, the emotional terms take on added power. They become rays of clarity in an incoherent verbal miasma. Think about that: If Trump were a more traditionally talented orator, if he just made more sense, the surface meaning of his phrases would likely overshadow the buried connotations of each individual word. As is, to listen to Trump fit language together is to swim in an eddy of confusion punctuated by sharp stabs of dread. Which happens to be exactly the sensation he wants to evoke in order to make us nervous enough to vote for him.
Of course, Waldman is being condescending and wrong here. This is not word salad, it's high context communication. But high context communication isn't what you use when you are thinking you might persuade someone who doesn't already agree with you, it's just a more efficient exercise in flag-waving. The reason why we don't see a complex structure here is because Trump is not trying to communicate this sort of novel content that structural language is required for. He's just saying "what everyone was already thinking."
But while Waldman picked a poor example, she's not wholly wrong. In some cases, the President of the United States seems to be impressionistically alluding to arguments or events his audience has already heard of – but his effective rhetorical use of insulting epithets like “Little Marco,” “Lying Ted Cruz,” and “Crooked Hillary,” fit very clearly into this schema. Instead of asking us to absorb facts about his opponents, incorporate them into coherent world-models, and then follow his argument for how we should judge them for their conduct, he used the simple expedient of putting a name next to a descriptor, repeatedly, to cause us to associate the connotations of those words. We weren't asked to think about anything. These were simply command words, designed to act directly on our feelings about the people he insulted.
We weren't asked to take his statements as factually accurate. It's enough that they're authentic.
This was persuasive to enough voters to make him President of the United States. This is not a straw man. This is real life. This is the world we live in.
You might object that the President of the United States is an unfair example, and that most people of any importance should be expected to be better and clearer thinkers than the leader of the free world. So, let's consider the case of some middling undergraduates taking an economics course.
Robin Hanson reports that he can get students to mimic an economic way of talking, but not to think like an economist:
After eighteen years of being a professor, I’ve graded many student essays. And while I usually try to teach a deep structure of concepts, what the median student actually learns seems to mostly be a set of low order correlations. They know what words to use, which words tend to go together, which combinations tend to have positive associations, and so on. But if you ask an exam question where the deep structure answer differs from answer you’d guess looking at low order correlations, most students usually give the wrong answer.
Let me call styles of talking (or music, etc.) that rely mostly on low order correlations “babbling”. Babbling isn’t meaningless, but to ignorant audiences it often appears to be based on a deeper understanding than is actually the case. When done well, babbling can be entertaining, comforting, titillating, or exciting. It just isn’t usually a good place to learn deep insight.
This is a straightforward description of thinking that is formal but nonconceptual. Hanson's students have learnt some words, and rules for moving the words around and putting them together, but at no point did they connect the rules for moving around words with regular properties of things that the words point to. The words are the things. When Hanson stops feeding them the right keywords, and asks them questions that require them to understand the underlying structural features of reality that economics is supposed to describe, they come up empty.
Of course, it seems unlikely that many people can't think structurally at all. It seems to me like nearly everyone can think structurally about physical objects in their immediate environment. But it seems like when talking about abstractions, or the future, some people shift to a mental mode where words don't carry the same weight of reference.
Even for those of us who habitually think structurally, it would be surprising if the mimetic component to language ever totally went away. Plenty of times, I've started saying something, only to stop midway through realizing that I'm just repeating something I heard, not reporting on a feature of my model of the world.
Tendencies towards mimesis are hard to resist, and part of why I think it's so important to push back against falsehoods in any spaces that are meant to be accreting truth. Why even casual, accidental errors should be promptly corrected. Why I need an epistemic environment that's not constantly being polluted by adversarial processes.
And we can’t begin to figure out how to do this until it becomes common knowledge that not everyone is doing the same thing with words, that modeling the world is a legitimate and useful thing to do with them, and that not all communication is designed to be friendly to the people who assume it’s composed of words with meanings.
(Cross-posted on my personal blog.)
A parent I know reports (some details anonymized):
Recently we bought my 3-year-old daughter a "behavior chart," in which she can earn stickers for achievements like not throwing tantrums, eating fruits and vegetables, and going to sleep on time. We successfully impressed on her that a major goal each day was to earn as many stickers as possible.
This morning, though, I found her just plastering her entire behavior chart with stickers. She genuinely seemed to think I'd be proud of how many stickers she now had.
The Effective Altruism movement has now entered this extremely cute stage of cognitive development. EA is more than three years old, but institutions age differently than individuals.
What is a confidence game?
In 2009, investment manager and con artist Bernie Madoff pled guilty to running a massive fraud, with $50 billion in fake return on investment, having outright embezzled around $18 billion out of the $36 billion investors put into the fund. Only a couple of years earlier, when my grandfather was still alive, I remember him telling me about how Madoff was a genius, getting his investors a consistent high return, and about how he wished he could be in on it, but Madoff wasn't accepting additional investors.
What Madoff was running was a classic Ponzi scheme. Investors gave him money, and he told them that he'd gotten them an exceptionally high return on investment, when in fact he had not. But because he promised to be able to do it again, his investors mostly reinvested their money, and more people were excited about getting in on the deal. There was more than enough money to cover the few people who wanted to take money out of this amazing opportunity.
Ponzi schemes, pyramid schemes, and speculative bubbles are all situations in investors' expected profits are paid out from the money paid in by new investors, instead of any independently profitable venture. Ponzi schemes are centrally managed – the person running the scheme represents it to investors as legitimate, and takes responsibility for finding new investors and paying off old ones. In pyramid schemes such as multi-level-marketing and chain letters, each generation of investor recruits new investors and profits from them. In speculative bubbles, there is no formal structure propping up the scheme, only a common, mutually reinforcing set of expectations among speculators driving up the price of something that was already for sale.
The general situation in which someone sets themself up as the repository of others' confidence, and uses this as leverage to acquire increasing investment, can be called a confidence game.
Some of the most iconic Ponzi schemes blew up quickly because they promised wildly unrealistic growth rates. This had three undesirable effects for the people running the schemes. First, it attracted too much attention – too many people wanted into the scheme too quickly, so they rapidly exhausted sources of new capital. Second, because their rates of return were implausibly high, they made themselves targets for scrutiny. Third, the extremely high rates of return themselves caused their promises to quickly outpace what they could plausibly return to even a small share of their investor victims.
Madoff was careful to avoid all these problems, which is why his scheme lasted for nearly half a century. He only promised plausibly high returns (around 10% annually) for a successful hedge fund, especially if it was illegally engaged in insider trading, rather than the sort of implausibly high returns typical of more blatant Ponzi schemes. (Charles Ponzi promised to double investors' money in 90 days.) Madoff showed reluctance to accept new clients, like any other fund manager who doesn't want to get too big for their trading strategy.
He didn't plaster stickers all over his behavior chart – he put a reasonable number of stickers on it. He played a long game.
Not all confidence games are inherently bad. For instance, the US national pension system, Social Security, operates as a kind of Ponzi scheme, it is not obviously unsustainable, and many people continue to be glad that it exists. Nominally, when people pay Social Security taxes, the money is invested in the social security trust fund, which holds interest-bearing financial assets that will be used to pay out benefits in their old age. In this respect it looks like an ordinary pension fund.
However, the financial assets are US Treasury bonds. There is no independently profitable venture. The Federal Government of the United States of America is quite literally writing an IOU to itself, and then spending the money on current expenditures, including paying out current Social Security benefits.
The Federal Government, of course, can write as large an IOU to itself as it wants. It could make all tax revenues part of the Social Security program. It could issue new Treasury bonds and gift them to Social Security. None of this would increase its ability to pay out Social Security benefits. It would be an empty exercise in putting stickers on its own chart.
If the Federal government loses the ability to collect enough taxes to pay out social security benefits, there is no additional capacity to pay represented by US Treasury bonds. What we have is an implied promise to pay out future benefits, backed by the expectation that the government will be able to collect taxes in the future, including Social Security taxes.
There's nothing necessarily wrong with this, except that the mechanism by which Social Security is funded is obscured by financial engineering. However, this misdirection should raise at least some doubts as to the underlying sustainability or desirability of the commitment. In fact, this scheme was adopted specifically to give people the impression that they had some sort of property rights over their social Security Pension, in order to make the program politically difficult to eliminate. Once people have "bought in" to a program, they will be reluctant to treat their prior contributions as sunk costs, and willing to invest additional resources to salvage their investment, in ways that may make them increasingly reliant on it.
Not all confidence games are intrinsically bad, but dubious programs benefit the most from being set up as confidence games. More generally, bad programs are the ones that benefit the most from being allowed to fiddle with their own accounting. As Daniel Davies writes, in The D-Squared Digest One Minute MBA - Avoiding Projects Pursued By Morons 101:
Good ideas do not need lots of lies told about them in order to gain public acceptance. I was first made aware of this during an accounting class. We were discussing the subject of accounting for stock options at technology companies. […] One side (mainly technology companies and their lobbyists) held that stock option grants should not be treated as an expense on public policy grounds; treating them as an expense would discourage companies from granting them, and stock options were a vital compensation tool that incentivised performance, rewarded dynamism and innovation and created vast amounts of value for America and the world. The other side (mainly people like Warren Buffet) held that stock options looked awfully like a massive blag carried out my management at the expense of shareholders, and that the proper place to record such blags was the P&L account.
Our lecturer, in summing up the debate, made the not unreasonable point that if stock options really were a fantastic tool which unleashed the creative power in every employee, everyone would want to expense as many of them as possible, the better to boast about how innovative, empowered and fantastic they were. Since the tech companies' point of view appeared to be that if they were ever forced to account honestly for their option grants, they would quickly stop making them, this offered decent prima facie evidence that they weren't, really, all that fantastic.
However, I want to generalize the concept of confidence games from the domain of financial currency, to the domain of social credit more generally (of which money is a particular form that our society commonly uses), and in particular I want to talk about confidence games in the currency of credit for achievement.
If I were applying for a very important job with great responsibilities, such as President of the United States, CEO of a top corporation, or head or board member of a major AI research institution, I could be expected to have some relevant prior experience. For instance, I might have had some success managing a similar, smaller institution, or serving the same institution in a lesser capacity. More generally, when I make a bid for control over something, I am implicitly claiming that I have enough social credit – enough of a track record – that I can be expected to do good things with that control.
In general, if someone has done a lot, we should expect to see an iceberg pattern where a small easily-visible part suggests a lot of solid but harder-to-verify substance under the surface. One might be tempted to make a habit of imputing a much larger iceberg from the combination of a small floaty bit, and promises. But, a small easily-visible part with claims of a lot of harder-to-see substance is easy to mimic without actually doing the work. As Davies continues:
The Vital Importance of Audit. Emphasised over and over again. Brealey and Myers has a section on this, in which they remind callow students that like backing-up one's computer files, this is a lesson that everyone seems to have to learn the hard way. Basically, it's been shown time and again and again; companies which do not audit completed projects in order to see how accurate the original projections were, tend to get exactly the forecasts and projects that they deserve. Companies which have a culture where there are no consequences for making dishonest forecasts, get the projects they deserve. Companies which allocate blank cheques to management teams with a proven record of failure and mendacity, get what they deserve.
If you can independently put stickers on your own chart, then your chart is no longer reliably tracking something externally verified. If forecasts are not checked and tracked, or forecasters are not consequently held accountable for their forecasts, then there is no reason to believe that assessments of future, ongoing, or past programs are accurate. Adopting a wait-and-see attitude, insisting on audits for actual results (not just predictions) before investing more, will definitely slow down funding for good programs. But without it, most of your funding will go to worthless ones.
Open Philanthropy, OpenAI, and closed validation loops
The Open Philanthropy Project recently announced a $30 million grant to the $1 billion nonprofit AI research organization OpenAI. This is the largest single grant it has ever made. The main point of the grant is to buy influence over OpenAI’s future priorities; Holden Karnofsky, Executive Director of the Open Philanthropy Project, is getting a seat on OpenAI’s board as part of the deal. This marks the second major shift in focus for the Open Philanthropy Project.
The first shift (back when it was just called GiveWell) was from trying to find the best already-existing programs to fund (“passive funding”) to envisioning new programs and working with grantees to make them reality (“active funding”). The new shift is from funding specific programs at all, to trying to take control of programs without any specific plan.
To justify the passive funding stage, all you have to believe is that you can know better than other donors, among existing charities. For active funding, you have to believe that you’re smart enough to evaluate potential programs, just like a charity founder might, and pick ones that will outperform. But buying control implies that you think you’re so much better, that even before you’ve evaluated any programs, if someone’s doing something big, you ought to have a say.
When GiveWell moved from a passive to an active funding strategy, it was relying on the moral credit it had earned for its extensive and well-regarded charity evaluations. The thing that was particularly exciting about GiveWell was that they focused on outcomes and efficiency. They didn't just focus on the size or intensity of the problem a charity was addressing. They didn't just look at financial details like overhead ratios. They asked the question a consequentialist cares about: for a given expenditure of money, how much will this charity be able to improve outcomes?
However, when GiveWell tracks its impact, it does not track objective outcomes at all. It tracks inputs: attention received (in the form of visits to its website) and money moved on the basis of its recommendations. In other words, its estimate of its own impact is based on the level of trust people have placed in it.
So, as GiveWell built out the Open Philanthropy Project, its story was: We promised to do something great. As a result, we were entrusted with a fair amount of attention and money. Therefore, we should be given more responsibility. We represented our behavior as praiseworthy, and as a result people put stickers on our chart. For this reason, we should be advanced stickers against future days of praiseworthy behavior.
Then, as the Open Philanthropy Project explored active funding in more areas, its estimate of its own effectiveness grew. After all, it was funding more speculative, hard-to-measure programs, but a multi-billion-dollar donor, which was largely relying on the Open Philanthropy Project's opinions to assess efficacy (including its own efficacy), continued to trust it.
What is missing here is any objective track record of benefits. What this looks like to me, is a long sort of confidence game – or, using less morally loaded language, a venture with structural reliance on increasing amounts of leverage – in the currency of moral credit.
Version 0: GiveWell and passive funding
First, there was GiveWell. GiveWell’s purpose was to find and vet evidence-backed charities. However, it recognized that charities know their own business best. It wasn’t trying to do better than the charities; it was trying to do better than the typical charity donor, by being more discerning.
GiveWell’s thinking from this phase is exemplified by co-founder Elie Hassenfeld’s Six tips for giving like a pro:
When you give, give cash – no strings attached. You’re just a part-time donor, but the charity you’re supporting does this full-time and staff there probably know a lot more about how to do their job than you do. If you’ve found a charity that you feel is excellent – not just acceptable – then it makes sense to trust the charity to make good decisions about how to spend your money.
GiveWell similarly tried to avoid distorting charities’ behavior. Its job was only to evaluate, not to interfere. To perceive, not to act. To find the best, and buy more of the same.
How did GiveWell assess its effectiveness in this stage? When GiveWell evaluates charities, it estimates their cost-effectiveness in advance. It assesses the program the charity is running, through experimental evidence of the form of randomized controlled trials. GiveWell also audits the charity to make sure they’re actually running the program, and figure out how much it costs as implemented. This is an excellent, evidence-based way to generate a prediction of how much good will be done by moving money to the charity.
As far as I can tell, these predictions are untested.
One of GiveWell’s early top charities was VillageReach, which helped Mozambique with TB immunization logistics. GiveWell estimated that VillageReach could save a life for $1,000. But this charity is no longer recommended. The public page says:
VillageReach (www.villagereach.org) was our top-rated organization for 2009, 2010 and much of 2011 and it has received over $2 million due to GiveWell's recommendation. In late 2011, we removed VillageReach from our top-rated list because we felt its project had limited room for more funding. As of November 2012, we believe that that this project may have room for more funding, but we still prefer our current highest-rated charities above it.
GiveWell reanalyzed the data it based its recommendations on, but hasn’t published an after-the-fact retrospective of long-run results. I asked GiveWell about this by email. The response was that such an assessment was not prioritized because GiveWell had found implementation problems in VillageReach's scale-up work as well as reasons to doubt its original conclusion about the impact of the pilot program. It's unclear to me whether this has caused GiveWell to evaluate charities differently in the future.
I don't think someone looking at GiveWell's page on VillageReach would be likely to reach the conclusion that GiveWell now believes its original recommendation was likely erroneous. GiveWell's impact page continues to count money moved to VillageReach without any mention of the retracted recommendation. If we assume that the point of tracking money moved is to track the benefit of moving money from worse to better uses, then repudiated programs ought to be counted against the total, as costs, rather than towards it.
GiveWell has recommended the Against Malaria Foundation for the last several years as a top charity. AMF distributes long-lasting insecticide-treated bed nets to prevent mosquitos from transmitting malaria to humans. Its evaluation of AMF does not mention any direct evidence, positive or negative, about what happened to malaria rates in the areas where AMF operated. (There is a discussion of the evidence that the bed nets were in fact delivered and used.) In the supplementary information page, however, we are told:
Previously, AMF expected to collect data on malaria case rates from the regions in which it funded LLIN distributions: […] In 2016, AMF shared malaria case rate data […] but we have not prioritized analyzing it closely. AMF believes that this data is not high quality enough to reliably indicate actual trends in malaria case rates, so we do not believe that the fact that AMF collects malaria case rate data is a consideration in AMF’s favor, and do not plan to continue to track AMF's progress in collecting malaria case rate data.
The data was noisy, so they simply stopped checking whether AMF’s bed net distributions do anything about malaria.
If we want to know the size of the improvement made by GiveWell in the developing world, we have their predictions about cost-effectiveness, an audit trail verifying that work was performed, and their direct measurement of how much money people gave because they trusted GiveWell. The predictions on the final target – improved outcomes – have not been tested.
GiveWell is actually doing unusually well as far as major funders go. It sticks to describing things it's actually responsible for. By contrast, the Gates Foundation, in a report to Warren Buffet claiming to describe its impact, simply described overall improvement in the developing world, a very small rhetorical step from claiming credit for 100% of the improvement. GiveWell at least sticks to facts about GiveWell's own effects, and this is to its credit. But, it focuses on costs it has been able to impose, not benefits it has been able to create.
The Centre for Effective Altruism's William MacAskill made a related point back in 2012, though he talked about the lack of any sort of formal outside validation or audit, rather than focusing on empirical validation of outcomes:
As far as I know, GiveWell haven't commissioned a thorough external evaluation of their recommendations. […] This surprises me. Whereas businesses have a natural feedback mechanism, namely profit or loss, research often doesn't, hence the need for peer-review within academia. This concern, when it comes to charity-evaluation, is even greater. If GiveWell's analysis and recommendations had major flaws, or were systematically biased in some way, it would be challenging for outsiders to work this out without a thorough independent evaluation. Fortunately, GiveWell has the resources to, for example, employ two top development economists to each do an independent review of their recommendations and the supporting research. This would make their recommendations more robust at a reasonable cost.
We continue to believe that it is important to ensure that our work is subjected to in-depth scrutiny. However, at this time, the scrutiny we’re naturally receiving – combined with the high costs and limited capacity for formal external evaluation – make us inclined to postpone major effort on external evaluation for the time being.
- >If someone volunteered to do (or facilitate) formal external evaluation, we’d welcome this and would be happy to prominently post or link to criticism.
- We do intend eventually to re-institute formal external evaluation.
Four years later, assessing the credibility of this assurance is left as an exercise for the reader.
Version 1: GiveWell Labs and active funding
Then there was GiveWell Labs, later called the Open Philanthropy Project. It looked into more potential philanthropic causes, where the evidence base might not be as cut-and-dried as that for the GiveWell top charities. One thing they learned was that in many areas, there simply weren’t shovel-ready programs ready for funding – a funder has to play a more active role. This shift was described by GiveWell co-founder Holden Karnofsky in his 2013 blog post, Challenges of passive funding:
By “passive funding,” I mean a dynamic in which the funder’s role is to review others’ proposals/ideas/arguments and pick which to fund, and by “active funding,” I mean a dynamic in which the funder’s role is to participate in – or lead – the development of a strategy, and find partners to “implement” it. Active funders, in other words, are participating at some level in “management” of partner organizations, whereas passive funders are merely choosing between plans that other nonprofits have already come up with.
My instinct is generally to try the most “passive” approach that’s feasible. Broadly speaking, it seems that a good partner organization will generally know their field and environment better than we do and therefore be best positioned to design strategy; in addition, I’d expect a project to go better when its implementer has fully bought into the plan as opposed to carrying out what the funder wants. However, (a) this philosophy seems to contrast heavily with how most existing major funders operate; (b) I’ve seen multiple reasons to believe the “active” approach may have more relative merits than we had originally anticipated. […]
- In the nonprofit world of today, it seems to us that funder interests are major drivers of which ideas that get proposed and fleshed out, and therefore, as a funder, it’s important to express interests rather than trying to be fully “passive.”
- While we still wish to err on the side of being as “passive” as possible, we are recognizing the importance of clearly articulating our values/strategy, and also recognizing that an area can be underfunded even if we can’t easily find shovel-ready funding opportunities in it.
GiveWell earned some credibility from its novel, evidence-based outcome-oriented approach to charity evaluation. But this credibility was already – and still is – a sort of loan. We have GiveWell's predictions or promises of cost effectiveness in terms of outcomes, and we have figures for money moved, from which we can infer how much we were promised in improved outcomes. As far as I know, no one's gone back and checked whether those promises turned out to be true.
In the meantime, GiveWell then leveraged this credibility by extending its methods into more speculative domains, where less was checkable, and donors had to put more trust in the subjective judgment of GiveWell analysts. This was called GiveWell Labs. At the time, this sort of compounded leverage may have been sensible, but it's important to track whether a debt has been paid off or merely rolled over.
Version 2: The Open Philanthropy Project and control-seeking
Finally, the Open Philanthropy made its largest-ever single grant to purchase its founder a seat on a major organization’s board. This represents a transition from mere active funding to overtly purchasing influence:
The Open Philanthropy Project awarded a grant of $30 million ($10 million per year for 3 years) in general support to OpenAI. This grant initiates a partnership between the Open Philanthropy Project and OpenAI, in which Holden Karnofsky (Open Philanthropy’s Executive Director, “Holden” throughout this page) will join OpenAI’s Board of Directors and, jointly with one other Board member, oversee OpenAI’s safety and governance work.
We expect the primary benefits of this grant to stem from our partnership with OpenAI, rather than simply from contributing funding toward OpenAI’s work. While we would also expect general support for OpenAI to be likely beneficial on its own, the case for this grant hinges on the benefits we anticipate from our partnership, particularly the opportunity to help play a role in OpenAI’s approach to safety and governance issues.
Clearly the value proposition is not increasing available funds for OpenAI, if OpenAI’s founders’ billion-dollar commitment to it is real:
Sam, Greg, Elon, Reid Hoffman, Jessica Livingston, Peter Thiel, Amazon Web Services (AWS), Infosys, and YC Research are donating to support OpenAI. In total, these funders have committed $1 billion, although we expect to only spend a tiny fraction of this in the next few years.
The Open Philanthropy Project is neither using this money to fund programs that have a track record of working, nor to fund a specific program that it has prior reason to expect will do good. Rather, it is buying control, in the hope that Holden will be able to persuade OpenAI not to destroy the world, because he knows better than OpenAI’s founders.
How does the Open Philanthropy Project know that Holden knows better? Well, it’s done some active funding of programs it expects to work out. It expects those programs to work out because they were approved by a process similar to the one used by GiveWell to find charities that it expects to save lives.
If you want to acquire control over something, that implies that you think you can manage it more sensibly than whoever is in control already. Thus, buying control is a claim to have superior judgment - not just over others funding things (the original GiveWell pitch), but over those being funded.
In a footnote to the very post announcing the grant, the Open Philanthropy Project notes that it has historically tried to avoid acquiring leverage over organizations it supports, precisely because it’s not sure it knows better:
For now, we note that providing a high proportion of an organization’s funding may cause it to be dependent on us and accountable primarily to us. This may mean that we come to be seen as more responsible for its actions than we want to be; it can also mean we have to choose between providing bad and possibly distortive guidance/feedback (unbalanced by other stakeholders’ guidance/feedback) and leaving the organization with essentially no accountability.
This seems to describe two main problems introduced by becoming a dominant funder:
- People might accurately attribute causal responsibility for some of the organization's conduct to the Open Philanthropy Project.
- The Open Philanthropy Project might influence the organization to behave differently than it otherwise would.
The first seems obviously silly. I've been trying to correct the imbalance where Open Phil is criticized mainly when it makes grants, by criticizing it for holding onto too much money.
The second really is a cost as well as a benefit, and the Open Philanthropy Project has been absolutely correct to recognize this. This is the sort of thing GiveWell has consistently gotten right since the beginning and it deserves credit for making this principle clear and – until now – living up to it.
But discomfort with being dominant funders seems inconsistent with buying a board seat to influence OpenAI. If the Open Philanthropy Project thinks that Holden’s judgment is good enough that he should be in control, why only here? If he thinks that other Open Philanthropy Project AI safety grantees have good judgment but OpenAI doesn’t, why not give them similar amounts of money free of strings to spend at their discretion and see what happens? Why not buy people like Eliezer Yudkowsky, Nick Bostrom, or Stuart Russell a seat on OpenAI’s board?
On the other hand, the Open Philanthropy Project is right on the merits here with respect to safe superintelligence development. Openness makes sense for weak AI, but if you’re building true strong AI you want to make sure you’re cooperating with all the other teams in a single closed effort. I agree with the Open Philanthropy Project’s assessment of the relevant risks. But it's not clear to me how often joining the bad guys to prevent their worst excesses is a good strategy, and it seems like it has to often be a mistake. Still, I’m mindful of heroes like John Rabe, Chiune Sugihara, and Oscar Schindler. And if I think someone has a good idea for improving things, it makes sense to reallocate control from people who have worse ideas, even if there's some potential better allocation.
On the other hand, is Holden Karnofsky the right person to do this? The case is mixed.
He listens to and engages with the arguments from principled advocates for AI safety research, such as Nick Bostrom, Eliezer Yudkowsky, and Stuart Russell. This is a point in his favor. But, I can think of other people who engage with such arguments. For instance, OpenAI founder Elon Musk has publicly praised Bostrom’s book Superintelligence, and founder Sam Altman has written two blog posts summarizing concerns about AI safety reasonably cogently. Altman even asked Luke Muehlhauser, former executive director of MIRI, for feedback pre-publication. He's met with Nick Bostrom. That suggests a substantial level of direct engagement with the field, although Holden has engaged for a longer time, more extensively, and more directly.
Another point in Holden’s favor, from my perspective, is that under his leadership, the Open Philanthropy Project has funded the most serious-seeming programs for both weak and strong AI safety research. But Musk also managed to (indirectly) fund AI safety research at MIRI and by Nick Bostrom personally, via his $10 million FLI grant.
The Open Philanthropy Project also says that it expects to learn a lot about AI research from this, which will help it make better decisions on AI risk in the future and influence the field in the right way. This is reasonable as far as it goes. But remember that the case for positioning the Open Philanthropy Project to do this relies on the assumption that the Open Philanthropy Project will improve matters by becoming a central influencer in this field. This move is consistent with reaching that goal, but it is not independent evidence that the goal is the right one.
Overall, there are good narrow reasons to think that this is a potential improvement over the prior situation around OpenAI – but only a small and ill-defined improvement, at considerable attentional cost, and with the offsetting potential harm of increasing OpenAI's perceived legitimacy as a long-run AI safety organization.
And it’s worrying that Open Philanthropy Project’s largest grant – not just for AI risk, but ever (aside from GiveWell Top Charity funding) – is being made to an organization at which Holden’s housemate and future brother-in-law is a leading researcher. The nepotism argument is not my central objection. If I otherwise thought the grant were obviously a good idea, it wouldn’t worry me, because it’s natural for people with shared values and outlooks to become close nonprofessionally as well. But in the absence of a clear compelling specific case for the grant, it’s worrying.
Altogether, I'm not saying this is an unreasonable shift, considered in isolation. I’m not even sure this is a bad thing for the Open Philanthropy Project to be doing – insiders may have information that I don’t, and that is difficult to communicate to outsiders. But as outsiders, there comes a point when someone’s maxed out their moral credit, and we should wait for results before actively trying to entrust the Open Philanthropy Project and its staff with more responsibility.
EA Funds and self-recommendation
The Centre for Effective Altruism is actively trying to entrust the Open Philanthropy Project and its staff with more responsibility.
The concerns of CEA’s CEO William MacAskill about GiveWell have, as far as I can tell, never been addressed, and the underlying issues have only become more acute. But CEA is now working to put more money under the control of Open Philanthropy Project staff, through its new EA Funds product – a way for supporters to delegate giving decisions to expert EA “fund managers” by giving to one of four funds: Global Health and Development, Animal Welfare, Long-Term Future, and Effective Altruism Community.
The Effective Altruism movement began by saying that because very poor people exist, we should reallocate money from ordinary people in the developed world to the global poor. Now the pitch is in effect that because very poor people exist, we should reallocate money from ordinary people in the developed world to the extremely wealthy. This is a strange and surprising place to end up, and it’s worth retracing our steps. Again, I find it easiest to think of three stages:
- Money can go much farther in the developing world. Here, we’ve found some examples for you. As a result, you can do a huge amount of good by giving away a large share of your income, so you ought to.
- We’ve found ways for you to do a huge amount of good by giving away a large share of your income for developing-world interventions, so you ought to trust our recommendations. You ought to give a large share of your income to these weird things our friends are doing that are even better, or join our friends.
- We’ve found ways for you to do a huge amount of good by funding weird things our friends are doing, so you ought to trust the people we trust. You ought to give a large share of your income to a multi-billion-dollar foundation that funds such things.
Stage 1: The direct pitch
At first, Giving What We Can (the organization that eventually became CEA) had a simple, easy to understand pitch:
Giving What We Can is the brainchild of Toby Ord, a philosopher at Balliol College, Oxford. Inspired by the ideas of ethicists Peter Singer and Thomas Pogge, Toby decided in 2009 to commit a large proportion of his income to charities that effectively alleviate poverty in the developing world.
Discovering that many of his friends and colleagues were interested in making a similar pledge, Toby worked with fellow Oxford philosopher Will MacAskill to create an international organization of people who would donate a significant proportion of their income to cost-effective charities.
Giving What We Can launched in November 2009, attracting significant media attention. Within a year, 64 people had joined the society, their pledged donations amounting to $21 million. Initially run on a volunteer basis, Giving What We Can took on full-time staff in the summer of 2012.
In effect, its argument was: "Look, you can do huge amounts of good by giving to people in the developing world. Here are some examples of charities that do that. It seems like a great idea to give 10% of our income to those charities."
GWWC was a simple product, with a clear, limited scope. Its founders believed that people, including them, ought to do a thing – so they argued directly for that thing, using the arguments that had persuaded them. If it wasn't for you, it was easy to figure that out; but a surprisingly large number of people were persuaded by a simple, direct statement of the argument, took the pledge, and gave a lot of money to charities helping the world's poorest.
Stage 2: Rhetoric and belief diverge
Then, GWWC staff were persuaded you could do even more good with your money in areas other than developing-world charity, such as existential risk mitigation. Encouraging donations and work in these areas became part of the broader Effective Altruism movement, and GWWC's umbrella organization was named the Centre for Effective Altruism. So far, so good.
But this left Effective Altruism in an awkward position; while leadership often personally believe the most effective way to do good is far-future stuff or similarly weird-sounding things, many people who can see the merits of the developing-world charity argument reject the argument that because the vast majority of people live in the far future, even a very small improvement in humanity’s long-run prospects outweighs huge improvements on the global poverty front. They also often reject similar scope-sensitive arguments for things like animal charities.
Giving What We Can's page on what we can achieve still focuses on global poverty, because developing-world charity is easier to explain persuasively. However, EA leadership tends to privately focus on things like AI risk. Two years ago many attendees at the EA Global conference in the San Francisco Bay Area were surprised that the conference focused so heavily on AI risk, rather than the global poverty interventions they’d expected.
Stage 3: Effective altruism is self-recommending
Shortly before the launch of the EA Funds I was told in informal conversations that they were a response to demand. Giving What We Can pledge-takers and other EA donors had told CEA that they trusted it to GWWC pledge-taker demand. CEA was responding by creating a product for the people who wanted it.
This seemed pretty reasonable to me, and on the whole good. If someone wants to trust you with their money, and you think you can do something good with it, you might as well take it, because they’re estimating your skill above theirs. But not everyone agrees, and as the Madoff case demonstrates, "people are begging me to take their money" is not a definitive argument that you are doing anything real.
In practice, the funds are managed by Open Philanthropy Project staff:
We want to keep this idea as simple as possible to begin with, so we’ll have just four funds, with the following managers:
- Global Health and Development - Elie Hassenfeld
- Animal Welfare – Lewis Bollard
- Long-run future – Nick Beckstead
- Movement-building – Nick Beckstead
(Note that the meta-charity fund will be able to fund CEA; and note that Nick Beckstead is a Trustee of CEA. The long-run future fund and the meta-charity fund continue the work that Nick has been doing running the EA Giving Fund.)
It’s not a coincidence that all the fund managers work for GiveWell or Open Philanthropy. First, these are the organisations whose charity evaluation we respect the most. The worst-case scenario, where your donation just adds to the Open Philanthropy funding within a particular area, is therefore still a great outcome. Second, they have the best information available about what grants Open Philanthropy are planning to make, so have a good understanding of where the remaining funding gaps are, in case they feel they can use the money in the EA Fund to fill a gap that they feel is important, but isn’t currently addressed by Open Philanthropy.
In past years, Giving What We Can recommendations have largely overlapped with GiveWell’s top charities.
In the comments on the launch announcement on the EA Forum, several people (including me) pointed out that the Open Philanthropy Project seems to be having trouble giving away even the money it already has, so it seems odd to direct more money to Open Philanthropy Project decisionmakers. CEA’s senior marketing manager replied that the Funds were a minimum viable product to test the concept:
I don't think the long-term goal is that OpenPhil program officers are the only fund managers. Working with them was the best way to get an MVP version in place.
This also seemed okay to me, and I said so at the time.
[NOTE: I've edited the next paragraph to excise some unreliable information. Sorry for the error, and thanks to Rob Wiblin for pointing it out.]
After they were launched, though, I saw phrasings that were not so cautious at all, instead making claims that this was generally a better way to give. As of writing this, if someone on the effectivealtruism.org website clicks on "Donate Effectively" they will be led directly to a page promoting EA Funds. When I looked at Giving What We Can’s top charities page in early April, it recommended the EA Funds "as the highest impact option for donors."
This is not a response to demand, it is an attempt to create demand by using CEA's authority, telling people that the funds are better than what they're doing already. By contrast, GiveWell's Top Charities page simply says:
Our top charities are evidence-backed, thoroughly vetted, underfunded organizations.
This carefully avoids any overt claim that they're the highest-impact option available to donors. GiveWell avoids saying that because there's no way they could know it, so saying it wouldn't be truthful.
A marketing email might have just been dashed off quickly, and an exaggerated wording might just have been an oversight. But when I looked at Giving What We Can’s top charities page in early April, it recommended the EA Funds "as the highest impact option for donors."
The wording has since been qualified with “for most donors”, which is a good change. But the thing I’m worried about isn’t just the explicit exaggerated claims – it’s the underlying marketing mindset that made them seem like a good idea in the first place. EA seems to have switched from an endorsement of the best things outside itself, to an endorsement of itself. And it's concentrating decisionmaking power in the Open Philanthropy Project.
Effective altruism is overextended, but it doesn't have to be
There is a saying in finance, that was old even back when Keynes said it. If you owe the bank a million dollars, then you have a problem. If you owe the bank a billion dollars, then the bank has a problem.
In other words, if someone extends you a level of trust they could survive writing off, then they might call in that loan. As a result, they have leverage over you. But if they overextend, putting all their eggs in one basket, and you are that basket, then you have leverage over them; you're too big to fail. Letting you fail would be so disastrous for their interests that you can extract nearly arbitrary concessions from them, including further investment. For this reason, successful institutions often try to diversify their investments, and avoid overextending themselves. Regulators, for the same reason, try to prevent banks from becoming "too big to fail."
The Effective Altruism movement is concentrating decisionmaking power and trust as much as possible, in a way that's setting itself up to invest ever increasing amounts of confidence to keep the game going.
The alternative is to keep the scope of each organization narrow, overtly ask for trust for each venture separately, and make it clear what sorts of programs are being funded. For instance, Giving What We Can should go back to its initial focus of global poverty relief.
Like many EA leaders, I happen to believe that anything you can do to steer the far future in a better direction is much, much more consequential for the well-being of sentient creatures than any purely short-run improvement you can create now. So it might seem odd that I think Giving What We Can should stay focused on global poverty. But, I believe that the single most important thing we can do to improve the far future is hold onto our ability to accurately build shared models. If we use bait-and-switch tactics, we are actively eroding the most important type of capital we have – coordination capacity.
If you do not think giving 10% of one's income to global poverty charities is the right thing to do, then you can't in full integrity urge others to do it – so you should stop. You might still believe that GWWC ought to exist. You might still believe that it is a positive good to encourage people to give much of their income to help the global poor, if they wouldn't have been doing anything else especially effective with the money. If so, and you happen to find yourself in charge of an organization like Giving What We Can, the thing to do is write a letter to GWWC members telling them that you've changed your mind, and why, and offering to give away the brand to whoever seems best able to honestly maintain it.
If someone at the Centre for Effective Altruism fully believes in GWWC's original mission, then that might make the transition easier. If not, then one still has to tell the truth and do what's right.
And what of the EA Funds? The Long-Term Future Fund is run by Open Philanthropy Project Program Officer Nick Beckstead. If you think that it's a good thing to delegate giving decisions to Nick, then I would agree with you. Nick's a great guy! I'm always happy to see him when he shows up at house parties. He's smart, and he actively seeks out arguments against his current point of view. But the right thing to do, if you want to persuade people to delegate their giving decisions to Nick Beckstead, is to make a principled case for delegating giving decisions to Nick Beckstead. If the Centre for Effective Altruism did that, then Nick would almost certainly feel more free to allocate funds to the best things he knows about, not just the best things he suspects EA Funds donors would be able to understand and agree with.
If you can't directly persuade people, then maybe you're wrong. If the problem is inferential distance, then you've got some work to do bridging that gap.
There's nothing wrong with setting up a fund to make it easy. It's actually a really good idea. But there is something wrong with the multiple layers of vague indirection involved in the current marketing of the Far Future fund – using global poverty to sell the generic idea of doing the most good, then using CEA's identity as the organization in charge of doing the most good to persuade people to delegate their giving decisions to it, and then sending their money to some dude at the multi-billion-dollar foundation to give away at his personal discretion. The same argument applies to all four Funds.
Likewise, if you think that working directly on AI risk is the most important thing, then you should make arguments directly for working on AI risk. If you can't directly persuade people, then maybe you're wrong. If the problem is inferential distance, it might make sense to imitate the example of someone like Eliezer Yudkowsky, who used indirect methods to bridge the inferential gap by writing extensively on individual human rationality, and did not try to control others' actions in the meantime.
If Holden thinks he should be in charge of some AI safety research, then he should ask Good Ventures for funds to actually start an AI safety research organization. I'd be excited to see what he'd come up with if he had full control of and responsibility for such an organization. But I don't think anyone has a good plan to work directly on AI risk, and I don't have one either, which is why I'm not directly working on it or funding it. My plan for improving the far future is to build human coordination capacity.
(If, by contrast, Holden just thinks there needs to be coordination between different AI safety organizations, the obvious thing to do would be to work with FLI on that, e.g. by giving them enough money to throw their weight around as a funder. They organized the successful Puerto Rico conference, after all.)
Another thing that would be encouraging would be if at least one of the Funds were not administered entirely by an Open Philanthropy Project staffer, and ideally an expert who doesn't benefit from the halo of "being an EA." For instance, Chris Blattman is a development economist with experience designing programs that don't just use but generate evidence on what works. When people were arguing about whether sweatshops are good or bad for the global poor, he actually went and looked by performing a randomized controlled trial. He's leading two new initiatives with J-PAL and IPA, and expects that directors designing studies will also have to spend time fundraising. Having funding lined up seems like the sort of thing that would let them spend more time actually running programs. And more generally, he seems likely to know about funding opportunities the Open Philanthropy Project doesn't, simply because he's embedded in a slightly different part of the global health and development network.
Narrower projects that rely less on the EA brand and more on what they're actually doing, and more cooperation on equal terms with outsiders who seem to be doing something good already, would do a lot to help EA grow beyond putting stickers on its own behavior chart. I'd like to see EA grow up. I'd be excited to see what it might do.
- Good programs don't need to distort the story people tell about them, while bad programs do.
- Moral confidence games – treating past promises and trust as a track record to justify more trust – are an example of the kind of distortion mentioned in (1), that benefits bad programs more than good ones.
- The Open Philanthropy Project's Open AI grant represents a shift from evaluating other programs' effectiveness, to assuming its own effectiveness.
- EA Funds represents a shift from EA evaluating programs' effectiveness, to assuming EA's effectiveness.
- A shift from evaluating other programs' effectiveness, to assuming one's own effectiveness, is an example of the kind of "moral confidence game" mentioned in (2).
- EA ought to focus on scope-limited projects, so that it can directly make the case for those particular projects instead of relying on EA identity as a reason to support an EA organization.
- EA organizations ought to entrust more responsibility to outsiders who seem to be doing good things but don't overtly identify as EA, instead of trying to keep it all in the family.
The Open Philanthropy Project recently bought a seat on the board of the billion-dollar nonprofit AI research organization OpenAI for $30 million. Some people have said that this was surprisingly cheap, because the price in dollars was such a low share of OpenAI's eventual endowment: 3%.
To the contrary, this seat on OpenAI's board is very expensive, not because the nominal price is high, but precisely because it is so low.
If OpenAI hasn’t extracted a meaningful-to-it amount of money, then it follows that it is getting something other than money out of the deal. The obvious thing it is getting is buy-in for OpenAI as an AI safety and capacity venture. In exchange for a board seat, the Open Philanthropy Project is aligning itself socially with OpenAI, by taking the position of a material supporter of the project. The important thing is mutual validation, and a nominal donation just large enough to neg the other AI safety organizations supported by the Open Philanthropy Project is simply a customary part of the ritual.
By my count, the grant is larger than all the Open Philanthropy Project's other AI safety grants combined.
(Cross-posted at my personal blog.)
If there's anything we can do now about the risks of superintelligent AI, then OpenAI makes humanity less safe.
Once upon a time, some good people were worried about the possibility that humanity would figure out how to create a superintelligent AI before they figured out how to tell it what we wanted it to do. If this happened, it could lead to literally destroying humanity and nearly everything we care about. This would be very bad. So they tried to warn people about the problem, and to organize efforts to solve it.
Specifically, they called for work on aligning an AI’s goals with ours - sometimes called the value alignment problem, AI control, friendly AI, or simply AI safety - before rushing ahead to increase the power of AI.
Some other good people listened. They knew they had no relevant technical expertise, but what they did have was a lot of money. So they did the one thing they could do - throw money at the problem, giving it to trusted parties to try to solve the problem. Unfortunately, the money was used to make the problem worse. This is the story of OpenAI.
Before I go on, two qualifiers:
- This post will be much easier to follow if you have some familiarity with the AI safety problem. For a quick summary you can read Scott Alexander’s Superintelligence FAQ. For a more comprehensive account see Nick Bostrom’s book Superintelligence.
- AI is an area in which even most highly informed people should have lots of uncertainty. I wouldn't be surprised if my opinion changes a lot after publishing this post, as I learn relevant information. I'm publishing this because I think this process should go on in public.
The story of OpenAI
Before OpenAI, there was DeepMind, a for-profit venture working on "deep learning” techniques. It was widely regarded as the advanced AI research organization. If any current effort was going to produce superhuman intelligence, it was DeepMind.
Elsewhere, industrialist Elon Musk was working on more concrete (and largely successful) projects to benefit humanity, like commercially viable electric cars, solar panels cheaper than ordinary roofing, cheap spaceflight with reusable rockets, and a long-run plan for a Mars colony. When he heard the arguments people like Eliezer Yudkowsky and Nick Bostrom were making about AI risk, he was persuaded that there was something to worry about - but he initially thought a Mars colony might save us. But when DeepMind’s head, Demis Hassabis, pointed out that this wasn't far enough to escape the reach of a true superintelligence, he decided he had to do something about it:
Hassabis, a co-founder of the mysterious London laboratory DeepMind, had come to Musk’s SpaceX rocket factory, outside Los Angeles, a few years ago. […] Musk explained that his ultimate goal at SpaceX was the most important project in the world: interplanetary colonization.
Hassabis replied that, in fact, he was working on the most important project in the world: developing artificial super-intelligence. Musk countered that this was one reason we needed to colonize Mars—so that we’ll have a bolt-hole if A.I. goes rogue and turns on humanity. Amused, Hassabis said that A.I. would simply follow humans to Mars.
Musk is not going gently. He plans on fighting this with every fiber of his carbon-based being. Musk and Altman have founded OpenAI, a billion-dollar nonprofit company, to work for safer artificial intelligence.
OpenAI’s primary strategy is to hire top AI researchers to do cutting-edge AI capacity research and publish the results, in order to ensure widespread access. Some of this involves making sure AI does what you meant it to do, which is a form of the value alignment problem mentioned above.
Intelligence and superintelligence
No one knows exactly what research will result in the creation of a general intelligence that can do anything a human can, much less a superintelligence - otherwise we’d already know how to build one. Some AI research is clearly not on the path towards superintelligence - for instance, applying known techniques to new fields. Other AI research is more general, and might plausibly be making progress towards a superintelligence. It could be that the sort of research DeepMind and OpenAI are working on is directly relevant to building a superintelligence, or it could be that their methods will tap out long before then. These are different scenarios, and need to be evaluated separately.
What if OpenAI and DeepMind are working on problems relevant to superintelligence?
If OpenAI is working on things that are directly relevant to the creation of a superintelligence, then its very existence makes an arms race with DeepMind more likely. This is really bad! Moreover, sharing results openly makes it easier for other institutions or individuals, who may care less about safety, to make progress on building a superintelligence.
Arms races are dangerous
One thing nearly everyone thinking seriously about the AI problem agrees on, is that an arms race towards superintelligence would be very bad news. The main problem occurs in what is called a “fast takeoff” scenario. If AI progress is smooth and gradual even past the point of human-level AI, then we may have plenty of time to correct any mistakes we make. But if there’s some threshold beyond which an AI would be able to improve itself faster than we could possibly keep up with, then we only get one chance to do it right.
AI value alignment is hard, and AI capacity is likely to be easier, so anything that causes an AI team to rush makes our chances substantially worse; if they get safety even slightly wrong but get capacity right enough, we may all end up dead. But you’re worried that the other team will unleash a potentially dangerous superintelligence first, then you might be willing to skip some steps on safety to preempt them. But they, having more reason to trust themselves than you, might notice that you’re rushing ahead, get worried that your team will destroy the world, and rush their (probably safe but they’re not sure) AI into existence.
OpenAI promotes competition
DeepMind used to be the standout AI research organization. With a comfortable lead on everyone else, they would be able to afford to take their time to check their work if they thought they were on the verge of doing something really dangerous. But OpenAI is now widely regarded as a credible close competitor. However dangerous you think DeepMind might have been in the absence of an arms race dynamic, this makes them more dangerous, not less. Moreover, by sharing their results, they are making it easier to create other close competitors to DeepMind, some of whom may not be so committed to AI safety.
We at least know that DeepMind, like OpenAI, has put some resources into safety research. What about the unknown people or organizations who might leverage AI capacity research published by OpenAI?
For more on how openly sharing technology with extreme destructive potential might be extremely harmful, see Scott Alexander’s Should AI be Open?, and Nick Bostrom’s Strategic Implications of Openness in AI Development.
What if OpenAI and DeepMind are not working on problems relevant to superintelligence?
Suppose OpenAI and DeepMind are largely not working on problems highly relevant to superintelligence. (Personally I consider this the more likely scenario.) By portraying short-run AI capacity work as a way to get to safe superintelligence, OpenAI’s existence diverts attention and resources from things actually focused on the problem of superintelligence value alignment, such as MIRI or FHI.
I suspect that in the long-run this will make it harder to get funding for long-run AI safety organizations. The Open Philanthropy Project just made its largest grant ever, to Open AI, to buy a seat on OpenAI’s board for Open Philanthropy Project executive director Holden Karnofsky. This is larger than their recent grants to MIRI, FHI, FLI, and the Center for Human-Compatible AI all together.
But the problem is not just money - it’s time and attention. The Open Philanthropy Project doesn’t think OpenAI is underfunded, and could do more good with the extra money. Instead, it seems to think that Holden can be a good influence on OpenAI. This means that of the time he's allocating to AI safety, a fair amount has been diverted to OpenAI.
This may also make it harder for organizations specializing in the sort of long-run AI alignment problems that don't have immediate applications to attract top talent. People who hear about AI safety research and are persuaded to look into it will have a harder time finding direct efforts to solve key long-run problems, since an organization focused on increasing short-run AI capacity will dominate AI safety's public image.
Why do good inputs turn bad?
OpenAI was founded by people trying to do good, and has hired some very good and highly talented people. It seems to be doing genuinely good capacity research. To the extent to which this is not dangerously close to superintelligence, it’s better to share this sort of thing than not – they could create a huge positive externality. They could construct a fantastic public good. Making the world richer in a way that widely distributes the gains is very, very good.
Separately, many people at OpenAI seem genuinely concerned about AI safety, want to prevent disaster, and have done real work to promote long-run AI safety research. For instance, my former housemate Paul Christiano, who is one of the most careful and insightful AI safety thinkers I know of, is currently employed at OpenAI. He is still doing AI safety work – for instance, he coauthored Concrete Problems in AI Safety with, among others, Dario Amodei, another OpenAI researcher.
Unfortunately, I don’t see how those two things make sense jointly in the same organization. I’ve talked with a lot of people about this in the AI risk community, and they’ve often attempted to steelman the case for OpenAI, but I haven’t found anyone willing to claim, as their own opinion, that OpenAI as conceived was a good idea. It doesn’t make sense to anyone, if you’re worried at all about the long-run AI alignment problem.
Something very puzzling is going on here. Good people tried to spend money on addressing an important problem, but somehow the money got spent on the thing most likely to make that exact problem worse. Whatever is going on here, it seems important to understand if you want to use your money to better the world.
(Cross-posted at my personal blog.)
I am surrounded by well-meaning people trying to take responsibility for the future of the universe. I think that this attitude – prominent among Effective Altruists – is causing great harm. I noticed this as part of a broader change in outlook, which I've been trying to describe on this blog in manageable pieces (and sometimes failing at the "manageable" part).
I'm going to try to contextualize this by outlining the structure of my overall argument.
Why I am worried
Effective Altruists often say they're motivated by utilitarianism. At its best, this leads to things like Katja Grace's excellent analysis of when to be a vegetarian. We need more of this kind of principled reasoning about tradeoffs.
At its worst, this leads to some people angsting over whether it's ethical to spend money on a cup of coffee when they might have saved a life, and others using the greater good as license to say things that are not quite true, socially pressure others into bearing inappropriate burdens, and make ever-increasing claims on resources without a correspondingly strong verified track record of improving people's lives. I claim that these actions are not in fact morally correct, and that people keep winding up endorsing those conclusions because they are using the wrong cognitive approximations to reason about morality.
Summary of the argument
- When people take responsibility for something, they try to control it. So, universal responsibility implies an attempt at universal control.
- Maximizing control has destructive effects:
- An adversarial stance towards other agents.
- Decision paralysis.
- These failures are not accidental, but baked into the structure of control-seeking. We need a practical moral philosophy to describe strategies that generalize better, and benefit from the existence of other benevolent agents, rather than treating them primarily as threats.
Responsibility implies control
In practice, the way I see the people around me applying utilitarianism, it seems to make two important moral claims:
- You - you, personally - are responsible for everything that happens.
- No one is allowed their own private perspective - everyone must take the public, common perspective.
The first principle is almost but not quite simple consequentialism. But it's important to note that it actually doesn't generalize; it's massive double-counting if each individual person is responsible for everything that happens. I worked through an example of the double-counting problem in my post on matching donations.
The second principle follows from the first one. If you think you're personally responsible for everything that happens, and obliged to do something about that rather than weigh your taste accordingly – and you also believe that there are ways to have an outsized impact (e.g. that you can reliably save a life for a few thousand dollars) – then in some sense nothing is yours. The money you spent on that cup of coffee could have fed a poor family for a day in the developing world. It's only justified if the few minutes you save somehow produce more value.
One way of resolving this is simply to decide that you're entitled to only as much as the global poor, and try to do without the rest to improve their lot. This is the reasoning behind the notorious demandingness of utilitarianism.
But of course, other people are also making suboptimal uses of resources. So if you can change that, then it becomes your responsibility to do so.
In general, if Alice and Bob both have some money, and Alice is making poor use of money by giving to the Society to Cure Rare Diseases in Cute Puppies, and Bob is giving money to comparatively effective charities like the Against Malaria Foundation, then if you can cause one of them to have access to more money, you'd rather help Bob than Alice.
There's no reason for this to be different if you are one of Bob and Alice. And since you've already rejected your own private right to hold onto things when there are stronger global claims to do otherwise, there's no principled reason not to try to reallocate resources from the other person to you.
What you're willing to do to yourself, you'll be willing to do to others. Respecting their autonomy becomes a mere matter of either selfishly indulging your personal taste for "deontological principles," or a concession made because they won't accept your leadership if you're too demanding - not a principled way to cooperate with them. You end up trying to force yourself and others to obey your judgment about what actions are best.
If you think of yourself as a benevolent agent, and think of the rest of the world and all the people in it in as objects with regular, predictable behaviors you can use to improve outcomes, then you'll feel morally obliged - and therefore morally sanctioned - to shift as much of the locus of control as possible to yourself, for the greater good.
If someone else seems like a better candidate, then the right thing to do seems like throwing your lot in with them, and transferring as much as you can to them rather than to yourself. So this attitude towards doing good leads either to personal control-seeking, or support of someone else's bid for the same.
I think that this reasoning is tacitly accepted by many Effective Altruists, and explains two seemingly opposite things:
- Some EAs get their act together and make power plays, implicitly claiming the right to deceive and manipulate to implement their plan.
- Some EAs are paralyzed by the impossibility of weighing the consequences for the universe of every act, and collapse into perpetual scrupulosity and anxiety, mitigated only by someone else claiming legitimacy, telling them what to do, and telling them how much is enough.
Interestingly, people in the second category are somewhat useful for people following the strategy of the first category, as they demonstrate demand for the service of telling other people what to do. (I think the right thing to do is largely to decline to meet this demand.)
Objectivists sometimes criticize "altruistic" ventures by insisting on Ayn Rand's definition of altruism as the drive to self-abnegation, rather than benevolence. I used to think that this was obnoxiously missing the point, but now I think this might be a fair description of a large part of what I actually see. (I'm very much not sure I'm right. I am sure I'm not describing all of Effective Altruism – many people are doing good work for good reasons.)
Control-seeking is harmful
You have to interact with other people somehow, since they're where most of the value is in our world, and they have a lot of causal influence on the things you care about. If you don't treat them as independent agents, and you don't already rule over them, you will default to going to war against them (and more generally trying to attain control and then make all the decisions) rather than trading with them (or letting them take care of a lot of the decisionmaking). This is bad because it destroys potential gains from trade and division of labor, because you win conflicts by destroying things of value, and because even when you win you unnecessarily become a bottleneck.
People who think that control-seeking is the best strategy for benevolence tend to adopt plans like this:
Step 1 – acquire control over everything.
Step 2 – optimize it for the good of all sentient beings.
The problem with this is that step 1 does not generalize well. There are lots of different goals for which step 1 might seem like an appealing first step, so you should expect lots of other people to be trying, and their interests will all be directly opposed to yours. Your methods will be nearly the same as the methods for someone with a different step 2. You'll never get to step 2 of this plan; it's been tried many times before, and failed every time.
Lots of different types of people want more resources. Many of them are very talented. You should be skeptical about your ability to win without some massive advantage. So, what you're left with are your proximate goals. Your impact on the world will be determined by your means, not your ends.
What are your means?
Even though you value others' well-being intrinsically, when pursuing your proximate goals, their agency mostly threatens to muck up your plans. Consequently, it will seem like a bad idea to give them info or leave them resources that they might misuse.
You will want to make their behavior more predictable to you, so you can influence it better. That means telling simplified stories designed to cause good actions, rather than to directly transmit relevant information. Withholding, rather than sharing, information. Message discipline. I wrote about this problem in my post on the humility argument for honesty.
And if the words you say are tools for causing others to take specific actions, then you're corroding their usefulness for literally true descriptions of things far away or too large or small to see. Peter Singer's claim that you can save a life for hundreds of dollars by giving to developing-world charities no longer means that you can save a life for hundreds of dollars by giving to developing-world charities. It simply means that Peter Singer wants to motivate you to give to developing-world charities. I wrote about this problem in my post on bindings and assurances.
More generally, you will try to minimize others' agency. If you believe that other people are moral agents with common values, then e.g. withholding information means that the friendly agents around you are more poorly informed, which is obviously bad, even before taking into account trust considerations! This plan only makes sense if you basically believe that other people are moral patients, but independent, friendly agents do not exist; that you are the only person in the world who can be responsible for anything.
Another specific behavioral consequence is that you'll try to acquire resources even when you have no specific plan for them. For instance, GiveWell's impact page tracks costs they've imposed on others – money moved, and attention in the form of visits to their website – but not independent measures of outcomes improved, or the opportunity cost of people who made a GiveWell-influenced donation. The implication is that people weren't doing much good with their money or time anyway, so it's a "free lunch" to gain control over these.<fn>Their annual metrics report goes into more detail and does track this, and finds that about a quarter of GiveWell-influenced donations were reallocated from other developing-world charities (and another quarter from developed-world charities).</fn> By contrast, the Gates foundation's Valentine's day report to Warren Buffet tracks nothing but developing-world outcomes (but then absurdly takes credit for 100% of the improvement).
As usual, I'm not picking on GiveWell because they're unusually bad – I'm picking on GiveWell because they're unusually open. You should assume that similar but more secretive organizations are worse by default, not better.
This kind of divergent strategy doesn't just directly inflict harms on other agents. It takes resources away from other agents that aren't defending themselves, which forces them into a more adversarial stance. It also earns justified mistrust, which means that if you follow this strategy, you burn cooperative bridges, forcing yourself farther down the adversarial path.
I've written more about the choice between convergent and divergent strategies in my post about the neglectedness consideration.
Simple patches don't undo the harms from adversarial strategies
Since you're benevolent, you have the advantage of a goal in common with many other people. Without abandoning your basic acquisitive strategy, you could try to have a secret handshake among people trying to take over the world for good reasons rather than bad. Ideally, this would let the benevolent people take over the world, cooperating among themselves. But, in practice, any simple shibboleth can be faked; anyone can say they're acquiring power for the greater good.
It's a commonplace in various discussions among Effective Altruists, when someone identifies an individual or organization doing important work, to suggest that we "persuade them to become an EA" or "get an EA in the organization", rather than directly about ways to open up a dialogue and cooperate. This is straightforwardly an attempt to get them to agree to the same shibboleths in order to coordinate on a power-grabbing strategy. And yet, the standard of evidence we're using is mostly "identifies as an EA".
When Gleb Tsipursky tried to extract resources from the Effective Altruism movement with straightforward low-quality mimesis, mouthing the words but not really adding value, and grossly misrepresenting what he was doing and his level of success, it took EAs a long time to notice the pattern of misbehavior. I don't think this is because Gleb is especially clever, or because EAs are especially bad at noticing things. I think this is because EAs identify each other by easy-to-mimic shibboleths rather than meaningful standards of behavior.
Nor is Effective Altruism unique in suffering from this problem. When the Roman empire became too big to govern, gradually emperors hit upon the solution of dividing the empire in two and picking someone to govern the other half. This occasionally worked very well, when the two emperors had a strong preexisting bond, but generally they distrusted each other enough that the two empires behaved like rival states as often as they behaved like allies. Even though both emperors were Romans, and often close relatives!
Using "believe me" as our standard of evidence will not work out well for us. The President of the United States seems to have followed the strategy of saying the thing that's most convenient, whether or not it happens to be true, and won an election based on this. Others can and will use this strategy against us.
We can do better
The above is all a symptom of not including other moral agents in your model of the world. We need a moral theory that takes this into account in its descriptions (rather than having to do a detailed calculation each time), and yet is scope-sensitive and consequentialist the way EAs want to be.
There are two important desiderata for such a theory:
- It needs to take into account the fact that there are other agents who also have moral reasoning. We shouldn't be sad to learn that others reason the way we do.
- Graceful degradation. We can't be so trusting that we can be defrauded by anyone willing to say they're one of us. Our moral theory has to work even if not everyone follows it. It should also degrade gracefully within an individual – you shouldn't have to be perfect to see benefits.
One thing we can do now is stop using wrong moral reasoning to excuse destructive behavior. Until we have a good theory, the answer is we don't know if your clever argument is valid.
On the explicit and systematic level, the divergent force is so dominant in our world that sincere benevolent people simply assume, when they see someone overtly optimizing for an outcome, that this person is optimizing for evil. This leads to perceptive people who don’t like doing harm, like Venkatesh Rao, to explicitly advise others to minimize their measurable impact on the world.
I don't think this impact-minimization is right, but on current margins it's probably a good corrective.
One encouraging thing is that many people using common-sense moral reasoning already behave according to norms that respect and try to cooperate with the moral agency of others. I wrote about this in Humble Charlie.
I've also begun to try to live up to cooperative heuristics even if I don't have all the details worked out, and help my friends do the same. For instance, I'm happy to talk to people making giving decisions, but usually I don't go any farther than connecting them with people they might be interested in, or coaching them through heuristics, because doing more would be harmful, it would destroy information, and I'm not omniscient, otherwise I'd be richer.
A movement like Effective Altruism, explicitly built around overt optimization, can only succeed in the long run at actually doing good with (a) a clear understanding of this problem, (b) a social environment engineered to robustly reject cost-maximization, and (c) an intellectual tradition of optimizing only for actually good things that people can anchor on and learn from.
This was only a summary. I don't expect many people to be persuaded by this alone. I'm going to fill in the details in the future posts. If you want to help me write things that are relevant, you can respond to this (preferably publicly), letting me know:
- What seems clearly true?
- Which parts seem most surprising and in need of justification or explanation?
(Cross-posted at my personal blog.)
Some theater people at NYU people wanted to demonstrate how gender stereotypes affected the 2016 US presidential election. So they decided to put on a theatrical performance of the presidential debates – but with the genders of the principals swapped. They assumed that this would show how much of a disadvantage Hillary Clinton was working under because of her gender. They were shocked to discover the opposite – audiences full of Clinton supporters, watching the gender-swapped debates, came away thinking that Trump was a better communicator than they'd thought.
The principals don't seem to have come into this with a fair-minded attitude. Instead, it seems to have been a case of "I'll show them!":
Salvatore says he and Guadalupe began the project assuming that the gender inversion would confirm what they’d each suspected watching the real-life debates: that Trump’s aggression—his tendency to interrupt and attack—would never be tolerated in a woman, and that Clinton’s competence and preparedness would seem even more convincing coming from a man.
Let's be clear about this. This was not epistemic even-handedness. This was a sincere attempt at confirmation bias. They believed one thing, and looked only for confirming evidence to prove their point. It was only when they started actually putting together the experiment that they realized they might learn the opposite lesson:
But the lessons about gender that emerged in rehearsal turned out to be much less tidy. What was Jonathan Gordon smiling about all the time? And didn’t he seem a little stiff, tethered to rehearsed statements at the podium, while Brenda King, plainspoken and confident, freely roamed the stage? Which one would audiences find more likeable?
What made this work? I think what happened is that they took their own beliefs literally. They actually believed that people hated Hillary because she was a woman, and so their idea of something that they were confident would show this clearly was a fair test. Because of this, when things came out the opposite of the way they'd predicted, they noticed and were surprised, because they actually expected the demonstration to work.
But they went further. Even though they knew in advance of the public performances that the experiment got the wrong answer, they neither falsified nor file-drawered the evidence. They tried to show, they got a different answer, they showed it anyway.
This is much, much better science than contemporary medical or psychology research were before the replication crisis.
Sometimes, when I think about how epistemically corrupt our culture is, I'm tempted to adopt a permanent defensive crouch and disbelieve anything I can't fact-check, to explicitly adjust for all the relevant biases, and this prospect sounds exhausting. It's not actually necessary. You don't have to worry too much about your biases. Just take your own beliefs literally, as though they mean what they say they mean, and try to believe all their consequences as well. And, when you hit a contradiction – well, now you have an opportunity to learn where you're wrong.
(Cross-posted at my personal blog.)
View more: Next