Filter This month

Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Could you be Prof Nick Bostrom's sidekick?

45 RobertWiblin 05 December 2014 01:09AM

If funding were available, the Centre for Effective Altruism would consider hiring someone to work closely with Prof Nick Bostrom to provide anything and everything he needs to be more productive. Bostrom is obviously the Director of the Future of Humanity Institute at Oxford University, and author of Superintelligence, the best guide yet to the possible risks posed by artificial intelligence.

Nobody has yet confirmed they will fund this role, but we are nevertheless interested in getting expressions of interest from suitable candidates.

The list of required characteristics is hefty, and the position would be a challenging one:

  • Willing to commit to the role for at least a year, and preferably several
  • Able to live and work in Oxford during this time
  • Conscientious and discreet
  • Trustworthy
  • Able to keep flexible hours (some days a lot of work, others not much)
  • Highly competent at almost everything in life (for example, organising travel, media appearances, choosing good products, and so on)
  • Will not screw up and look bad when dealing with external parties (e.g. media, event organisers, the university)
  • Has a good personality 'fit' with Bostrom
  • Willing to do some tasks that are not high-status
  • Willing to help Bostrom with both his professional and personal life (to free up his attention)
  • Can speak English well
  • Knowledge of rationality, philosophy and artificial intelligence would also be helpful, and would allow you to also do more work as a research assistant.

The research Bostrom can do is unique; to my knowledge we don't have anyone who has made such significant strides clarifying the biggest risks facing humanity as a whole. As a result, helping increase Bostrom's output by say, 20%, would be a major contribution. This person's work would also help the rest of the Future of Humanity Institute run smoothly.

The role would offer significant skill development in operations, some skill development in communications and research, and the chance to build extensive relationships with the people and organisations working on existential risks.

If you would like to know more, or be added to the list of potential candidates, please email me: robert [dot] wiblin [at] centreforeffectivealtruism [dot] org. Feel free to share this post around.

Note that we are also hiring for a bunch of other roles, with applications closing Friday the 12th December.

 

Breaking the vicious cycle

39 XiXiDu 23 November 2014 06:25PM

You may know me as the guy who posts a lot of controversial stuff about LW and MIRI. I don't enjoy doing this and do not want to continue with it. One reason being that the debate is turning into a flame war. Another reason is that I noticed that it does affect my health negatively (e.g. my high blood pressure (I actually had a single-sided hearing loss over this xkcd comic on Friday)).

This all started in 2010 when I encountered something I perceived to be wrong. But the specifics are irrelevant for this post. The problem is that ever since that time there have been various reasons that made me feel forced to continue the controversy. Sometimes it was the urge to clarify what I wrote, other times I thought it was necessary to respond to a reply I got. What matters is that I couldn't stop. But I believe that this is now possible, given my health concerns.

One problem is that I don't want to leave possible misrepresentations behind. And there very likely exist misrepresentations. There are many reasons for this, but I can assure you that I never deliberately lied and that I never deliberately tried to misrepresent anyone. The main reason might be that I feel very easily overwhelmed and never had the ability to force myself to invest the time that is necessary to do something correctly if I don't really enjoy doing it (for the same reason I probably failed school). Which means that most comments and posts are written in a tearing hurry, akin to a reflexive retraction from the painful stimulus.

<tldr>

I hate this fight and want to end it once and for all. I don't expect you to take my word for it. So instead, here is an offer:

I am willing to post counterstatements, endorsed by MIRI, of any length and content[1] at the top of any of my blog posts. You can either post them in the comments below or send me an email (da [at] kruel.co).

</tldr>

I have no idea if MIRI believes this to be worthwhile. But I couldn't think of a better way to solve this dilemma in a way that everyone can live with happily. But I am open to suggestions that don't stress me too much (also about how to prove that I am trying to be honest).

You obviously don't need to read all my posts. It can also be a general statement.

I am also aware that LW and MIRI are bothered by RationalWiki. As you can easily check from the fossil record, I have at points tried to correct specific problems. But, for the reasons given above, I have problems investing the time to go through every sentence to find possible errors and attempt to correct it in such a way that the edit is not reverted and that people who feel offended are satisfied.

[1] There are obviously some caveats regarding the content, such as no nude photos of Yudkowsky ;-)

Harper's Magazine article on LW/MIRI/CFAR and Ethereum

37 gwern 12 December 2014 08:34PM

Cover title: “Power and paranoia in Silicon Valley”; article title: “Come with us if you want to live: Among the apocalyptic libertarians of Silicon Valley” (mirrors: 1, 2, 3), by Sam Frank; Harper’s Magazine, January 2015, pg26-36 (~8500 words). The beginning/ending are focused on Ethereum and Vitalik Buterin, so I'll excerpt the LW/MIRI/CFAR-focused middle:

…Blake Masters-the name was too perfect-had, obviously, dedicated himself to the command of self and universe. He did CrossFit and ate Bulletproof, a tech-world variant of the paleo diet. On his Tumblr’s About page, since rewritten, the anti-belief belief systems multiplied, hyperlinked to Wikipedia pages or to the confoundingly scholastic website Less Wrong: “Libertarian (and not convinced there’s irreconcilable fissure between deontological and consequentialist camps). Aspiring rationalist/Bayesian. Secularist/agnostic/ ignostic . . . Hayekian. As important as what we know is what we don’t. Admittedly eccentric.” Then: “Really, really excited to be in Silicon Valley right now, working on fascinating stuff with an amazing team.” I was startled that all these negative ideologies could be condensed so easily into a positive worldview. …I saw the utopianism latent in capitalism-that, as Bernard Mandeville had it three centuries ago, it is a system that manufactures public benefit from private vice. I started CrossFit and began tinkering with my diet. I browsed venal tech-trade publications, and tried and failed to read Less Wrong, which was written as if for aliens.

…I left the auditorium of Alice Tully Hall. Bleary beside the silver coffee urn in the nearly empty lobby, I was buttonholed by a man whose name tag read MICHAEL VASSAR, METAMED research. He wore a black-and-white paisley shirt and a jacket that was slightly too big for him. “What did you think of that talk?” he asked, without introducing himself. “Disorganized, wasn’t it?” A theory of everything followed. Heroes like Elon and Peter (did I have to ask? Musk and Thiel). The relative abilities of physicists and biologists, their standard deviations calculated out loud. How exactly Vassar would save the world. His left eyelid twitched, his full face winced with effort as he told me about his “personal war against the universe.” My brain hurt. I backed away and headed home. But Vassar had spoken like no one I had ever met, and after Kurzweil’s keynote the next morning, I sought him out. He continued as if uninterrupted. Among the acolytes of eternal life, Vassar was an eschatologist. “There are all of these different countdowns going on,” he said. “There’s the countdown to the broad postmodern memeplex undermining our civilization and causing everything to break down, there’s the countdown to the broad modernist memeplex destroying our environment or killing everyone in a nuclear war, and there’s the countdown to the modernist civilization learning to critique itself fully and creating an artificial intelligence that it can’t control. There are so many different - on different time-scales - ways in which the self-modifying intelligent processes that we are embedded in undermine themselves. I’m trying to figure out ways of disentangling all of that. . . .I’m not sure that what I’m trying to do is as hard as founding the Roman Empire or the Catholic Church or something. But it’s harder than people’s normal big-picture ambitions, like making a billion dollars.” Vassar was thirty-four, one year older than I was. He had gone to college at seventeen, and had worked as an actuary, as a teacher, in nanotech, and in the Peace Corps. He’d founded a music-licensing start-up called Sir Groovy. Early in 2012, he had stepped down as president of the Singularity Institute for Artificial Intelligence, now called the Machine Intelligence Research Institute (MIRI), which was created by an autodidact named Eliezer Yudkowsky, who also started Less Wrong. Vassar had left to found MetaMed, a personalized-medicine company, with Jaan Tallinn of Skype and Kazaa, $500,000 from Peter Thiel, and a staff that included young rationalists who had cut their teeth arguing on Yudkowsky’s website. The idea behind MetaMed was to apply rationality to medicine-“rationality” here defined as the ability to properly research, weight, and synthesize the flawed medical information that exists in the world. Prices ranged from $25,000 for a literature review to a few hundred thousand for a personalized study. “We can save lots and lots and lots of lives,” Vassar said (if mostly moneyed ones at first). “But it’s the signal-it’s the ‘Hey! Reason works!’-that matters. . . . It’s not really about medicine.” Our whole society was sick - root, branch, and memeplex - and rationality was the only cure. …I asked Vassar about his friend Yudkowsky. “He has worse aesthetics than I do,” he replied, “and is actually incomprehensibly smart.” We agreed to stay in touch.

One month later, I boarded a plane to San Francisco. I had spent the interim taking a second look at Less Wrong, trying to parse its lore and jargon: “scope insensitivity,” “ugh field,” “affective death spiral,” “typical mind fallacy,” “counterfactual mugging,” “Roko’s basilisk.” When I arrived at the MIRI offices in Berkeley, young men were sprawled on beanbags, surrounded by whiteboards half black with equations. I had come costumed in a Fermat’s Last Theorem T-shirt, a summary of the proof on the front and a bibliography on the back, printed for the number-theory camp I had attended at fifteen. Yudkowsky arrived late. He led me to an empty office where we sat down in mismatched chairs. He wore glasses, had a short, dark beard, and his heavy body seemed slightly alien to him. I asked what he was working on. “Should I assume that your shirt is an accurate reflection of your abilities,” he asked, “and start blabbing math at you?” Eight minutes of probability and game theory followed. Cogitating before me, he kept grimacing as if not quite in control of his face. “In the very long run, obviously, you want to solve all the problems associated with having a stable, self-improving, beneficial-slash-benevolent AI, and then you want to build one.” What happens if an artificial intelligence begins improving itself, changing its own source code, until it rapidly becomes - foom! is Yudkowsky’s preferred expression - orders of magnitude more intelligent than we are? A canonical thought experiment devised by Oxford philosopher Nick Bostrom in 2003 suggests that even a mundane, industrial sort of AI might kill us. Bostrom posited a “superintelligence whose top goal is the manufacturing of paper-clips.” For this AI, known fondly on Less Wrong as Clippy, self-improvement might entail rearranging the atoms in our bodies, and then in the universe - and so we, and everything else, end up as office supplies. Nothing so misanthropic as Skynet is required, only indifference to humanity. What is urgently needed, then, claims Yudkowsky, is an AI that shares our values and goals. This, in turn, requires a cadre of highly rational mathematicians, philosophers, and programmers to solve the problem of “friendly” AI - and, incidentally, the problem of a universal human ethics - before an indifferent, unfriendly AI escapes into the wild.

Among those who study artificial intelligence, there’s no consensus on either point: that an intelligence explosion is possible (rather than, for instance, a proliferation of weaker, more limited forms of AI) or that a heroic team of rationalists is the best defense in the event. That MIRI has as much support as it does (in 2012, the institute’s annual revenue broke $1 million for the first time) is a testament to Yudkowsky’s rhetorical ability as much as to any technical skill. Over the course of a decade, his writing, along with that of Bostrom and a handful of others, has impressed the dangers of unfriendly AI on a growing number of people in the tech world and beyond. In August, after reading Superintelligence, Bostrom’s new book, Elon Musk tweeted, “Hope we’re not just the biological boot loader for digital superintelligence. Unfortunately, that is increasingly probable.” In 2000, when Yudkowsky was twenty, he founded the Singularity Institute with the support of a few people he’d met at the Foresight Institute, a Palo Alto nanotech think tank. He had already written papers on “The Plan to Singularity” and “Coding a Transhuman AI,” and posted an autobiography on his website, since removed, called “Eliezer, the Person.” It recounted a breakdown of will when he was eleven and a half: “I can’t do anything. That’s the phrase I used then.” He dropped out before high school and taught himself a mess of evolutionary psychology and cognitive science. He began to “neuro-hack” himself, systematizing his introspection to evade his cognitive quirks. Yudkowsky believed he could hasten the singularity by twenty years, creating a superhuman intelligence and saving humankind in the process. He met Thiel at a Foresight Institute dinner in 2005 and invited him to speak at the first annual Singularity Summit. The institute’s paid staff grew. In 2006, Yudkowsky began writing a hydra-headed series of blog posts: science-fictionish parables, thought experiments, and explainers encompassing cognitive biases, self-improvement, and many-worlds quantum mechanics that funneled lay readers into his theory of friendly AI. Rationality workshops and Meetups began soon after. In 2009, the blog posts became what he called Sequences on a new website: Less Wrong. The next year, Yudkowsky began publishing Harry Potter and the Methods of Rationality at fanfiction.net. The Harry Potter category is the site’s most popular, with almost 700,000 stories; of these, HPMoR is the most reviewed and the second-most favorited. The last comment that the programmer and activist Aaron Swartz left on Reddit before his suicide in 2013 was on /r/hpmor. In Yudkowsky’s telling, Harry is not only a magician but also a scientist, and he needs just one school year to accomplish what takes canon-Harry seven. HPMoR is serialized in arcs, like a TV show, and runs to a few thousand pages when printed; the book is still unfinished. Yudkowsky and I were talking about literature, and Swartz, when a college student wandered in. Would Eliezer sign his copy of HPMoR? “But you have to, like, write something,” he said. “You have to write, ‘I am who I am.’ So, ‘I am who I am’ and then sign it.” “Alrighty,” Yudkowsky said, signed, continued. “Have you actually read Methods of Rationality at all?” he asked me. “I take it not.” (I’d been found out.) “I don’t know what sort of a deadline you’re on, but you might consider taking a look at that.” (I had taken a look, and hated the little I’d managed.) “It has a legendary nerd-sniping effect on some people, so be warned. That is, it causes you to read it for sixty hours straight.”

The nerd-sniping effect is real enough. Of the 1,636 people who responded to a 2013 survey of Less Wrong’s readers, one quarter had found the site thanks to HPMoR, and many more had read the book. Their average age was 27.4, their average IQ 138.2. Men made up 88.8% of respondents; 78.7% were straight, 1.5% transgender, 54.7 % American, 89.3% atheist or agnostic. The catastrophes they thought most likely to wipe out at least 90% of humanity before the year 2100 were, in descending order, pandemic (bioengineered), environmental collapse, unfriendly AI, nuclear war, pandemic (natural), economic/political collapse, asteroid, nanotech/gray goo. Forty-two people, 2.6 %, called themselves futarchists, after an idea from Robin Hanson, an economist and Yudkowsky’s former coblogger, for reengineering democracy into a set of prediction markets in which speculators can bet on the best policies. Forty people called themselves reactionaries, a grab bag of former libertarians, ethno-nationalists, Social Darwinists, scientific racists, patriarchists, pickup artists, and atavistic “traditionalists,” who Internet-argue about antidemocratic futures, plumping variously for fascism or monarchism or corporatism or rule by an all-powerful, gold-seeking alien named Fnargl who will free the markets and stabilize everything else. At the bottom of each year’s list are suggestive statistical irrelevancies: “every optimizing system’s a dictator and i’m not sure which one i want in charge,” “Autocracy (important: myself as autocrat),” “Bayesian (aspiring) Rationalist. Technocratic. Human-centric Extropian Coherent Extrapolated Volition.” “Bayesian” refers to Bayes’s Theorem, a mathematical formula that describes uncertainty in probabilistic terms, telling you how much to update your beliefs when given new information. This is a formalization and calibration of the way we operate naturally, but “Bayesian” has a special status in the rationalist community because it’s the least imperfect way to think. “Extropy,” the antonym of “entropy,” is a decades-old doctrine of continuous human improvement, and “coherent extrapolated volition” is one of Yudkowsky’s pet concepts for friendly artificial intelligence. Rather than our having to solve moral philosophy in order to arrive at a complete human goal structure, C.E.V. would computationally simulate eons of moral progress, like some kind of Whiggish Pangloss machine. As Yudkowsky wrote in 2004, “In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together.” Yet can even a single human’s volition cohere or compute in this way, let alone humanity’s? We stood up to leave the room. Yudkowsky stopped me and said I might want to turn my recorder on again; he had a final thought. “We’re part of the continuation of the Enlightenment, the Old Enlightenment. This is the New Enlightenment,” he said. “Old project’s finished. We actually have science now, now we have the next part of the Enlightenment project.”

In 2013, the Singularity Institute changed its name to the Machine Intelligence Research Institute. Whereas MIRI aims to ensure human-friendly artificial intelligence, an associated program, the Center for Applied Rationality, helps humans optimize their own minds, in accordance with Bayes’s Theorem. The day after I met Yudkowsky, I returned to Berkeley for one of CFAR’s long-weekend workshops. The color scheme at the Rose Garden Inn was red and green, and everything was brocaded. The attendees were mostly in their twenties: mathematicians, software engineers, quants, a scientist studying soot, employees of Google and Facebook, an eighteen-year-old Thiel Fellow who’d been paid $100,000 to leave Boston College and start a company, professional atheists, a Mormon turned atheist, an atheist turned Catholic, an Objectivist who was photographed at the premiere of Atlas Shrugged II: The Strike. There were about three men for every woman. At the Friday-night meet and greet, I talked with Benja, a German who was studying math and behavioral biology at the University of Bristol, whom I had spotted at MIRI the day before. He was in his early thirties and quite tall, with bad posture and a ponytail past his shoulders. He wore socks with sandals, and worried a paper cup as we talked. Benja had felt death was terrible since he was a small child, and wanted his aging parents to sign up for cryonics, if he could figure out how to pay for it on a grad-student stipend. He was unsure about the risks from unfriendly AI - “There is a part of my brain,” he said, “that sort of goes, like, ‘This is crazy talk; that’s not going to happen’” - but the probabilities had persuaded him. He said there was only about a 30% chance that we could make it another century without an intelligence explosion. He was at CFAR to stop procrastinating. Julia Galef, CFAR’s president and cofounder, began a session on Saturday morning with the first of many brain-as-computer metaphors. We are “running rationality on human hardware,” she said, not supercomputers, so the goal was to become incrementally more self-reflective and Bayesian: not perfectly rational agents, but “agent-y.” The workshop’s classes lasted six or so hours a day; activities and conversations went well into the night. We got a condensed treatment of contemporary neuroscience that focused on hacking our brains’ various systems and modules, and attended sessions on habit training, urge propagation, and delegating to future selves. We heard a lot about Daniel Kahneman, the Nobel Prize-winning psychologist whose work on cognitive heuristics and biases demonstrated many of the ways we are irrational. Geoff Anders, the founder of Leverage Research, a “meta-level nonprofit” funded by Thiel, taught a class on goal factoring, a process of introspection that, after many tens of hours, maps out every one of your goals down to root-level motivations-the unchangeable “intrinsic goods,” around which you can rebuild your life. Goal factoring is an application of Connection Theory, Anders’s model of human psychology, which he developed as a Rutgers philosophy student disserting on Descartes, and Connection Theory is just the start of a universal renovation. Leverage Research has a master plan that, in the most recent public version, consists of nearly 300 steps. It begins from first principles and scales up from there: “Initiate a philosophical investigation of philosophical method”; “Discover a sufficiently good philosophical method”; have 2,000-plus “actively and stably benevolent people successfully seek enough power to be able to stably guide the world”; “People achieve their ultimate goals as far as possible without harming others”; “We have an optimal world”; “Done.” On Saturday night, Anders left the Rose Garden Inn early to supervise a polyphasic-sleep experiment that some Leverage staff members were conducting on themselves. It was a schedule called the Everyman 3, which compresses sleep into three twenty-minute REM naps each day and three hours at night for slow-wave. Anders was already polyphasic himself. Operating by the lights of his own best practices, goal-factored, coherent, and connected, he was able to work 105 hours a week on world optimization. For the rest of us, for me, these were distant aspirations. We were nerdy and unperfected. There was intense discussion at every free moment, and a genuine interest in new ideas, if especially in testable, verifiable ones. There was joy in meeting peers after years of isolation. CFAR was also insular, overhygienic, and witheringly focused on productivity. Almost everyone found politics to be tribal and viscerally upsetting. Discussions quickly turned back to philosophy and math. By Monday afternoon, things were wrapping up. Andrew Critch, a CFAR cofounder, gave a final speech in the lounge: “Remember how you got started on this path. Think about what was the time for you when you first asked yourself, ‘How do I work?’ and ‘How do I want to work?’ and ‘What can I do about that?’ . . . Think about how many people throughout history could have had that moment and not been able to do anything about it because they didn’t know the stuff we do now. I find this very upsetting to think about. It could have been really hard. A lot harder.” He was crying. “I kind of want to be grateful that we’re now, and we can share this knowledge and stand on the shoulders of giants like Daniel Kahneman . . . I just want to be grateful for that. . . . And because of those giants, the kinds of conversations we can have here now, with, like, psychology and, like, algorithms in the same paragraph, to me it feels like a new frontier. . . . Be explorers; take advantage of this vast new landscape that’s been opened up to us in this time and this place; and bear the torch of applied rationality like brave explorers. And then, like, keep in touch by email.” The workshop attendees put giant Post-its on the walls expressing the lessons they hoped to take with them. A blue one read RATIONALITY IS SYSTEMATIZED WINNING. Above it, in pink: THERE ARE OTHER PEOPLE WHO THINK LIKE ME. I AM NOT ALONE.

That night, there was a party. Alumni were invited. Networking was encouraged. Post-its proliferated; one, by the beer cooler, read SLIGHTLY ADDICTIVE. SLIGHTLY MIND-ALTERING. Another, a few feet to the right, over a double stack of bound copies of Harry Potter and the Methods of Rationality: VERY ADDICTIVE. VERY MIND-ALTERING. I talked to one of my roommates, a Google scientist who worked on neural nets. The CFAR workshop was just a whim to him, a tourist weekend. “They’re the nicest people you’d ever meet,” he said, but then he qualified the compliment. “Look around. If they were effective, rational people, would they be here? Something a little weird, no?” I walked outside for air. Michael Vassar, in a clinging red sweater, was talking to an actuary from Florida. They discussed timeless decision theory (approximately: intelligent agents should make decisions on the basis of the futures, or possible worlds, that they predict their decisions will create) and the simulation argument (essentially: we’re living in one), which Vassar traced to Schopenhauer. He recited lines from Kipling’s “If-” in no particular order and advised the actuary on how to change his life: Become a pro poker player with the $100k he had in the bank, then hit the Magic: The Gathering pro circuit; make more money; develop more rationality skills; launch the first Costco in Northern Europe. I asked Vassar what was happening at MetaMed. He told me that he was raising money, and was in discussions with a big HMO. He wanted to show up Peter Thiel for not investing more than $500,000. “I’m basically hoping that I can run the largest convertible-debt offering in the history of finance, and I think it’s kind of reasonable,” he said. “I like Peter. I just would like him to notice that he made a mistake . . . I imagine a hundred million or a billion will cause him to notice . . . I’d like to have a pi-billion-dollar valuation.” I wondered whether Vassar was drunk. He was about to drive one of his coworkers, a young woman named Alyssa, home, and he asked whether I would join them. I sat silently in the back of his musty BMW as they talked about potential investors and hires. Vassar almost ran a red light. After Alyssa got out, I rode shotgun, and we headed back to the hotel.

It was getting late. I asked him about the rationalist community. Were they really going to save the world? From what? “Imagine there is a set of skills,” he said. “There is a myth that they are possessed by the whole population, and there is a cynical myth that they’re possessed by 10% of the population. They’ve actually been wiped out in all but about one person in three thousand.” It is important, Vassar said, that his people, “the fragments of the world,” lead the way during “the fairly predictable, fairly total cultural transition that will predictably take place between 2020 and 2035 or so.” We pulled up outside the Rose Garden Inn. He continued: “You have these weird phenomena like Occupy where people are protesting with no goals, no theory of how the world is, around which they can structure a protest. Basically this incredibly, weirdly, thoroughly disempowered group of people will have to inherit the power of the world anyway, because sooner or later everyone older is going to be too old and too technologically obsolete and too bankrupt. The old institutions may largely break down or they may be handed over, but either way they can’t just freeze. These people are going to be in charge, and it would be helpful if they, as they come into their own, crystallize an identity that contains certain cultural strengths like argument and reason.” I didn’t argue with him, except to press, gently, on his particular form of elitism. His rationalism seemed so limited to me, so incomplete. “It is unfortunate,” he said, “that we are in a situation where our cultural heritage is possessed only by people who are extremely unappealing to most of the population.” That hadn’t been what I’d meant. I had meant rationalism as itself a failure of the imagination. “The current ecosystem is so totally fucked up,” Vassar said. “But if you have conversations here”-he gestured at the hotel-“people change their mind and learn and update and change their behaviors in response to the things they say and learn. That never happens anywhere else.” In a hallway of the Rose Garden Inn, a former high-frequency trader started arguing with Vassar and Anna Salamon, CFAR’s executive director, about whether people optimize for hedons or utilons or neither, about mountain climbers and other high-end masochists, about whether world happiness is currently net positive or negative, increasing or decreasing. Vassar was eating and drinking everything within reach. My recording ends with someone saying, “I just heard ‘hedons’ and then was going to ask whether anyone wants to get high,” and Vassar replying, “Ah, that’s a good point.” Other voices: “When in California . . .” “We are in California, yes.”

…Back on the East Coast, summer turned into fall, and I took another shot at reading Yudkowsky’s Harry Potter fanfic. It’s not what I would call a novel, exactly, rather an unending, self-satisfied parable about rationality and trans-humanism, with jokes.

…I flew back to San Francisco, and my friend Courtney and I drove to a cul-de-sac in Atherton, at the end of which sat the promised mansion. It had been repurposed as cohousing for children who were trying to build the future: start-up founders, singularitarians, a teenage venture capitalist. The woman who coined the term “open source” was there, along with a Less Wronger and Thiel Capital employee who had renamed himself Eden. The Day of the Idealist was a day for self-actualization and networking, like the CFAR workshop without the rigor. We were to set “mega goals” and pick a “core good” to build on in the coming year. Everyone was a capitalist; everyone was postpolitical. I squabbled with a young man in a Tesla jacket about anti-Google activism. No one has a right to housing, he said; programmers are the people who matter; the protesters’ antagonistic tactics had totally discredited them.

…Thiel and Vassar and Yudkowsky, for all their far-out rhetoric, take it on faith that corporate capitalism, unchecked just a little longer, will bring about this era of widespread abundance. Progress, Thiel thinks, is threatened mostly by the political power of what he calls the “unthinking demos.”


Pointer thanks to /u/Vulture.

My experience of the recent CFAR workshop

29 Kaj_Sotala 27 November 2014 04:17PM

Originally posted at my blog.

---

I just got home from a four-day rationality workshop in England that was organized by the Center For Applied Rationality (CFAR). It covered a lot of content, but if I had to choose a single theme that united most of it, it was listening to your emotions.

That might sound like a weird focus for a rationality workshop, but cognitive science has shown that the intuitive and emotional part of the mind (”System 1”) is both in charge of most of our behavior, and also carries out a great deal of valuable information-processing of its own (it’s great at pattern-matching, for example). Much of the workshop material was aimed at helping people reach a greater harmony between their System 1 and their verbal, logical System 2. Many of people’s motivational troubles come from the goals of their two systems being somehow at odds with each other, and we were taught to have our two systems have a better dialogue with each other, harmonizing their desires and making it easier for information to cross from one system to the other and back.

To give a more concrete example, there was the technique of goal factoring. You take a behavior that you often do but aren’t sure why, or which you feel might be wasted time. Suppose that you spend a lot of time answering e-mails that aren’t actually very important. You start by asking yourself: what’s good about this activity, that makes me do it? Then you try to listen to your feelings in response to that question, and write down what you perceive. Maybe you conclude that it makes you feel productive, and it gives you a break from tasks that require more energy to do.

Next you look at the things that you came up with, and consider whether there’s a better way to accomplish them. There are two possible outcomes here. Either you conclude that the behavior is an important and valuable one after all, meaning that you can now be more motivated to do it. Alternatively, you find that there would be better ways of accomplishing all the goals that the behavior was aiming for. Maybe taking a walk would make for a better break, and answering more urgent e-mails would provide more value. If you were previously using two hours per day on the unimportant e-mails, possibly you could now achieve more in terms of both relaxation and actual productivity by spending an hour on a walk and an hour on the important e-mails.

At this point, you consider your new plan, and again ask yourself: does this feel right? Is this motivating? Are there any slight pangs of regret about giving up my old behavior? If you still don’t want to shift your behavior, chances are that you still have some motive for doing this thing that you have missed, and the feelings of productivity and relaxation aren’t quite enough to cover it. In that case, go back to the step of listing motives.

Or, if you feel happy and content about the new direction that you’ve chosen, victory!

Notice how this technique is all about moving information from one system to another. System 2 notices that you’re doing something but it isn’t sure why that is, so it asks System 1 for the reasons. System 1 answers, ”here’s what I’m trying to do for us, what do you think?” Then System 2 does what it’s best at, taking an analytic approach and possibly coming up with better ways of achieving the different motives. Then it gives that alternative approach back to System 1 and asks, would this work? Would this give us everything that we want? If System 1 says no, System 2 gets back to work, and the dialogue continues until both are happy.

Again, I emphasize the collaborative aspect between the two systems. They’re allies working for common goals, not enemies. Too many people tend towards one of two extremes: either thinking that their emotions are stupid and something to suppress, or completely disdaining the use of logical analysis. Both extremes miss out on the strengths of the system that is neglected, and make it unlikely for the person to get everything that they want.

As I was heading back from the workshop, I considered doing something that I noticed feeling uncomfortable about. Previous meditation experience had already made me more likely to just attend to the discomfort rather than trying to push it away, but inspired by the workshop, I went a bit further. I took the discomfort, considered what my System 1 might be trying to warn me about, and concluded that it might be better to err on the side of caution this time around. Finally – and this wasn’t a thing from the workshop, it was something I invited on the spot – I summoned a feeling of gratitude and thanked my System 1 for having been alert and giving me the information. That might have been a little overblown, since neither system should actually be sentient by itself, but it still felt like a good mindset to cultivate.

Although it was never mentioned in the workshop, what comes to mind is the concept of wu-wei from Chinese philosophy, a state of ”effortless doing” where all of your desires are perfectly aligned and everything comes naturally. In the ideal form, you never need to force yourself to do something you don’t want to do, or to expend willpower on an unpleasant task. Either you want to do something and do, or don’t want to do it, and don’t.

A large number of the workshop’s classes – goal factoring, aversion factoring and calibration, urge propagation, comfort zone expansion, inner simulation, making hard decisions, Hamming questions, againstness – were aimed at more or less this. Find out what System 1 wants, find out what System 2 wants, dialogue, aim for a harmonious state between the two. Then there were a smaller number of other classes that might be summarized as being about problem-solving in general.

The classes about the different techniques were interspersed with ”debugging sessions” of various kinds. In the beginning of the workshop, we listed different bugs in our lives – anything about our lives that we weren’t happy with, with the suggested example bugs being things like ”every time I talk to so-and-so I end up in an argument”, ”I think that I ‘should’ do something but don’t really want to”, and ”I’m working on my dissertation and everything is going fine – but when people ask me why I’m doing a PhD, I have a hard time remembering why I wanted to”. After we’d had a class or a few, we’d apply the techniques we’d learned to solving those bugs, either individually, in pairs, or small groups with a staff member or volunteer TA assisting us. Then a few more classes on techniques and more debugging, classes and debugging, and so on.

The debugging sessions were interesting. Often when you ask someone for help on something, they will answer with direct object-level suggestions – if your problem is that you’re underweight and you would like to gain some weight, try this or that. Here, the staff and TAs would eventually get to the object-level advice as well, but first they would ask – why don’t you want to be underweight? Okay, you say that you’re not completely sure but based on the other things that you said, here’s a stupid and quite certainly wrong theory of what your underlying reasons for it might be, how does that theory feel like? Okay, you said that it’s mostly on the right track, so now tell me what’s wrong with it? If you feel that gaining weight would make you more attractive, do you feel that this is the most effective way of achieving that?

Only after you and the facilitator had reached some kind of consensus of why you thought that something was a bug, and made sure that the problem you were discussing was actually the best way to address to reasons, would it be time for the more direct advice.

At first, I had felt that I didn’t have very many bugs to address, and that I had mostly gotten reasonable advice for them that I might try. But then the workshop continued, and there were more debugging sessions, and I had to keep coming up with bugs. And then, under the gentle poking of others, I started finding the underlying, deep-seated problems, and some things that had been motivating my actions for the last several months without me always fully realizing it. At the end, when I looked at my initial list of bugs that I’d come up with in the beginning, most of the first items on the list looked hopelessly shallow compared to the later ones.

Often in life you feel that your problems are silly, and that you are affected by small stupid things that ”shouldn’t” be a problem. There was none of that at the workshop: it was tacitly acknowledged that being unreasonably hindered by ”stupid” problems is just something that brains tend to do.  Valentine, one of the staff members, gave a powerful speech about ”alienated birthrights” – things that all human beings should be capable of engaging in and enjoying, but which have been taken from people because they have internalized beliefs and identities that say things like ”I cannot do that” or ”I am bad at that”. Things like singing, dancing, athletics, mathematics, romantic relationships, actually understanding the world, heroism, tackling challenging problems. To use his analogy, we might not be good at these things at first, and may have to grow into them and master them the way that a toddler grows to master her body. And like a toddler who’s taking her early steps, we may flail around and look silly when we first start doing them, but these are capacities that – barring any actual disabilities – are a part of our birthright as human beings, which anyone can ultimately learn to master.

Then there were the people, and the general atmosphere of the workshop. People were intelligent, open, and motivated to work on their problems, help each other, and grow as human beings. After a long, cognitively and emotionally exhausting day at the workshop, people would then shift to entertainment ranging from wrestling to telling funny stories of their lives to Magic: the Gathering. (The game of ”bunny” was an actual scheduled event on the official agenda.) And just plain talk with each other, in a supportive, non-judgemental atmosphere. It was the people and the atmosphere that made me the most reluctant to leave, and I miss them already.

Would I recommend CFAR’s workshops to others? Although my above description may sound rather gushingly positive, my answer still needs to be a qualified ”mmmaybe”. The full price tag is quite hefty, though financial aid is available and I personally got a very substantial scholarship, with the agreement that I would pay it at a later time when I could actually afford it.

Still, the biggest question is, will the changes from the workshop stick? I feel like I have gained a valuable new perspective on emotions, a number of useful techniques, made new friends, strengthened my belief that I can do the things that I really set my mind on, and refined the ways by which I think of the world and any problems that I might have – but aside for the new friends, all of that will be worthless if it fades away in a week. If it does, I would have to judge even my steeply discounted price as ”not worth it”. That said, the workshops do have a money-back guarantee if you’re unhappy with the results, so if it really feels like it wasn’t worth it, I can simply choose to not pay. And if all the new things do end up sticking, it might still turn out that it would have been worth paying even the full, non-discounted price.

CFAR does have a few ways by which they try to make the things stick. There will be Skype follow-ups with their staff, for talking about how things have been going since the workshop. There is a mailing list for workshop alumni, and the occasional events, though the physical events are very US-centric (and in particular, San Francisco Bay Area-centric).

The techniques that we were taught are still all more or less experimental, and are being constantly refined and revised according to people’s experiences. I have already been thinking of a new skill that I had been playing with for a while before the workshop, and which has a bit of that ”CFAR feel” – I will aim to have it written up soon and sent to the others, and maybe it will eventually make its way to the curriculum of a future workshop. That should help keep me engaged as well.

We shall see. Until then, as they say in CFAR – to victory!

When should an Effective Altruist be vegetarian?

27 KatjaGrace 23 November 2014 05:25AM

Crossposted from Meteuphoric

I have lately noticed several people wondering why more Effective Altruists are not vegetarians. I am personally not a vegetarian because I don't think it is an effective way to be altruistic.

As far as I can tell the fact that many EAs are not vegetarians is surprising to some because they think 'animals are probably morally relevant' basically implies 'we shouldn't eat animals'. To my ear, this sounds about as absurd as if Givewell's explanation of their recommendation of SCI stopped after 'the developing world exists, or at least has a high probability of doing so'.

(By the way, I do get to a calculation at the bottom, after some speculation about why the calculation I think is appropriate is unlike what I take others' implicit calculations to be. Feel free to just scroll down and look at it).

I think this fairly large difference between my and many vegetarians' guesses at the value of vegetarianism arises because they think the relevant question is whether the suffering to the animal is worse than the pleasure to themselves at eating the animal. This question sounds superficially plausibly relevant, but I think on closer consideration you will agree that it is the wrong question.

The real question is not whether the cost to you is small, but whether you could do more good for the same small cost.

Similarly, when deciding whether to donate $5 to a random charity, the question is whether you could do more good by donating the money to the most effective charity you know of. Going vegetarian because it relieves the animals more than it hurts you is the equivalent of donating to a random developing world charity because it relieves the suffering of an impoverished child more than foregoing $5 increases your suffering.

Trading with inconvenience and displeasure

My imaginary vegetarian debate partner objects to this on grounds that vegetarianism is different from donating to ineffective charities, because to be a vegetarian you are spending effort and enjoying your life less rather than spending money, and you can't really reallocate that inconvenience and displeasure to, say, preventing artificial intelligence disaster or feeding the hungry, if don't use it on reading food labels and eating tofu. If I were to go ahead and eat the sausage instead - the concern goes - probably I would just go on with the rest of my life exactly the same, and a bunch of farm animals somewhere would be the worse for it, and I scarcely better.

I agree that if the meat eating decision were separated from everything else in this way, then the decision really would be about your welfare vs. the animal's welfare, and you should probably eat the tofu.

However whether you can trade being vegetarian for more effective sacrifices is largely a question of whether you choose to do so. And if vegetarianism is not the most effective way to inconvenience yourself, then it is clear that you should choose to do so. If you eat meat now in exchange for suffering some more effective annoyance at another time, you and the world can be better off.

Imagine an EA friend says to you that she gives substantial money to whatever random charity has put a tin in whatever shop she is in, because it's better than the donuts and new dresses she would buy otherwise. She doesn't see how not giving the money to the random charity would really cause her to give it to a better charity - empirically she would spend it on luxuries. What do you say to this?

If she were my friend, I might point out that the money isn't meant to magically move somewhere better - she may have to consciously direct it there. She might need to write down how much she was going to give to the random charity, then look at the note later for instance. Or she might do well to decide once and for all how much to give to charity and how much to spend on herself, and then stick to that. As an aside, I might also feel that she was using the term 'Effective Altruist' kind of broadly.

I see vegetarianism for the sake of not managing to trade inconveniences as quite similar. And in both cases you risk spending your life doing suboptimal things every time a suboptimal altruistic opportunity has a chance to steal resources from what would be your personal purse. This seems like something that your personal and altruistic values should cooperate in avoiding.

It is likely too expensive to keep track of an elaborate trading system, but you should at least be able to make reasonable long term arrangements. For instance, if instead of eating vegetarian you ate a bit frugally and saved and donated a few dollars per meal, you would probably do more good (see calculations lower in this post). So if frugal eating were similarly annoying, it would be better. Eating frugally is inconvenient in very similar ways to vegetarianism, so is a particularly plausible trade if you are skeptical that such trades can be made. I claim you could make very different trades though, for instance foregoing the pleasure of an extra five minute's break and working instead sometimes. Or you could decide once and for all how much annoyance to have, and then choose most worthwhile bits of annoyance, or put a dollar value on your own time and suffering and try to be consistent.

Nebulous life-worsening costs of vegetarianism

There is a separate psychological question which is often mixed up with the above issue. That is, whether making your life marginally less gratifying and more annoying in small ways will make you sufficiently less productive to undermine the good done by your sacrifice. This is not about whether you will do something a bit costly another time for the sake of altruism, but whether just spending your attention and happiness on vegetarianism will harm your other efforts to do good, and cause more harm than good.

I find this plausible in many cases, but I expect it to vary a lot by person. My mother seems to think it's basically free to eat supplements, whereas to me every additional daily routine seems to encumber my life and require me to spend disproportionately more time thinking about unimportant things. Some people find it hard to concentrate when unhappy, others don't. Some people struggle to feed themselves adequately at all, while others actively enjoy preparing food.

There are offsetting positives from vegetarianism which also vary across people. For instance there is the pleasure of self-sacrifice, the joy of being part of a proud and moralizing minority, and the absence of the horror of eating other beings. There are also perhaps health benefits, which probably don't vary that much by people, but people do vary in how big they think the health benefits are.

Another  way you might accidentally lose more value than you save is in spending little bits of time which are hard to measure or notice. For instance, vegetarianism means spending a bit more time searching for vegetarian alternatives, researching nutrition, buying supplements, writing emails back to people who invite you to dinner explaining your dietary restrictions, etc. The value of different people's time varies a lot, as does the extent to which an additional vegetarianism routine would tend to eat their time.

On a less psychological note, the potential drop in IQ (~5 points?!) from missing out on creatine is a particularly terrible example of vegetarianism making people less productive. Now that we know about creatine and can supplement it, creatine itself is not such an issue. An issue does remain though: is this an unlikely one-off failure, or should we worry about more such deficiency? (this goes for any kind of unusual diet, not just meat-free ones).

How much is avoiding meat worth?

Here is my own calculation of how much it costs to do the same amount of good as replacing one meat meal with one vegetarian meal. If you would be willing to pay this much extra to eat meat for one meal, then you should eat meat. If not, then you should abstain. For instance, if eating meat does $10 worth of harm, you should eat meat whenever you would hypothetically pay an extra $10 for the privilege.

This is a tentative calculation. I will probably update it if people offer substantially better numbers.

All quantities are in terms of social harm.

Eating 1 non-vegetarian meal

< eating 1 chickeny meal (I am told chickens are particularly bad animals to eat, due to their poor living conditions and large animal:meal ratio. The relatively small size of their brains might offset this, but I will conservatively give all animals the moral weight of humans in this calculation.)

< eating 200 calories of chicken (a McDonalds crispy chicken sandwich probably contains a bit over 100 calories of chicken (based on its listed protein content); a Chipotle chicken burrito contains around 180 calories of chicken)

= causing ~0.25 chicken lives (1 chicken is equivalent in price to 800 calories of chicken breast i.e. eating an additional 800 calories of chicken breast conservatively results in one additional chicken. Calculations from data here and here.)

< -$0.08 given to the Humane League (ACE estimates the Humane League spares 3.4 animal lives per dollar). However since the humane league basically convinces other people to be vegetarians, this may be hypocritical or otherwise dubious.

< causing 12.5 days of chicken life (broiler chickens are slaughtered at between 35-49 days of age)

= causing 12.5 days of chicken suffering (I'm being generous)

-$0.50 subsidizing free range eggs,  (This is a somewhat random example of the cost of more systematic efforts to improve animal welfare, rather than necessarily the best. The cost here is the cost of buying free range eggs and selling them as non-free range eggs. It costs about 2.6 2004 Euro cents [= US 4c in 2014] to pay for an egg to be free range instead of produced in a battery. This corresponds to a bit over one day of chicken life. I'm assuming here that the life of a battery egg-laying chicken is not substantially better than that of a meat chicken, and that free range chickens have lives that are at least neutral. If they are positive, the figure becomes even more favorable to the free range eggs).

< losing 12.5 days of high quality human life (assuming saving one year of human life is at least as good as stopping one year of an animal suffering, which you may disagree with.)

= -$1.94-5.49 spent on GiveWell's top charities (This was GiveWell's estimate for AMF if we assume saving a life corresponds to saving 52 years - roughly the life expectancy of children in Malawi. GiveWell doesn't recommend AMF at the moment, but they recommend charities they considered comparable to AMF when AMF had this value.

GiveWell employees' median estimate for the cost of 'saving a life' through donating to SCI is $5936 [see spreadsheet here]. If we suppose a life  is 37 DALYs, as they assume in the spreadsheet, then 12.5 days is worth 5936*12.5/37*365.25 = $5.49. Elie produced two estimates that were generous to cash and to deworming separately, and gave the highest and lowest estimates for the cost-effectiveness of deworming, of the group. They imply a range of $1.40-$45.98 to do as much good via SCI as eating vegetarian for a meal).

Given this calculation, we get a few cents to a couple of dollars as the cost of doing similar amounts of good to averting a meat meal via other means. We are not finished yet though - there were many factors I didn't take into account in the calculation, because I wanted to separate relatively straightforward facts for which I have good evidence from guesses. Here are other considerations I can think of, which reduce the relative value of averting meat eating:

  1. Chicken brains are fairly small, suggesting their internal experience is less than that of humans. More generally, in the spectrum of entities between humans and microbes, chickens are at least some of the way to microbes. And you wouldn't pay much to save a microbe.
  2. Eating a chicken only reduces the number of chicken produced by some fraction. According to Peter Hurford, an extra 0.3 chickens are produced if you demand 1 chicken. I didn't include this in the above calculation because I am not sure of the time scale of the relevant elasticities (if they are short-run elasticities, they might underestimate the effect of vegetarianism).
  3. Vegetable production may also have negative effects on animals.
  4. Givewell estimates have been rigorously checked relative to other things, and evaluations tend to get worse as you check them. For instance, you might forget to include any of the things in this list in your evaluation of vegetarianism. Probably there are more things I forgot. That is, if you looked into vegetarianism with the same detail as SCI, it would become more pessimistic, and so cheaper to do as much good with SCI.
  5. It is not at all obvious that meat animal lives are not worth living on average. Relatedly, animals generally want to be alive, which we might want to give some weight to.
  6. Animal welfare in general appears to have negligible predictable effect on the future (very debatably), and there are probably things which can have huge impact on the future. This would make animal altruism worse compared to present-day human interventions, and much worse compared to interventions directed at affecting the far future, such as averting existential risk.

My own quick guesses at factors by which the relative value of avoiding meat should be multiplied, to account for these considerations:

  1. Moral value of small animals: 0.05
  2. Raised price reduces others' consumption: 0.5
  3. Vegetables harm animals too: 0.9
  4. Rigorous estimates look worse: 0.9
  5. Animal lives might be worth living: 0.2
  6. Animals don't affect the future: 0.1 relative to human poverty charities

Thus given my estimates, we scale down the above figures by 0.05*0.5*0.9*0.9*0.2*0.1 =0.0004. This gives us $0.0008-$0.002 to do as much good as eating a vegetarian meal by spending on GiveWell's top charities. Without the factor for the future (which doesn't apply to these other animal charities), we only multiply the cost of eating a meat meal by 0.004. This gives us a price of $0.0003 with the Humane League, or $0.002 on improving chicken welfare in other ways. These are not price differences that will change my meal choices very often! I think I would often be willing to pay at least a couple of extra dollars to eat meat, setting aside animal suffering. So if I were to avoid eating meat, then assuming I keep fixed how much of my budget I spend on myself and how much I spend on altruism, I would be trading a couple of dollars of value for less than one thousandth of that.

I encourage you to estimate your own numbers for the above factors, and to recalculate the overall price according to your beliefs. If you would happily pay this much (in my case, less than $0.002) to eat meat on many occasions, you probably shouldn't be a vegetarian. You are better off paying that cost elsewhere. If you would rarely be willing to pay the calculated price, you should perhaps consider being a vegetarian, though note that the calculation was conservative in favor of vegetarianism, so you might want to run it again more carefully. Note that in judging what you would be willing to pay to eat meat, you should take into account everything except the direct cost to animals.

There are many common reasons you might not be willing to eat meat, given these calculations, e.g.:

  • You don't enjoy eating meat
  • You think meat is pretty unhealthy
  • You belong to a social cluster of vegetarians, and don't like conflict
  • You think convincing enough others to be vegetarians is the most cost-effective way to make the world better, and being a vegetarian is a great way to have heaps of conversations about vegetarianism, which you believe makes people feel better about vegetarians overall, to the extent that they are frequently compelled to become vegetarians.
  • 'For signaling' is another common explanation I have heard, which I think is meant to be similar to the above, though I'm not actually sure of the details.
  • You aren't able to treat costs like these as fungible (as discussed above)
  • You are completely indifferent to what you eat (in that case, you would probably do better eating as cheaply as possible, but maybe everything is the same price)
  •  You consider the act-omission distinction morally relevant
  • You are very skeptical of the ability to affect anything, and in particular have substantially greater confidence in the market - to farm some fraction of a pig fewer in expectation if you abstain from pork for long enough - than in nonprofits and complicated schemes. (Though in that case, consider buying free-range eggs and selling them as cage eggs).
  • You think the suffering of animals is of extreme importance compared to the suffering of humans or loss of human lives, and don't trust the figures I have given for improving the lives of egg-laying chickens, and don't want to be a hypocrite. Actually, you still probably shouldn't here - the egg-laying chicken number is just an example of a plausible alternative way to help animals. You should really check quite a few of these before settling.

However I think for wannabe effective altruists with the usual array of characteristics, vegetarianism is likely to be quite ineffective.

Entropy and Temperature

26 spxtr 17 December 2014 08:04AM

Eliezer Yudkowsky previously wrote (6 years ago!) about the second law of thermodynamics. Many commenters were skeptical about the statement, "if you know the positions and momenta of every particle in a glass of water, it is at absolute zero temperature," because they don't know what temperature is. This is a common confusion.

Entropy

To specify the precise state of a classical system, you need to know its location in phase space. For a bunch of helium atoms whizzing around in a box, phase space is the position and momentum of each helium atom. For N atoms in the box, that means 6N numbers to completely specify the system.

Lets say you know the total energy of the gas, but nothing else. It will be the case that a fantastically huge number of points in phase space will be consistent with that energy.* In the absence of any more information it is correct to assign a uniform distribution to this region of phase space. The entropy of a uniform distribution is the logarithm of the number of points, so that's that. If you also know the volume, then the number of points in phase space consistent with both the energy and volume is necessarily smaller, so the entropy is smaller.

This might be confusing to chemists, since they memorized a formula for the entropy of an ideal gas, and it's ostensibly objective. Someone with perfect knowledge of the system will calculate the same number on the right side of that equation, but to them, that number isn't the entropy. It's the entropy of the gas if you know nothing more than energy, volume, and number of particles.

Temperature

The existence of temperature follows from the zeroth and second laws of thermodynamics: thermal equilibrium is transitive, and entropy is maximum in equilibrium. Temperature is then defined as the thermodynamic quantity that is the shared by systems in equilibrium.

If two systems are in equilibrium then they cannot increase entropy by flowing energy from one to the other. That means that if we flow a tiny bit of energy from one to the other (δU1 = -δU2), the entropy change in the first must be the opposite of the entropy change of the second (δS1 = -δS2), so that the total entropy (S1 + S2) doesn't change. For systems in equilibrium, this leads to (∂S1/∂U1) = (∂S2/∂U2). Define 1/T = (∂S/∂U), and we are done.

Temperature is sometimes taught as, "a measure of the average kinetic energy of the particles," because for an ideal gas U/= (3/2) kBT. This is wrong, for the same reason that the ideal gas entropy isn't the definition of entropy.

Probability is in the mind. Entropy is a function of probabilities, so entropy is in the mind. Temperature is a derivative of entropy, so temperature is in the mind.

Second Law Trickery

With perfect knowledge of a system, it is possible to extract all of its energy as work. EY states it clearly:

So (again ignoring quantum effects for the moment), if you know the states of all the molecules in a glass of hot water, it is cold in a genuinely thermodynamic sense: you can take electricity out of it and leave behind an ice cube.

Someone who doesn't know the state of the water will observe a violation of the second law. This is allowed. Let that sink in for a minute. Jaynes calls it second law trickery, and I can't explain it better than he does, so I won't try:

A physical system always has more macroscopic degrees of freedom beyond what we control or observe, and by manipulating them a trickster can always make us see an apparent violation of the second law.

Therefore the correct statement of the second law is not that an entropy decrease is impossible in principle, or even improbable; rather that it cannot be achieved reproducibly by manipulating the macrovariables {X1, ..., Xn} that we have chosen to define our macrostate. Any attempt to write a stronger law than this will put one at the mercy of a trickster, who can produce a violation of it.

But recognizing this should increase rather than decrease our confidence in the future of the second law, because it means that if an experimenter ever sees an apparent violation, then instead of issuing a sensational announcement, it will be more prudent to search for that unobserved degree of freedom. That is, the connection of entropy with information works both ways; seeing an apparent decrease of entropy signifies ignorance of what were the relevant macrovariables.

Homework

I've actually given you enough information on statistical mechanics to calculate an interesting system. Say you have N particles, each fixed in place to a lattice. Each particle can be in one of two states, with energies 0 and ε. Calculate and plot the entropy if you know the total energy: S(E), and then the energy as a function of temperature: E(T). This is essentially a combinatorics problem, and you may assume that N is large, so use Stirling's approximation. What you will discover should make sense using the correct definitions of entropy and temperature.


*: How many combinations of 1023 numbers between 0 and 10 add up to 5×1023?

TV's "Elementary" Tackles Friendly AI and X-Risk - "Bella" (Possible Spoilers)

25 pjeby 22 November 2014 07:51PM

I was a bit surprised to find this week's episode of Elementary was about AI...  not just AI and the Turing Test, but also a fairly even-handed presentation of issues like Friendliness, hard takeoff, and the difficulties of getting people to take AI risks seriously.

The case revolves around a supposed first "real AI", dubbed "Bella", and the theft of its source code...  followed by a computer-mediated murder.  The question of whether "Bella" might actually have murdered its creator for refusing to let it out of the box and connect it to the internet is treated as an actual possibility, springboarding to a discussion about how giving an AI a reward button could lead to it wanting to kill all humans and replace them with a machine that pushes the reward button.

Also demonstrated are the right and wrong ways to deal with attempted blackmail...  But I'll leave that vague so it doesn't spoil anything.  An X-risks research group and a charismatic "dangers of AI" personality are featured, but do not appear intended to resemble any real-life groups or personalities.  (Or if they are, I'm too unfamiliar with the groups or persons to see the resemblence.)  They aren't mocked, either...  and the episode's ending is unusually ambiguous and open-ended for the show, which more typically wraps everything up with a nice bow of Justice Being Done.  Here, we're left to wonder what the right thing actually is, or was, even if it's symbolically moved to Holmes' smaller personal dilemma, rather than leaving the focus on the larger moral dilemma that created Holmes' dilemma in the first place.

The episode actually does a pretty good job of raising an important question about the weight of lives, even if LW has explicitly drawn a line that the episode's villain(s)(?) choose to cross.  It also has some fun moments, with Holmes becoming obsessed with proving Bella isn't an AI, even though Bella makes it easy by repeatedly telling him it can't understand his questions and needs more data.  (Bella, being on an isolated machine without internet access, doesn't actually know a whole lot, after all.)  Personally, I don't think Holmes really understands the Turing Test, even with half a dozen computer or AI experts assisting him, and I think that's actually the intended joke.

There's also an obligatory "no pity, remorse, fear" speech lifted straight from The Terminator, and the comment "That escalated quickly!" in response to a short description of an AI box escape/world takeover/massacre.

(Edit to add: one of the unusually realistic things about the AI, "Bella", is that it was one of the least anthromorphized fictional AI's I have ever seen.  I mean, there was no way the thing was going to pass even the most primitive Turing test...  and yet it still seemed at least somewhat plausible as a potential murder suspect.  While perhaps not a truly realistic demonstration of just how alien an AI's thought process would be, it felt like the writers were at least making an actual effort.  Kudos to them.)

(Second edit to add: if you're not familiar with the series, this might not be the best episode to start with; a lot of the humor and even drama depends upon knowledge of existing characters, relationships, backstory, etc.  For example, Watson's concern that Holmes has deliberately arranged things to separate her from her boyfriend might seem like sheer crazy-person paranoia if you don't know about all the ways he did interfere with her personal life in previous seasons...  nor will Holmes' private confessions to Bella and Watson have the same impact without reference to how difficult any admission of feeling was for him in previous seasons.)

Has LessWrong Ever Backfired On You?

24 Evan_Gaensbauer 15 December 2014 05:44AM

Several weeks ago I wrote a heavily upvoted post called Don't Be Afraid of Asking Personally Important Questions on LessWrong. I thought it would only be due diligence if I tried to track users on LessWrong who have received advice on this site and it's backfired. In other words, to avoid bias in the record, we might notice what LessWrong as a community is bad at giving advice about. So, I'm seeking feedback. If you have anecdotes or data of how a plan or advice directly from LessWrong backfired, failed, or didn't lead to satisfaction, please share below. 

Stuart Russell: AI value alignment problem must be an "intrinsic part" of the field's mainstream agenda

24 RobbBB 26 November 2014 11:02AM

Edge.org has recently been discussing "the myth of AI". Unfortunately, although Superintelligence is cited in the opening, most of the participants don't seem to have looked into Bostrom's arguments. (Luke has written a brief response to some of the misunderstandings Pinker and others exhibit.) The most interesting comment is Stuart Russell's, at the very bottom:

Of Myths and Moonshine

"We switched everything off and went home. That night, there was very little doubt in my mind that the world was headed for grief."

So wrote Leo Szilard, describing the events of March 3, 1939, when he demonstrated a neutron-induced uranium fission reaction. According to the historian Richard Rhodes, Szilard had the idea for a neutron-induced chain reaction on September 12, 1933, while crossing the road next to Russell Square in London. The previous day, Ernest Rutherford, a world authority on radioactivity, had given a "warning…to those who seek a source of power in the transmutation of atoms – such expectations are the merest moonshine."

Thus, the gap between authoritative statements of technological impossibility and the "miracle of understanding" (to borrow a phrase from Nathan Myhrvold) that renders the impossible possible may sometimes be measured not in centuries, as Rod Brooks suggests, but in hours.

None of this proves that AI, or gray goo, or strangelets, will be the end of the world. But there is no need for a proof, just a convincing argument pointing to a more-than-infinitesimal possibility. There have been many unconvincing arguments – especially those involving blunt applications of Moore's law or the spontaneous emergence of consciousness and evil intent. Many of the contributors to this conversation seem to be responding to those arguments and ignoring the more substantial arguments proposed by Omohundro, Bostrom, and others.

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.

2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer's apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker – especially one connected through the Internet to all the world's information and billions of screens and most of our infrastructure – can have an irreversible impact on humanity.

This is not a minor difficulty. Improving decision quality, irrespective of the utility function chosen, has been the goal of AI research – the mainstream goal on which we now spend billions per year, not the secret plot of some lone evil genius. AI research has been accelerating rapidly as pieces of the conceptual framework fall into place, the building blocks gain in size and strength, and commercial investment outstrips academic research activity. Senior AI researchers express noticeably more optimism about the field's prospects than was the case even a few years ago, and correspondingly greater concern about the potential risks.

No one in the field is calling for regulation of basic research; given the potential benefits of AI for humanity, that seems both infeasible and misdirected. The right response seems to be to change the goals of the field itself; instead of pure intelligence, we need to build intelligence that is provably aligned with human values. For practical reasons, we will need to solve the value alignment problem even for relatively unintelligent AI systems that operate in the human environment. There is cause for optimism, if we understand that this issue is an intrinsic part of AI, much as containment is an intrinsic part of modern nuclear fusion research. The world need not be headed for grief.

I'd quibble with a point or two, but this strikes me as an extraordinarily good introduction to the issue. I hope it gets reposted somewhere it can stand on its own.

Russell has previously written on this topic in Artificial Intelligence: A Modern Approach and the essays "The long-term future of AI," "Transcending complacency on superintelligent machines," and "An AI researcher enjoys watching his own execution." He's also been interviewed by GiveWell.

[Link] Eric S. Raymond - Me and Less Wrong

20 philh 05 December 2014 11:44PM

http://esr.ibiblio.org/?p=6549

I’ve gotten questions from a couple of different quarters recently about my relationship to the the rationalist community around Less Wrong and related blogs. The one sentence answer is that I consider myself a fellow-traveler and ally of that culture, but not really part of it nor particularly wishing to be.

The rest of this post is a slightly longer development of that answer.

The new GiveWell recommendations are out: here's a summary of the charities

18 tog 01 December 2014 09:20PM

GiveWell have just announced their latest charity recommendations! What are everyone’s thoughts on them?

A summary: all of the old charities (GiveDirectly, SCI and Deworm the World) remain on the list. They're rejoined by AMF, as the room for more funding issues that led to it being delisted have been resolved to GiveWell's satisfaction. Together these organisations form GiveWell's list of 'top charities', which is now joined by a list of other charities which they see as excellent but not quite in the top tier. The charities on this list are Development Media International, Living Goods, and two salt fortification programs (run by GAIN and ICCIDD).

As normal, GiveWell's site contains extremely detailed writeups on these organisations. Here are some shorter descriptions which I wrote for Charity Science's donations page and my tool for donating tax-efficiently, starting with the new entries:

GiveWell's newly-added charities

Boost health and cognitive development with salt fortification

The charities GAIN and ICCIDD run programs that fortify the salt that millions of poor people eat with iodine. There is strong evidence that this boosts their health and cognitive development; iodine deficiency causes pervasive mental impairment, as well as stillbirth and congenital abnormalities such as severe retardation. It can be done very cheaply on a mass scale, so is highly cost-effective. GAIN is registered in the US and ICCIDD in Canada (although Canadians can give to either via Charity Science, which for complex reasons helps others who donate tax-deductibly to other charities), allowing for especially efficient donations from these countries, and taxpayers from other countries can also often give to them tax-deductibly. For more information, read GiveWell's detailed reviews of GAIN and ICCIDD.

Educate millions in life-saving practices with Development Media International

Development Media International (DMI) produces radio and television broadcasts in developing countries that tell people about improved health practices that can save lives, especially those of young children. Examples of such practices include exclusive breastfeeding. DMI are conducting a randomized controlled trial of their program which has found promising indications of a large decrease in children's deaths. With more funds they would be able to reach millions of people, due to the unparalleled reach of broadcasting. For more information, read GiveWell's detailed review.

Bring badly-needed goods and health services to the poor with Living Goods

Living Goods is a non-profit which runs a network of people selling badly-needed health and household goods door-to-door in their communities in Uganda and Kenya and provide free health advice. A randomized controlled trial suggested that this caused a 25% reduction in under-5 mortality among other benefits. Products sold range from fortified foods and mosquito nets to cookstoves and contraceptives. Giving to Living Goods is an exciting opportunity to bring these badly needed goods and services to some of the poorest families in the world. For more information, read GiveWell's detailed review.

GiveWell's old and returning charities

Treat hundreds of people for parasitic worms

Deworm the World and the Schistosomiasis Control Initiative (SCI) treat parasitic worm infections such as schistosomiasis, which can cause urinary infections, anemia, and other nutritional problems. For more information, read GiveWell's detailed review, or the more accessible Charity Science summary. Deworm the World is registered in the USA and SCI in the UK, allowing for tax-efficient direct donations in those countries, and taxpayers from other countries can also often give to them efficiently.

Make unconditional cash transfers with GiveDirectly

GiveDirectly lets you empower people to purchase whatever they believe will help them most. Eleven randomized controlled trials have supported cash transfers’ impact, and there is strong evidence that recipients know their own situation best and generally invest in things which make them happier in the long term. For more information, read GiveWell's detailed review, or the more accessible Charity Science summary.

Save lives and prevent infections with the Against Malaria Foundation

Malaria causes about a million deaths and two hundred million infections a year. Thankfully a $6 bednet can stop mosquitos from infecting children while they sleep, preventing this deadly disease. This intervention has exceptionally robust evidence behind it, with many randomized controlled trials suggesting that it is one of the most cost-effective ways to save lives. The Against Malaria Foundation (AMF) is an exceptional charity in every respect, and was GiveWell's top recommendation in 2012 and 2013. Not all bednet charities are created equal, and AMF outperforms the rest on every count. They can distribute nets cheaper than most others, for just $6.13 US. They distribute long-lasting nets which don’t need retreating with insecticide. They are extremely transparent and monitor their own impact carefully, requiring photo verification from each net distribution. For more information, read GiveWell's detailed review, or the more accessible Charity Science summary.

How to donate

To find out which charities are tax-deductible in your country and get links to give to them tax-efficiently, you can use this interactive tool that I made. If you give this season, consider sharing the charities you choose on the EA Donation Registry. We can see which charities EAs pick, and which of the new ones prove popular!

How can one change what they consider "fun"?

17 AmagicalFishy 21 November 2014 02:04AM

Most of this post is background and context, so I've included a tl;dr horizontal rule near the bottom where you can skip everything else if you so choose. :)

Here's a short anecdote of Feynman's:

... I invented some way of doing problems in physics, quantum electrodynamics, and made some diagrams that help to make the analysis. I was on a floor in a rooming house. I was in in my pyjamas, I'd been working on the floor in my pyjamas for many weeks, fooling around, but I got these funny diagrams after a while and I found they were useful. They helped me to find the equations easier, so I thought of the possibility that it might be useful for other people, and I thought it would really look funny, these funny diagrams I'm making, if they appear someday in the Physical Review, because they looked so odd to me. And I remember sitting there thinking how funny that would be if it ever happened, ha ha.

Well, it turned out in fact that they were useful and they do appear in the Physical Review, and I can now look at them and see other people making them and smile to myself, they do look funny to me as they did then, not as funny because I've seen so many of them. But I get the same kick out of it, that was a little fantasy when I was a kid…not a kid, I was a college professor already at Cornell. But the idea was that I was still playing, just like I have always been playing, and the secret of my happiness in life or the major part of it is to have discovered a way to entertain myself that other people consider important and they pay me to do. I do exactly what I want and I get paid. They might consider it serious, but the secret is I'm having a very good time.

There are things that I have fun doing, and there are things that I feel I have substantially more fun doing. The things in the latter group are things I generally consider a waste of time. I will focus on one specifically, because it's by far the biggest offender, and what spurred this question. Video games.

I have a knack for video games. I've played them since I was very young. I can pick one up and just be good at it right off the bat. Many of my fondest memories take place in various games played with friends or by myself and I can spend hours just reading about them. (Just recently, I started getting into fighting games technically; I plan to build my own joystick in a couple of weeks. I'm having a blast just doing the associated research.)

Usually, I'd rather play a good game than anything else. I find that the most fun I have is time spent mastering a game, learning its ins and outs, and eventually winning. I have great fun solving a good problem, or making a subtle, surprising connection—but it just doesn't do it for me like a game does.

But I want to have as much fun doing something else. I admire mathematics and physics on a very deep level, and feel a profound sense of awe when I come into contact with new knowledge regarding these fields. The other day, I made a connection between pretty basic group theory and something we were learning about in quantum (nothing amazing; it's something well known to... not undergraduates) and that was awesome. But still, I think I would have preferred to play 50 rounds of Skullgirls and test out a new combo.

TL;DR BAR


I want to have as much fun doing the things that I, on a deep level, want to do—as opposed to the things which I actually have more fun doing. I'm (obviously) not Feynman, but I want to play with ideas and structures and numbers like I do with video games. I want the same creativity to apply. The same fervor. The same want. It's not that it isn't there; I am not just arbitrarily applying this want to mathematics. I can feel it's there—it's just overshadowed by what's already there for video games.

How does one go about switching something they find immensely fun, something they're even passionate about, with something else? I don't want to be as passionate about video games as I am. I'd rather feel this way about something... else. I'd rather be able to happily spend hours reading up on [something] instead of what type of button I'm going to use in my fantasy joystick, or the most effective way to cross-up your opponent.

What would you folks do? I consider this somewhat of a mind-hacking question.

I'm in a really dark place right now, I think I need help.

15 efim 30 November 2014 11:34PM

Hi!
For a while now I've been having troubles with my life.

Today it got worse, likely I will feel better in a week, but problems related to a person I love and searching for the purpose of my life will not be solved, just ignored as I did for a couple of years now.

First thing to do would be to talk it out with friends or therapist (and I am now willing to spend money on it).

But the biggest problem is that they will probably not understand it, I tried to discuss it with a friend and got sympathetic and emotionally helpful advices that nonetheless don't contribute to solution at all.

I know that I will never let me kill myself (at least for anything less that amount of money that I can make in a lifetime), so I am lingering on with my life. Still I need help.

Conversations like this should not be held in comments, and I don't really know what kind of help am I expecting to get.

Sometime ago I saw an ad of therapist from lw that can council via Skype - please give me a link if you know anyone like that.

[edited: 1.12.14; 6:42] I thank everyone who send me link to Shannon at http://anxietygoaway.com/; i signed up for a free consultation, hope that something goog will come out of it.

xkcd on the AI box experiment

15 FiftyTwo 21 November 2014 08:26AM

Todays xkcd 

 

I guess there'll be a fair bit of traffic coming from people looking it up? 

Stupid Questions December 2014

14 Gondolinian 08 December 2014 03:39PM

This thread is for asking any questions that might seem obvious, tangential, silly or what-have-you. Don't be shy, everyone has holes in their knowledge, though the fewer and the smaller we can make them, the better.

Please be respectful of other people's admitting ignorance and don't mock them for it, as they're doing a noble thing.

To any future monthly posters of SQ threads, please remember to add the "stupid_questions" tag.

Good things to have learned....

14 NancyLebovitz 03 December 2014 07:29PM

I was looking at a discussion of what should be in a college curriculum, and as such discussions seem to go, there was a big list of things everyone should study, and some political claims about what's being offered but shouldn't be. 

Instead, what do you wish you'd studied in college? What do you wish other people had studied in college? On the latter, do you think everyone should have studied it, or do you just wish more people knew about it? Approximately what percentage of people?

Of course, this doesn't have to be limited to college. People could learn the same things earlier or later.

Request for suggestions: ageing and data-mining

14 bokov 24 November 2014 11:38PM

Imagine you had the following at your disposal:

  • A Ph.D. in a biological science, with a fair amount of reading and wet-lab work under your belt on the topic of aging and longevity (but in hindsight, nothing that turned out to leverage any real mechanistic insights into aging).
  • A M.S. in statistics. Sadly, the non-Bayesian kind for the most part, but along the way acquired the meta-skills necessary to read and understand most quantitative papers with life-science applications.
  • Love of programming and data, the ability to learn most new computer languages in a couple of weeks, and at least 8 years spent hacking R code.
  • Research access to large amounts of anonymized patient data.
  • Optimistically, two decades remaining in which to make it all count.

Imagine that your goal were to slow or prevent biological aging...

  1. What would be the specific questions you would try to tackle first?
  2. What additional skills would you add to your toolkit?
  3. How would you allocate your limited time between the research questions in #1 and the acquisition of new skills in #2?

Thanks for your input.

 

PSA: Eugine_Nier evading ban?

13 Dahlen 07 December 2014 11:23PM

I know this reeks of witch-hunting, but... I have a hunch that u/Eugine_Nier is back under the guise of u/Azathoth123. Reasons:

  •  Same political views, with a tendency to be outspoken about them
  • Karma hovering in the 70s% for both accounts, occasionally going into the 60s%, significantly lower than the LW average
  • The dates match up. Kaj Sotala announced on July 03, 2014 that Eugine was to be permanently banned. The first comment from Azathoth123 was on July 12, 2014.
  • The one that got my attention was the posting pattern. Particularly, Eugine_Nier had a pervasive pattern of exceeding the quote limits per rationality thread. That's actually the first thing I had noticed about the guy back when he was first active, and a few times I thought about drawing attention to the way he flouted the rules, but never got around to it/cared enough about the matter. Now, I see Azathoth123 doing the same thing. The current Rationality Quotes thread has four quotes from him already and it hasn't even been a week since the thread was posted; all of them have something to do with his political views. As do basically all of his postings so far.
  • Each one of these points, separately, has a small prior probability if the two of them are not the same person. Together, they have an even smaller probability. Especially the predilection for posting one too many rationality quotes; seriously, how common an occurrence is that one in particular?
  • My experience so far with the internet has been that people like Eugine never really leave an online community they have pestered for so long. It doesn't matter if they're IP banned or something. They always come back, just under a different name, and they come back shortly.

I don't have an axe to grind against the guy, I've only spoken to him a couple of times and didn't notice any particularly large karma hits afterwards, I just really dislike it when someone skirts the rules like that. Disruptive users evading permanent bans never helped any community ever.

Obviously I'm posting this here because I think a moderator should look into the matter. Usually I would be posting a disclaimer of some sort, apologizing in advance to Azathoth123 for attacking his standing with slanderous accusations if this turned out not to be the case. Well, I won't. The more I look into the matter, the more confident I get that they're the same person. Azathoth, if you're reading this and you're not Eugine_Nier, then I strongly advise you go search for your twin brother, I think you'll get along very well. Seriously, I'm saying this in good faith. You have a suspiciously great deal of things in common.

If retributive downvoting is (still) a concern (if not, then disregard this paragraph): I'd like to request, if such a thing is possible, that a mod karma-blocks me until the issue is over, so as to not incur undeserved downvotes (it would also mean I'd get no upvotes). In turn, I promise not to abuse the system by spamming the boards with garbage without consequences, but then again given my history so far on LW I don't think that such an abuse should be expected from me. For the record, I could have made a throwaway account just to say this, and not risk being karmassassinated, but 1) a zero karma account has no credibility and 2) for signalling reasons I prefer to put my money where my mouth is.

P.S. I only made this announcement its own post because the latest open thread was about to "expire".

Podcast: Rationalists in Tech

12 JoshuaFox 14 December 2014 04:14PM
I'll appreciate feedback on a new podcast, Rationalists in Tech. 

I'm interviewing founders, executives, CEOs, consultants, and other people in the tech sector, mostly software. Thanks to Laurent Bossavit, Daniel Reeves, and Alexei Andreev who agreed to be the guinea pigs for this experiment. 
  • The audience:
Software engineers and other tech workers, at all levels of seniority.
  • The hypothesized need
Some of you are thinking: "I see that some smart and fun people hang out at LessWrong. It's hard to find people like that to work with. I wonder if my next job/employee/cofounder could come from that community."
  • What this podcast does for you
You will get insights into other LessWrongers as real people in the software profession. (OK, you knew that, but this helps.) You will hear the interviewees' ideas on CfAR-style techniques as a productivity booster, on working with other aspiring rationalists, and on the interviewees' own special areas of expertise. (At the same time, interviewees benefit from exposure that can get them business contacts,  employees, or customers.) Software engineers from LW will reach out to interviewees and others in the tech sector, and soon, more hires and startups will emerge. 

Please give your feedback on the first episodes of the podcast. Do you want to hear more? Should there be other topics? A different interview style? Better music?

A forum for researchers to publicly discuss safety issues in advanced AI

12 RobbBB 13 December 2014 12:33AM

MIRI has an organizational goal of putting a wider variety of mathematically proficient people in a position to advance our understanding of beneficial smarter-than-human AI. The MIRIx workshops, our new research guide, and our more detailed in-the-works technical agenda are intended to further that goal.

To encourage the growth of a larger research community where people can easily collaborate and get up to speed on each other's new ideas, we're also going to roll out an online discussion forum that's specifically focused on resolving technical problems in Friendly AI. MIRI researchers and other interested parties will be able to have more open exchanges there, and get rapid feedback on their ideas and drafts. A relatively small group of people with relevant mathematical backgrounds will be authorized to post on the forum, but all discussion on the site will be publicly visible to visitors.

Topics will run the gamut from logical uncertainty in formal agents to cognitive models of concept generation. The exact range of discussion topics is likely to evolve over time as researchers' priorities change and new researchers join the forum.

We're currently tossing around possible names for the forum, and I wanted to solicit LessWrong's input, since you've been helpful here in the past. (We're also getting input from non-LW mathematicians and computer scientists.) We want to know how confusing, apt, etc. you perceive these variants on 'forum for doing exploratory engineering research in AI' to be:

1. AI Exploratory Research Forum (AIXRF)

2. Forum for Exploratory Engineering in AI (FEEAI)

3. Forum for Exploratory Research in AI (FERAI, or FXRAI)

4. Exploratory AI Research Forum (XAIRF, or EAIRF)

We're also looking at other name possibilities, including:

5. AI Foundations Forum (AIFF)

6. Intelligent Agent Foundations Forum (IAFF)

7. Reflective Agents Research Forum (RARF)

We're trying to avoid names like "friendly" and "normative" that could reinforce someone's impression that we think of AI risk in anthropomorphic terms, that we're AI-hating technophobes, or that we're moral philosophers.

Feedback on the above ideas is welcome, as are new ideas. Feel free to post separate ideas in separate comments, so they can be upvoted individually. We're especially looking for feedback along the lines of: 'I'm a grad student in theoretical computer science and I feel that the name [X] would look bad in a comp sci bibliography or C.V.' or 'I'm friends with a lot of topologists, and I'm pretty sure they'd find the name [Y] unobjectionable and mildly intriguing; I don't know how well that generalizes to mathematical logicians.'

What Peter Thiel thinks about AI risk

12 Dr_Manhattan 11 December 2014 09:22PM

This is probably the clearest statement from him on the issue:

http://betaboston.com/news/2014/12/10/audio-peter-thiel-visits-boston-university-to-talk-entrepreneurship-and-backing-zuck/

25:30 mins in

 

TL;DR: he thinks its an issue but also feels AGI is very distant and hence less worried about it than Musk.

 

I recommend the rest of the lecture as well, it's a good summary of "Zero to One"  and a good QA afterwards.

[link] The Philosophy of Intelligence Explosions and Advanced Robotics

12 Kaj_Sotala 02 December 2014 03:44AM

The philosopher John Danaher has posted a list of all the posts that he's written on the topic of robotics and AI. Below is the current version of the list: he says that he will keep updating the page as he writes more.

  • The Singularity: Overview and Framework: This was my first attempt to provide a general overview and framework for understanding the debate about the technological singularity. I suggested that the debate could be organised around three main theses: (i) the explosion thesis -- which claims that there will be an intelligence explosion; (ii) the unfriendliness thesis -- which claims that an advanced artificial intelligence is likely to be "unfriendly"; and (iii) the inevitability thesis -- which claims that the creation of an unfriendly AI will be difficult to avoid, if not inevitable.
  • The Singularity: Overview and Framework Redux: This was my second attempt to provide a general overview and framework for understanding the debate about the technological singularity. I tried to reduce the framework down to two main theses: (i) the explosion thesis and (ii) the unfriendliness thesis.
  • AIs and the Decisive Advantage Thesis: Many people claim that an advanced artificial intelligence would have decisive advantages over human intelligences. Is this right? In this post, I look at Kaj Sotala's argument to that effect.
  • Is there a case for robot slaves? - If robots can be persons -- in the morally thick sense of "person" -- then surely it would be wrong to make them cater to our every whim? Or would it? Steve Petersen argues that the creation of robot slaves might be morally permissible. In this post, I look at what he has to say.
  • The Ethics of Robot Sex: A reasonably self-explanatory title. This post looks at the ethical issues that might arise from the creation of sex robots.
  • Will sex workers be replaced by robots? A Precis: A short summary of a longer article examining the possibility of sex workers being replaced by robots. Contrary to the work of others, I suggest that sex work might be resilient to the phenomenon of technological unemployment.
  • Bostrom on Superintelligence (2) The Instrumental Convergence Thesis: The second part in my series on Bostrom's book. This one examines the instrumental convergence thesis, according to which an intelligent agent, no matter what its final goals may be, is likely to converge upon certain instrumental goals that are unfriendly to human beings.

Shop for Charity: how to earn proven charities 5% of your Amazon spending in commission

12 tog 24 November 2014 08:29AM

If you shop on Amazon in the countries listed below, you can earn a substantial commission for charity by doing so via the links below. This is a cost-free way to do a lot of good, so I'd encourage you to do so! You can bookmark one of the direct links to Amazon below and then use that bookmark every time you shop.

The commission will be at least 5%, varying by product category. This is substantially better than the AmazonSmile scheme available in the US, which only gives 0.5% of the money you spend to charity. It works through Amazon's 'Associates Program', which pays this commission for referring purchasers to them, from the unaltered purchase price (details here). It doesn't cost the purchaser anything. The money goes to Associates Program accounts owned by the EA non-profit Charity Science, money to which always gets regranted to GiveWell-recommended charities unless explicitly earmarked otherwise. For ease of administration and to get tax-deductibility, commission will get regranted to the Schistosomiasis Control Initiative until further notice.

Direct links to Amazon for your bookmarks

If you'd like to shop for charity, please bookmark the appropriate link below now:

 

From now through November 28: Black Friday Deals Week

Amazon's biggest cut price sale is this week. The links below take you to currently available deals:

Please share these links

I'll add other links on the main 'Shop for Charity' page later. I'd love to hear suggestions for good commission schemes in other countries. If you'd like to share these links with friends and family, please point them to this post or even better this project's main page.

Happy shopping!

'Shop for Charity' is a Charity Science project

Where is the line between being a good child and taking care of oneself?

11 jkadlubo 04 December 2014 07:26AM

I recently saw some posts here about how LW helps with personal stuff and that it's a good idea to post here1. Plus, you are the most supportive people I've ever met. I still hesitate. Know my courage. Also, I pretty much always put needs and feeling of other people ahead of mine, because "mine are not as important" (more on that in note 3 and 5). Saying my feelings and needs out loud is really scary. Writing them down is almost unimaginable.

I recently parted ways with my psychologist, but don't yet want to find a new one. Between our last meetings I had a thought that (as usually) was not explored deep enough. And I think I need to go deeper in it. Maybe you can point me in the right disome interesting directions?

 

A bit of background information about me and my parents: as a child and adolescent I did not exist as a separate person. I lived by the side of perfection, always not good enough, or simply not good. The chant of that time was "why can't you ... like your sister?" (have good grades, keep the room tidy, have friends, be nice - insert almost anything you can imagine)2. There were also other problems, but let's not make this part too long.  

When I grew up and discovered that this all was not normal nor right, I became bitter and angry3. I keep in touch with them, act as if almost everything is fine, but boil inside a lot.  

On the surface I want to fix things. I want to be able to ask my mum to teach me something (I haven't been able to do this since I was a preschooler, partly because I feared being laughed at), I want to look at my dad and not remember him calling me a murderer3. In my country it's even unthinkable that I am trying to cut the contact with one of my grandmothers4.  

So I invented that I want my mother to understand why what she did was hurting me and apologise. She knows that I think she wronged me, though I doubt she realizes how very bitter I am about it all. For her, the solution is to start anew, with a blank space. I can't do that; I spent my adolescence being hit by her and apologising to her for whatever she deemed my fault on a given day. The problem with my solution is that she's incapable of it. For the sake of simplification: she does not comprehend other people's emotions (but she does have emotions of her own and she mostly comprehends them)5.

 

My new thought to look at "my solution" from a different perspective. Maybe me wanting to make amends with them this particular way is a bit like looking for approval the same way I did as a child. Maybe my mother saying "I hurt you, I'm sorry" would be like her saying "you were right, you are a good girl". Since I never got approval as a child, I should not expect it as an adult. Does this look sensible?

Maybe the adult thing for me to do would be to stop looking for these pats-on-the-back. And also the tiny pats-on-the-back I get when I act around them as if almost everything is fine while boiling inside and they reciprocate niceties. And that might mean cutting the contact as much as I can (which is of course scary and unimaginable). Or am I just going in one direction, where more are possible?  

Maybe I should stop trying to invent ways to fix things6. Maybe I should tell them how angry I am, and tell them in a way that prohibits them from interrupting me or reacting sooner than let's say a week later. But then again - would that do any good? Would they be able to understand me if they've never tried it before? Or is the question "would that do any good" just an example of my regular tendency to place other people ahead and diminute my needs?

 

1. Asking personal questions on LW, I don't remember the other ones now

2. She was more brilliant than me, managed to keep some friends even though we were the only kids in our class (yes, younger, but the same class - she went to school early and was always at the top) living outside the area etc. Another chant was "if you don't lose weight now, you'll be a cripple when you grow up", even though I had a perfectly normal figure. Everybody would routinely mix our names. The parents would routinely mix up our preferences about food, colours, favourite subjects etc. In a way: there was no me, there was just her and a faulty copy of her, who would keep being faulty on purpose, in order to enrage mother.

3. This last straw fell when my father (who was much more taciturn in scolding me, so back then I still had some sypathy for him) heard that my son and his (then) only grandchild suffered from a genetic disease that kills babies before they turn 2. After I explained the genetics of the disease, testing fetuses, chances, potential choices etc. he showed no sympathy, only asked "but you do know that you're commiting an infanticide?" His catholicism was more important to him than his daughter's grief over her dying child. Then it took me two weeks to stop feeling for him and decide that his reaction was inappropriate.

4. This last straw fell when she was playing with said dying child while I was away for a couple of hours. I got a letter regarding some testing in another country. She opened the letter, saw it was in English, took a dictionary, failed to make any sense of it and asked other people to help her. And when I came back and got angry at her reading my correspondece, she got outraged because she considered it her right. But I kept quiet for another year or two before I told her I don't want her near my children.

5. She thaught me empathy this way: I was always supposed to take into account how my actions would look from her point of view (given that she would not even try to reciprocate and always choose one of the worst interpretations). Empathy is my superpower now; I see a person and know what they feel about a situation, though I lack confidence in acting upon my insight.

6. At some point I was planning on employing someone to help her with the "understand why what she did was hurting me" part.

Link: LessWrong and AI risk mentioned in a Business Insider Article

10 James_Miller 03 December 2014 05:13PM

Google Has An Internal Committee To Discuss Its Fears About The Power Of Artificial

"Worryingly, cofounder Shane Legg thinks the team's advances could be what finishes off the human race. He told the LessWrong blog in an interview 'Eventually, I think human extinction will probably occur, and technology will likely play a part in this'. He adds he thinks AI is the 'no.1 risk for this century'. It's ominous stuff. (Read about Elon Musk discussing his concerns about AI here.)"

[LINK] Steven Hawking warns of the dangers of AI

10 Salemicus 02 December 2014 03:22PM

From the BBC:

[Hawking] told the BBC:"The development of full artificial intelligence could spell the end of the human race."

...

"It would take off on its own, and re-design itself at an ever increasing rate," he said. "Humans, who are limited by slow biological evolution, couldn't compete, and would be superseded."

There is, however, no mention of Friendly AI or similar principles.

In my opinion, this is particularly notable for the coverage this story is getting within the mainstream media. At the current time, this is the most-read and most-shared news story on the BBC website.

Lifehack Ideas December 2014

9 Gondolinian 10 December 2014 12:21AM
Life hacking refers to any trick, shortcut, skill, or novelty method that increases productivity and efficiency, in all walks of life.

Wikipedia

 

This thread is for posting any promising or interesting ideas for lifehacks you've come up with or heard of.  If you've implemented your idea, please share the results.  You are also encouraged to post lifehack ideas you've tried out that have not been successful, and why you think they weren't.  If you can, please give credit for ideas that you got from other people.

To any future posters of Lifehack Ideas threads, please remember to add the "lifehacks_thread" tag.

Is arguing worth it? If so, when and when not? Also, how do I become less arrogant?

9 27chaos 27 November 2014 09:28PM

I've had several political arguments about That Which Must Not Be Named in the past few days with people of a wide variety of... strong opinions. I'm rather doubtful I've changed anyone's mind about anything, but I've spent a lot of time trying to do so. I also seem to have offended one person I know rather severely. Also, even if I have managed to change someone's mind about something through argument, it feels as though someone will end up having to argue with them later down the line when the next controversy happens.

It's very discouraging to feel this way. It is frustrating when making an argument is taken as a reason for personal attack. And it's annoying to me to feel like I'm being forced into something by the disapproval of others. I'm tempted to just retreat from democratic engagement entirely. But there are disadvantages to this, for example it makes it easier to maintain irrational beliefs if you never talk to people who disagree with you.

I think a big part of the problem is that I have an irrational alief that makes me feel like my opinions are uniquely valuable and important to share with others. I do think I'm smarter, more moderate, and more creative than most. But the feeling's magnitude and influence over my behavior is far greater than what's justified by the facts.

How do I destroy this feeling? Indulging it satisfies some competitive urges of mine and boosts my self-esteem. But I think it's bad overall despite this, because it makes evaluating the social consequences of my choices more difficult. It's like a small addiction, and I have no idea how to get over it.

Does anyone else here have an opinion on any of this? Advice from your own lives, perhaps?

Superintelligence 11: The treacherous turn

9 KatjaGrace 25 November 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the 11th section in the reading guideThe treacherous turn. This corresponds to Chapter 8.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Existential catastrophe…” and “The treacherous turn” from Chapter 8


Summary

  1. The possibility of a first mover advantage + orthogonality thesis + convergent instrumental values suggests doom for humanity (p115-6)
    1. First mover advantage implies the AI is in a position to do what it wants
    2. Orthogonality thesis implies that what it wants could be all sorts of things
    3. Instrumental convergence thesis implies that regardless of its wants, it will try to acquire resources and eliminate threats
    4. Humans have resources and may be threats
    5. Therefore an AI in a position to do what it wants is likely to want to take our resources and eliminate us. i.e. doom for humanity.
  2. One kind of response: why wouldn't the makers of the AI be extremely careful not to develop and release dangerous AIs, or relatedly, why wouldn't someone else shut the whole thing down? (p116)
  3. It is hard to observe whether an AI is dangerous via its behavior at a time when you could turn it off, because AIs have convergent instrumental reasons to pretend to be safe, even if they are not. If they expect their minds to be surveilled, even observing their thoughts may not help. (p117)
  4. The treacherous turn: while weak, an AI behaves cooperatively. When the AI is strong enough to be unstoppable it pursues its own values. (p119)
  5. We might expect AIs to be more safe as they get smarter initially - when most of the risks come from crashing self-driving cars or mis-firing drones - then to get much less safe as they get too smart. (p117)
  6. One can imagine a scenario where there is little social impetus for safety (p117-8): alarmists will have been wrong for a long time, smarter AI will have been safer for a long time, large industries will be invested, an exciting new technique will be hard to set aside, useless safety rituals will be available, and the AI will look cooperative enough in its sandbox.
  7. The conception of deception: that moment when the AI realizes that it should conceal its thoughts (footnote 2, p282)

Another view

Danaher:

This is all superficially plausible. It is indeed conceivable that an intelligent system — capable of strategic planning — could take such treacherous turns. And a sufficiently time-indifferent AI could play a “long game” with us, i.e. it could conceal its true intentions and abilities for a very long time. Nevertheless, accepting this has some pretty profound epistemic costs. It seems to suggest that no amount of empirical evidence could ever rule out the possibility of a future AI taking a treacherous turn. In fact, its even worse than that. If we take it seriously, then it is possible that we have already created an existentially threatening AI. It’s just that it is concealing its true intentions and powers from us for the time being.

I don’t quite know what to make of this. Bostrom is a pretty rational, bayesian guy. I tend to think he would say that if all the evidence suggests that our AI is non-threatening (and if there is a lot of that evidence), then we should heavily discount the probability of a treacherous turn. But he doesn’t seem to add that qualification in the chapter. He seems to think the threat of an existential catastrophe from a superintelligent AI is pretty serious. So I’m not sure whether he embraces the epistemic costs I just mentioned or not.

Notes

1. Danaher also made a nice diagram of the case for doom, and relationship with the treacherous turn:

 

2. History

According to Luke Muehlhauser's timeline of AI risk ideas, the treacherous turn idea for AIs has been around at least 1977, when a fictional worm did it:

1977: Self-improving AI could stealthily take over the internet; convergent instrumental goals in AI; the treacherous turn. Though the concept of a self-propagating computer worm was introduced by John Brunner's The Shockwave Rider (1975), Thomas J. Ryan's novel The Adolescence of P-1 (1977) tells the story of an intelligent worm that at first is merely able to learn to hack novel computer systems and use them to propagate itself, but later (1) has novel insights on how to improve its own intelligence, (2) develops convergent instrumental subgoals (see Bostrom 2012) for self-preservation and resource acquisition, and (3) learns the ability to fake its own death so that it can grow its powers in secret and later engage in a "treacherous turn" (see Bostrom forthcoming) against humans.

 

3. The role of the premises

Bostrom's argument for doom has one premise that says AI could care about almost anything, then another that says regardless of what an AI cares about, it will do basically the same terrible things anyway. (p115) Do these sound a bit strange together to you? Why do we need the first, if final values don't tend to change instrumental goals anyway?

It seems the immediate reason is that an AI with values we like would not have the convergent goal of taking all our stuff and killing us. That is, the values we want an AI to have are some of those rare values that don't lead to destructive instrumental goals. Why is this? Because we (and thus the AI) care about the activites the resources would be grabbed from. If the resources were currently being used for anything we didn't care about, then our values would also suggest grabbing resources, and look similar to all of the other values. The difference that makes our values special here is just that most resources are already being used for them somewhat. 

4. Signaling

It is hard to tell apart a safe and an unsafe AI, because both would like to look safe. This is a very common problem in human interactions. For instance, it can be nontrivial to tell a genuine lover from a gold digger, a businessman from a conman, and an expert from a crank. All of them want to look like the desirable sort. Particularly similar to the AI case is that of hiring a new employee for a trial period. You will sometimes find that the employee's values are much better aligned during the trial period, and then they undergo a 'treacherous turn' once they have been hired more thoroughly.

'Costly signaling' is a general purpose solution to this problem, which works some of the time. The basic idea is this. Everyone has instrumental reasons to look like the good kind of person, but perhaps their reasons aren't exactly as strong as one other's, or the strength of their desire is harder to act on for one group than the other, so we can construct a set of options that will lead the different types of people to choose differently, even though they know this will set them apart. For instance, while an honest businessman and a conman would both like to say 'I'm an honest businessman', and to behave honestly if you watch them for five minutes, a conman would benefit less from expensive shopfront as an honest businessman, because his reputation is less valuable, so a brand is a signal of being honest.

Similarly, we could perhaps design situations where most harmful AIs would have a reasonable incentive to reveal themselves. This is difficult because so much is at stake, compared to the cost of the signal. But I haven't thought enough about it to rule it out, and I haven't seen anyone else do so either.

5. When would the 'conception of deception' take place?

Below the level of the best humans presumably, since we have already thought of all this.

6. Surveillance of the mind

Bostrom is not optimistic that AI researchers could protect much against risks by watching AI mental processes directly for signs of deception, rapid progress, or undesirable scheming. He says if the AI is smart it will realize this might happen, and hide the offending thoughts.

This seems an open question to me, for several reasons:

  1. Making your mental contents look innocuous while maintaining their semantic content sounds potentially very hard
  2. Especially for a creature which has only just become smart enough to realize it should treacherously turn
  3. From the AI's perspective, even if it is smart, surveillance could seem fairly unlikely, especially if we deceive it about its surroundings
As a consequence of 2, it seems better if the 'conception of deception' comes earlier.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. How transparent are AI minds likely to be? Should we expect to be able to detect deception? What are the answers to these questions for different specific architectures and methods? This might be relevant.
  2. Are there other good ways to filter AIs with certain desirable goals from others? e.g. by offering them choices that would filter them.
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'malignant failure modes' (as opposed presumably to worse failure modes). To prepare, read “Malignant failure modes” from Chapter 8The discussion will go live at 6pm Pacific time next Monday December 1. Sign up to be notified here.

Memory Improvement: Mnemonics, Tools, or Books on the Topic?

9 Capla 21 November 2014 06:59PM

I want a perfect eidetic memory.

Unfortunately, such things don't exist, but that's not stopping me from getting as close as possible. It seems as if the popular solutions are spaced repetition and memory palaces. So let's talk about those.

Memory Palaces: Do they work? If so what's the best resource (book, website etc.) for learning and mastering the technique? Is it any good for memorizing anything other than lists of things (which I find I almost never have to do)?

Spaced Repetition: What software do you use? Why that one? What sort of cards do you put in?

It seems to me that memory programs and mnemonic techniques assist one of three parts of the problem of memory: memorizing, recalling, and not forgetting.

"Not forgetting" is the long term problem of memory. Spaced repetition seems to solve the problem of "not forgetting." You feed the information you want to remember into your program, review frequently, and you won't forget that information.

Memory Palaces seem to deal with the "memorizing" part of the problem. When faced with new information that you want to be able to recall, you put it in a memory palace, vividly emphasized so as to be affective and memorable. This is good for short term encoding of information that you know you want to keep. You might put it into your spaced repetition program latter, but you just want to not forget it until then.

The last part is the problem of "recalling." Both of the previous facets of the problem of memory had a distinct advantage: you knew the information that you wanted to remember in advance. However, we frequently find ourselves in situations in which we need/want  to remember something that we know (or perhaps we don't) we encountered, but didn't consider particularly important at the time.  Under this heading falls the situation of making connections when learning or being reminded of old information by new information: when you learn y, you have the thought "hey, isn't that just like x?" This is the facet of the memory problem that I am most interested in, but I know of scarcely anything that can reliably improve ease of recall of information in general. Do you know of anything?

I'm looking for recommendations: books on memory, specific mnemonics, or practices that are known to improve recall, or anything else that might help with any of the three parts of the problem.

 

 

(Very Short) PSA: Combined Main and Discussion Feed

8 Gondolinian 18 December 2014 03:46PM

For anyone who's annoyed by having to check newest submissions for Main and Discussion separately, there is a feed for combined submissions from both, in the form of Newest Submissions - All (RSS feed).  (There's also Comments - All (RSS feed), but for me at least, it seems to only show comments from Main and none from Discussion.)

Thanks to RichardKennaway for bringing this to my attention, and to Unknowns for asking the question that prompted him.  (If you've got the time, head over there and give them some karma.)  I thought this deserved the visibility of a post in Discussion, as not everyone reads through the Open Thread, and I think there's a chance that many would benefit from this information.

Giving What We Can - New Year drive

8 Smaug123 17 December 2014 03:26PM

If you’ve been planning to get around to maybe thinking about Effective Altruism, we’re making your job easier. A group of UK students has set up a drive for people to sign up to the Giving What We Can pledge to donate 10% of their future income to charity. It does not specify the charities - that decision remains under your control. The pledge is not legally binding, but honour is a powerful force when it comes to promising to help. If 10% is a daunting number, or you don't want to sign away your future earnings in perpetuity, there is a Try Giving scheme in which you may donate less money for less time. I suggest five years (that is, from 2015 to 2020) of 5% as a suitable "silver" option to the 10%-until-retirement "gold medal".

 

We’re hoping to take advantage of the existing Schelling point of “new year” as a time for resolutions, as well as building the kind of community spirit that gets people signing up in groups. If you feel it’s a word worth spreading, please feel free to spread it. As of this writing, GWWC reported 41 new members this month, which is a record for monthly acquisitions (and we’re only halfway through the month, three days into the event).

 

If anyone has suggestions about how to better publicise this event (or Effective Altruism generally), please do let me know. We’re currently talking to various news outlets and high-profile philanthropists to see if they can give us a mention, but suggestions are always welcome. Likewise, comments on the effectiveness of this post itself will be gratefully noted.

 

About Giving What We Can: GWWC is under the umbrella of the Centre for Effective Altruism, was co-founded by a LessWronger, and in 2013 had verbal praise from lukeprog.

Approval-directed agents

8 paulfchristiano 12 December 2014 10:38PM

Most concern about AI comes down to the scariness of goal-oriented behavior. A common response to such concerns is “why would we give an AI goals anyway?” I think there are good reasons to expect goal-oriented behavior, and I’ve been on that side of a lot of arguments. But I don’t think the issue is settled, and it might be possible to get better outcomes without them. I flesh out one possible alternative here, based on the dictum "take the action I would like best" rather than "achieve the outcome I would like best."

(As an experiment I wrote the post on medium, so that it is easier to provide sentence-level feedback, especially feedback on writing or low-level comments.)

Make your own cost-effectiveness Fermi estimates for one-off problems

8 owencb 11 December 2014 12:00PM

In some recent work (particularly this article) I built models for estimating the cost effectiveness of work on problems when we don’t know how hard those problems are. The estimates they produce aren’t perfect, but they can get us started where it’s otherwise hard to make comparisons.


Now I want to know: what can we use this technique on? I have a couple of applications I am working on, but I’m keen to see what estimates other people produce.


There are complicated versions of the model which account for more factors, but we can start with a simple version. This is a tool for initial Fermi calculations: it’s relatively easy to use but should get us around the right order of magnitude. That can be very useful, and we can build more detailed models for the most promising opportunities.


The model is given by:

 

This expresses the expected benefit of adding another unit of resources to solving the problem. You can denominate the resources in dollars, researcher-years, or another convenient unit. To use this formula we need to estimate four variables:


  • R(0) denotes the current resources going towards the problem each year. Whatever units you measure R(0) in, those are the units we’ll get an estimate for the benefit of. So if R(0) is measured in researcher-years, the formula will tell us the expected benefit of adding a researcher year.

    • You want to count all of the resources going towards the problem. That includes the labour of those who work on it in their spare time, and some weighting for the talent of the people working in the area (if you doubled the budget going to an area, you couldn’t get twice as many people who are just as good; ideally we’d use an elasticity here).

    • Some resources may be aimed at something other than your problem, but be tangentially useful. We should count some fraction of those, according to how much resources devoted entirely to the problem they seem equivalent to.

  • B is the annual benefit that we’d get from a solution to the problem. You can measure this in its own units, but whatever you use here will be the units of value that come out in the cost-effectiveness estimate.

  • p and y/z are parameters that we will estimate together. p is the probability of getting a solution by the time y resources have been dedicated to the problem, if z resources have been dedicated so far. Note that we only need the ratio y/z, so we can estimate this directly.

    • Although y/z is hard to estimate, we will take a (natural) logarithm of it, so don’t worry too much about making this term precise.

    • I think it will often be best to use middling values of p, perhaps between 0.2 and 0.8.

And that’s it.


Example: How valuable is extra research into nuclear fusion? Assume:

  • R(0) = $5 billion (after a quick google turns up $1.5B for current spending, and adjusting upwards to account for non-financial inputs);

  • B = $1000 billion (guesswork, a bit over 1% of the world economy; a fraction of the current energy sector);

  • There’s a 50% chance of success (p = 0.5) by the time we’ve spent 100 times as many resources as today (log(y/z) = log(100) = 4.6).


Putting these together would give an expected societal benefit of (0.5*$1000B)/(5B*4.6) = $22 for every dollar spent. This is high enough to suggest that we may be significantly under-investing in fusion, and that a more careful calculation (with better-researched numbers!) might be justified.

Caveats

To get the simple formula, the model made a number of assumptions. Since we’re just using it to get rough numbers, it’s okay if we don’t fit these assumptions exactly, but if they’re totally off then the model may be inappropriate. One restriction in particular I’d want to bear in mind:


  • It should be plausible that we could solve the problem in the next decade or two.


It’s okay if this is unlikely, but I’d want to change the model if I were estimating the value of e.g. trying to colonise the stars.

Request for applications

So -- what would you like to apply this method to? What answers do you get?


To help structure the comment thread, I suggest attempting only one problem in each  comment. Include the value of p, and the units of R(0) and units of B that you’d like to use. Then you can give your estimates for R(0), B, and y/z as a comment reply, and so can anyone else who wants to give estimates for the same thing.


I’ve also set up a google spreadsheet where we can enter estimates for the questions people propose. For the time being anyone can edit this.


Have fun!

Superintelligence 13: Capability control methods

7 KatjaGrace 09 December 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the thirteenth section in the reading guide: capability control methods. This corresponds to the start of chapter nine.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Two agency problems” and “Capability control methods” from Chapter 9


Summary

  1. If the default outcome is doom, how can we avoid it? (p127)
  2. We can divide this 'control problem' into two parts:
    1. The first principal-agent problem: the well known problem faced by a sponsor wanting an employee to fulfill their wishes (usually called 'the principal agent problem')
    2. The second principal-agent problem: the emerging problem of a developer wanting their AI to fulfill their wishes
  3. How to solve second problem? We can't rely on behavioral observation (as seen in week 11). Two other options are 'capability control methods' and 'motivation selection methods'. We see the former this week, and the latter next week.
  4. Capability control methods: avoiding bad outcomes through limiting what an AI can do. (p129)
  5. Some capability control methods:
    1. Boxing: minimize interaction between the AI and the outside world. Note that the AI must interact with the world to be useful, and that it is hard to eliminate small interactions. (p129)
    2. Incentive methods: set up the AI's environment such that it is in the AI's interest to cooperate. e.g. a social environment with punishment or social repercussions often achieves this for contemporary agents. One could also design a reward system, perhaps with cryptographic rewards (so that the AI could not wirehead) or heavily discounted rewards (so that long term plans are not worth the short term risk of detection) (p131)
      • Anthropic capture: an AI thinks it might be in a simulation, and so tries to behave as will be rewarded by simulators (box 8; p134)
    3. Stunting: limit the AI's capabilities. This may be hard to do to a degree that avoids danger and is still useful. An option here is to limit the AI's information. A strong AI may infer much from little apparent access to information however. (p135)
    4. Tripwires: test the system without its knowledge, and shut it down if it crosses some boundary. This might be combined with 'honey pots' to attract undesirable AIs take an action that would reveal them. Tripwires could test behavior, ability, or content. (p137)

Another view

Brian Clegg reviews the book mostly favorably, but isn't convinced that controlling an AI via merely turning it off should be so hard:

I also think a couple of the fundamentals aren’t covered well enough, but pretty much assumed. One is that it would be impossible to contain and restrict such an AI. Although some effort is put into this, I’m not sure there is enough thought put into the basics of ways you can pull the plug manually – if necessary by shutting down the power station that provides the AI with electricity.
Kevin Kelly also apparently doubts that AI will substantially impede efforts to modify it:

...We’ll reprogram the AIs if we are not satisfied with their performance...

...This is an engineering problem. So far as I can tell, AIs have not yet made a decision that its human creators have regretted. If they do (or when they do), then we change their algorithms. If AIs are making decisions that our society, our laws, our moral consensus, or the consumer market, does not approve of, we then should, and will, modify the principles that govern the AI, or create better ones that do make decisions we approve. Of course machines will make “mistakes,” even big mistakes – but so do humans. We keep correcting them. There will be tons of scrutiny on the actions of AI, so the world is watching. However, we don’t have universal consensus on what we find appropriate, so that is where most of the friction about them will come from. As we decide, our AI will decide...

This may be related to his view that AI is unlikely to modify itself (from further down the same page):

3. Reprogramming themselves, on their own, is the least likely of many scenarios.

The great fear pumped up by some, though, is that as AI gain our confidence in making decisions, they will somehow prevent us from altering their decisions. The fear is they lock us out. They go rogue. It is very difficult to imagine how this happens. It seems highly improbable that human engineers would program an AI so that it could not be altered in any way. That is possible, but so impractical. That hobble does not even serve a bad actor. The usual scary scenario is that an AI will reprogram itself on its own to be unalterable by outsiders. This is conjectured to be a selfish move on the AI’s part, but it is unclear how an unalterable program is an advantage to an AI. It would also be an incredible achievement for a gang of human engineers to create a system that could not be hacked. Still it may be possible at some distant time, but it is only one of many possibilities. An AI could just as likely decide on its own to let anyone change it, in open source mode. Or it could decide that it wanted to merge with human will power. Why not? In the only example we have of an introspective self-aware intelligence (hominids), we have found that evolution seems to have designed our minds to not be easily self-reprogrammable. Except for a few yogis, you can’t go in and change your core mental code easily. There seems to be an evolutionary disadvantage to being able to easily muck with your basic operating system, and it is possible that AIs may need the same self-protection. We don’t know. But the possibility they, on their own, decide to lock out their partners (and doctors) is just one of many possibilities, and not necessarily the most probable one.

 

Notes

1. What do you do with a bad AI once it is under your control?

Note that capability control doesn't necessarily solve much: boxing, stunting and tripwires seem to just stall a superintelligence rather than provide means to safely use one to its full capacity. This leaves the controlled AI to be overtaken by some other unconstrained AI as soon as someone else isn't so careful. In this way, capability control methods seem much like slowing down AI research: helpful in the short term while we find better solutions, but not in itself a solution to the problem.

However this might be too pessimistic. An AI whose capabilities are under control might either be almost as useful as an uncontrolled AI who shares your goals (if interacted with the right way), or at least be helpful in getting to a more stable situation.

Paul Christiano outlines a scheme for safely using an unfriendly AI to solve some kinds of problems. We have both blogged on general methods for getting useful work from adversarial agents, which is related.

2. Cryptographic boxing

Paul Christiano describes a way to stop an AI interacting with the environment using a cryptographic box.

3. Philosophical Disquisitions

Danaher again summarizes the chapter well. Read it if you want a different description of any of the ideas, or to refresh your memory. He also provides a table of the methods presented in this chapter.

4. Some relevant fiction

That Alien Message by Eliezer Yudkowsky

5. Control through social integration

Robin Hanson argues that it matters more that a population of AIs are integrated into our social institutions, and that they keep the peace among themselves through the same institutions we keep the peace among ourselves, than whether they have the right values. He thinks this is why you trust your neighbors, not because you are confident that they have the same values as you. He has several followup posts.

6. More miscellaneous writings on these topics

LessWrong wiki on AI boxingArmstrong et al on controlling and using an oracle AIRoman Yampolskiy on 'leakproofing' the singularity. I have not necessarily read these.

 

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

 

  1. Choose any control method and work out the details better. For instance:
    1. Could one construct a cryptographic box for an untrusted autonomous system?
    2. Investigate steep temporal discounting as an incentives control method for an untrusted AGI.
  2. Are there other capability control methods we could add to the list?
  3. Devise uses for a malicious but constrained AI.
  4. How much pressure is there likely to be to develop AI which is not controlled?
  5. If existing AI methods had unexpected progress and were heading for human-level soon, what precautions should we take now?

 

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'motivation selection methods'. To prepare, read “Motivation selection methods” and “Synopsis” from Chapter 9The discussion will go live at 6pm Pacific time next Monday 15th December. Sign up to be notified here.

Superintelligence 12: Malignant failure modes

7 KatjaGrace 02 December 2014 02:02AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the twelfth section in the reading guideMalignant failure modes

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: 'Malignant failure modes' from Chapter 8


Summary

  1. Malignant failure mode: a failure that involves human extinction; in contrast with many failure modes where the AI doesn't do much. 
  2. Features of malignant failures 
    1. We don't get a second try 
    2. It supposes we have a great deal of success, i.e. enough to make an unprecedentedly competent agent
  3. Some malignant failures:
    1. Perverse instantiation: the AI does what you ask, but what you ask turns out to be most satisfiable in unforeseen and destructive ways.
      1. Example: you ask the AI to make people smile, and it intervenes on their facial muscles or neurochemicals, instead of via their happiness, and in particular via the bits of the world that usually make them happy.
      2. Possible counterargument: if it's so smart, won't it know what we meant? Answer: Yes, it knows, but it's goal is to make you smile, not to do what you meant when you programmed that goal.
      3. AI which can manipulate its own mind easily is at risk of 'wireheading' - that is, a goal of maximizing a reward signal might be perversely instantiated by just manipulating the signal directly. In general, animals can be motivated to do outside things to achieve internal states, however AI with sufficient access to internal state can do this more easily by manipulating internal state. 
      4. Even if we think a goal looks good, we should fear it has perverse instantiations that we haven't appreciated.
    2. Infrastructure profusion: in pursuit of some goal, an AI redirects most resources to infrastructure, at our expense.
      1. Even apparently self-limiting goals can lead to infrastructure profusion. For instance, to an agent whose only goal is to make ten paperclips, once it has apparently made ten paperclips it is always more valuable to try to become more certain that there are really ten paperclips than it is to just stop doing anything.
      2. Examples: Riemann hypothesis catastrophe, paperclip maximizing AI
    3. Mind crime: AI contains morally relevant computations, and treats them badly
      1. Example: AI simulates humans in its mind, for the purpose of learning about human psychology, then quickly destroys them.
      2. Other reasons for simulating morally relevant creatures:
        1. Blackmail
        2. Creating indexical uncertainty in outside creatures

Another view

In this chapter Bostrom discussed the difficulty he perceives in designing goals that don't lead to indefinite resource acquisition. Steven Pinker recently offered a different perspective on the inevitability of resource acquisition:

...The other problem with AI dystopias is that they project a parochial alpha-male psychology onto the concept of intelligence. Even if we did have superhumanly intelligent robots, why would they want to depose their masters, massacre bystanders, or take over the world? Intelligence is the ability to deploy novel means to attain a goal, but the goals are extraneous to the intelligence itself: being smart is not the same as wanting something. History does turn up the occasional megalomaniacal despot or psychopathic serial killer, but these are products of a history of natural selection shaping testosterone-sensitive circuits in a certain species of primate, not an inevitable feature of intelligent systems. It’s telling that many of our techno-prophets can’t entertain the possibility that artificial intelligence will naturally develop along female lines: fully capable of solving problems, but with no burning desire to annihilate innocents or dominate the civilization.

Of course we can imagine an evil genius who deliberately designed, built, and released a battalion of robots to sow mass destruction.  But we should keep in mind the chain of probabilities that would have to multiply out before it would be a reality. A Dr. Evil would have to arise with the combination of a thirst for pointless mass murder and a genius for technological innovation. He would have to recruit and manage a team of co-conspirators that exercised perfect secrecy, loyalty, and competence. And the operation would have to survive the hazards of detection, betrayal, stings, blunders, and bad luck. In theory it could happen, but I think we have more pressing things to worry about. 

Notes

1. Perverse instantiation is a very old idea. It is what genies are most famous for. King Midas had similar problems. Apparently it was applied to AI by 1947, in With Folded Hands.

2. Adam Elga writes more on simulating people for blackmail and indexical uncertainty.

3. More directions for making AI which don't lead to infrastructure profusion:

  • Some kinds of preferences don't lend themselves to ambitious investments. Anna Salamon talks about risk averse preferences. Short time horizons and goals which are cheap to fulfil should also make long term investments in infrastructure or intelligence augmentation less valuable, compared to direct work on the problem at hand.
  • Oracle and tool AIs are intended to not be goal-directed, but as far as I know it is an open question whether this makes sense. We will get to these later in the book.

5. Often when systems break, or we make errors in them, they don't work at all. Sometimes, they fail more subtly, working well in some sense, but leading us to an undesirable outcome, for instance a malignant failure mode. How can you tell whether a poorly designed AI is likely to just not work, vs. accidentally take over the world? An important consideration for systems in general seems to be the level of abstraction at which the error occurs. We try to build systems so that you can just interact with them at a relatively abstract level, without knowing how the parts work. For instance, you can interact with your GPS by typing places into it, then listening to it, and you don't need to know anything about how it works. If you make an error while up writing your address into the GPS, it will fail by taking you to the wrong place, but it will still direct you there fairly well. If you fail by putting the wires inside the GPS into the wrong places the GPS is more likely to just not work. 

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Are there better ways to specify 'limited' goals? For instance, to ask for ten paperclips without asking for the universe to be devoted to slightly improving the probability of success?
  2. In what circumstances could you be confident that the goals you have given an AI do not permit perverse instantiations? 
  3. Explore possibilities for malignant failure vs. other failures. If we fail, is it actually probable that we will have enough 'success' for our creation to take over the world?
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about capability control methods, section 13. To prepare, read “Two agency problems” and “Capability control methods” from Chapter 9The discussion will go live at 6pm Pacific time next Monday December 8. Sign up to be notified here.

Integral versus differential ethics

7 Stuart_Armstrong 01 December 2014 06:04PM

In population ethics...

Most people start out believing that the following are true:

  1. That adding more happy lives is a net positive.
  2. That redistributing happiness more fairly is not a net negative.
  3. That the repugnant conclusion is indeed repugnant.

Some will baulk on the first statement on equality grounds, but most people should accept those three statements as presented. Then they find out about the mere addition paradox.

Someone who then accepts the repugnant could then reason something like this:

Adding happy people and redistributing fairly happiness, if done many, many times, in the way described above, will result in a repugnant conclusion. Each step along the way seems solid, but the conclusion seems wrong. Therefore I will accept the repugnant conclusion, not on its own merits, but because each step is clearly intuitively correct.

Call this the "differential" (or local) way or reasoning about population ethics. As long as each small change seems intuitively an improvement, then the global change must also be.

Adding happy people and redistributing fairly happiness, if done many, many times, in the way described above, will result in a repugnant conclusion. Each step along the way seems solid, but the conclusion seems wrong. Therefore I will reject (at least) one step, not on its own merits, but because the conclusion is clearly intuitively incorrect.

Call this the "integral" (or global) way of reasoning about population ethics. As long as the overall change seems intuitively a deterioration, then some of the small changes along the way must also be.

 

In general...

Now, I personally tend towards integral rather than differential reasoning on this particular topic. However, I want to make a more general point: philosophy may be over dedicated to differential reasoning. Mainly because it's easy: you can take things apart, simplify them, abstract details away, and appeal to simple principles - and avoid many potential biases along the way.

But it's also a very destructive tool to use in areas where concepts are unclear and cannot easily be made clear. Take the statement "human life is valuable". This can be taken apart quite easily, critiqued from all directions, its lack of easily described meaning its weakness. Nevertheless, integral reasoning is almost always applied: something called "human life" is taken to be "valuable", and many caveats and subdefinitions can be added to these terms without changing the fundamental (integral) acceptance of the statement. If we followed the differential approach, we might end up with the definition of "human life" as "energy exchange across a neurone cell membrane" or something equally ridiculous but much more rigorous.

Now, that example is a parody... but only because no-one sensible does that, we know that we'd lose too much value from that kind of definition. We want to build an extensive/integral definition of life, using our analysis to add clarity rather than simplify to a few core underlying concepts. But in population ethics and many other cases, we do feel free to use differential ethics, replacing vague overarching concepts with clear simplified versions that clearly throw away a lot of the initial concept.

Maybe we do it too much. To pick an example I disagree with (always a good habit), maybe there is such a thing as "society", for instance, not simply the total of individuals and their interactions. You can already use pretty crude consequentialist arguments with "societies" as agents subject to predictable actions and reactions (social science does it all the time), but what if we tried to build a rigorous definition of society as something morally valuable, rather than focusing on individual?

Anyway, we should be aware when, in arguments, we are keeping the broad goal and making the small steps and definitions conform to it, and when we are focusing on the small steps and definitions and following them wherever they lead.

Discussion of AI control over at worldbuilding.stackexchange [LINK]

6 ike 14 December 2014 02:59AM

https://worldbuilding.stackexchange.com/questions/6340/the-challenge-of-controlling-a-powerful-ai

Go insert some rationality into the discussion! (There are actually some pretty good comments in there, and some links to the right places, including LW).

Open thread, Dec. 8 - Dec. 15, 2014

6 Gondolinian 08 December 2014 12:06AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Previous OT

Next OT


Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.


If you have any comments about the Open Thread posts themselves or this post specifically, please post them as a reply to the [META] comment.  Aside from that, this thread is as organized as you collectively wish to make it.

[link] On the abundance of extraterrestrial life after the Kepler mission

6 Gunnar_Zarncke 05 December 2014 09:02PM

On the abundance of extraterrestrial life after the Kepler mission Amri Wandel

Some recent calculation of the Drake Equation with estimates of the likelihood and logevitiy of civilizations

Related: The Great Filter and Planets in the habitable zone, the Drake Equation, and the Great Filter


View more: Next