Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Today, the Machine Intelligence Research Institute is launching a new forum for research discussion: the Intelligent Agent Foundations Forum! It's already been seeded with a bunch of new work on MIRI topics from the last few months.
We've covered most of the (what, why, how) subjects on the forum's new welcome post and the How to Contribute page, but this post is an easy place to comment if you have further questions (or if, maths forbid, there are technical issues with the forum instead of on it).
But before that, go ahead and check it out!
(Major thanks to Benja Fallenstein, Alice Monday, and Elliott Jin for their work on the forum code, and to all the contributors so far!)
EDIT 3/22: Jessica Taylor, Benja Fallenstein, and I wrote forum digest posts summarizing and linking to recent work (on the IAFF and elsewhere) on reflective oracle machines, on corrigibility, utility indifference, and related control ideas, and on updateless decision theory and the logic of provability, respectively! These are pretty excellent resources for reading up on those topics, in my biased opinion.
At a societal level, how much money should we put into medical research, or into fusion research? For individual donors seeking out the best opportunities, how can we compare the expected cost-effectiveness of research projects with more direct interventions?
Over the past few months I've been researching this area for the Global Priorities Project. We've written a variety of articles which focus on different parts of the question. Estimating the cost-effectiveness of research is the central example here, but a lot of the methodology is also applicable to other one-off projects with unknown difficulty (perhaps including political lobbying). I don't think it's all solved, but I do think we've made substantial progress.
I think people here might be interested, so I wanted to share our work. To help you navigate and find the most appropriate pieces, here I collect them, summarise what's contained in each, and explain how they fit together.
- I gave an overview of my thinking at the Good Done Right conference, held in Oxford in July 2014. The slides and audio of my talk are available; I have developed more sophisticated models for some parts of the area since then.
- How to treat problems of unknown difficulty introduces the problem: we need to make decisions about when to work more on problems such as research into fusion where we don't know how difficult it will be. It builds some models which allow principled reasoning about how we should act. These models are quite crude but easy to work with: they are intended to lower the bar for Fermi estimates and similar, and provide a starting point for building more sophisticated models.
- Estimating cost-effectiveness for problems of unknown difficulty picks up from the models in the above post, and asks what they mean for the expected cost-effectiveness of work on the problems. This involves building a model of the counterfactual impact, as solvable research problems are likely to be solved eventually, so the main effect is to move the solution forwards. This post includes several explicit formulae that you can use to produce estimates; it also explains analogies between the explicit model we derive and the qualitative 'three factor' model that GiveWell and 80,000 Hours have used for cause selection.
- Estimating the cost-effectiveness of research into neglected diseases is an investigation by Max Dalton, which uses the techniques for estimating cost-effectiveness to provide ballpark figures for how valuable we should expect research into vaccines or treatments for neglected diseases to be. The estimates suggest that, if carefully targeted, such research could be more cost-effective than the best direct health interventions currently available for funding.
- The law of logarithmic returns discusses the question of returns to resources into a field rather than on a single question. With some examples, it suggests that as a first approximation it is often reasonable to assume that diminishing marginal returns take a logarithmic form.
- Theory behind logarithmic returns explains how some simple generating mechanisms can produce roughly logarithmic returns. This is a complement to the above article: we think having both empirical and theoretical justification for the rule helps us to have higher confidence in it, and to better understand when it's appropriate to generalise to new contexts. In this piece I also highlight areas for further research on the theoretical side, into when the approximation will break down, and what we might want to use instead in these cases.
- How valuable is medical research? written with Giving What We Can, applies the logarithmic returns model together with counterfactual reasoning to produce an estimate for the cost-effectiveness of medical research as a whole.
Separate silos of expertise
I've been doing a lot of work on expertise recently, on the issue of measuring it and assessing it. The academic research out there is fascinating, though rather messy. Like many areas in the social sciences, it often suffers from small samples and overgeneralising from narrow examples. More disturbingly, the research projects seems to be grouped into various "silos" that don't communicate much with each other, each silo continuing on their own pet projects.
The main four silos I've identified are:
There may be more silos than this - many people working in expertise studies haven't heard of all of these (for instance, I was ignorant of Cooke's research until it was pointed out to me by someone who hadn't heard of Shanteau or Klein). The division into silos isn't perfect; Shanteau, for instance, has addressed the biases literature at least once (Shanteau, James. "Decision making by experts: The GNAHM effect." Decision Science and Technology. Springer US, 1999. 105-130), Kahneman and Klein have authored a paper together (Kahneman, Daniel, and Gary Klein. "Conditions for intuitive expertise: a failure to disagree." American Psychologist 64.6 (2009): 515). But in general the mutual ignoring (or mutual ignorance) seems pretty strong between the silos.
There is a lot of bad science and controversy in the realm of how to have a healthy lifestyle. Every week we are bombarded with new studies conflicting older studies telling us X is good or Y is bad. Eventually we reach our psychological limit, throw up our hands, and give up. I used to do this a lot. I knew exercise was good, I knew flossing was good, and I wanted to eat better. But I never acted on any of that knowledge. I would feel guilty when I thought about this stuff and go back to what I was doing. Unsurprisingly, this didn't really cause me to make any positive lifestyle changes.
Instead of vaguely guilt-tripping you with potentially unreliable science news, this post aims to provide an overview of lifestyle interventions that have very strong evidence behind them and concrete ways to implement them.
I argued in this post that the differences in capability between different researchers are vast (Kaj Sotala provided me with some interesting empirical evidence that backs up this claim). Einstein's contributions to physics or John von Neumann's contributions to mathematics (and a number of other disciplines) are arguably at least hundreds of times greater than that of an average physicist or mathematician.
At the same time, Yudkowsky argues that "in the space of brain designs" the difference between the village idiot and Einstein is tiny. Their brains are extremely similar, with the exception of some "minor genetic tweaks". Hence we get the following picture:
Many thanks to all those whose conversations have contributed to forming these ideas.
Will the singleton save us?
For most of the large existential risks that we deal with here, the situation would be improved with a single world government (a singleton), or at least greater global coordination. The risk of nuclear war would fade, pandemics would be met with a comprehensive global strategy rather than a mess of national priorities. Workable regulations for the technology risks - such as synthetic biology or AI – become at least conceivable. All in all, a great improvement in safety...
...with one important exception. A stable tyrannical one-world government, empowered by future mass surveillance, is itself an existential risk (it might not destroy humanity, but it would “permanently and drastically curtail its potential”). So to decide whether to oppose or advocate for more global coordination, we need to see how likely such a despotic government could be.
This is the kind of research I would love to do if I had the time to develop the relevant domain skills. In the meantime, I’ll just take all my thoughts on the subject and form them into a “proto-research project plan”, in the hopes that someone could make use of them in a real research project. Please contact me if you would want to do research on this, and would fancy a chat.
Before we can talk about the likelihood of a good outcome, we need to define what a good outcome actually is. For this analysis, I will take the definition that:
- A singleton regime is acceptable, if it is at least as good as any developed democratic government of today.
Hundreds of Less Wrong posts summarize or repackage work previously published in professional books and journals, but Less Wrong also hosts lots of original research in philosophy, decision theory, mathematical logic, and other fields. This post serves as a curated index of Less Wrong posts containing significant original research.
Obviously, there is much fuzziness about what counts as "significant" or "original." I'll be making lots of subjective judgment calls about which suggestions to add to this post. One clear rule is: I won't be linking anything that merely summarizes previous work (e.g. Stuart's summary of his earlier work on utility indifference).
Update 09/20/2013: Added Notes on logical priors from the MIRI workshop, Cooperating with agents with different ideas of fairness, while resisting exploitation, Do Earths with slower economic growth have a better chance at FAI?
Update 11/03/2013: Added Bayesian probability as an approximate theory of uncertainty?, On the importance of taking limits: Infinite Spheres of Utility, Of all the SIA-doomsdays in the all the worlds...
Update 01/22/2014: Added Change the labels, undo infinitely good, Reduced impact AI: no back channels, International cooperation vs. AI arms race, Naturalistic trust among AIs: The parable of the thesis advisor’s theorem
Highly Advanced Epistemology 101 for Beginners. Eliezer's bottom-up guide to truth, reference, meaningfulness, and epistemology. Includes practical applications and puzzling meditations.
Counterfactual resiliency test for non-causal models. Stuart Armstrong suggests testing non-causal models for "counterfactual resiliency."
Thoughts and problems with Eliezer's measure of optimization power. Stuart Armstrong examines some potential problems with Eliezer's concept of optimization power.
Free will. Eliezer's particular compatibilist-style solution to the free will problem from reductionist viewpoint.
The absolute Self-Selection Assumption. A clarification on anthropic reasoning, focused on Wei Dai s UDASSA framework.
SIA, conditional probability, and Jaan Tallinn s simulation tree. Stuart Armstrong makes the bridge between Nick Bostrom s Self-Indication Assumption (SIA) and Jann Tallinn s of superintelligence reproduction.
Mathematical Measures of Optimization Power. Alex Altair tackles one approach to mathematically formalizing Yudkowsky s Optimization Power concept.
Caught in the glare of two anthropic shadows. Stuart_Armstrong provides a detailed analysis of the "anthrophic shadow" concept and its implications.
Bayesian probability as an approximate theory of uncertainty?. Vladimir Slepnev argues that Bayesian probability is an imperfect approximation of what we want from a theory of uncertainty.
Of all the SIA-doomsdays in the all the worlds.... Stuart_Armstrong on the doomsday argument, the self-sampling assumption and the self-indication assumption.
Several weeks ago, the NYC Rationality Meetup Group began discussing outreach, both for rationality in general and the group in particular. A lot of interesting problems were brought up. Should we be targeting the average person, or sticking to the cluster of personality-types that Less Wrong already attracts? How quickly should we introduce people to our community? What are the most effective ways to spread the idea of rationality, and what are the most effective ways of actually encouraging people to undertake rational actions?
Those are all complex questions with complex answers, which are beyond the scope of this post. I ended up focusing on the question: "Is ' Rationality' the word we want to use when we're pitching ourselves?" I do not think it's worthwhile to try and change the central meme of the Less Wrong community, but it's not obvious that the new, realspace communities forming need to use the same central meme.
This begat a simpler question: "What does the average person think of when they hear the word ' Rationality?' What positive or negative connotations does it have?" Do they think of straw vulcans and robots? Do they think of effective programmers or businessmen? Armed with this knowledge, we can craft a rationalist pitch that is likely to be effective at the average person, either by challenging their conception of rationality or by bypassing keywords that might set off memetic immune systems.
I've hated group projects since about Grade 3. I grew up assuming that at some point, working in groups would stop being a series of trials and tribulations, and turn into normal, sane people working just like they would normally, except on a shared problem. Either they would change, or I would change, because I am incredibly bad at teamwork, at least the kind of it that gets doled out in the classroom. I don’t have the requisite people skills to lead a group, but I’m too much of a control freak to meekly follow along when the group wants to do a B+ project and I would like an A+. Drama inevitably ensues.
I would like to not have this problem. An inability to work in teams seems like a serious handicap. There are very few jobs that don’t involve teamwork, and my choice of future career, nursing, involves a lot.
My first experiences in the workplace, as a lifeguard, made me feel a little better about this. There was a lot less drama and a lot more just getting the work done. I think it has a lot to do with a) the fact that we’re paid to do a job that’s generally pretty easy, and b) the requirements of that job are pretty simple, if not easy. There is drama, but it rarely involves guard rotations or who’s going to hose the deck, and I can safely ignore it. Rescues do involve teamwork, but it’s a specific sort of teamwork where the roles are all laid out in advance, and that’s what we spent most of our training learning. Example: in a three-guard scenario, the guard to notice an unconscious swimmer in the water becomes guard #1: they make a whistle signal to the others and jump in, while guard #2 calls 911 and guard #3 clears the pool and does crowd control. There isn’t a lot of room for drama, and there isn’t much point because there is one right way to do things, everyone knows the right way to do things, and there isn’t time to fight about it anyway.
I’m hoping that working as a nurse in a hospital will be more this and less like the school-project variety of group work. The roles are defined and laid out; they’re what we’re learning right now in our theory classes. There’s less of a time crunch, but there’s still, usually, an obviously right way to do things. Maybe it gets more complicated when you have to approach a colleague for, say, not following the hand-hygiene rules, or when the rules the hospital management enforces are obviously not the best way to do things, but those are add-ons to the job, not its backbone.
But that’s for bedside nursing. Research is a different matter, and unfortunately, it’s a lot more like school. I’m taking a class about research right now, and something like 30% or 40% of our mark is on a group project. We have to design a study from beginning to end: problem, hypothesis, type of research, research proposal, population and sample, methods of measurement, methods of analysis, etc. My excuse that “I dislike this because it has absolutely no real-world relevance” is downright wrong, because we’re doing exactly what real researchers would do, only with much less resources and time, and I do like research and would like to work in that milieu someday.
Conflict with my group-members usually comes because I’m more invested in the outcome than the others. I have more motivation to spend time on it, and a higher standard for "good enough". Even if I think the assignment is stupid, I want to do it properly, partly for grades and partly because I hate not doing things properly. I don’t want to lead the group, because I know I’m terrible at it, but no one else wants to either because they don’t care either way. I end up feeling like a slave driver who isn’t very good at her job.
This time I had a new sort of problem. A group asked me to join them because they thought I was smart and would be a good worker. They set a personal deadline to have the project finished nearly a month before it was due. They had a group meeting, which I couldn’t go to because I was at work, and assigned sections, and sent out an email with an outline. I skimmed the email and put it aside for later, since it seemed less than urgent to me. ...And all of a sudden, at our next meeting, the project was nearly finished. No one had hounded me; they had just gone ahead and done it. Maybe they had a schema in their heads that hounding the non-productive members of the team would lead to drama, but I was offended, because I felt that in my case it wouldn’t have. I would have overridden my policy of doing my work at the last minute, and just gotten it done. It’s not like I didn’t care about our final grade.
My pride was hurt (the way my classmate told me was by looking at my computer screen in the library, where I’d started to do the part assigned to me in the outline, and saying “you might as well not save that, I already did it.”) I didn’t feel like fighting about it, so I emailed the prof and asked if I could do the project on my own instead of with a team. She seemed confused that I wanted to do extra work, but assented.
I didn’t want to do extra work. I wanted to avoid the work of team meetings, team discussions, team drama... But that’s not how real-world research works. Refusing to play their game means I lose an opportunity to improve my teamwork skills, and I’m going to need those someday, and not just the skills acquired through lifeguarding. Either I need to turn off my control-freak need to have things my way, or I need to become charismatic and good at leading groups, and to do either of those things, I need a venue to practice.
Does anyone else here have the same problem I do? Has anyone solved it? Does anyone have tips for ways to improve?
Edit: reply to comment by jwendy, concerning my 'other' kind of problem.
"I probably didn't say enough about it in the article, if you thought it seemed glossed over, but I thought a lot about why this happened at the time, and I was pretty upset (more than I should have been, really, over a school project) and that's why I left the group...because unlike type#2 team members, I actually cared a lotabout making a fair contribution and felt like shit when I hadn't. I never consciously decided to procrastinate, either...I just had a lot of other things on my plate, which is pretty much inevitable during the school year, and all of a sudden, foom!, my part of the project is done because one of the girls was bored on the weekend and had nothing better to do. (Huh? When does this ever happen?)
So I guess I'm like a team #2 member in that I procrastinate when I can get away with it, but like a team#1 member in that I do want to turn in quality work and get an A+. And I want it to my my quality work, not someone else's with my name on it."
I think it was justified to be surprised when the new kind of problem happened to me. If I'm more involved/engaged than all the students I've worked with in the past, that doesn't mean I'm the most engaged, but it does mean I have a schema in my brain for 'no one has their work finished until a week after they say they will'.
View more: Next