Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Benja, Eliezer, and I have published a new technical report, in collaboration with Stuart Armstrong of the Future of Humanity institute. This paper introduces Corrigibility, a subfield of Friendly AI research. The abstract is reproduced below:
As artificially intelligent systems grow in intelligence and capability, some of their available options may allow them to resist intervention by their programmers. We call an AI system "corrigible" if it cooperates with what its creators regard as a corrective intervention, despite default incentives for rational agents to resist attempts to shut them down or modify their preferences. We introduce the notion of corrigibility and analyze utility functions that attempt to make an agent shut down safely if a shutdown button is pressed, while avoiding incentives to prevent the button from being pressed or cause the button to be pressed, and while ensuring propagation of the shutdown behavior as it creates new subsystems or self-modifies. While some proposals are interesting, none have yet been demonstrated to satisfy all of our intuitive desiderata, leaving this simple problem in corrigibility wide-open.
I've got into audiobooks lately and have been enjoying listening to David Fitzgerald's Nailed! and his Heretics Guide to mormonism, along with Greta Christina's "Why Are You Atheists So Angry?" and Laura Bates's "Everyday Sexism" which were all very good. I was wondering what other illuminating and engaging books might be recommended, ideally ones available as audiobooks on audible.
I've already read The Selfish Gene, The God Delusion and God Is Not Great in book form as well, so it might be time for something not specifically religion-related, unless it has some interesting new angle.
After Nailed and Everyday Sexism were really illuminating I'm now thinking there must be lots of other must-read books out there and wondered what people here might recommend. Any suggestions would be appreciated.
Thanks for your time.
Many designs for creating AGIs (such as Open-Cog) rely on the AGI deducing moral values as it develops. This is a form of value loading (or value learning), in which the AGI updates its values through various methods, generally including feedback from trusted human sources. This is very analogous to how human infants (approximately) integrate the values of their society.
The great challenge of this approach is that it relies upon an AGI which already has an interim system of values, being able and willing to correctly update this system. Generally speaking, humans are unwilling to easily update their values, and we would want our AGIs to be similar: values that are too unstable aren't values at all.
So the aim is to clearly separate the conditions under which values should be kept stable by the AGI, and conditions when they should be allowed to vary. This will generally be done by specifying criteria for the variation ("only when talking with Mr and Mrs Programmer"). But, as always with AGIs, unless we program those criteria perfectly (hint: we won't) the AGI will be motivated to interpret them differently from how we would expect. It will, as a natural consequence of its program, attempt to manipulate the value updating rules according to its current values.
How could it do that? A very powerful AGI could do the time honoured "take control of your reward channel", by either threatening humans to give it the moral answer it wants, or replacing humans with "humans" (constructs that pass the programmed requirements of being human, according to the AGI's programming, but aren't actually human in practice) willing to give it these answers. A weaker AGI could instead use social manipulation and leading questioning to achieve the morality it desires. Even more subtly, it could tweak its internal architecture and updating process so that it updates values in its preferred direction (even something as simple as choosing the order in which to process evidence). This will be hard to detect, as a smart AGI might have a much clearer impression of how its updating process will play out in practice than it programmers would.
The problems with value loading have been cast into the various "Cake or Death" problems. We have some idea what criteria we need for safe value loading, but as yet we have no candidates for such a system. This post will attempt to construct one.
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the third section in the reading guide, AI & Whole Brain Emulation. This is about two possible routes to the development of superintelligence: the route of developing intelligent algorithms by hand, and the route of replicating a human brain in great detail.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: “Artificial intelligence” and “Whole brain emulation” from Chapter 2 (p22-36)
- Superintelligence is defined as 'any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest'
- There are several plausible routes to the arrival of a superintelligence: artificial intelligence, whole brain emulation, biological cognition, brain-computer interfaces, and networks and organizations.
- Multiple possible paths to superintelligence makes it more likely that we will get there somehow.
- A human-level artificial intelligence would probably have learning, uncertainty, and concept formation as central features.
- Evolution produced human-level intelligence. This means it is possible, but it is unclear how much it says about the effort required.
- Humans could perhaps develop human-level artificial intelligence by just replicating a similar evolutionary process virtually. This appears at after a quick calculation to be too expensive to be feasible for a century, however it might be made more efficient.
- Human-level AI might be developed by copying the human brain to various degrees. If the copying is very close, the resulting agent would be a 'whole brain emulation', which we'll discuss shortly. If the copying is only of a few key insights about brains, the resulting AI might be very unlike humans.
- AI might iteratively improve itself from a meagre beginning. We'll examine this idea later. Some definitions for discussing this:
- 'Seed AI': a modest AI which can bootstrap into an impressive AI by improving its own architecture.
- 'Recursive self-improvement': the envisaged process of AI (perhaps a seed AI) iteratively improving itself.
- 'Intelligence explosion': a hypothesized event in which an AI rapidly improves from 'relatively modest' to superhuman level (usually imagined to be as a result of recursive self-improvement).
- The possibility of an intelligence explosion suggests we might have modest AI, then suddenly and surprisingly have super-human AI.
- An AI mind might generally be very different from a human mind.
Whole brain emulation
- Whole brain emulation (WBE or 'uploading') involves scanning a human brain in a lot of detail, then making a computer model of the relevant structures in the brain.
- Three steps are needed for uploading: sufficiently detailed scanning, ability to process the scans into a model of the brain, and enough hardware to run the model. These correspond to three required technologies: scanning, translation (or interpreting images into models), and simulation (or hardware). These technologies appear attainable through incremental progress, by very roughly mid-century.
- This process might produce something much like the original person, in terms of mental characteristics. However the copies could also have lower fidelity. For instance, they might be humanlike instead of copies of specific humans, or they may only be humanlike in being able to do some tasks humans do, while being alien in other regards.
- What routes to human-level AI do people think are most likely?
Bostrom and Müller's survey asked participants to compare various methods for producing synthetic and biologically inspired AI. They asked, 'in your opinion, what are the research approaches that might contribute the most to the development of such HLMI?” Selection was from a list, more than one selection possible. They report that the responses were very similar for the different groups surveyed, except that whole brain emulation got 0% in the TOP100 group (100 most cited authors in AI) but 46% in the AGI group (participants at Artificial General Intelligence conferences). Note that they are only asking about synthetic AI and brain emulations, not the other paths to superintelligence we will discuss next week.
- How different might AI minds be?
Omohundro suggests advanced AIs will tend to have important instrumental goals in common, such as the desire to accumulate resources and the desire to not be killed.
‘We must avoid the error of inferring, from the fact that intelligent life evolved on Earth, that the evolutionary processes involved had a reasonably high prior probability of producing intelligence’ (p27)
Whether such inferences are valid is a topic of contention. For a book-length overview of the question, see Bostrom’s Anthropic Bias. I’ve written shorter (Ch 2) and even shorter summaries, which links to other relevant material. The Doomsday Argument and Sleeping Beauty Problem are closely related.
- More detail on the brain emulation scheme
Whole Brain Emulation: A Roadmap is an extensive source on this, written in 2008. If that's a bit too much detail, Anders Sandberg (an author of the Roadmap) summarises in an entertaining (and much shorter) talk. More recently, Anders tried to predict when whole brain emulation would be feasible with a statistical model. Randal Koene and Ken Hayworth both recently spoke to Luke Muehlhauser about the Roadmap and what research projects would help with brain emulation now.
Levels of detail
As you may predict, the feasibility of brain emulation is not universally agreed upon. One contentious point is the degree of detail needed to emulate a human brain. For instance, you might just need the connections between neurons and some basic neuron models, or you might need to model the states of different membranes, or the concentrations of neurotransmitters. The Whole Brain Emulation Roadmap lists some possible levels of detail in figure 2 (the yellow ones were considered most plausible). Physicist Richard Jones argues that simulation of the molecular level would be needed, and that the project is infeasible.
Other problems with whole brain emulation
Sandberg considers many potential impediments here.
Order matters for brain emulation technologies (scanning, hardware, and modeling)
Bostrom points out that this order matters for how much warning we receive that brain emulations are about to arrive (p35). Order might also matter a lot to the social implications of brain emulations. Robin Hanson discusses this briefly here, and in this talk (starting at 30:50) and this paper discusses the issue.
What would happen after brain emulations were developed?
We will look more at this in Chapter 11 (weeks 17-19) as well as perhaps earlier, including what a brain emulation society might look like, how brain emulations might lead to superintelligence, and whether any of this is good.
‘With a scanning tunneling microscope it is possible to ‘see’ individual atoms, which is a far higher resolution than needed...microscopy technology would need not just sufficient resolution but also sufficient throughput.’
Here are some atoms, neurons, and neuronal activity in a living larval zebrafish, and videos of various neural events.
Array tomography of mouse somatosensory cortex from Smithlab.
A molecule made from eight cesium and eight
iodine atoms (from here).
Efforts to map connections between neurons
Here is a 5m video about recent efforts, with many nice pictures. If you enjoy coloring in, you can take part in a gamified project to help map the brain's neural connections! Or you can just look at the pictures they made.
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some taken from Luke Muehlhauser's list:
- Produce a better - or merely somewhat independent - estimate of how much computing power it would take to rerun evolution artificially. (p25-6)
- Conduct a more thorough investigation into the approaches to AI that are likely to lead to human-level intelligence, for instance by interviewing AI researchers in more depth about their opinions on the question.
- Measure relevant progress in neuroscience, so that trends can be extrapolated to neuroscience-inspired AI. Finding good metrics seems to be hard here.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about other paths to the development of superintelligence: biological cognition, brain-computer interfaces, and organizations. To prepare, read Biological Cognition and the rest of Chapter 2. The discussion will go live at 6pm Pacific time next Monday 6 October. Sign up to be notified here.
Over the past year, I've noticed a topic where Less Wrong might have a blind spot: public opinion. Since last September I've had (or butted into) five conversations here where someone's written something which made me think, "you wouldn't be saying that if you'd looked up surveys where people were actually asked about this". The following list includes six findings I've brought up in those LW threads. All of the findings come from surveys of public opinion in the United States, though some of the results are so obvious that polls scarcely seem necessary to establish their truth.
- The public's view of the harms and benefits from scientific research has consistently become more pessimistic since the National Science Foundation began its surveys in 1979. (In the wake of repeated misconduct scandals, and controversies like those over vaccination, global warming, fluoridation, animal research, stem cells, and genetic modification, people consider scientists less objective and less trustworthy.)
- Most adults identify as neither Republican nor Democrat. (Although the public is far from apolitical, lots of people are unhappy with how politics currently works, and also recognize that their beliefs align imperfectly with the simplistic left-right axis. This dissuades them from identifying with mainstream parties.)
- Adults under 30 are less likely to believe that abortion should be illegal than the middle-aged. (Younger adults tend to be more socially liberal in general than their parents' generation.)
- In the 1960s, those under 30 were less likely than the middle-aged to think the US made a mistake in sending troops to fight in Vietnam. (The under-30s were more likely to be students and/or highly educated, and more educated people were less likely to think sending troops to Vietnam was a mistake.)
- The Harris Survey asked, in November 1969, "as far as their objectives are concerned, do you sympathize with the goals of the people who are demonstrating, marching, and protesting against the war in Vietnam, or do you disagree with their goals?" Most respondents aged 50+ sympathized with the protesters' goals, whereas only 28% of under-35s did. (Despite the specific wording of the question, the younger respondents worried that the protests reflected badly on their demographic, whereas older respondents were more often glad to see their own dissent voiced.)
- A 2002 survey found that about 90% of adult smokers agreed with the statement, "If you had to do it over again, you would not have started smoking." (While most smokers derive enjoyment from smoking, many weight smoking's negative consequences strongly enough that they'd rather not smoke; they continue smoking because of habit or addiction.)
I remember seeing a talk of the concept of privilege show up in the discussion thread on contrarian views.
Some discussion got started from "Feminism is a good thing. Privilege is real."
This is an article that presents some of those ideas in a way that might be approachable for LW.
One of the ideas I take out of this is that these issues can be examined as the result of unconscious cognitive bias. IE sexism isn't the result of any conscious thought, but can be the result as a failure mode where we don't rationality correctly in these social situations.
Of course a broad view of these issues exist, and many people have different ways of looking at these issues, but I think it would be good to focus on the case presented in this article rather than your other associations.
There is a lot of mainstream interest in machine ethics now. Here are some links to some popular articles on this topic.
By danah boyd, claiming that 'tech folks' are designing systems that implement an idea of fairness that comes from neoliberal ideology.
danah boyd (who spells her name with no capitalization) runs the Data & Society, a "think/do tank" that aims to study this stuff. They've recently gotten MacArthur Foundation funding for studying the ethical and political impact of intelligent systems.
A few observations:
First, there is no mention of superintelligence or recursively self-modifying anything. These scholars are interested in how, in the near future, the already comparatively powerful machines have moral and political impact on the world.
Second, these groups are quite bad at thinking in a formal or mechanically implementable way about ethics. They mainly seem to recapitulate the same tired tropes that have been resonating through academia for literally decades. On the contrary, mathematical formulation of ethical positions appears to be ya'll's specialty.
Third, however much the one-true-morality may be indeterminate or presently unknowable, progress towards implementable descriptions of various plausible moral positions could at least be incremental steps forward towards an understanding of how to achieve something better. Considering a slow take-off possible future, iterative testing and design of ethical machines with high computational power seems like low-hanging fruit that could only better inform longer-term futurist thought.
Personally, I try to do work in this area and find the lack of serious formal work in this area deeply disappointing. This post is a combination heads up and request to step up your game. It's go time.
UC Berkeley School of Infromation
I have returned from a particularly fruitful Google search, with unexpected results.
My question was simple. I was pretty sure that talking to myself aloud makes me temporarily better at solving problems that need a lot of working memory. It is a thinking tool that I find to be of great value, and that I imagine would be of interest to anyone who'd like to optimize their problem solving. I just wanted to collect some evidence on that, make sure I'm not deluding myself, and possibly learn how to enhance the effect.
This might be just lousy Googling on my part, but the evidence is surprisingly unclear and disorganized. There are at least three seperate Wiki pages for it. They don't link to each other. Instead they present the distinct models of three seperate fields: autocommunication in communication studies, semiotics and other cultural studies, intrapersonal communication ("self-talk" redirects here) in anthropology and (older) psychology and private speech in developmental psychology. The first is useless for my purpose, the second mentions "may increase concentration and retention" with no source, the third confirms my suspicion that this behavior boosts memory, motivation and creativity, but it only talks about children.
Google Scholar yields lots of sports-related results for "self-talk" because it can apparently improve the performance of athletes and if there's something that obviously needs the optimization power of psychology departments, it is competitive sports. For "intrapersonal communication" it has papers indicating it helps in language acquisition and in dealing with social anxiety. Both are dwarfed by the results for "private speech", which again focus on children. There's very little on "autocommunication" and what is there has nothing to do with the functioning of individual minds.
So there's a bunch of converging pieces of evidence supporting the usefulness of this behavior, but they're from several seperate fields that don't seem to have noticed each other very much. How often do you find that?
Let me quickly list a few ways that I find it plausible to imagine talking to yourself could enhance rational thought.
- It taps the phonological loop, a distinct part of working memory that might otherwise sit idle in non-auditory tasks. More memory is always better, right?
- Auditory information is retained more easily, so making thoughts auditory helps remember them later.
- It lets you commit to thoughts, and build upon them, in a way that is more powerful (and slower) than unspoken thought while less powerful (but quicker) than action. (I don't have a good online source for this one, but Inside Jokes should convince you, and has lots of new cognitive science to boot.)
- System 1 does seem to understand language, especially if it does not use complex grammar - so this might be a useful way for results of System 2 reasoning to be propagated. Compare affirmations. Anecdotally, whenever I'm starting a complex task, I find stating my intent out loud makes a huge difference in how well the various submodules of my mind cooperate.
- It lets separate parts of your mind communicate in a fairly natural fashion, slows each of them down to the speed of your tongue and makes them not interrupt each other so much. (This is being used as a psychotherapy method.) In effect, your mouth becomes a kind of talking stick in their discussion.
All told, if you're talking to yourself you should be more able to solve complex problems than somebody of your IQ who doesn't, although somebody of your IQ with a pen and a piece of paper should still outthink both of you.
Given all that, I'm surprised this doesn't appear to have been discussed on LessWrong. Honesty: Beyond Internal Truth comes close but goes past it. Again, this might be me failing to use a search engine, but I think this is worth more of our attention that it has gotten so far.
I'm now almost certain talking to myself is useful, and I already find hindsight bias trying to convince me I've always been so sure. But I wasn't - I was suspicious because talking to yourself is an early warning sign of schizophrenia, and is frequent in dementia. But in those cases, it might simply be an autoregulatory response to failing working memory, not a pathogenetic element. After all, its memory enhancing effect is what the developmental psychologists say the kids use it for. I do expect social stigma, which is why I avoid talking to myself when around uninvolved or unsympathetic people, but my solving of complex problems tends to happen away from those anyway so that hasn't been an issue really.
So, what do you think? Useful?
It's unlikely that by pure chance we are currently writing the correct number of LW posts. So it might be useful to try to figure out if we're currently writing too few or too many LW posts. If commenters are evenly divided on this question then we're probably close to the optimal number; otherwise we have an opportunity to improve. Here's my case for why we should be writing more posts.
Let's say you came up with a new and useful life hack, you have a novel line of argument on an important topic, or you stumbled across some academic research that seems valuable and isn't frequently discussed on Less Wrong. How valuable would it be for you to share your findings by writing up at post for Less Wrong?
Recently I visited a friend of mine and commented on the extremely bright lights he had in his room. He referenced this LW post written over a year ago. That got me thinking. The bright lights in my friend's room make his life better every day, for a small upfront cost. And my friend is probably just one of tens or hundreds of people to use bright lights this way as a result of that post. Given that the technique seems to be effective, that number will probably continue going up, and will grow exponentially via word of mouth (useful memes tend to spread). So by my reckoning, chaosmage has created and will create a lot of utility. If they had kept that idea to themselves, I suspect they would have captured less than 1% of the total value to be had from the idea.
You can reach orders of magnitude more people writing an obscure Less Wrong comment than you can talking to a few people at a party in person. For example, at least 100 logged in users read this fairly obscure comment of mine. So if you're going to discuss an important topic, it's often best to do it online. Given enough eyeballs, all bugs in human reasoning are shallow.
Yes, peoples' time does have opportunity costs. But people are on Less Wrong because they need a break anyway. (If you're a LW addict, you might try the technique I describe in this post for dealing with your addiction. If you're dealing with serious cravings, for LW or video games or drugs or anything else, perhaps look at N-acetylcysteine... a variety of studies suggest it helps reduce cravings (behavioral addictions are pretty similar to drug addictions neurologically btw), it has a good safety profile, and you can buy it on Amazon. Not prescribed by doctors because it's not approved by the FDA. Yes, you could use willpower (it's worked so well in the past...) or you could hit the "stop craving things as much" button, and then try using willpower. Amazing what you can learn on Less Wrong isn't it?)
And LW does a good job of indexing content by how much utility people are going to get out of it. It's easy to look at a post's keywords and score and guess if it's worth reading. If your post is bad it will vanish in to obscurity and few will be significantly harmed. (Unless it's bad and inflammatory, or bad with a linkbait title... please don't write posts like that.) If your post is good, it will spread virally on its own and you'll generate untold utility.
Given that above-average posts get read much more than below-average posts, if you're post's expected quality is average, sharing it on Less Wrong has a high positive expected utility. Like Paul Graham, I think we should be spreading our net wide and trying to capture all of the winners we can.
I'm going to call out a particular subset of LW commenters in particular. If you're a commenter and you (a) have at least 100 karma, (b) it's over 80% positive, and (c) you have a draft post with valuable new ideas you've been sitting on for a while, you should totally polish it off and share it with us! In general, the better your track record, the more you should be inclined to share ideas that seem valuable. Worst case you can delete your post and cut your losses.
View more: Next