Blind artifacts
This is the second of four short essays that say explicitly some things that I would tell an intrigued proto-rationalist before pointing them towards Rationality: AI to Zombies (and, by extension, most of LessWrong). For most people here, these essays will be very old news, as they talk about the insights that come even before the sequences. However, I've noticed recently that a number of fledgling rationalists haven't actually been exposed to all of these ideas, and there is power in saying the obvious.
This essay is cross-posted on MindingOurWay.
A note on what sort of artifact a brain is:
A brain is a specialty device that, when slammed against its surroundings in a particular way, changes so that its insides reflect its outsides. A brain is a precise, complex machine that continually hits nearby things just so, so that some of its inner bits start to correlate with the outside world.
Consider the photons bouncing off the chair in the room where I write this. In coarse summary, those photons slam into specialized proteins in the membrane of my photoreceptor cells, changing their shape and setting off a chain reaction that activates an enzyme that breaks down certain nucleotides, thereby changing the electrochemical gradient between the inside and the outside of the cell, preventing the release of certain neurotransmitters through its membrane. This lack of neurotransmitters causes nearby cells to undergo similar ionization events, and those cells transmit the signal from a number of nearby photoreceptor cells into the first layer of my retinal cells (again, by the mechanism of proteins changing shape and altering the electrochemical gradient). And that's just the very beginning of a looooong Rube Goldberg machine: the signal then makes its way down the retina (interacting, at each level, with signals from higher levels) until it's passed to the optic nerve, where it's passed to the visual cortex, where the specific pattern of nerve cell ionization events causes a specific pattern of neurons to fire, setting off a cascade of neurons-affecting-other-neurons in a domino effect that results in the inside of my brain containing a tiny summarized representation of a chair.
A brain is a complex piece of machinery that, when immersed in a big soup of photons while connected to light-sensors, undergoes a massive chain reaction that causes the inner parts of the brain to correlate with the things the photons bounced off of.
A brain is a machine that builds up mutual information between its internals and its externals.
Desire is the direction, rationality is the magnitude
What follows is a series of four short essays that say explicitly some things that I would tell an intrigued proto-rationalist before pointing them towards Rationality: AI to Zombies (and, by extension, most of LessWrong). For most people here, these essays will be very old news, as they talk about the insights that come even before the sequences. However, I've noticed recently that a number of fledgling rationalists haven't actually been exposed to all of these ideas, and there is power in saying the obvious.
This essay is cross-posted on MindingOurWay.
A brief note on "rationality:"
It's a common trope that thinking can be divided up into "hot, emotional thinking" and "cold, rational thinking" (with Kirk and Spock being the stereotypical offenders, respectively). The tropes say that the hot decisions are often stupid (and inconsiderate of consequences), while the cold decisions are often smart (but made by the sort of disconnected nerd that wears a lab coat and makes wacky technology). Of course (the trope goes) there are Deep Human Truths available to the hot reasoners that the cold reasoners know not.
Many people, upon encountering one who says they study the art of human rationality, jump to the conclusion that these "rationalists" are people who reject the hot reasoning entirely, attempting to disconnect themselves from their emotions once and for all, in order to avoid the rash mistakes of "hot reasoning." Many think that these aspiring rationalists are attempting some sort of dark ritual to sacrifice emotion once and for all, while failing to notice that the emotions they wish to sacrifice are the very things which give them their humanity. "Love is hot and rash and irrational," they say, "but you sure wouldn't want to sacrifice it." Understandably, many people find the prospect of "becoming more rational" rather uncomfortable.
So heads up: this sort of emotional sacrifice has little to do with the word "rationality" as it is used in Rationality: AI to Zombies.
When Rationality: AI to Zombies talks about "rationality," it's not talking about the "cold" part of hot vs cold reasoning, it's talking about the reasoning part.
One way or another, we humans are reasoning creatures. Sometimes, when time pressure is bearing down on us, we make quick decisions and follow our split-second intuitions. Sometimes, when the stakes are incredibly high and we have time available, we deploy the machinery of logic, in places where we trust it more than our impulses. But in both cases, we are reasoning. Whether our reasoning be hot or cold or otherwise, there are better and worse ways to reason.
(And, trust me, brains have found a whole lot of the bad ones. What do you expect, when you run programs that screwed themselves into existence on computers made of meat?)
The rationality of Rationality: AI to Zombies isn't about using cold logic to choose what to care about. Reasoning well has little to do with what you're reasoning towards. If your goal is to enjoy life to the fullest and love without restraint, then better reasoning (while hot or cold, while rushed or relaxed) will help you do so. But if your goal is to annihilate as many puppies as possible, then this-kind-of-rationality will also help you annihilate more puppies.
(Unfortunately, this usage of the word "rationality" does not match the colloquial usage. I wish we had a better word for the study of how to improve one's reasoning in all its forms that didn't also evoke images of people sacrificing their emotions on the altar of cold logic. But alas, that ship has sailed.)
If you are considering walking the path towards rationality-as-better-reasoning, then please, do not sacrifice your warmth. Your deepest desires are not a burden, but a compass. Rationality of this kind is not about changing where you're going, it's about changing how far you can go.
People often label their deepest desires "irrational." They say things like "I know it's irrational, but I love my partner, and if they were taken from me, I'd move heaven and earth to get them back." To which I say: when I point towards "rationality," I point not towards that which would rob you of your desires, but rather towards that which would make you better able to achieve them.
That is the sort of rationality that I suggest studying, when I recommend reading Rationality: AI to Zombies.
MIRI's technical agenda: an annotated bibliography, and other updates
This week, I'm pleased to present Aligning Superintelligence with Human Interests: An Annotated Bibliography. This annotated bibliography complements our technical agenda and the six supporting papers that I've released over the past few months. Once you've read the supporting papers, this annotated bibliography will help you figure out what to read next on any given topic, in order to get to the cutting edge.
This annotated bibliography concludes my series of updates on MIRI's technical agenda. To review, I've presented a series of eight papers which sketch out MIRI's research strategy and overview a number of active research areas. Those papers are:
- Aligning Superintelligence with Human Interests: A Technical Research Agenda
- Formalizing Two Problems of Realistic World Models
- Toward Idealized Decision Theory
- Questions of Reasoning Under Logical Uncertainty
- Vingean Reflection: Reliable Reasoning for Self-Modifying Agents
- Corrigibility
- The Value Learning Problem
- Aligning Superintelligence with Human Interests: An Annotated Bibliography
The Value Learning Problem
I'm pleased to announce a new paper from MIRI about The Value Learning Problem.
Abstract:
A superintelligent machine would not automatically act as intended: it will act as programmed, but the fit between human intentions and formal specification could be poor. We discuss methods by which a system could be constructed to learn what to value. We highlight open problems specific to inductive value learning (from labeled training data), and raise a number of questions about the construction of systems which model the preferences of their operators and act accordingly.
This is the sixth of six papers supporting the MIRI technical agenda. It motivates the need for value learning, a bit, and gives some early thoughts on how the problem could be approached (while pointing to some early open problems in the field).
I'm pretty excited to have the technical agenda and all its supporting papers published. Next week I'll be posting an annotated bibliography that gives more reading for each subject. The introduction to the value learning paper has been reproduced below.
Formalizing Two Problems of Realistic World Models
I'm pleased to announce a new paper from MIRI: Formalizing Two Problems of Realistic World Models.
Abstract:
An intelligent agent embedded within the real world must reason about an environment which is larger than the agent, and learn how to achieve goals in that environment. We discuss attempts to formalize two problems: one of induction, where an agent must use sensory data to infer a universe which embeds (and computes) the agent, and one of interaction, where an agent must learn to achieve complex goals in the universe. We review related problems formalized by Solomonoff and Hutter, and explore challenges that arise when attempting to formalize analogous problems in a setting where the agent is embedded within the environment.
This is the fifth of six papers discussing active research topics that we've been looking into at MIRI. It discusses a few difficulties that arise when attempting to formalize problems of induction and evaluation in settings where an agent is attempting to learn about (and act upon) a universe from within. These problems have been much discussed on LessWrong; for further reading, see the links below. This paper is intended to better introduce the topic, and motivate it as relevant to FAI research.
- Intelligence Metrics with Naturalized Induction using UDT
- Building Phenomenological Bridges
- Failures of an Embodied AIXI
- The Naturalized Induction wiki page
The (rather short) introduction to the paper is reproduced below.
Vingean Reflection: Reliable Reasoning for Self-Improving Agents
I'm pleased to announce a new paper from MIRI: Vingean Reflection: Reliable Reasoning for Self-Improving Agents.
Abstract:
Today, human-level machine intelligence is in the domain of futurism, but there is every reason to expect that it will be developed eventually. Once artificial agents become able to improve themselves further, they may far surpass human intelligence, making it vitally important to ensure that the result of an "intelligence explosion" is aligned with human interests. In this paper, we discuss one aspect of this challenge: ensuring that the initial agent's reasoning about its future versions is reliable, even if these future versions are far more intelligent than the current reasoner. We refer to reasoning of this sort as Vingean Reflection.
A self-improving agent must reason about the behavior of its smarter successors in abstract terms, since if it could predict their actions in detail, it would already be as smart as them. This is called the Vingean principle, and we argue that theoretical work on Vingean reflection should focus on formal models that reflect this principle. However, the framework of expected utility maximization, commonly used to model rational agents, fails to do so. We review a body of work which instead investigates agents that use formal proofs to reason about their successors. While it is unlikely that real-world agents would base their behavior entirely on formal proofs, this appears to be the best currently available formal model of abstract reasoning, and work in this setting may lead to insights applicable to more realistic approaches to Vingean reflection.
This is the fourth in a series of six papers discussing various components of MIRI's technical research agenda. It motivates the field of Vingean reflection, which studies methods by which agents can reason reliably about agents that are more intelligent than themselves. Toy models used to study this problem in the past include the "tiling agent" models that have been discussed on LessWrong in the past. The introduction to the paper runs as follows:
Questions of Reasoning under Logical Uncertainty
I'm pleased to announce a new paper from MIRI: Questions of Reasoning Under Logical Uncertainty.
Abstract:
A logically uncertain reasoner would be able to reason as if they know both a programming language and a program, without knowing what the program outputs. Most practical reasoning involves some logical uncertainty, but no satisfactory theory of reasoning under logical uncertainty yet exists. A better theory of reasoning under logical uncertainty is needed in order to develop the tools necessary to construct highly reliable artificial reasoners. This paper introduces the topic, discusses a number of historical results, and describes a number of open problems.
Following Corrigibility and Toward Idealized Decision Theory, this is the third in a series of six papers motivating MIRI's technical research agenda. This paper mostly motivates and summarizes the state of the field, and contains one very minor new technical result. Readers looking for more technical meat can find it in Paul Christiano's paper Non-Omniscience, Probabilistic Inference, and Metamathematics, published mid-2014. This paper is instead intended to motivate the study of logical uncertainty as relevant to the design of highly reliable smarter-than-human systems. The introduction runs as follows:
MIRI's technical research agenda
I'm pleased to announce the release of Aligning Superintelligence with Human Interests: A Technical Research Agenda written by Benja and I (with help and input from many, many others). This document summarizes and motivates MIRI's current technical research agenda.
I'm happy to answer questions about this document, but expect slow response times, as I'm travelling for the holidays. The introduction of the paper is included below. (See the paper for references.)
New paper from MIRI: "Toward idealized decision theory"
I'm pleased to announce a new paper from MIRI: Toward Idealized Decision Theory.
Abstract:
This paper motivates the study of decision theory as necessary for aligning smarter-than-human artificial systems with human interests. We discuss the shortcomings of two standard formulations of decision theory, and demonstrate that they cannot be used to describe an idealized decision procedure suitable for approximation by artificial systems. We then explore the notions of strategy selection and logical counterfactuals, two recent insights into decision theory that point the way toward promising paths for future research.
Following the Corrigibility paper, this is the second in a series of six papers motivating MIRI's active research areas. Also included in the series will be a technical agenda, which motivates all six research areas and describes the reasons why we have selected these topics in particular, and an annotated bibliography, which compiles a fair bit of related work. I plan to post one paper every week or two for the next few months.
I've decided to start with the decision theory paper, as it's one of the meatiest. This paper compiles and summarizes quite a bit of work on decision theory that was done right here on LessWrong. There is a lot more to be said on the subject of decision theory than can fit into a single paper, but I think this one does a fairly good job of describing why we're interested in the field and summarizing some recent work in the area. The introduction is copied below. Enjoy!
MIRI Research Guide
We've recently published a guide to MIRI's research on MIRI's website. It overviews some of the major open problems in FAI research, and provides reading lists for those who want to get familiar with MIRI's technical agenda.
This guide updates and replaces the MIRI course list that started me on the path of becoming a MIRI researcher over a year ago. Many thanks to Louie Helm, who wrote the previous version.
This guide is a bit more focused than the old course list, and points you not only towards prerequisite textbooks but also towards a number of relevant papers and technical reports in something approximating the "appropriate order." By following this guide, you can get yourself pretty close to the cutting edge of our technical research (barring some results that we haven't written up yet). If you intend to embark on that quest, you are invited to let me know; I can provide both guidance and encouragement along the way.
I've reproduced the guide below. The canonical version is at intelligence.org/research-guide, and I intend to keep that version up to date. This post will not be kept current.
Finally, a note on content: the guide below discusses a number of FAI research subfields. The goal is to overview, rather than motivate, those subfields. These sketches are not intended to carry any arguments. Rather, they attempt to convey our current conclusions to readers who are already extending us significant charity. We're hard at work producing a number of documents describing why we think these particular subfields are important. (The first was released a few weeks ago, the rest should be published over the next two months.) In the meantime, please understand that the research guide is not able nor intended to provide strong motivation for these particular problems.
Friendly AI theory currently isn't about implementation, it's about figuring out how to ask the right questions. Even if we had unlimited finite computing resources and a solid understanding of general intelligence, we still wouldn't know how to specify a system that would reliably have a positive impact during and after an intelligence explosion. Such is the state of our ignorance.
For now, MIRI's research program aims to develop solutions that assume access to unbounded finite computing power, not because unbounded solutions are feasible, but in the hope that these solutions will help us understand which questions need to be answered in order to the lay the groundwork for the eventual specification of a Friendly AI. Hence, our current research is primarily in mathematics (as opposed to software engineering or machine learning, as many expect).
This guide outlines the topics that one can study to become able to contribute to one or more of MIRI’s active research areas.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)