Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: siIver 09 April 2017 06:02:46PM *  0 points [-]

Can someone briefly explain to me the difference between functional and updateless decision theory / where FDT performs better? That would be much appreciated. I have not yet read FDT because it does not mention UDT (I checked) and I want to understand why UDT needs "fixing" before I invest the time.

Comment author: RobbBB 10 April 2017 06:44:58PM *  3 points [-]

"UDT" is ambiguous and has been used to refer to a number of different approaches; "FDT" is a new name for the central cluster of UDT-ish theories (excluding some similar theories like TDT), intended to be less ambiguous and easier to explain (especially to working decision theorists).

In part it's easier to explain because it's formulated in a more CDT-like fashion (whereas Wei Dai's formulations are more EDT-like), and in part it's easier to explain because it builds in less content: accepting FDT doesn't necessarily require a commitment to some of the philosophical ideas associated with updatelessness and logical prior probability that MIRI, Wei Dai, or other FDT proponents happen to accept. In particular, some of Nate's claims in the linked post are stronger than is strictly required for FDT.

[Link] MIRI: Decisions are for making bad outcomes inconsistent

7 RobbBB 09 April 2017 03:42AM

CHCAI/MIRI research internship in AI safety

5 RobbBB 13 February 2017 06:34PM

The Center for Human-Compatible AI (CHCAI) and the Machine Intelligence Research Institute (MIRI) are looking for talented, driven, and ambitious technical researchers for a summer research internship.



CHCAI is a research center based at UC Berkeley with PIs including Stuart Russell, Pieter Abbeel and Anca Dragan. CHCAI describes its goal as "to develop the conceptual and technical wherewithal to reorient the general thrust of AI research towards provably beneficial systems".

MIRI is an independent research nonprofit located near the UC Berkeley campus with a mission of helping ensure that smarter-than-human AI has a positive impact on the world.

CHCAI's research focus includes work on inverse reinforcement learning and human-robot cooperation (link), while MIRI's focus areas include task AI and computational reflection (link). Both groups are also interested in theories of (bounded) rationality that may help us develop a deeper understanding of general-purpose AI agents.


To apply:

1. Fill in the form here: https://goo.gl/forms/bDe6xbbKwj1tgDbo1

2. Send an email to beth.m.barnes@gmail.com with the subject line "AI safety internship application", attaching your CV, a piece of technical writing on which you were the primary author, and your research proposal.

The research proposal should be one to two pages in length. It should outline a problem you think you can make progress on over the summer, and some approaches to tackling it that you consider promising. We recommend reading over CHCAI's annotated bibliography and the concrete problems agenda as good sources for open problems in AI safety, if you haven't previously done so. You should target your proposal at a specific research agenda or a specific adviser’s interests. Advisers' interests include:

Andrew Critch (CHCAI, MIRI): anything listed in CHCAI's open technical problems; negotiable reinforcement learning; game theory for agents with transparent source code (e.g., "Program Equilibrium" and "Parametric Bounded Löb's Theorem and Robust Cooperation of Bounded Agents").

• Daniel Filan (CHCAI): the contents of "Foundational Problems," "Corrigibility," "Preference Inference," and "Reward Engineering" in CHCAI's open technical problems list.

• Dylan Hadfield-Menell (CHCAI): application of game-theoretic analysis to models of AI safety problems (specifically by people who come from a theoretical economics background); formulating and analyzing AI safety problems as CIRL games; the relationships between AI safety and principal-agent models / theories of incomplete contracting; reliability engineering in machine learning; questions about fairness.

Jessica Taylor, Scott Garrabrant, and Patrick LaVictoire (MIRI): open problems described in MIRI's agent foundations and alignment for advanced ML systems research agendas.

This application does not bind you to work on your submitted proposal. Its purpose is to demonstrate your ability to make concrete suggestions for how to make progress on a given research problem.


Who we're looking for:

This is a new and somewhat experimental program. You’ll need to be self-directed, and you'll need to have enough knowledge to get started tackling the problems. The supervisors can give you guidance on research, but they aren’t going to be teaching you the material. However, if you’re deeply motivated by research, this should be a fantastic experience. Successful applicants will demonstrate examples of technical writing, motivation and aptitude for research, and produce a concrete research proposal. We expect most successful applicants will either:

• have or be pursuing a PhD closely related to AI safety;

• have or be pursuing a PhD in an unrelated field, but currently pivoting to AI safety, with evidence of sufficient knowledge and motivation for AI safety research; or

• be an exceptional undergraduate or masters-level student with concrete evidence of research ability (e.g., publications or projects) in an area closely related to AI safety.



Program dates are flexible, and may vary from individual to individual. However, our assumption is that most people will come for twelve weeks, starting in early June. The program will take place in the San Francisco Bay Area. Basic living expenses will be covered. We can’t guarantee that housing will be all arranged for you, but we can provide assistance in finding housing if needed. Interns who are not US citizens will most likely need to apply for J-1 intern visas. Once you have been accepted to the program, we can help you with the required documentation.



The deadline for applications is the March 1. Applicants should hear back about decisions by March 20.

Comment author: RobbBB 04 January 2017 04:58:54PM *  3 points [-]

The Berkeley Center for Human-Compatible AI doesn't seem to have a specific research agenda beyond Stuart Russell.

Stuart Russell was the primary author of the FLI research priorities document, so I'd expect CHCAI's work to focus in on some of the problems sketched there. Based on CHCAI's publication page, their focus areas will probably include value learning, human-robot cooperation, and theories of bounded rationality. Right now, Russell's group is spending a lot of time on cooperative inverse reinforcement learning and corrigibility.

This slide from a recent talk by Critch seems roughly right to me: https://intelligence.org/wp-content/uploads/2017/01/hotspot-slide.png

So, why isn't there an XPrize for AI safety?

A prize fund is one of the main side-projects MIRI has talked about wanting to do for the last few years, if we could run a sufficiently large one -- more or less for the reasons you mention. Ideally the AI safety community would offer a diversity of prizes representing different views about what kinds of progress we'd be most excited by.

If funds for this materialize at some point, the main challenge will be that the most important conceptual breakthroughs right now involve going from mostly informal ideas to crude initial formalisms. This introduces some subjectivity in deciding whether the formalism really captures the key original idea, and also makes it harder for outside researchers to understand what kinds of work we're looking for. (MIRI's research team's focus is exactly on the parts of the problem that are hardest to design a prize for.) It's easier to come up with benchmarks in areas where there's already been a decent amount of technical progress, which would be quite valuable on its own, though it means potentially neglecting the most important things to work on.

Comment author: Gunnar_Zarncke 14 October 2016 10:56:15PM 2 points [-]

I'm not sure this has the best visibility here in Main. I just noted it right now because I haven't looked in Main for ages. And it wasn't featured in discussions, or was it?

Comment author: RobbBB 24 October 2016 02:36:31AM 2 points [-]

There's a discussion post that mentions the fundraiser here, along with other news: http://lesswrong.com/r/discussion/lw/o0d/miri_ama_plus_updates/

MIRI AMA plus updates

11 RobbBB 11 October 2016 11:52PM

MIRI is running an AMA on the Effective Altruism Forum tomorrow (Wednesday, Oct. 11): Ask MIRI Anything. Questions are welcome in the interim!

Nate also recently posted a more detailed version of our 2016 fundraising pitch to the EA Forum. One of the additions is about our first funding target:

We feel reasonably good about our chance of hitting target 1, but it isn't a sure thing; we'll probably need to see support from new donors in order to hit our target, to offset the fact that a few of our regular donors are giving less than usual this year.

The Why MIRI's Approach? section also touches on new topics that we haven't talked about in much detail in the past, but plan to write up some blog posts about in the future. In particular:

Loosely speaking, we can imagine the space of all smarter-than-human AI systems as an extremely wide and heterogeneous space, in which "alignable AI designs" is a small and narrow target (and "aligned AI designs" smaller and narrower still). I think that the most important thing a marginal alignment researcher can do today is help ensure that the first generally intelligent systems humans design are in the “alignable” region. I think that this is unlikely to happen unless researchers have a fairly principled understanding of how the systems they're developing reason, and how that reasoning connects to the intended objectives.

Most of our work is therefore aimed at seeding the field with ideas that may inspire more AI research in the vicinity of (what we expect to be) alignable AI designs. When the first general reasoning machines are developed, we want the developers to be sampling from a space of designs and techniques that are more understandable and reliable than what’s possible in AI today.

In other news, we've uploaded a new intro talk on our most recent result, "Logical Induction," that goes into more of the technical details than our previous talk.

See also Shtetl-Optimized and n-Category Café for recent discussions of the paper.

In response to comment by timujin on Zombies Redacted
Comment author: UmamiSalami 06 July 2016 08:29:09PM *  -1 points [-]

This argument is not going to win over their heads and hearts. It's clearly written for a reductionist reader, who accepts concepts such as Occam's Razor and knowing-what-a-correct-theory-looks-like.

I would suggest that people who have already studied this issue in depth would have other reasons for rejecting the above blog post. However, you are right that philosophers in general don't use Occam's Razor as a common tool and they don't seem to make assumptions about what a correct theory "looks like."

If conceivability does not imply logical possibility, then even if you can imagine a Zombie world, it does not mean that the Zombie world is logically possible.

Chalmers does not claim that p-zombies are logically possible, he claims that they are metaphysically possible. Chalmers already believes that certain atomic configurations necessarily imply consciousness, by dint of psychophysical laws.

The claim that certain atomic configurations just are consciousness is what the physicalist claims, but that is what is contested by knowledge arguments: we can't really conceive of a way for consciousness to be identical with physical states.

Comment author: RobbBB 07 July 2016 04:13:57AM 0 points [-]

Chalmers doesn't think 'metaphysical possibility' is a well-specified idea. He thinks p-zombies are logically possible, but that the purely physical facts in our world do not logically entail the phenomenal facts; the phenomenal facts are 'further facts.'

In response to Zombies Redacted
Comment author: RobbBB 03 July 2016 08:30:32PM *  16 points [-]

The "conceivability" of zombies is accepted by a substantial fraction, possibly a majority, of academic philosophers of consciousness.

This can be made precise. According to the 2009 PhilPapers Survey (sent to all faculty at the top 89 Ph.D-granting philosophy departments in the English-speaking world as ranked by the Philosophical Gourmet Report, plus 10 high-prestige non-Anglophone departments), about 2/3 of professional philosophers of mind think zombies are conceivable, though most of these think physicalism is true anyway. Specifically, 91 of the 191 respondents (47.6%) said zombies are conceivable but not metaphysically possible; 47 (24.6%) said they were inconceivable; 35 (18.3%) said they're (conceivable and) metaphysically possible; and the other 9.4% were agnostic/undecided or rejected all three options.

Looking at professional philosophers as a whole in the relevant departments, including non-philosophers-of-mind, 35.6% say zombies are conceivable, 16% say they're inconceivable, 23.3% say they're metaphysically possible, 17% say they're undecided or insufficiently familiar with the issue (or they skipped the question), and 8.2% rejected all three options. So the average top-tier Anglophone philosopher of mind is more likely to reject zombies than is the average top-tier Anglophone philosopher. (Relatedly, 22% of philosophers of mind accept or lean toward 'non-physicalism', vs. 27% of philosophers in general.)

There is a stuff of consciousness which is not yet understood, an extraordinary super-physical stuff that visibly affects our world; and this stuff is what makes us talk about consciousness.

Chalmers' core objection to interactionism, I think, is that any particular third-person story you can tell about the causal effects of consciousness could also be told without appealing to consciousness. E.g., if you think consciousness intervenes on the physical world by sometimes spontaneously causing wavefunctions to collapse (setting aside that Chalmers and most LWers reject collapse...), you could just as easily tell a story in which wavefunctions just spontaneously collapse without any mysterious redness getting involved; or a story in which they mysteriously collapse when mysterious greenness occurs rather than redness, or when an alien color occurs.

Chalmers thinks any argument for thinking that the mysterious redness of red is causally indispensable for dualist interactionism should also allow that the mysterious redness of red is an ordinary physical property that's indispensable for physical interactions. Quoting "Moving Forward on the Problem of Consciousness":

The real "epiphenomenalism" problem, I think, does not arise from the causal closure of the physical world. Rather, it arises from the causal closure of the world! Even on an interactionist picture, there will be some broader causally closed story that explains behavior, and such a story can always be told in a way that neither includes nor implies experience. Even on the interactionist picture, we can view minds as just further nodes in the causal network, like the physical nodes, and the fact that these nodes are experiential is inessential to the causal dynamics. The basic worry arises not because experience is logically independent of physics, but because it is logically independent of causal dynamics more generally.

The interactionist has a reasonable solution to this problem, I think. Presumably, the interactionist will respond that some nodes in the causal network are experiential through and through. Even though one can tell the causal story about psychons without mentioning experience, for example, psychons are intrinsically experiential all the same. Subtract experience, and there is nothing left of the psychon but an empty place-marker in a causal network, which is arguably to say there is nothing left at all. To have real causation, one needs something to do the causing; and here, what is doing the causing is experience.

I think this solution is perfectly reasonable; but once the problem is pointed out this way, it becomes clear that the same solution will work in a causally closed physical world. Just as the interactionist postulates that some nodes in the causal network are intrinsically experiential, the "epiphenomenalist" can do the same.

This brings up a terminology-ish point:

The technical term for the belief that consciousness is there, but has no effect on the physical world, is epiphenomenalism.

Chalmers denies that he's an epiphenomenalist. Rather he says (in "Panpsychism and Panprotopsychism"):

I think that substance dualism (in its epiphenomenalist and interactionist forms) and Russellian monism (in its panpsychist and panprotopsychist forms) are the two serious contenders in the metaphysics of consciousness, at least once one has given up on standard physicalism. (I divide my own credence fairly equally between them.)

Quoting "Moving Forward" again:

Here we can exploit an idea that was set out by Bertrand Russell (1926), and which has been developed in recent years by Grover Maxwell (1978) and Michael Lockwood (1989). This is the idea that physics characterizes its basic entities only extrinsically, in terms of their causes and effects, and leaves their intrinsic nature unspecified. For everything that physics tells us about a particle, for example, it might as well just be a bundle of causal dispositions; we know nothing of the entity that carries those dispositions. The same goes for fundamental properties, such as mass and charge: ultimately, these are complex dispositional properties (to have mass is to resist acceleration in a certain way, and so on). But whenever one has a causal disposition, one can ask about the categorical basis of that disposition: that is, what is the entity that is doing the causing?

One might try to resist this question by saying that the world contains only dispositions. But this leads to a very odd view of the world indeed, with a vast amount of causation and no entities for all this causation to relate! It seems to make the fundamental properties and particles into empty placeholders, in the same way as the psychon above, and thus seems to free the world of any substance at all. It is easy to overlook this problem in the way we think about physics from day to day, given all the rich details of the mathematical structure that physical theory provides; but as Stephen Hawking (1988) has noted, physical theory says nothing about what puts the "fire" into the equations and grounds the reality that these structures describe. The idea of a world of "pure structure" or of "pure causation" has a certain attraction, but it is not at all clear that it is coherent.

So we have two questions: (1) what are the intrinsic properties underlying physical reality?; and (2) where do the intrinsic properties of experience fit into the natural order? Russell's insight, developed by Maxwell and Lockwood, is that these two questions fit with each other remarkably well. Perhaps the intrinsic properties underlying physical dispositions are themselves experiential properties, or perhaps they are some sort of proto-experiential properties that together constitute conscious experience. This way, we locate experience inside the causal network that physics describes, rather than outside it as a dangler; and we locate it in a role that one might argue urgently needed to be filled. And importantly, we do this without violating the causal closure of the physical. The causal network itself has the same shape as ever; we have just colored in its nodes.

This ideas smacks of the grandest metaphysics, of course, and I do not know that it has to be true. But if the idea is true, it lets us hold on to irreducibility and causal closure and nevertheless deny epiphenomenalism. By placing experience inside the causal network, it now carries a causal role. Indeed, fundamental experiences or proto-experiences will be the basis of causation at the lowest levels, and high-level experiences such as ours will presumably inherit causal relevance from the (proto)-experiences from which they are constituted. So we will have a much more integrated picture of the place of consciousness in the natural order.

This is also (a more honest name for) the non-physicalist view that sometimes gets called "Strawsonian physicalism." But this view seems to be exactly as vulnerable to your criticisms as traditional epiphenomenalism, because the "causal role" in question doesn't seem to be a difference-making role -- it's maybe "causal" in some metaphysical sense, but it's not causal in a Bayesian or information-theoretic sense, a sense that would allow a brain to nonrandomly update in the direction of Strawsonian physicalism / Russellian monism by computing evidence.

I'm not sure what Chalmers would say to your argument in detail, though he's responded to the terminological point about epiphenomenalism. If he thinks Russellian monism is a good response, then either I'm misunderstanding how weird Russellian monism is (in particular, how well it can do interactionism-like things), or Chalmers is misunderstanding how general your argument is. The latter is suggested by the fact that Chalmers thinks your argument weighs against epiphenomenalism but not against Russellian monism in this old LessWrong comment.

It might be worth e-mailing him this updated "Zombies" post, with this comment highlighted so that we don't get into the weeds of debating whose definition of "epiphenomenalism" is better.

Comment author: RobbBB 17 April 2016 05:34:26AM 2 points [-]

I removed the second post (What's in a Name?) from the list because it's been... well, debunked. From a recent SSC link post:

A long time ago I blogged about the name preference effect – ie that people are more positively disposed towards things that sound like their name – so I might like science more because Scott and science start with the same two letters. A bunch of very careful studies confirmed this effect even after apparently controlling for everything. Now Uri Simonsohn says – too bad, it’s all spurious. This really bothers me because I remember specifically combing over these studies and finding them believable at the time. Yet another reminder that things are worse than I thought.

Comment author: [deleted] 01 January 2016 01:29:31AM *  0 points [-]

One common question we hear about alignment research runs analogously to: "If you don't develop calculus, what bad thing happens to your rocket? Do you think the pilot will be struggling to make a course correction, and find that they simply can't add up the tiny vectors fast enough? That scenario just doesn't sound plausible."

Actually, that sounds entirely plausible.

The case is similar with, e.g., attempts to develop theories of logical uncertainty. The problem is not that we visualize a specific AI system encountering a catastrophic failure because it mishandles logical uncertainty; the problem is that all our existing tools for describing the behavior of rational agents assume that those agents are logically omniscient, making our best theories incommensurate with our best practical AI designs.

Well, of course, part of the problem is that the best theories of "rational agents" try to assume Homo Economicus into being, and insist on cutting off all the ways in which physically-realizable minds cannot fit. So we need a definition of rationality that makes sense in a world where agents don't have completed infinities of computational power and can be modified by the environment and don't come with built-in utility functions that necessarily map physically realizable situations to the real numbers.

If we could program that computer to reliably achieve some simple goal (such as producing as much diamond as possible), then a large share of the AI alignment research would be completed.

Wait wait wait. You're saying that the path between Clippy and a prospective completed FAI is shorter than the path between today's AI state-of-the-art and Clippy? Because it sounds like you're saying that, even though I really don't expect you to say that.

On the upside, I do think we can spell out a research program to get us there, which will be grounded in current computational cog-sci and ML literature, which will also help with Friendliness/alignment engineering, which will not engender arguments with Jessica over math this time.

But now for the mandatory remark: you are insane and will kill us all ;-), rabble rabble rabble.

Comment author: RobbBB 02 January 2016 08:34:55AM *  3 points [-]

Clippy is a thought experiment used to illustrate two ideas: terminal goals are orthogonal to capabilities ("the AI does not love you"), and they tend to have instrumental goals like resource acquisition and self-preservation ("the AI does not hate you, but..."). This highlights the fact that highly capable AI can be dangerous even if it's reliably pursuing some known goal and the goal isn't ambitious or malicious. For that reason, Clippy comes up a lot as an intuition pump for why we need to get started early on safety research.

But 'a system causes harm in the course of reliably pursuing some known, stable, obviously-non-humane goal' is a very small minority of the actual disaster scenarios MIRI researchers are worried about. Not because it looks easy to go from a highly reliable diamond maximizer to an aligned superintelligence, but because there appear to be a larger number of ways things can go wrong before we get to that point.

  1. We can fail to understand an advanced AI system well enough to know how 'goals' are encoded in it, forcing us to infer and alter goals indirectly.

  2. We can understand the system's 'goals,' but have them be in the wrong idiom for a safe superintelligence (e.g., rewards for a reinforcement learner).

  3. We can understand the system well enough to specify its goals, but not understand our own goals fully or precisely enough to specify them correctly. We come up with an intuitively 'friendly' goal (something more promising-sounding than 'maximize the number of paperclips'), but it's still the wrong goal.

  4. Similarly: We can understand the system well enough to specify safe behavior in its initial context, but the system stops being safe after it or its environment undergoes a change. An example of this is instability under self-modification.

  5. We can design advanced AI systems we don't realize (or don't care) have consequentialist goals. This includes systems we don't realize are powerful optimizers, e.g., ones whose goal-oriented behavior may depend in complicated ways on the interaction of multiple AI systems, or ones that function as unnoticed subsystems of non-consequentialists.

View more: Next