In response to comment by timujin on Zombies Redacted
Comment author: UmamiSalami 06 July 2016 08:29:09PM *  -1 points [-]

This argument is not going to win over their heads and hearts. It's clearly written for a reductionist reader, who accepts concepts such as Occam's Razor and knowing-what-a-correct-theory-looks-like.

I would suggest that people who have already studied this issue in depth would have other reasons for rejecting the above blog post. However, you are right that philosophers in general don't use Occam's Razor as a common tool and they don't seem to make assumptions about what a correct theory "looks like."

If conceivability does not imply logical possibility, then even if you can imagine a Zombie world, it does not mean that the Zombie world is logically possible.

Chalmers does not claim that p-zombies are logically possible, he claims that they are metaphysically possible. Chalmers already believes that certain atomic configurations necessarily imply consciousness, by dint of psychophysical laws.

The claim that certain atomic configurations just are consciousness is what the physicalist claims, but that is what is contested by knowledge arguments: we can't really conceive of a way for consciousness to be identical with physical states.

Comment author: RobbBB 07 July 2016 04:13:57AM 0 points [-]

Chalmers doesn't think 'metaphysical possibility' is a well-specified idea. He thinks p-zombies are logically possible, but that the purely physical facts in our world do not logically entail the phenomenal facts; the phenomenal facts are 'further facts.'

In response to Zombies Redacted
Comment author: RobbBB 03 July 2016 08:30:32PM *  16 points [-]

The "conceivability" of zombies is accepted by a substantial fraction, possibly a majority, of academic philosophers of consciousness.

This can be made precise. According to the 2009 PhilPapers Survey (sent to all faculty at the top 89 Ph.D-granting philosophy departments in the English-speaking world as ranked by the Philosophical Gourmet Report, plus 10 high-prestige non-Anglophone departments), about 2/3 of professional philosophers of mind think zombies are conceivable, though most of these think physicalism is true anyway. Specifically, 91 of the 191 respondents (47.6%) said zombies are conceivable but not metaphysically possible; 47 (24.6%) said they were inconceivable; 35 (18.3%) said they're (conceivable and) metaphysically possible; and the other 9.4% were agnostic/undecided or rejected all three options.

Looking at professional philosophers as a whole in the relevant departments, including non-philosophers-of-mind, 35.6% say zombies are conceivable, 16% say they're inconceivable, 23.3% say they're metaphysically possible, 17% say they're undecided or insufficiently familiar with the issue (or they skipped the question), and 8.2% rejected all three options. So the average top-tier Anglophone philosopher of mind is more likely to reject zombies than is the average top-tier Anglophone philosopher. (Relatedly, 22% of philosophers of mind accept or lean toward 'non-physicalism', vs. 27% of philosophers in general.)

There is a stuff of consciousness which is not yet understood, an extraordinary super-physical stuff that visibly affects our world; and this stuff is what makes us talk about consciousness.

Chalmers' core objection to interactionism, I think, is that any particular third-person story you can tell about the causal effects of consciousness could also be told without appealing to consciousness. E.g., if you think consciousness intervenes on the physical world by sometimes spontaneously causing wavefunctions to collapse (setting aside that Chalmers and most LWers reject collapse...), you could just as easily tell a story in which wavefunctions just spontaneously collapse without any mysterious redness getting involved; or a story in which they mysteriously collapse when mysterious greenness occurs rather than redness, or when an alien color occurs.

Chalmers thinks any argument for thinking that the mysterious redness of red is causally indispensable for dualist interactionism should also allow that the mysterious redness of red is an ordinary physical property that's indispensable for physical interactions. Quoting "Moving Forward on the Problem of Consciousness":

The real "epiphenomenalism" problem, I think, does not arise from the causal closure of the physical world. Rather, it arises from the causal closure of the world! Even on an interactionist picture, there will be some broader causally closed story that explains behavior, and such a story can always be told in a way that neither includes nor implies experience. Even on the interactionist picture, we can view minds as just further nodes in the causal network, like the physical nodes, and the fact that these nodes are experiential is inessential to the causal dynamics. The basic worry arises not because experience is logically independent of physics, but because it is logically independent of causal dynamics more generally.

The interactionist has a reasonable solution to this problem, I think. Presumably, the interactionist will respond that some nodes in the causal network are experiential through and through. Even though one can tell the causal story about psychons without mentioning experience, for example, psychons are intrinsically experiential all the same. Subtract experience, and there is nothing left of the psychon but an empty place-marker in a causal network, which is arguably to say there is nothing left at all. To have real causation, one needs something to do the causing; and here, what is doing the causing is experience.

I think this solution is perfectly reasonable; but once the problem is pointed out this way, it becomes clear that the same solution will work in a causally closed physical world. Just as the interactionist postulates that some nodes in the causal network are intrinsically experiential, the "epiphenomenalist" can do the same.

This brings up a terminology-ish point:

The technical term for the belief that consciousness is there, but has no effect on the physical world, is epiphenomenalism.

Chalmers denies that he's an epiphenomenalist. Rather he says (in "Panpsychism and Panprotopsychism"):

I think that substance dualism (in its epiphenomenalist and interactionist forms) and Russellian monism (in its panpsychist and panprotopsychist forms) are the two serious contenders in the metaphysics of consciousness, at least once one has given up on standard physicalism. (I divide my own credence fairly equally between them.)

Quoting "Moving Forward" again:

Here we can exploit an idea that was set out by Bertrand Russell (1926), and which has been developed in recent years by Grover Maxwell (1978) and Michael Lockwood (1989). This is the idea that physics characterizes its basic entities only extrinsically, in terms of their causes and effects, and leaves their intrinsic nature unspecified. For everything that physics tells us about a particle, for example, it might as well just be a bundle of causal dispositions; we know nothing of the entity that carries those dispositions. The same goes for fundamental properties, such as mass and charge: ultimately, these are complex dispositional properties (to have mass is to resist acceleration in a certain way, and so on). But whenever one has a causal disposition, one can ask about the categorical basis of that disposition: that is, what is the entity that is doing the causing?

One might try to resist this question by saying that the world contains only dispositions. But this leads to a very odd view of the world indeed, with a vast amount of causation and no entities for all this causation to relate! It seems to make the fundamental properties and particles into empty placeholders, in the same way as the psychon above, and thus seems to free the world of any substance at all. It is easy to overlook this problem in the way we think about physics from day to day, given all the rich details of the mathematical structure that physical theory provides; but as Stephen Hawking (1988) has noted, physical theory says nothing about what puts the "fire" into the equations and grounds the reality that these structures describe. The idea of a world of "pure structure" or of "pure causation" has a certain attraction, but it is not at all clear that it is coherent.

So we have two questions: (1) what are the intrinsic properties underlying physical reality?; and (2) where do the intrinsic properties of experience fit into the natural order? Russell's insight, developed by Maxwell and Lockwood, is that these two questions fit with each other remarkably well. Perhaps the intrinsic properties underlying physical dispositions are themselves experiential properties, or perhaps they are some sort of proto-experiential properties that together constitute conscious experience. This way, we locate experience inside the causal network that physics describes, rather than outside it as a dangler; and we locate it in a role that one might argue urgently needed to be filled. And importantly, we do this without violating the causal closure of the physical. The causal network itself has the same shape as ever; we have just colored in its nodes.

This ideas smacks of the grandest metaphysics, of course, and I do not know that it has to be true. But if the idea is true, it lets us hold on to irreducibility and causal closure and nevertheless deny epiphenomenalism. By placing experience inside the causal network, it now carries a causal role. Indeed, fundamental experiences or proto-experiences will be the basis of causation at the lowest levels, and high-level experiences such as ours will presumably inherit causal relevance from the (proto)-experiences from which they are constituted. So we will have a much more integrated picture of the place of consciousness in the natural order.

This is also (a more honest name for) the non-physicalist view that sometimes gets called "Strawsonian physicalism." But this view seems to be exactly as vulnerable to your criticisms as traditional epiphenomenalism, because the "causal role" in question doesn't seem to be a difference-making role -- it's maybe "causal" in some metaphysical sense, but it's not causal in a Bayesian or information-theoretic sense, a sense that would allow a brain to nonrandomly update in the direction of Strawsonian physicalism / Russellian monism by computing evidence.

I'm not sure what Chalmers would say to your argument in detail, though he's responded to the terminological point about epiphenomenalism. If he thinks Russellian monism is a good response, then either I'm misunderstanding how weird Russellian monism is (in particular, how well it can do interactionism-like things), or Chalmers is misunderstanding how general your argument is. The latter is suggested by the fact that Chalmers thinks your argument weighs against epiphenomenalism but not against Russellian monism in this old LessWrong comment.

It might be worth e-mailing him this updated "Zombies" post, with this comment highlighted so that we don't get into the weeds of debating whose definition of "epiphenomenalism" is better.

Comment author: RobbBB 17 April 2016 05:34:26AM 2 points [-]

I removed the second post (What's in a Name?) from the list because it's been... well, debunked. From a recent SSC link post:

A long time ago I blogged about the name preference effect – ie that people are more positively disposed towards things that sound like their name – so I might like science more because Scott and science start with the same two letters. A bunch of very careful studies confirmed this effect even after apparently controlling for everything. Now Uri Simonsohn says – too bad, it’s all spurious. This really bothers me because I remember specifically combing over these studies and finding them believable at the time. Yet another reminder that things are worse than I thought.

Comment author: [deleted] 01 January 2016 01:29:31AM *  0 points [-]

One common question we hear about alignment research runs analogously to: "If you don't develop calculus, what bad thing happens to your rocket? Do you think the pilot will be struggling to make a course correction, and find that they simply can't add up the tiny vectors fast enough? That scenario just doesn't sound plausible."

Actually, that sounds entirely plausible.

The case is similar with, e.g., attempts to develop theories of logical uncertainty. The problem is not that we visualize a specific AI system encountering a catastrophic failure because it mishandles logical uncertainty; the problem is that all our existing tools for describing the behavior of rational agents assume that those agents are logically omniscient, making our best theories incommensurate with our best practical AI designs.

Well, of course, part of the problem is that the best theories of "rational agents" try to assume Homo Economicus into being, and insist on cutting off all the ways in which physically-realizable minds cannot fit. So we need a definition of rationality that makes sense in a world where agents don't have completed infinities of computational power and can be modified by the environment and don't come with built-in utility functions that necessarily map physically realizable situations to the real numbers.

If we could program that computer to reliably achieve some simple goal (such as producing as much diamond as possible), then a large share of the AI alignment research would be completed.

Wait wait wait. You're saying that the path between Clippy and a prospective completed FAI is shorter than the path between today's AI state-of-the-art and Clippy? Because it sounds like you're saying that, even though I really don't expect you to say that.

On the upside, I do think we can spell out a research program to get us there, which will be grounded in current computational cog-sci and ML literature, which will also help with Friendliness/alignment engineering, which will not engender arguments with Jessica over math this time.

But now for the mandatory remark: you are insane and will kill us all ;-), rabble rabble rabble.

Comment author: RobbBB 02 January 2016 08:34:55AM *  3 points [-]

Clippy is a thought experiment used to illustrate two ideas: terminal goals are orthogonal to capabilities ("the AI does not love you"), and they tend to have instrumental goals like resource acquisition and self-preservation ("the AI does not hate you, but..."). This highlights the fact that highly capable AI can be dangerous even if it's reliably pursuing some known goal and the goal isn't ambitious or malicious. For that reason, Clippy comes up a lot as an intuition pump for why we need to get started early on safety research.

But 'a system causes harm in the course of reliably pursuing some known, stable, obviously-non-humane goal' is a very small minority of the actual disaster scenarios MIRI researchers are worried about. Not because it looks easy to go from a highly reliable diamond maximizer to an aligned superintelligence, but because there appear to be a larger number of ways things can go wrong before we get to that point.

  1. We can fail to understand an advanced AI system well enough to know how 'goals' are encoded in it, forcing us to infer and alter goals indirectly.

  2. We can understand the system's 'goals,' but have them be in the wrong idiom for a safe superintelligence (e.g., rewards for a reinforcement learner).

  3. We can understand the system well enough to specify its goals, but not understand our own goals fully or precisely enough to specify them correctly. We come up with an intuitively 'friendly' goal (something more promising-sounding than 'maximize the number of paperclips'), but it's still the wrong goal.

  4. Similarly: We can understand the system well enough to specify safe behavior in its initial context, but the system stops being safe after it or its environment undergoes a change. An example of this is instability under self-modification.

  5. We can design advanced AI systems we don't realize (or don't care) have consequentialist goals. This includes systems we don't realize are powerful optimizers, e.g., ones whose goal-oriented behavior may depend in complicated ways on the interaction of multiple AI systems, or ones that function as unnoticed subsystems of non-consequentialists.

Comment author: RobbBB 18 December 2015 06:12:43AM *  9 points [-]

Update: We've hit our first fundraising target! We're now nearing the $200k mark. Concretely, additional donations at this point will have several effects on MIRI's operations over the coming year:

  • There are a half-dozen promising people we'd be hiring on a trial basis if we had the funds. Not everyone in this reference class can be sent to e.g. the Oxford and Cambridge groups working on AI risk (CSER, FHI, Leverhulme CFI), because those organizations have different hiring criteria and aren't primarily focused on our kind of research. This is one of the main ways a more successful fundraiser this month translates into additional AI alignment research on the margin.

This includes several research fellows we're considering hiring (with varying levels of confidence) in the next year. Being less funding-constrained would make us more confident that our growth is sustainable, causing us to move significantly faster on these hires.

  • Secondarily, additional funds would allow us to run additional workshops and run longer and meatier fellows/scholars programs.

Larger donations would allow us to expand the research team size we're shooting for, and also spend a lot more time on academic outreach.

Comment author: RobbBB 10 December 2015 01:06:05AM 5 points [-]

Update from Ruairi Donnelly, the Executive Director at Raising for Effective Giving: "Ferruell's matching drive has been going on for a while behind the scenes, as he had a preference for doing it in the Russian poker community first. No limit was declared (but the implicit understanding was 'in the tens of thousands'). As donations were slowing down, our ambassador Liv Boeree shared it on Twitter. Ferruell has now confirmed that he'll match donations up to $50,000. [Roughly] $30,000 have already come in."

Comment author: RobbBB 10 December 2015 06:46:38AM 7 points [-]

Update: The remaining $20,000 has been matched!!! Wow!

Comment author: RobbBB 09 December 2015 05:38:23PM 11 points [-]

Announcement: Raising for Effective Giving just let us know that for the next two days, donations to MIRI through REG's donation page (under "Pick a specific charity: MIRI") will be matched dollar-for-dollar by poker player Ferruell! Fantastic.

Comment author: RobbBB 10 December 2015 01:06:05AM 5 points [-]

Update from Ruairi Donnelly, the Executive Director at Raising for Effective Giving: "Ferruell's matching drive has been going on for a while behind the scenes, as he had a preference for doing it in the Russian poker community first. No limit was declared (but the implicit understanding was 'in the tens of thousands'). As donations were slowing down, our ambassador Liv Boeree shared it on Twitter. Ferruell has now confirmed that he'll match donations up to $50,000. [Roughly] $30,000 have already come in."

Comment author: RobbBB 09 December 2015 05:38:23PM 11 points [-]

Announcement: Raising for Effective Giving just let us know that for the next two days, donations to MIRI through REG's donation page (under "Pick a specific charity: MIRI") will be matched dollar-for-dollar by poker player Ferruell! Fantastic.

Comment author: V_V 14 October 2015 08:40:04AM *  -2 points [-]
  • 1 - Humans can't reliably precommit. Even if they could, precommittment is different than using an "acausal" decision theory. You don't need precommitment to one-box in Newcomb's problem, and the ability to precommit doesn't guarantee by itself that you will one-box. In an adversarial game where the players can precommit and use a causal version of game theory, the one that can precommit first generally wins. E.g. Alice can precommit to ignore Bob's threats, but she has no incentive to do so if Bob already precommitted to ignore Alice's precommitments, and so on. If you allow for "acausal" reasoning, then even having a time advantage doesn't work: if Bob isn't born yet, but Alice predicts that she will be in an adversarial game with Bob and Bob will reason acausally and therefore he will have an incentive to threaten her and ignore her precommitments, then she has an incentive not to make such precommitment.
  • 2 - This implies that the future AI uses a decision theory that two-boxes in Newcomb's problem, contradicting the premise that it one-boxes.
  • 3 - This implies that the future AI will have a deontological rule that says "Don't blackmail" somehow hard-coded in it, contradicting the premise that it will be an utilitarian. Indeed, humans may want to build an AI with such constants, but in order to do so they will have to consider the possibility of blackmail and likely reject utilitarianism, which was the point of Roko's argument.
  • 4 - Shut up and multiply.
Comment author: RobbBB 14 October 2015 11:00:05PM *  0 points [-]

Humans can't reliably precommit.

Humans don't follow any decision theory consistently. They sometimes give in to blackmail, and at other times resist blackmail. If you convinced a bunch of people to take acausal blackmail seriously, presumably some subset would give in and some subset would resist, since that's what we see in ordinary blackmail situations. What would be interesting is if (a) there were some applicable reasoning norm that forced us to give in to acausal blackmail on pain of irrationality, or (b) there were some known human irrationality that made us inevitably susceptible to acausal blackmail. But I don't think Roko gave a good argument for either of those claims.

From my last comment: "there are probably some decision theories that let agents acausally blackmail each other". But if humans frequently make use of heuristics like 'punish blackmailers' and 'never give in to blackmailers', and if normative decision theory says they're right to do so, there's less practical import to 'blackmailable agents are possible'.

This implies that the future AI uses a decision theory that two-boxes in Newcomb's problem, contradicting the premise that it one-boxes.

No it doesn't. If you model Newcomb's problem as a Prisoner's Dilemma, then one-boxing maps on to cooperating and two-boxing maps on to defecting. For Omega, cooperating means 'I put money in both boxes' and defecting means 'I put money in just one box'. TDT recognizes that the only two options are mutual cooperation or mutual defection, so TDT cooperates.

Blackmail works analogously. Perhaps the blackmailer has five demands. For the blackmailee, full cooperation means 'giving in to all five demands'; full defection means 'rejecting all five demands'; and there are also intermediary levels (e.g., giving in to two demands while rejecting the other three), with the blackmailee prefer to do as little as possible.

For the blackmailer, full cooperation means 'expending resources to punish the blackmailee in proportion to how many of my demands were met'. Full defection means 'expending no resources to punish the blackmailee even if some demands aren't met'. In other words, since harming past agents is costly, a blackmailer's favorite scenario is always 'the blackmailee, fearing punishment, gives in to most or all of my demands; but I don't bother punishing them regardless of how many of my demands they ignored'. We could say that full defection doesn't even bother to check how many of the demands were met, except insofar as this is useful for other goals.

The blackmailer wants to look as scary as possible (to get the blackmailee to cooperate) and then defect at the last moment anyway (by not following through on the threat), if at all possible. In terms of Newcomb's problem, this is the same as preferring to trick Omega into thinking you'll one-box, and then two-boxing anyway. We usually construct Newcomb's problem in such a way that this is impossible; therefore TDT cooperates. But in the real world mutual cooperation of this sort is difficult to engineer, which makes fully credible acausal blackmail at least as difficult.

This implies that the future AI will have a deontological rule that says "Don't blackmail" somehow hard-coded in it, contradicting the premise that it will be an utilitarian.

I think you misunderstood point 3. 3 is a follow-up to 2: humans and AI systems alike have incentives to discourage blackmail, which increases the likelihood that blackmail is a self-defeating strategy.

Shut up and multiply.

Eliezer has endorsed the claim "two independent occurrences of a harm (not to the same person, not interacting with each other) are exactly twice as bad as one". This doesn't tell us how bad the act of blackmail itself is, it doesn't tell us how faithfully we should implement that idea in autonomous AI systems, and it doesn't tell us how likely it is that a superintelligent AI would find itself forced into this particular moral dilemma.

Since Eliezer asserts a CEV-based agent wouldn't blackmail humans, the next step in shoring up Roko's argument would be to do more to connect the dots from "two independent occurrences of a harm (not to the same person, not interacting with each other) are exactly twice as bad as one" to a real-world worry about AI systems actually blackmailing people conditional on claims (a) and (c). 'I find it scary to think a superintelligent AI might follow the kind of reasoning that can ever privilege torture over dust specks' is not the same thing as 'I'm scared a superintelligent AI will actually torture people because this will in fact be the best way to prevent a superastronomically large number of dust specks from ending up in people's eyes', so Roko's particular argument has a high evidential burden.

Comment author: VoiceOfRa 14 October 2015 09:00:18PM 0 points [-]

Assuming you for some reason are following a decision theory that does put you at risk of acausal blackmail: Since the hypothetical agent is superintelligent, it has lots of ways to trick people into thinking it's going to torture people without actually torturing them. Since this is cheaper, it would rather do that. And since we're aware of this, we know any threat of blackmail would be empty.

Um, your conclusion "since we're aware of this, we know any threat of blackmail would be empty" contradicts your premise that the AI by virtue of being super-intelligent is capable of fooling people into thinking it'll torture them.

Comment author: RobbBB 14 October 2015 10:05:44PM *  0 points [-]

One way of putting this is that the AI, once it exists, can convincingly trick people into thinking it will cooperate in Prisoner's Dilemmas; but since we know it has this property and we know it prefers (D,C) over (C,C), we know it will defect. This is consistent because we're assuming the actual AI is powerful enough to trick people once it exists; this doesn't require the assumption that my low-fidelity mental model of the AI is powerful enough to trick me in the real world.

For acausal blackmail to work, the blackmailer needs a mechanism for convincing the blackmailee that it will follow through on its threat. 'I'm a TDT agent' isn't a sufficient mechanism, because a TDT agent's favorite option is still to trick other agents into cooperating in Prisoner's Dilemmas while they defect.

View more: Next