gwern comments on Open Thread, August 2010-- part 2 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (369)
I haven't seen much response to it. There's a reply in Analysis by Baumann who takes a cheap out by saying simply that one cannot provide the probability in advance, that it's 'extremely implausible'.
I have an unfinished essay where I argue that as presented the problem is asking for a uniform distribution over an infinity, so you cannot give the probability in advance, but I haven't yet come up with a convincing argument why you would want your probability to scale down in proportion as the mugger's offer scales up.
That is: it's easy to show that scaling disproportionately leads to another mugging. If you scale superlinearly, then the mugging can be broken up into an ensemble of offers that add to a mugging. If you scale sublinearly, you will refuse sensible offers that are broken up.
But I haven't come up with any deeper justification for linearly scaling other than 'this apparently arbitrary numeric procedure avoids 3 problems'. I've sort of given up on it, as you can see from the parlous state of my essay.
Thanks. Here's my fresh and uneducated opinion.
I see three kinds of answers to the mugging:
Here's my analysis in the sense of 4., tell me if I'm making a common mistake. We are worried that P(agent can do H amount of harm | agent threatens to do H amount of harm) times H can be arbitrarily large. As Tarleton pointed out in the 2007 post, any details beyond H about the scenario we're being threatened with is a distraction (right? That actually doesn't seem to be the implicit assumption of your draft, or of Hanson's comment, etc.)
By Bayes the quantity in question is the same as
P(threat | ability)/P(threat) x P(ability) x H
Our hope is that we can prove this quantity is actually bounded independent of H (but of course not independent of the agent making the threat). I'll leave aside the fact that the probability that such a proof contains a mistake is certainly bounded below.
P(threaten H) is the probability that a certain computer program (the agent making the threat) will give a certain output (the threat). My feeling about this number is that it is medium sized if H has low complexity (such as 3^^^3) and tiny if H has high complexity (such as some of the numbers within 10% of 3^^^3). That is, complex threats have more credibility. I'm comforted by the fact that, by the definition of complexity, it would take a long time for an agent to articulate his complex threat. So let's assume P(threaten H) is medium-sized, as in the original version where H = 3^^^3 x value of human not being tortured.
It seems like wishful thinking that P(threat | ability) should shrink with H. Let's assume this is also medium sized and does not depend on H.
So I think the question boils down to how fast P(agent can do H amount of harm) shrinks with H. If it's O(1/H) we're OK, and if it's larger we're boned.
As long as we're all chipping in, here's my take:
(1) Even if the correct answer is to hand over the money, we should expect to feel an intuitive sense that doing so is the wrong answer. A credible threat to inflict that much disutility would never have happened in the ancestral environment, but false threats to do so have happened rather often. That being the case, the following is probably rationalization rather than rationality:
(2) Consider the proposition that, at some point in my life, someone will try to Pascal's-mug me and actually back their threats up. In this case, I would still expect to receive a much larger number of false threats over the course of my lifetime. If I hand over all my money to the first mugger without proper verification, I won't be able to pay up when the real threat comes around.
I think that your (2) is a proof that handing over the money is the wrong answer. My understand is that the problem is whether this means that any AI that runs on the basic package that we sometimes envision hazily -- prior, (unbounded) utility function, algorithm for choosing based somehow on multiplying the former by the latter -- is boned.
I thought that my (2) was a proof that a prior-and-utility system will correctly decide to investigate the claim to see whether it's credible.
But what a prior-and-utility system means by "credible" is that the expected disutility is large. If a blackmailer can, at finite cost to itself, put our AI in a situation with arbitrarily high expected disutility, then our AI is boned.
Ah, you're worried about a blackmailer that can actually follow up on that threat. I would point out that humans usually pay ransoms, so it's not exactly making a different decision than we would in the same situation.
Or, the AI might anticipate the problem and self-modify in advance to never submit to threats.
I'm worried about a blackmailer that can with positive probability follow up on that threat.
Yes humans behave in the same way, at least according to economists. We pay ransoms when the probability of the threat being carried out, times the disutility that would result from the threat being carried out, is less than the ransom. The difference is that for human-scale threats, this expected disutility does seem to be bounded.
That could mean one of at least two things: either the AI starts to work according to the rules of a (hitherto not conceived?) non-prior-and-utility system. Or the AI calibrates its prior and its utility function so that it doesn't submit to (some) threats. I think the question is whether something like the second idea can work.
No, see, that's different.
If you're dealing with a blackmailer that might be able to carry out their threats, then you investigate whether they can or not. The blackmailer themselves might assist you with this, since it's in their interest to show that their threat is credible.
Allow me to demonstrate: Give $100 to the EFF or I'll blow up the sun. Do you now assign a higher expected-value utility to giving $100 to the EFF, or to giving the same $100 instead to SIAI? If I blew up the moon as a warning shot, would that change your mind?
The result of such an investigation might raise or lower P(threat can be carried out). This doesn't change the shape of the question: can a blackmailer issue a threat with P(threat can be carried out) x U(threat is carried out) > H, for all H? Can it do so at cost to itself that is bounded independent of H?
I refuse. According to economists, I have just revealed a preference:
P(Pavitra can blow up the sun) x U(Sun) < U(100$)
Yes. Now I have revealed
P(Pavitra can blow up the sun | Pavitra has blown up the moon) x U(Sun) > U(100$)
I don't quite follow this. Assuming we're using one of the universal priors based on Turing machine enumerations, then an agent which consists of 3^^^3threat+noability is much shorter and much more likely than an agent which consists of ~.10*3^^^3threat+ability. The more complex the threat, the less space there is for executing it.
If I disagree, it's for a very minor reason, and with only a little confidence. (P(threat) is short for P(threat|no information about ability).) But you're saying the case for P(threaten H) being bounded below (and its reciprocal being bounded above) is even stronger than I thought, right?
Another way to argue that P(threaten H) should be medium-sized: at least in real life, muggings have a time-limit. There are finitely many threats of a hundred words or less, and so our prior probability that we will one day receive such a threat is bounded below.
Another way to argue that the real issue is P(ability H): our AI might single you out and compute P(gwern will do H harm) = P(gwern will do H harm | gwern can do H harm) x P(gwern can do H harm). It seems like you have an interest in convincing the AI that P(gwern can do H harm) x H is bounded above.