Eliezer's Lost Alignment Articles / The Arbital Sequence

Ruby; RobertM

LESSWRONG
LW

All of BurntVictory's Comments + Replies

The Kitty Genovese Equation

Someone's in trouble. You can hear them from your apartment, but you can't tell if any of your neighbors are already rushing down, or already calling the police. It's time sensitive, and you've got to decide now: is it worth spending those precious minutes, or not?

Let's define our variables:

Cost to victim of nobody helping: $C$

cost to each bystander of intervening: $k < C$

Number of bystanders: $N >= 2.$ (Since $k < C$ , for $N = 1$ it's always right to intervene.)

Analysis:

Suppose the bystanders all sim... (read more),

Buying Value, not Price

BurntVictory6y40

The LessWrongy framework I'm familiar with would say that value = expected utility, so it takes potential downsides into account. You're not risk-averse wrt your VNM utility function, but computing that utility function is hard in practice, and EV calculations can benefit from some consideration of the tail-risks.

The Game Theory of Blackmail

BurntVictory6y40

Schelling's The Strategy of Conflict seems very relevant here; a major focus is precommitment as a bargaining tool. See here for an old review by cousin_it.

Iterated chicken seems fine to test, just as a spinoff of the IPD that maps to slightly different situations. (I believe that the iterated game of mutually modeling each other's single-shot strategy is different from iterating the game itself, so I don't think Abram's post necessarily implies that iterated chicken is relevant to ASI blackmail solutions.)

Speaking of iterated games, on... (read more)

Privacy

BurntVictory6y*30

It's true the net effect is low to first order, but you're neglecting second-order effects. If premia are important enough, people will feel compelled to Goodhart proxies used for them until those proxies have less meaning.

Given the linked siderea post, maybe this is not very true for insurance in particular. I agree that wasn't a great example.

Slack-wise, uh, choices are bad. really bad. Keep the sabbath. These are some intuitions I suspect are at play here. I'm not interested in a detailed argument hashing out whether we should beli... (read more)

Privacy

BurntVictory6y80

The post implies it is bad to be judged. I could have misinterpreted why, but that implication is there. If judge just meant "make inferences about" why would it be bad?

As Raemon says, knowing that others are making correct inferences about your behavior means you can't relax. No, idk, watching soap operas, because that's an indicator of being less likely to repay your loans, and your premia go up. There's an ethos of slack, decisionmaking-has-costs, strategizing-has-costs that Zvi's explored in his previous posts, and that&... (read more)

9jessicata6y

This is really, really clearly false! 1. This assumes that, upon more facts being revealed, insurance companies will think I am less (not more) likely to repay my loans, by default (e.g. if I don't change my TV viewing behavior). 2. More egregiously, this assumes that I have to keep putting in effort into reducing my insurance premiums until I have no slack left, because these premiums really, really, really matter. (I don't even spend that much on insurance premiums!) If you meant this more generally, and insurance was just a bad example, why is the situation worse in terms of slack than it was before? (I already have the ability to spend leisure time on gaining more money, signalling, etc.)

Privacy

BurntVictory6y30

I found this pretty useful--Zvi's definitely reflecting a particular, pretty negative view of society and strategy here. But I disagree with some of your inferences, and I think you're somewhat exaggerating the level of gloom-and-doom implicit in the post.

>Implication: "judge" means to use information against someone. Linguistic norms related to the word "judgment" are thoroughly corrupt enough that it's worth ceding to these, linguistically, and using "judge" to mean (usually unjustly!) using information aga... (read more)

1jessicata6y

The post implies it is bad to be judged. I could have misinterpreted why, but that implication is there. If judge just meant "make inferences about" why would it be bad? But it also helps in knowing who's exploiting them! Why does it give more advantages to the "bad" side? Why would you expect the terrorists to be miscalibrated about this before the reduction in privacy, to the point where they think people won't negotiate with them when they actually will, and less privacy predictably changes this opinion? Perhaps the optimal set of norms for these people is "there are no rules, do what you want". If you can improve on that, than that would constitute a norm-set that is more just than normlessness. Capturing true ethical law in the norms most people follow isn't necessary. Sure, but doesn't it help me against them too?

Question: MIRI Corrigbility Agenda

BurntVictory6y30

The CHAI reading list is also fairly out of date (last updated april 2017) but has a few more papers, especially if you go to the top and select [3] or [4] so it shows lower-priority ones.

(And in case others haven't seen it, here's the MIRI reading guide for learning agent foundations.)

AI development incentive gradients are not uniformly terrible

BurntVictory6y10

Oh wait, yeah, this is just an example of the general principle "when you're optimizing for xy, and you have a limited budget with linear costs on x and y, the optimal allocation is to spend equal amounts on both."

Formally, you can show this via Lagrange-multiplier optimization, using the Lagrangian $L (x, y) = x y - λ (a x + b y - M)$ . Setting the partials equal to zero gets you $λ = y / a = x / b$ , and you recover the linear constraint function $a x + b y = M$ . So $a x = b y = M / 2$ . (Alternatively, just optimizing $x \frac{M - a x}{b}$ works, but I like Lagrange multipliers.)

In this case, we wa... (read more)

AI development incentive gradients are not uniformly terrible

BurntVictory6y*40

I think your solution to "reckless rivals" might be wrong? I think you mistakenly put a multiplier of q instead of a p on the left-hand side of the inequality. (The derivation of the general inequality checks out, though, and I like your point about discontinuous effects of capacity investment when you assume that the opponent plays a known pure strategy.)

I'll use slightly different notation from yours, to avoid overloading p and q. (This ends up not mattering because of linearity, but eh.) Let $p_{0}, q_{0}$ be the initial probabilities for winning... (read more),

2rk6y

Yes, you're quite right! The intuition becomes a little clearer when I take the following alternative derivation: Let us look at the change in expected value when I increase my capabilities. From the expected value stemming from worlds where I win, we have (p∗q)′=p′∗q+p∗q′. For the other actor, their probability of winning decreases at a rate that matches my increase in probability of winning. Also, their probability of deploying a safe AI doesn't change. So the change in expected value stemming fro m worlds where they win is −p′∗r∗q. We should be indifferent to increasing capabilities when these sum to 0, so p′∗q+p∗q′=p′∗r∗q. Let's choose our units so km=1. Then, using the expressions for q′ from your comment, we have rq0p′0=p′0q0+p0q0(r−1). Dividing through by q0 we get rp′0=p′0+p0(r−1). Collecting like terms we have (r−1)∗p′0=p0∗(r−1) and thus p′0=p0. Substituting for p′0 we have 12−p0=p0 and thus p0=14

Drexler on AI Risk

BurntVictory6y10

Yeah, I worry that competitive pressure could convince people to push for unsafe systems. Military AI seems like an especially risky case. Military goals are harder to specify than "maximize portfolio value", but there are probably reasonable proxies, and as AI gets more capable and more widely used there's a strong incentive to get ahead of the competition.

The Pavlov Strategy

BurntVictory6y20

Yeah, I think you’re right.* So it actually looks the same as the “TFTWF accidentally defects” case.

*assuming we specify TFTWF as “defect against DD, cooperate otherwise”. I don’t see a reasonable alternate definition. I think you’re right that defecting against DC is bad, and if we go to 3-memory, defecting against DDC while cooperating with DCD seems bad too.** Sarah can’t be assuming the latter, anyway, because the “TFTWF accidentally defects” case would look different.

**there might be some fairly reasonably-behaved variant that’s like “defect if >=2 of 3 past moves were D”, but that seems like a) probably bad since I just made it up and b) not what’s being discussed here.

Do the best ideas float to the top?

BurntVictory6y10

I liked the playful writing here.

Maybe I'm being dumb, but I feel like spelling out some of your ideas would have been useful. (Or maybe you're just playing with ~pre-rigor intuitions, and I'm overthinking this.)

I think "float to the top" could plausibly mean:

A. In practice, human nature biases us towards treating these ideas as if they were true.

B. Ideal reasoning implies that these ideas should be treated as if they were true.

C. By postulate, these ideas end up reaching fixation in society. [Which then implies things about what ... (read more)

1Quinn6y

thanks for your comment. * personal behavior: probably not viable without a dystopian regime of microchips embedded into brains. * structure of group blog sites: maybe-- these things have been suggested and tried, i.e. I can't tell you how many times I've seen a reddit comment lamenting the incentives of their upvote system. * weirdly, I found out about the Brave browser last week (weird because it's apparently been around for a while): attempting to overthrow advertising with an attention-measuring coin. This is great news! * I was thinking a lot about NAT reading this paper. In the context of debate judges, NAT is a bit of a "last minute jerry-rig / frantically shore up the levy" solution, something engineers would stumble upon in an elaborate and convoluted debugging process--- the exact opposite of the kind of the solutions alignment researchers are interested in. * Tim Wu's "Is the first amendment obsolete?" is important and I think everybody should read it.

Less Competition, More Meritocracy?

BurntVictory6y40

I'll echo the other commenters in saying this was interesting and valuable, but also (perhaps necessarily) left me to cross some significant inferential gaps. The biggest for me were in going from game-descriptions to equilibria. Maybe this is just a thing that can't be made intuitive to people who haven't solved it out? But I think that, e.g., graphs of the kinds of distributions you get in different cases would have helped me, at least.

I also had to think for a bit about what assumptions you were making here:

A more rigorous or multi-step

BurntVictory6y40

A similar concept is the idea of offense-defense balance in international relations. eg, large stockpiles of nuclear weapons strongly favor “defense” (well, deterrence) because it’s prohibitively costly to develop the capacity to reliably destroy the enemy’s second-strike forces. Note the caveats there: at sufficient resource levels, and given constraints imposed by other technologies (eg inability to detect nuclear subs).

Allan Dafoe and Ben Garfinkel have a paper out on how techs tend to favor offense at low investment and defense at high investment. (Tha

BurntVictory6y50

Well, it’s nonequilibrium, so pressure isn’t even at each layer of water any more...

When I picture this happening, there’s a pulse of high-pressure water below the rock. If you froze the rock’s motion while keeping its force on the water below it, I think the pulse would eventually equilibrate out of existence as water flowed to the side? Or if I imagine a fluid with strong drag forces on the rock, but which flows smoothly itself, it again seems plausible that the pressure equilibrates at the bottom.

(More confident in the first para than the second one.)

4philh6y

Thanks! "It's nonequilibrium" feels like it points at my specific mistake. Apparently my intuitions don't currently always remember to consider that question.

Decisions are not about changing the world, they are about learning what world you live in

BurntVictory7y40

Hey, noticed what might be errors in your lesion chart: No lesion, no cancer should give +1m utils in both cases. And your probabilities don't add to 1. Including p(lesion) explicitly doesn't meaningfully change the EV difference, so eh. However, my understanding is that the core of the lesion problem is recognizing that p(lesion) is independent of smoking; EYNS seems to say the same. Might be worth including it to make that clearer?

(I don't know much about decision theory, so maybe I'm just confused.)

Anthropics: A Short Note on the Fission Riddle

BurntVictory7y10

I think what avturchin is getting at is that when you say “there is a 1/3 chance your memory is false and a 1/3 chance you are the original”, you’re implicitly conditioning only on “being one of the N total clones”, ignoring the extra information “do you remember the last split” which provides a lot of useful information. That is, if each clone fully conditioned on the information available to them, you’d get 0-.5-.5 as subjective probabilities due to your step 2.

If that’s not what you’re going for, it seems like maybe the probability you’re calculating is... (read more)

2Chris_Leong7y

Firstly, what's 0-.5-.5 mean? Secondly, you're right about conditioning on the last split. The original and last clone each think that they have a 50% chance of being the original and everyone knows that they aren't. Given this, it's tough making sense of the problem posed in the original post. Maybe the question isn't asking about the probability of the original knowing that they are the original at the end, but the chance of someone who thinks they might be the original (including those with false memories) turning out to be the original. Of course it is hard to define exactly what time we are asking about since some of these memories are false. It seems like we need to define some kind of virtual time for it to even make sense. But once this is surmounted, it should be 1/n. Again, I should be clear, this is one part of anthropics where my ideas are less developed. I think this post will have to be edited once I have a more comprehensive theory.

An Intuitive Explanation of Solomonoff Induction

BurntVictory13y30

The idea of reducing hypotheses to bitstrings (ie, programs to be run on a universal Turing machine) actually helped me a lot in understanding something about science that hindisght had previously cheapened for me. Looking back on the founding of quantum mechanics, it's easy to say "right, they should have abandoned their idea of particles existing as point objects with definite position and adopted the concept and language of probability distributions, rather than assuming a particle really exists and is just 'hidden' by the wavefunction." But t... (read more)

0Mitchell_Porter13y

When people describe something with a probability distribution, they normally continue to think that it does have a definite property and they just don't know exactly what it is. To abandon the idea of a particle having a definite position is logically distinct from adopting the use of probability distributions. Perhaps you mean that they should have adopted the view that the wavefunction is a physical object? That was what Schrodinger and de Broglie wanted. But particles show up at points, not smeared out. It took many decades for someone to think of many worlds coexisting inside a wavefunction.