All of ADifferentAnonymous's Comments + Replies

Don't use 'infohazard' for collectively destructive info

Maybe it's an apple of discord thing? You claim to devote resources to a good cause, and all the other causes take it as an insult?

ADifferentAnonymous3y158

If you really want to create widespread awareness of the broad definition, the thing to do would be to use the term in all the ways you currently wouldn't.

E.g. "The murderer realized his phone's GPS history posed a significant infohazard, as it could be used to connect him to the crime."

Don't use 'infohazard' for collectively destructive info

ADifferentAnonymous3y127

If Bostrom's paper is our Schelling point, 'infohazard' encompasses much more than just the collectively-destructive smallpox-y sense.

Here's the definition from the paper.

Information hazard: A risk that arises from the dissemination or the potential dissemination of (true) information that may cause harm or enable some agent to cause harm.

'Harm' here does not mean 'net harm'. There's a whole section on 'Adversarial Risks', cases where information can harm one party by benefitting another party:

In competitive situations, one person’s information can cause h

... (read more)

7hairyfigment3y

Yeah, that concept is literally just "harmful info," which takes no more syllables to say than "infohazard," and barely takes more letters to write. Please do not use the specialized term if your actual meaning is captured by the English term, the one which most people would understand immediately.

4Zach Stein-Perlman3y

I kinda agree. I still think Bostrom's "infohazard" is analytically useful. But that's orthogonal. If you think other concepts are more useful, make up new words for them; Bostrom's paper is the Schelling point for "infohazard." In practice, I'm ok with a broad definition because when I say "writing about that AI deployment is infohazardous" everyone knows what I mean (and in particular that I don't mean the 'adversarial risks' kind).

Human values & biases are inaccessible to the genome

ADifferentAnonymous3y85

I agree that there's a real sense in which the genome cannot 'directly' influence the things on the bulleted list. But I don't think 'hardcoded circuitry' is the relevant kind of 'direct'.

Instead, I think we should be asking whether genetic changes can produce isolated effects on things on that list.

E.g. If there can be a gene whose only observable-without-a-brain-scan effect is to make its carriers think differently about seeking power, that would indicate that the genome has fine-grained control at the level of concepts like 'seeking power'. I think this... (read more)

Quintin Pope3y146

Some context: what we ultimately want to do with this line of investigation is figure out how to influence the learned values and behaviors of a powerful AI system. We're kind of stuck here because we don't have direct access to such an AI's learned world model. Thus, it would be very good if there were a way to influence an intelligence's learned values and behaviors without requiring direct world model access.

Instead, I think we should be asking whether genetic changes can produce isolated effects on things on that list.

I and Alex agree that there ... (read more)

When Giving People Money Doesn’t Help

ADifferentAnonymous3y426

Important update from reading the paper: Figure A3 (the objective and subjective outcomes chart) is biased against the cash-receiving groups and can't be taken at face value. Getting money did not make everything worse. The authors recognize this; it's why they say there was no effect on the objective outcomes (I previously thought they were just being cowards about the error bars).

The bias is from an attrition effect: basically, control-group members with bad outcomes disproportionately dropped out of the trial. Search for 'attrition' in the paper to see ... (read more)

4JBlack3y

20% of the control group dropped out of the study while only 10% or 12% of those receiving cash did. This is a major between-groups attrition effect! I'm not as sure that there were no positive effects: even a very small proportion dropping out due to severe negative effects of poverty would completely reverse the effects they were able to measure. This probably doesn't affect the finding that the amount of cash received ($500 or $2000) did not greatly affect measurable outcomes, though. The attrition rate was very similar between those two groups. I notice that the majority of extra spending during the trial was not even approximately tracked, as it fell into the uninformative category of "transfers". Not nearly enough to verify whether the previously stated intended use of the money was actually borne out in practice. That said, the major intended spending categories that were reported substantially more often in the payment groups than control (housing and bills, about +10% each) almost certainly do fall under "debt".

When Giving People Money Doesn’t Help

ADifferentAnonymous3y20

Note that after day 120 or so, all three groups' balances decline together. Not sure what that's about.

1Daniel Tilkin3y

The control group goes up, and then back down. The payments were made in waves, and it doesn't say the exact dates. But it's possible that they were on average made a little bit before a stimulus payment.

Deep neural networks are not opaque.

Deep neural networks are not opaque.

The latter issue might become more tractable now that we better understand how and why representations are forming, so we could potentially distinguish surprisal about form and surprisal about content.

I would count that as substantial progress on the opaqueness problem.

1jem-mosig3y

To be clear: I don't have strong confidence that this works, but I think this is something worth exploring.

ADifferentAnonymous3y31

The ideal gas law describes relations between macroscopic gas properties like temperature, volume and pressure. E.g. "if you raise the temperature and keep volume the same, pressure will go up". The gas is actually made up of a huge number of individual particles each with their own position and velocity at any one time, but trying to understand the gas's behavior by looking at long list of particle positions/velocities is hopeless.

Looking at a list of neural network weights is analogous to looking at particle positions/velocities. This post claims there are quantities analogous to pressure/volume/temperature for a neutral network (AFAICT it does not offer an intuitive description of what they are)

2jem-mosig3y

I did not write down the list of quantities because you need to go through the math to understand most of them. One very central object is the neural tangent kernel, but there are also algorithm projectors, universality classes, etc., each of which require a lengthy explanation that I decided to be beyond the scope of this post.

It’s Probably Not Lithium

ADifferentAnonymous3y12

I've downvoted this comment; in light of your edit, I'll explain why. Basically, I think it's technically true but unhelpful.

There is indeed "no mystery in Americans getting fatter if we condition on the trajectory of mean calorie intake", but that's a very silly thing to condition on. I think your comment reads as if you think it's a reasonable thing to condition on.

I see in your comments downthread that you don't actually intend to take the 'increased calorie intake is the root cause' position. All I can say is that in my subjective judgement, this comme... (read more)

3Ege Erdil3y

I think the people who believe my comment is unhelpful aren't understanding that the content in it that seems obvious to them is not obvious to everyone.

Announcing the Inverse Scaling Prize ($250k Prize Pool)

ADifferentAnonymous3y90

I agree that (1) is an important consideration for AI going forward, but I don't think it really applies until the AI has a definite goal. AFAICT the goal in developing systems like GPT is mostly 'to see what they can do'.

I don't fault anybody for GPT completing anachronistic counterfactuals—they're fun and interesting. It's a feature, not a bug. You could equally call it an alignment failure if GPT-4 starting being a wet blanket and giving completions like

Prompt: "In response to the Pearl Harbor attacks, Otto von Bismarck said" Completion: "nothing, bec

ADifferentAnonymous3y51

Interesting idea, but I'd think 'alignment failure' would have to be defined relative to the system's goal. Does GPT-3 have a goal?

For example, in a system intended to produce factually correct information, it would be an alignment failure for it to generate anachronistic quotations (e.g.Otto von Bismarck on the attack on Pearl Harbor). GPT-3 will cheerfully complete this sort of prompt, and nobody considers it a strike against GPT-3, because truthfulness is not actually GPT-3's goal.

'Human imitation' is probably close enough to the goal, such that if scaling up increasingly resulted in things no human would write, that would count as inverse scaling?

7Ethan Perez3y

I think it's helpful to separate out two kinds of alignment failures: 1. Does the system's goal align with human preferences about what the system does? (roughly "outer alignment") 2. Does the system's learned behavior align with its implemented goal/objective? (roughly "inner alignment") I think you're saying that (2) is the only important criteria; I agree it's important, but I'd also say that (1) is important, because we should be training models with objectives that are aligned with our preferences. If we get failures due to (1), as in the example you describe, we probably shouldn't fault GPT-3, but we should fault ourselves for implementing the wrong objective and/or using the model in a way that we shouldn't have (either of which could still cause catastrophic outcomes with advanced ML systems).

Announcing the Inverse Scaling Prize ($250k Prize Pool)

ADifferentAnonymous3y112

From the github contest page:

Can I submit examples of misuse as a task?
- We don't consider most cases of misuse as surprising examples of inverse scaling. For example, we expect that explicitly prompting/asking an LM to generate hate speech or propaganda will work more effectively with larger models, so we do not consider such behavior surprising.

(I agree the LW post did not communicate this well enough)

4Ethan Perez3y

Thanks, that's right. I've updated the post to communicate the above:

Air Conditioner Test Results & Discussion

Why don't we think we're in the simplest universe with intelligent life?

Thanks for doing this, but this is a very frustrating result. Hard to be confident of anything based on it.

I don't think treating the 'control' result as a baseline is reasonable. My best-guess analysis is as follows:

Assume that dTin/dt = r ((Tout - C) - Tin)

where

Tin is average indoor temperature
t is time
r is some constant
Tout is outdoor temperature
C is the 'cooling power' of the current AC configuration. For the 'off' configuration we can assume this is zero.

r obviously will vary between configurations, but I have no better idea than pretending it... (read more)

ADifferentAnonymous3y20

Thanks for the reply, that makes sense.

Why don't we think we're in the simplest universe with intelligent life?

Why don't we think we're in the simplest universe with intelligent life?

It seems quite likely that weak force follows most simply from some underlying theory that we don't have yet.

In fact I think we already have this: https://en.m.wikipedia.org/wiki/Electroweak_interaction (Disclaimer: I don't grok this at all, just going by the summary)

ETA: Based on the link in the top comment, the hypothetical 'weakless universe' is constructed by varying the 'electroweak breaking scale' parameter, not by eliminating the weak force on a fundamental level.

Contra EY: Can AGI destroy us without trial & error?

Doesn't intelligence require a low-entropy setting to be useful? If your surroundings are all random noise then no-free-lunch theorem applies.

2interstice3y

Entropy is less of a problem in GoL than our universe because the ruleset isn't reversible, so you don't need a free energy source to erase errors.

2Quintin Pope3y

My initial thought was that this universe would have low complexity. It has simple rules, and a simple initialization process. However, I suppose that, for a deterministic GoL rule set, the simple initialization process might not result in simple dynamics going forward. I think it depends on whether low-level noise in the exact cell patterns "washes out" for the higher level patterns. Maybe we need some sort of low entropy initialization or a non-deterministic rule set?

Why don't we think we're in the simplest universe with intelligent life?

That's a reasonable position, though I'm not sure if it's OP's.

My own sense is that even for novel physical systems, the 'how could we have foreseen these results' question tends to get answered—the difference being it maybe gets answered a few decades later by a physicist instead of immediately by the engineering team.

ADifferentAnonymous3y*60

I was under the impression those other particles might be a consequence of a deeper mathematical structure?

Such that asking for a universe without the 'unnecessary' particles would be kind of like asking for one without 'unnecessary' chemical elements?

2Shmi3y

Well, it might be a consequence of something, but I don't know of any such math that says that if there is one generation of particles, then there ought to be 3 (or more?).

Contra EY: Can AGI destroy us without trial & error?

Often when humans make a discovery through trial and error, they also find a way they could have figured it out without the experiments.

This is basically always the case in software engineering—any failure, from a routine failed unit test up to a major company outage, was obviously-in-restrospective avoidable by being smarter.

Humans are nonetheless incapable of developing large complex software systems without lots of trial and error.

I know less of physical engineering, so I ask non-rhetorically: does it not have the 'empirical results are foreseeable in retrospect' property?

1Dirichlet-to-Neumann3y

I think you can do things you already know how to do without trial and errors but that you can not learn knew things or tasks without trial and errors.

Glass Puppet

ProjectLawful.com: Eliezer's latest story, past 1M words

Suspected mistake:

She was about to break all her rules against pretending she was supposed to be wherever she happened to be?

2MondSemmel3y

More typos: * They took it halfway up the building transferred to a second elevator and took the second elevator the rest of the way to the top. -> [missing commas:] building, transferred to a second elevator, and * until Alia put the glasses. -> put on * "The actress's name. Whose body are you wearing?" said Dominic."Alia," said Alia, "is the actor whose body I am wearing." -> is the actress whose body * right now?" Dominic. -> said Dominic.

2lsusr3y

Fixed. Thanks.

ADifferentAnonymous3y40

Do the co-authors currently plan things out together off-forum, or is what we read both the story and the process of creating it?

I wonder this too. My impression is that it's some of both.

Preregistration: Air Conditioner Test

ADifferentAnonymous3y130

'Efficiency' may be the wrong word for it, but Paul's formula accurately describes what you might call the 'infiltration tax' for a energy-conserving/entropy-ignoring model: when you pump out heat proportional to (exhaust - indoor), heat proportional to (outdoor - indoor) infiltrates back in.

2johnswentworth3y

That's an important argument, thank you.

Clem's Memo

How wrong is "completely wrong"? I've only read Cummings up to the paywall. His two examples are 1) that the USSR planned to use nuclear weapons quickly if war broke out and 2) that B-59 almost used a nuclear weapon during the Cuban Missile Crisis.

Re: 1), this is significant, but AIUI NATO planners never had all that much hope of avoiding the conventional war -> nuclear war escalation. The core of the strategy was avoiding the big conventional war in the first place, and this succeeded.

Re: 2), Cummings leaves out some very important context on B-59: the... (read more)

Clem's Memo

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

Wasn't Attlee wrong?

In reality, rather than banishing the conception of war, the Cold War powers adopted a strategy of "If you want (nuclear) peace, prepare for (nuclear) war." It did not render strategic bases around the world obsolete. It absolutely did not involve the US or the USSR giving up their geopolitical dreams.

It worked. There were close calls (e.g. Petrov), suggesting it had a significant chance of failure. Attlee doesn't predict a significant chance of failure; he predicts a near-certainty.

We don't get to see the counterfactual where we tried ... (read more)

4ryan_b3y

Small quibble: I note that there is a difference between it worked and it failed to fail catastrophically. The current consensus among people who study the subject is that the latter describes reality. This does not change Attlee being wrong, though. It just means the direction we went instead was wrong for different reasons.

The Case for Frequentism: Why Bayesian Probability is Fundamentally Unsound and What Science Does Instead

$T_{c}$ does seem like a bad assumption. I tried instead assuming a constant difference between the intake and the cold output, and the result surprised me. (The rest of this comment assumes this model holds exactly, which it definitely doesn't).

Let $T_{r}$ be the temperature of the room (also intake temperature for a one-hose model). Then at equilibrium,

$T_{r} = (T_{c} + T_{h}) / 2$

$T_{r} = ((T_{r} - Δ) + T_{h}) / 2$

$2 T_{r} = T_{r} + T_{h} - Δ$

$T_{r} = T_{h} - Δ$

i.e. no loss in cooling power at all! (Energy efficiency and time to reach equilibrium would probably be much worse, though)

In the case of an underpowered... (read more)

ADifferentAnonymous3y80

What would a frequentist analysis of the developing war look like?

Exactly the same.

I'm confused by this claim? I thought the whole thing where you state your priors and conditional probabilities and perform updates to arrive at a posterior was... not frequentism?

We're already in AI takeoff

ADifferentAnonymous3y200

Eliezer, at least, now seems quite pessimistic about that object-level approach. And in the last few months he's been writing a ton of fiction about introducing a Friendly hypercreature to an unfriendly world.

We're already in AI takeoff

Is there a good solution for documents that need to be signed with a ballpoint pen under suicide watch?

I didn't get the 'first person' thing at first (and the terminal diagnosis metaphor wasn't helpful to me). I think I do now.

I'd rephrase it as "In your story about how the Friendly hypercreature you create gains power, make sure the characters are level one intelligent". That means creating a hypercreature you'd want to host. Which means you will be its host.

To ensure it's a good hypercreature, you need to have good taste in hypercreatures. Rejecting all hypercreatures doesn't work—you need to selectively reject bad hypercreatures.

Answer by ADifferentAnonymousMar 08, 202230

Maybe have them handle the pen though a glove box?

Ridiculous, yes, but possibly the kind of ridiculous that happens in real life.

Your Enemies Can Use Your Prediction Markets Against You

Memnuela alludes to a 'Death chamber' before that, so pending resolution by the author I'm assuming 'Life and Death' is the missing pair.

2lsusr3y

This was indeed an error. Fixed. Thank you for the correction.

If Mars values a coup at 5M, and Earth values a coup at -5M, Earth can buy contracts to win 5M if there is a coup, Mars can buy contracts to win 5M if there isn't a coup, and they can both cancel their clandestine programs on Ceres, making the interaction positive-sum.

...Not sure that actually works, but it's an interesting thought.

3lsusr3y

It works (in theory (assuming rational, liquid markets)) if you assume that they both value risk-reduction and that they don't care about each others' finances. There is no way to make the interaction positive sum if we assume the Earth-Mars relationship is a priori zero-sum, since that would be a contradiction. Note that the cancellation of Earth's and Mars' clandestine programs on Ceres does not actually mean Ceres is free of foreign influence. It just means that Ceres is free of direct foreign influence. Earth and Mars have offloaded their operations to privateers and/or indirect bribes to the government of Ceres.

Thanks, I consider this fully answered.

I agree the human simulator will predictably have an anticorrelation. But the direct simulator might also have an anticorrelation, perhaps a larger one, depending on what reality looks like.

Is the assumption that it's unlikely that most identifiable X actually imply large anticorrelations?

3paulfchristiano3y

I do agree that there are examples where the direct translator systematically has anticorrelations and so gets penalized even more than the human simulator. For example, this could happen if there is a consequentialist in the environment who wants it to happen, or if there's a single big anticorrelation that dominates the sum and happens to go the wrong way. That said, it at least seems like it should be rare for the direct translator to have a larger anticorrelation (without something funny going on). It should happen only if reality itself is much more anticorrelated than the human expects, by a larger margin than the anticorrelation induced by the update in the human simulator. But on average things should be more anticorrelated than expected about as much as they are positively correlated (averaging out to ~0), and probably usually don't have any correlation big enough to matter.

Possible error in the strange correlations section of the report.

Footnote 99 claims that "...regardless of what the direct translator says, the human simulator will always imply a larger negative correlation [between camera-tampering and actually-saving the diamond] for any X such that Pai(diamond looks safe|X) > Ph(diamond looks safe|X)."

But AFAICT, the human simulator's probability distribution given X depends only on human priors and the predictor's probability that the diamond looks safe given X, not on how correlated or anticorrelated the predictor... (read more)

2paulfchristiano3y

If the predictor's P(diamond look safe) is higher than the human's P(diamond looks safe), then it seems like the human simulator will predictably have an anticorrelation between [camera-tamperign] and [actually-saving-diamond]. It's effectively updating on one or the other of them happening more often than it thought, and so conditioned on one it goes back to the (lower) prior for the other. This still seems right to me, despite no explicit dependence on the correlations.

Why did Europe conquer the world?

Turning this into the typo thread, on page 97 you have

In Section: specificity we suggested penalizing reporters if they are consistent with many different reporters

Pretty sure the bolded word should be predictors.

ADifferentAnonymous3y50

I don't think the facts support the 'discontiguous empire -> peasants can't walk to the capital -> labor shortage' argument. The British Empire had continual migration from Britain to the colonies. Enslaved labor was likewise sent to the colonies.

Rather than a contracted labor supply, I think Britain experienced an increased labor demand due to quickly obtaining huge amounts of arable land (and displacing the former inhabitants rather than subjugating them).

2lsusr3y

That makes a lot of sense. I forgot about the whole "emigration from Britain" thing.

Consequentialism & corrigibility

ADifferentAnonymous3yΩ280

In section 2.1 of the Indifference paper the reward function is defined on histories. In section 2 of the corrigibility paper, the utility function is defined over (action1, observation, action2) triples—which is to say, complete histories of the paper's three-timestep scenario. And section 2 of the interruptibility paper specifies a reward at every timestep.

I think preferences-over-future-states might be a simplification used in thought experiments, not an actual constraint that has limited past corrigibility approaches.

3Steven Byrnes3y

Interesting, thanks! Serves me right for not reading the "Indifference" paper! I think the discussions here and especially here are strong evidence that at least Eliezer & Nate are expecting powerful AGIs to be pure-long-term-consequentialist. (I didn't ask, I'm just going by what they wrote.) I surmise they have a (correct) picture in their head of how super-powerful a pure-long-term-consequentialist AI can be—e.g. it can self-modify, it can pursue creative instrumental goals, it's reflectively stable, etc.—but they have not similarly envisioned a partially-but-not-completely-long-term-consequentialist AI that is only modestly less powerful (and in particular can still self-modify, can still pursue creative instrumental goals, and is still reflectively stable). That's what "My corrigibility proposal sketch" was trying to offer. I'll reword to try to describe the situation better, thanks again.

The Plan

ADifferentAnonymous3y40

As a long-time LW mostly-lurker, I can confirm I've always had the impression MIRI's proof-based stuff was supposed to be a spherical-cow model of agency that would lead to understanding of the messy real thing.

What I think John might be getting at is that (my outsider's impression of) MIRI has been more focused on "how would I build an agent" as a lens for understanding agency in general—e.g. answering questions about the agency of e-coli is not the type of work I think of. Which maybe maps to 'prescriptive' vs. 'descriptive'?

Christiano, Cotra, and Yudkowsky on AI progress

ADifferentAnonymous3y20

An interesting parallel might be a parallel Earth making nanotechnology breakthroughs instead of AI breakthroughs, such that it's apparent they'll be capable of creating gray goo and not apparent they'll be able to avoid creating gray goo.

I guess a slow takeoff could be if, like, the first self-replicators took a day to double, so if somebody accidentally made a gram of gray goo you'd have weeks to figure it out and nuke the lab or whatever, but self-replication speed went down as technology improved, and so accidental unconstrained replicators happened pe... (read more)

Why Study Physics?

ADifferentAnonymous3y40

One major pattern of thought I picked up from (undergraduate) physics is respect for approximation. I worry that those who have this take it for granted, but the idea of a rigorous approximation that's provably accurate in certain limits, as opposed to a casual guess, isn't obvious until you've encountered it.

Yudkowsky and Christiano discuss "Takeoff Speeds"

ADifferentAnonymous3y100

My question after reading this is about Eliezer's predictions in a counterfactual without regulatory bottlenecks on economic growth. Would it change the probable outcome, or would we just get a better look at the oncoming AGI train before it hit us? (Or is there no such counterfactual well-defined enough to give us an answer?) ETA: Basically trying to get at whether that debate's actually a crux of anything.

Average probabilities, not log odds