All of ADifferentAnonymous's Comments + Replies

Many complex physical systems are still largely modelled empirically (ad-hoc models validated using experiments) rather than it being possible to derive them from first principles. While physicists sometimes claim to derive things from first principles, in practice these derivations often ignore a lot of details which still has to be justified using experiments.

The argument here seems to be "humans have not yet discovered true first-principles justifications of the practical models, therefore a superintelligence won't be able to either".

I agree that not... (read more)

Had it turned out that the brain was big because blind-idiot-god left gains on the table, I'd have considered it evidence of more gains lying on other tables and updated towards faster takeoff.

I mean, sure, but I doubt that e.g. Eliezer thinks evolution is inefficient in that sense.

Basically, there are only a handful of specific ways we should expect to be able to beat evolution in terms of general capabilities, a priori:

  • Some things just haven't had very much time to evolve, so they're probably not near optimal. Broca's area would be an obvious candidate, and more generally whatever things separate human brains from other apes.
  • There's ways to nonlocally redesign the whole system to jump from one local optimum to somewhere else.
  • We're optimizing a
... (read more)

I agree the blackbody formula doesn't seem that relevant, but it's also not clear what relevance Jacob is claiming it has. He does discuss that the brain is actively cooled. So let's look at the conclusion of the section:

Conclusion: The brain is perhaps 1 to 2 OOM larger than the physical limits for a computer of equivalent power, but is constrained to its somewhat larger than minimal size due in part to thermodynamic cooling considerations.

If the temperature-gradient-scaling works and scaling down is free, this is definitely wrong. But you explicitly flag... (read more)

My gloss of the section is 'you could potential make the brain smaller, but it's the size it is because cooling is expensive in a biological context, not necessarily because blind-idiot-god evolution left gains on the table'

I tentatively buy that, but then the argument says little-to-nothing about barriers to AI takeoff. Like, sure, the brain is efficient subject to some constraint which doesn't apply to engineered compute hardware. More generally, the brain is probably efficient relative to lots of constraints which don't apply to engineered compute hardw... (read more)

The capabilities of ancestral humans increased smoothly as their brains increased in scale and/or algorithmic efficiency. Until culture allowed for the brain’s within-lifetime learning to accumulate information across generations, this steady improvement in brain capabilities didn’t matter much. Once culture allowed such accumulation, the brain’s vastly superior within-lifetime learning capacity allowed cultural accumulation of information to vastly exceed the rate at which evolution had been accumulating information. This caused the human sharp left turn.

... (read more)

Upvoted mainly for the 'width of mindspace' section. The general shard theory worldview makes a lot more sense to me after reading that.

Consider a standalone post on that topic if there isn't one already.

I feel that there's something true and very important here, and (as the post acknowledges) it is described very imperfectly.

One analogy came to mind for me that seems so obvious that I wonder if you omitted it deliberately: a snare trap. These very literally work by removing any slack the victim manages to create.

2Valentine
I'm just not familiar with snare traps. A quick search doesn't give me the sense that it's a better analogy than entropy or technical debt. But maybe I'm just not gleaning its nature. In any case, not an intentional omission.

There's definitely something here.

I think it's a mistake to conflate rank with size. The point of the whole spherical-terrarium thing is that something like 'the presidency' is still just a human-sized nook. What makes it special is the nature of its connections to other nooks.

Size is something else. Big things like 'the global economy' do exist, but you can't really inhabit them—at best, you can inhabit a human-sized nook with unusually high leverage over them.

That said, there's a sense in which you can inhabit something like 'competitive Tae Kwon Do' or ... (read more)

Maybe it's an apple of discord thing? You claim to devote resources to a good cause, and all the other causes take it as an insult?

If you really want to create widespread awareness of the broad definition, the thing to do would be to use the term in all the ways you currently wouldn't.

E.g. "The murderer realized his phone's GPS history posed a significant infohazard, as it could be used to connect him to the crime."

If Bostrom's paper is our Schelling point, 'infohazard' encompasses much more than just the collectively-destructive smallpox-y sense.

Here's the definition from the paper.

Information hazard: A risk that arises from the dissemination or the potential dissemination of (true) information that may cause harm or enable some agent to cause harm.

'Harm' here does not mean 'net harm'. There's a whole section on 'Adversarial Risks', cases where information can harm one party by benefitting another party:

In competitive situations, one person’s information can cause h

... (read more)
7hairyfigment
Yeah, that concept is literally just "harmful info," which takes no more syllables to say than "infohazard," and barely takes more letters to write. Please do not use the specialized term if your actual meaning is captured by the English term, the one which most people would understand immediately.
4Zach Stein-Perlman
I kinda agree. I still think Bostrom's "infohazard" is analytically useful. But that's orthogonal. If you think other concepts are more useful, make up new words for them; Bostrom's paper is the Schelling point for "infohazard." In practice, I'm ok with a broad definition because when I say "writing about that AI deployment is infohazardous" everyone knows what I mean (and in particular that I don't mean the 'adversarial risks' kind).

I agree that there's a real sense in which the genome cannot 'directly' influence the things on the bulleted list. But I don't think 'hardcoded circuitry' is the relevant kind of 'direct'.

Instead, I think we should be asking whether genetic changes can produce isolated effects on things on that list.

E.g. If there can be a gene whose only observable-without-a-brain-scan effect is to make its carriers think differently about seeking power, that would indicate that the genome has fine-grained control at the level of concepts like 'seeking power'. I think this... (read more)

Some context: what we ultimately want to do with this line of investigation is figure out how to influence the learned values and behaviors of a powerful AI system. We're kind of stuck here because we don't have direct access to such an AI's learned world model. Thus, it would be very good if there were a way to influence an intelligence's learned values and behaviors without requiring direct world model access. 

Instead, I think we should be asking whether genetic changes can produce isolated effects on things on that list.

I and Alex agree that there ... (read more)

Important update from reading the paper: Figure A3 (the objective and subjective outcomes chart) is biased against the cash-receiving groups and can't be taken at face value. Getting money did not make everything worse. The authors recognize this; it's why they say there was no effect on the objective outcomes (I previously thought they were just being cowards about the error bars).

The bias is from an attrition effect: basically, control-group members with bad outcomes disproportionately dropped out of the trial. Search for 'attrition' in the paper to see ... (read more)

4JBlack
20% of the control group dropped out of the study while only 10% or 12% of those receiving cash did. This is a major between-groups attrition effect! I'm not as sure that there were no positive effects: even a very small proportion dropping out due to severe negative effects of poverty would completely reverse the effects they were able to measure. This probably doesn't affect the finding that the amount of cash received ($500 or $2000) did not greatly affect measurable outcomes, though. The attrition rate was very similar between those two groups. I notice that the majority of extra spending during the trial was not even approximately tracked, as it fell into the uninformative category of "transfers". Not nearly enough to verify whether the previously stated intended use of the money was actually borne out in practice. That said, the major intended spending categories that were reported substantially more often in the payment groups than control (housing and bills, about +10% each) almost certainly do fall under "debt".

Note that after day 120 or so, all three groups' balances decline together. Not sure what that's about.

1Daniel Tilkin
The control group goes up, and then back down. The payments were made in waves, and it doesn't say the exact dates. But it's possible that they were on average made a little bit before a stimulus payment.

The latter issue might become more tractable now that we better understand how and why representations are forming, so we could potentially distinguish surprisal about form and surprisal about content.

I would count that as substantial progress on the opaqueness problem.

1jem-mosig
To be clear: I don't have strong confidence that this works, but I think this is something worth exploring.

The ideal gas law describes relations between macroscopic gas properties like temperature, volume and pressure. E.g. "if you raise the temperature and keep volume the same, pressure will go up". The gas is actually made up of a huge number of individual particles each with their own position and velocity at any one time, but trying to understand the gas's behavior by looking at long list of particle positions/velocities is hopeless.

Looking at a list of neural network weights is analogous to looking at particle positions/velocities. This post claims there are quantities analogous to pressure/volume/temperature for a neutral network (AFAICT it does not offer an intuitive description of what they are)

2jem-mosig
I did not write down the list of quantities because you need to go through the math to understand most of them. One very central object is the neural tangent kernel, but there are also algorithm projectors, universality classes, etc., each of which require a lengthy explanation that I decided to be beyond the scope of this post.

I've downvoted this comment; in light of your edit, I'll explain why. Basically, I think it's technically true but unhelpful.

There is indeed "no mystery in Americans getting fatter if we condition on the trajectory of mean calorie intake", but that's a very silly thing to condition on. I think your comment reads as if you think it's a reasonable thing to condition on.

I see in your comments downthread that you don't actually intend to take the 'increased calorie intake is the root cause' position. All I can say is that in my subjective judgement, this comme... (read more)

3Ege Erdil
I think the people who believe my comment is unhelpful aren't understanding that the content in it that seems obvious to them is not obvious to everyone.

I agree that (1) is an important consideration for AI going forward, but I don't think it really applies until the AI has a definite goal. AFAICT the goal in developing systems like GPT is mostly 'to see what they can do'.

I don't fault anybody for GPT completing anachronistic counterfactuals—they're fun and interesting. It's a feature, not a bug. You could equally call it an alignment failure if GPT-4 starting being a wet blanket and giving completions like

Prompt: "In response to the Pearl Harbor attacks, Otto von Bismarck said" Completion: "nothing, bec

... (read more)

Interesting idea, but I'd think 'alignment failure' would have to be defined relative to the system's goal. Does GPT-3 have a goal?

For example, in a system intended to produce factually correct information, it would be an alignment failure for it to generate anachronistic quotations (e.g.Otto von Bismarck on the attack on Pearl Harbor). GPT-3 will cheerfully complete this sort of prompt, and nobody considers it a strike against GPT-3, because truthfulness is not actually GPT-3's goal.

'Human imitation' is probably close enough to the goal, such that if scaling up increasingly resulted in things no human would write, that would count as inverse scaling?

7Ethan Perez
I think it's helpful to separate out two kinds of alignment failures: 1. Does the system's goal align with human preferences about what the system does? (roughly "outer alignment") 2. Does the system's learned behavior align with its implemented goal/objective? (roughly "inner alignment") I think you're saying that (2) is the only important criteria; I agree it's important, but I'd also say that (1) is important, because we should be training models with objectives that are aligned with our preferences. If we get failures due to (1), as in the example you describe, we probably shouldn't fault GPT-3, but we should fault ourselves for implementing the wrong objective and/or using the model in a way that we shouldn't have (either of which could still cause catastrophic outcomes with advanced ML systems).

From the github contest page:

  1. Can I submit examples of misuse as a task?
    • We don't consider most cases of misuse as surprising examples of inverse scaling. For example, we expect that explicitly prompting/asking an LM to generate hate speech or propaganda will work more effectively with larger models, so we do not consider such behavior surprising.

(I agree the LW post did not communicate this well enough)

4Ethan Perez
Thanks, that's right. I've updated the post to communicate the above:

Thanks for doing this, but this is a very frustrating result. Hard to be confident of anything based on it.

I don't think treating the 'control' result as a baseline is reasonable. My best-guess analysis is as follows:

Assume that dTin/dt = r ((Tout - C) - Tin)

where

  • Tin is average indoor temperature
  • t is time
  • r is some constant
  • Tout is outdoor temperature
  • C is the 'cooling power' of the current AC configuration. For the 'off' configuration we can assume this is zero.

r obviously will vary between configurations, but I have no better idea than pretending it... (read more)

It seems quite likely that weak force follows most simply from some underlying theory that we don't have yet.

In fact I think we already have this: https://en.m.wikipedia.org/wiki/Electroweak_interaction (Disclaimer: I don't grok this at all, just going by the summary)

ETA: Based on the link in the top comment, the hypothetical 'weakless universe' is constructed by varying the 'electroweak breaking scale' parameter, not by eliminating the weak force on a fundamental level.

Doesn't intelligence require a low-entropy setting to be useful? If your surroundings are all random noise then no-free-lunch theorem applies.

2interstice
Entropy is less of a problem in GoL than our universe because the ruleset isn't reversible, so you don't need a free energy source to erase errors.
2Quintin Pope
My initial thought was that this universe would have low complexity. It has simple rules, and a simple initialization process. However, I suppose that, for a deterministic GoL rule set, the simple initialization process might not result in simple dynamics going forward. I think it depends on whether low-level noise in the exact cell patterns "washes out" for the higher level patterns.  Maybe we need some sort of low entropy initialization or a non-deterministic rule set?

That's a reasonable position, though I'm not sure if it's OP's.

My own sense is that even for novel physical systems, the 'how could we have foreseen these results' question tends to get answered—the difference being it maybe gets answered a few decades later by a physicist instead of immediately by the engineering team.

I was under the impression those other particles might be a consequence of a deeper mathematical structure?

Such that asking for a universe without the 'unnecessary' particles would be kind of like asking for one without 'unnecessary' chemical elements?

2Shmi
Well, it might be a consequence of something, but I don't know of any such math that says that if there is one generation of particles, then there ought to be 3 (or more?).

Often when humans make a discovery through trial and error, they also find a way they could have figured it out without the experiments.

This is basically always the case in software engineering—any failure, from a routine failed unit test up to a major company outage, was obviously-in-restrospective avoidable by being smarter.

Humans are nonetheless incapable of developing large complex software systems without lots of trial and error.

I know less of physical engineering, so I ask non-rhetorically: does it not have the 'empirical results are foreseeable in retrospect' property?

1Dirichlet-to-Neumann
I think you can do things you already know how to do without trial and errors but that you can not learn knew things or tasks without trial and errors.

Suspected mistake:

She was about to break all her rules against pretending she was supposed to be wherever she happened to be?

2MondSemmel
More typos: * They took it halfway up the building transferred to a second elevator and took the second elevator the rest of the way to the top. -> [missing commas:] building, transferred to a second elevator, and * until Alia put the glasses. -> put on * "The actress's name. Whose body are you wearing?" said Dominic."Alia," said Alia, "is the actor whose body I am wearing." -> is the actress whose body * right now?" Dominic. -> said Dominic.
2lsusr
Fixed. Thanks.

Do the co-authors currently plan things out together off-forum, or is what we read both the story and the process of creating it?

I wonder this too. My impression is that it's some of both.

'Efficiency' may be the wrong word for it, but Paul's formula accurately describes what you might call the 'infiltration tax' for a energy-conserving/entropy-ignoring model: when you pump out heat proportional to (exhaust - indoor), heat proportional to (outdoor - indoor) infiltrates back in.

2johnswentworth
That's an important argument, thank you.

How wrong is "completely wrong"? I've only read Cummings up to the paywall. His two examples are 1) that the USSR planned to use nuclear weapons quickly if war broke out and 2) that B-59 almost used a nuclear weapon during the Cuban Missile Crisis.

Re: 1), this is significant, but AIUI NATO planners never had all that much hope of avoiding the conventional war -> nuclear war escalation. The core of the strategy was avoiding the big conventional war in the first place, and this succeeded.

Re: 2), Cummings leaves out some very important context on B-59: the... (read more)

Wasn't Attlee wrong?

In reality, rather than banishing the conception of war, the Cold War powers adopted a strategy of "If you want (nuclear) peace, prepare for (nuclear) war." It did not render strategic bases around the world obsolete. It absolutely did not involve the US or the USSR giving up their geopolitical dreams.

It worked. There were close calls (e.g. Petrov), suggesting it had a significant chance of failure. Attlee doesn't predict a significant chance of failure; he predicts a near-certainty.

We don't get to see the counterfactual where we tried ... (read more)

4ryan_b
Small quibble: I note that there is a difference between it worked and it failed to fail catastrophically. The current consensus among people who study the subject is that the latter describes reality. This does not change Attlee being wrong, though. It just means the direction we went instead was wrong for different reasons.

 does seem like a bad assumption. I tried instead assuming a constant difference between the intake and the cold output, and the result surprised me. (The rest of this comment assumes this model holds exactly, which it definitely doesn't).

Let  be the temperature of the room (also intake temperature for a one-hose model). Then at equilibrium,

i.e. no loss in cooling power at all! (Energy efficiency and time to reach equilibrium would probably be much worse, though)

In the case of an underpowered... (read more)

  1. What would a frequentist analysis of the developing war look like?

Exactly the same.

I'm confused by this claim? I thought the whole thing where you state your priors and conditional probabilities and perform updates to arrive at a posterior was... not frequentism?

Eliezer, at least, now seems quite pessimistic about that object-level approach. And in the last few months he's been writing a ton of fiction about introducing a Friendly hypercreature to an unfriendly world.

I didn't get the 'first person' thing at first (and the terminal diagnosis metaphor wasn't helpful to me). I think I do now.

I'd rephrase it as "In your story about how the Friendly hypercreature you create gains power, make sure the characters are level one intelligent". That means creating a hypercreature you'd want to host. Which means you will be its host.

To ensure it's a good hypercreature, you need to have good taste in hypercreatures. Rejecting all hypercreatures doesn't work—you need to selectively reject bad hypercreatures.

Answer by ADifferentAnonymous30

Maybe have them handle the pen though a glove box?

Ridiculous, yes, but possibly the kind of ridiculous that happens in real life.

Memnuela alludes to a 'Death chamber' before that, so pending resolution by the author I'm assuming 'Life and Death' is the missing pair.

2lsusr
This was indeed an error. Fixed. Thank you for the correction.

If Mars values a coup at 5M, and Earth values a coup at -5M, Earth can buy contracts to win 5M if there is a coup, Mars can buy contracts to win 5M if there isn't a coup, and they can both cancel their clandestine programs on Ceres, making the interaction positive-sum.

...Not sure that actually works, but it's an interesting thought.

3lsusr
It works (in theory (assuming rational, liquid markets)) if you assume that they both value risk-reduction and that they don't care about each others' finances. There is no way to make the interaction positive sum if we assume the Earth-Mars relationship is a priori zero-sum, since that would be a contradiction. Note that the cancellation of Earth's and Mars' clandestine programs on Ceres does not actually mean Ceres is free of foreign influence. It just means that Ceres is free of direct foreign influence. Earth and Mars have offloaded their operations to privateers and/or indirect bribes to the government of Ceres.

Thanks, I consider this fully answered.

I agree the human simulator will predictably have an anticorrelation. But the direct simulator might also have an anticorrelation, perhaps a larger one, depending on what reality looks like.

Is the assumption that it's unlikely that most identifiable X actually imply large anticorrelations?

3paulfchristiano
I do agree that there are examples where the direct translator systematically has anticorrelations and so gets penalized even more than the human simulator. For example, this could happen if there is a consequentialist in the environment who wants it to happen, or if there's a single big anticorrelation that dominates the sum and happens to go the wrong way. That said, it at least seems like it should be rare for the direct translator to have a larger anticorrelation (without something funny going on). It should happen only if reality itself is much more anticorrelated than the human expects, by a larger margin than the anticorrelation induced by the update in the human simulator. But on average things should be more anticorrelated than expected about as much as they are positively correlated (averaging out to ~0), and probably usually don't have any correlation big enough to matter.

Possible error in the strange correlations section of the report.

Footnote 99 claims that "...regardless of what the direct translator says, the human simulator will always imply a larger negative correlation [between camera-tampering and actually-saving the diamond] for any X such that Pai(diamond looks safe|X) > Ph(diamond looks safe|X)."

But AFAICT, the human simulator's probability distribution given X depends only on human priors and the predictor's probability that the diamond looks safe given X, not on how correlated or anticorrelated the predictor... (read more)

2paulfchristiano
If the predictor's P(diamond look safe) is higher than the human's P(diamond looks safe), then it seems like the human simulator will predictably have an anticorrelation between [camera-tamperign] and [actually-saving-diamond]. It's effectively updating on one or the other of them happening more often than it thought, and so conditioned on one it goes back to the (lower) prior for the other. This still seems right to me, despite no explicit dependence on the correlations.

Turning this into the typo thread, on page 97 you have

In Section: specificity we suggested penalizing reporters if they are consistent with many different reporters

Pretty sure the bolded word should be predictors.

I don't think the facts support the 'discontiguous empire -> peasants can't walk to the capital -> labor shortage' argument. The British Empire had continual migration from Britain to the colonies. Enslaved labor was likewise sent to the colonies.

Rather than a contracted labor supply, I think Britain experienced an increased labor demand due to quickly obtaining huge amounts of arable land (and displacing the former inhabitants rather than subjugating them).

2lsusr
That makes a lot of sense. I forgot about the whole "emigration from Britain" thing.

In section 2.1 of the Indifference paper the reward function is defined on histories. In section 2 of the corrigibility paper, the utility function is defined over (action1, observation, action2) triples—which is to say, complete histories of the paper's three-timestep scenario.  And section 2 of the interruptibility paper specifies a reward at every timestep.

I think preferences-over-future-states might be a simplification used in thought experiments, not an actual constraint that has limited past corrigibility approaches.

3Steven Byrnes
Interesting, thanks! Serves me right for not reading the "Indifference" paper! I think the discussions here and especially here are strong evidence that at least Eliezer & Nate are expecting powerful AGIs to be pure-long-term-consequentialist. (I didn't ask, I'm just going by what they wrote.) I surmise they have a (correct) picture in their head of how super-powerful a pure-long-term-consequentialist AI can be—e.g. it can self-modify, it can pursue creative instrumental goals, it's reflectively stable, etc.—but they have not similarly envisioned a partially-but-not-completely-long-term-consequentialist AI that is only modestly less powerful (and in particular can still self-modify, can still pursue creative instrumental goals, and is still reflectively stable). That's what "My corrigibility proposal sketch" was trying to offer. I'll reword to try to describe the situation better, thanks again.

As a long-time LW mostly-lurker, I can confirm I've always had the impression MIRI's proof-based stuff was supposed to be a spherical-cow model of agency that would lead to understanding of the messy real thing.

What I think John might be getting at is that (my outsider's impression of) MIRI has been more focused on "how would I build an agent" as a lens for understanding agency in general—e.g. answering questions about the agency of e-coli is not the type of work I think of. Which maybe maps to 'prescriptive' vs. 'descriptive'?

An interesting parallel might be a parallel Earth making nanotechnology breakthroughs instead of AI breakthroughs, such that it's apparent they'll be capable of creating gray goo and not apparent they'll be able to avoid creating gray goo.

I guess a slow takeoff could be if, like, the first self-replicators took a day to double, so if somebody accidentally made a gram of gray goo you'd have weeks to figure it out and nuke the lab or whatever, but self-replication speed went down as technology improved, and so accidental unconstrained replicators happened pe... (read more)

One major pattern of thought I picked up from (undergraduate) physics is respect for approximation. I worry that those who have this take it for granted, but the idea of a rigorous approximation that's provably accurate in certain limits, as opposed to a casual guess, isn't obvious until you've encountered it.

My question after reading this is about Eliezer's predictions in a counterfactual without regulatory bottlenecks on economic growth. Would it change the probable outcome, or would we just get a better look at the oncoming AGI train before it hit us? (Or is there no such counterfactual well-defined enough to give us an answer?) ETA: Basically trying to get at whether that debate's actually a crux of anything.

Oof, rookie mistake. I retract the claim that averaging log odds is 'the correct thing to do' in this case

Still—unless I'm wrong again—the average log odds would converge to the correct result in the limit of many forecasters, and the average probabilities wouldn't? Making the post title bad advice in such a case?

(Though median forecast would do just fine)

+1 to the question.

My current best guess at an answer:

There are easy safe ways, but not easy safe useful-enough ways. E.g. you could make your AI output DNA strings for a nanosystem and absolutely do not synthesize them, just have human scientists study them, and that would be a perfectly safe way to develop nanosystems in, say, 20 years instead of 50, except that you won't make it 2 years without some fool synthesizing the strings and ending the world. And more generally, any pathway that relies on humans achieving deep understanding of the pivotal act will take more than 2 years, unless you make 'human understanding' one of the AI's goals, in which case the AI is optimizing human brains and you've lost safety.

1[comment deleted]
Load More