Erich_Grunewald

Erich_Grunewald4dQuick Take

At my job on the compute policy team at IAPS, we recently started a Substack that we call The Substrate. I think this could be of interest to some here, since I quite often see discussions on LessWrong around export controls, hardware-enabled mechanisms, security, and other compute-governance-related topics.

Here are the posts we've published so far:

For chip exports, quantity is at least as important as quality, about how to best set AI chip export policy
The case for paying whistleblowers to report on export violations, about the Stop Stealing Our Chips Act
BIS is getting more funding—here's how to spend it, about the upcoming Bureau of Industry and Security budget increase, and what BIS plans

... (read more)

If all humans were turned into high-fidelity mind uploads tomorrow, would we be self-sustaining?

Erich_Grunewald

That is, would we in some sense manage to survive, in the longer term? Presumably we would have to maintain the physical substrate we are running on, by providing power and cooling, and by eventually replacing our hardware.

I think this question could help to answer whether AGI as common defined ("all cognitive labour") would be the same as or different from what would be required by Vitalik Buterin's definition:

"AGI" is AI powerful enough that, if one day all humans suddenly disappeared, and the AI was uploaded into robot bodies, it would be able to independently continue civilization.

Erich_Grunewald19dQuick Take

Some argue that even without misaligned AI, humanity could lose control of societal systems simply by delegating more and more to AI. They would delegate because these future AIs are more capable and faster than humans, and because competitive dynamics pushing everyone to delegate further, until eventually humans have no control over these societal systems.

Delegation ≠ loss of control, though. A principal can delegate to an agent while maintaining control and seeing what the agent does. CEOs and managers do this all the time obviously. So to go from "strong incentive to delegate" to "loss of control", you may need to also argue that humans will be unable to meaningfully oversee what... (read more)

Erich_Grunewald1moQuick Take

Has anyone else noticed a thing recently (the past couple of days) where Claude is extremely reluctant to search the web, and instead is extremely keen to search past conversations or Google Drive and other nonsense like that? Even after updating my system prompt to encourage the former and discourage the latter, it still defaults to the latter. Also, instead of using the web search tool it will sometimes try and fail to search using curl and code execution. Is this just me or is anyone else experiencing similar issues?

Erich_Grunewald1mo

In that case, I think your original statement is very misleading (suggesting as it does that OP/CG funded, and actively chose to fund, Mechanize) and you should probably edit it. It doesn't seem material to the point you were trying to make anyway -- it seems enough to argue that Mechanize had used bad arguments in the past, regardless of the purpose for (allegedly) doing so.

Erich_Grunewald1mo

My guess is the reason this hasn't been discussed is that Mechanize and the founders have been using pretty bad arguments to defend them basically taking openphil money to develop a startup idea.

Do you have a source for your claim that Open Philanthropy (aka Coefficient Giving) funded Mechanize? Or, what work is "basically" doing here?

Replying toDefending Against Model Weight Exfiltration Through Inference Verification

Erich_Grunewald2mo

Defending Against Model Weight Exfiltration Through Inference Verification

Great work!

In February 2024, Sam Altman tweeted that ChatGPT generates 100B words per day, which is about 200 GB of text per day (though this has likely grown significantly). At scale, exfiltration over the output token channel becomes viable, and it's the one channel you can't just turn off. If model weights are 1000Gb, and egress limits are 800GB/day, it will take just 1.25 days to exfiltrate the weights across any channel.

I don't quite follow this -- why would the egress limits be at 800GB/day? Presumably the 200GB of text is spread across multiple data centers, each of which could have its own (lower) egress limit. (I assume you're adding more traffic for non-text data -- is that the other 600GB?) I imagine this could make a difference of an OOM or so, making egress limits quite a bit more appealing (e.g., slowing exfiltration to 10 days rather than 1 day) if true?

The most popular ideas in society all have outgroups that are the enemy.

This is just straightforwardly false, isn't it? The beliefs that suffering is bad, that continents move over geological time, and that Venice is a beautiful city -- none of these have outgroups as the enemy, or in any case, that is not how they came to be universally popular.

Erich_Grunewald2mo

Nice post!

I think this is probably mostly because there's an important sense in which world has been changing more slowly (at least from the perspective of Americans), and the ways in which it's changing feel somehow less real.

Maybe another factor is that a lot of the unbounded, grand, and imaginative thinking of the early 20th and the 19th century ended up either being either unfounded or quite harmful. So maybe the narrower margins of today are in part a reaction to that in addition to being a reaction to fewer wild things happening.

For example, many of the catastrophes of the 20th century (Nazism, Maoism, Stalinism) were founded in a kind of utopian... (read more)

Erich_Grunewald2mo

It wasn't clear to me from the Inkhaven website that you, Ben Pace, and John Wentworth were participating to that degree (though I did mention you three), and I missed aggliu and RobertM. So fair enough, I'll retract my comment. (ETA: I missed aggliu since I didn't know their name and they had only that one LW post in November, and I thought RobertM might be Rob Miles, but none of RobertM's November LW posts seem to be listed among Rob Miles's posts on the Inkhaven website. But obviously you were there and I was not so I defer to you.)

Erich_Grunewald2mo

Of the 16 November posts you cite as evidence of Inkhaven's positive impact, I count three (Mikhail Samin, mingyuan, Ben Goldhaber) that were actually authored by an Inkhaven resident, and one of those three was a post that received a lot of criticism for its perceived low quality. (Another two were authored by habryka and Ben Pace, and another one was authored by johnswentworth who I think tried to post daily in the spirit of the thing, while not actually participating.) I think this is pretty weak evidence that Inkhaven has made LessWrong much more vibrant.

-2

There's a tweet (1,564 likes as I write this) making the rounds that I think is at least half false. Since I don't have a Twitter/X account, I will reply here. The tweet says

Every day I get reminded of the story of how KPD and SPD members would clap when a member of the other party would come into the concentration camps

quote tweeting this tweet:

Not a fan of Tr*mp's to say the least but so far it is unclear there is anyone in his administration as monstrous as Biden's Middle East team.

The source for the KPD-SPD claim seems to be this earlier tweet from February (2,051 likes):

When the first concentration camps for

... (read 397 more words →)

On the advice of @adamShimi, I recently read Hasok Chang's Inventing Temperature. The book is terrific and full of deep ideas, many of which relate in interesting ways to AI safety. What follows are some thoughts on that relationship, from someone who is not an AI safety researcher and only somewhat follows developments there, and who probably got one or two things wrong.

(Definitions: By "operationalizing", I mean "giving a concept meaning by describing it in terms of measurable or closer-to-measurable operations", whereas "abstracting" means "removing properties in the description of an object".)

There has been discussion on LessWrong about the relative value of abstract work on AI safety (e.g., agent foundations) versus concrete... (read 1151 more words →)

Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI

Erich_Grunewald

Summary

In the past year, public fora have seen growing concern about existential risk (henceforth, x-risk) from AI. The thought is that we could see transformative AI in the coming years or decades, that it may be hard to ensure that such systems act with humanity's best interests in mind and that those highly advanced AIs may be able to overpower us if they aimed to do so, or otherwise that such systems may be catastrophically misused. Some have reacted by arguing that concerns about x-risk distract from current harms from AI, like algorithmic bias, job displacement and labour issues, environmental impact and so on. And in opposition to those voices, others have... (read 4929 more words →)

Linkpost: Are Emergent Abilities in Large Language Models just In-Context Learning?

Erich_Grunewald

A new preprint, Lu et al., published last month, has important implications for AI risk if true. It essentially suggests that emergent capabilities of current LLMs are (with some exceptions) mediated by a model's ability to do in-context learning. If so, emergent capabilities may be less unexpected, given that improvements to in-context learning (e.g., via instruction tuning) are relatively predictable.

Wei et al. defines "emergent ability" like so: "An ability is emergent if it is not present in smaller models but is present in larger models." So you can have sudden "jumps" in performance on various tasks as you scale models up, e.g. (from Wei et al.):

Lu et al. is saying, yes, but... (read 469 more words →)

Alpha

Erich_Grunewald

Summary

In the first half of this essay, I recount two anecdotes. (The impatient reader can skip these.) First, Gerald Murnane tries, around Melbourne in the 1950s, to find a system that'll make him money betting on horse races. Then, Bill Benter, in Hong Kong in the 1990s, comes up with a system for picking horses that makes him nearly $1B.

In the second half of this essay, I discuss alpha:

In finance, alpha is (a) excess returns earned on an investment above the market when adjusted for risk, (b) an investor's ability to beat the market, or (sometimes) (c) a strategy or resource that consistently generates excess returns. Outside finance, alpha refers simply to

... (read 4167 more words →)

I'm really confused by this passage from The Six Mistakes Executives Make in Risk Management (Taleb, Goldstein, Spitznagel):

We asked participants in an experiment: “You are on vacation in a foreign country and are considering flying a local airline to see a special island. Safety statistics show that, on average, there has been one crash every 1,000 years on this airline. It is unlikely you’ll visit this part of the world again. Would you take the flight?” All the respondents said they would.
We then changed the second sentence so it read: “Safety statistics show that, on average, one in 1,000 flights on this airline has crashed.” Only 70% of the sample said they

... (read more)

The Prospect of an AI Winter

Erich_Grunewald

Summary

William Eden forecasts an AI winter. He argues that AI systems (1) are too unreliable and too inscrutable, (2) won't get that much better (mostly due to hardware limitations) and/or (3) won't be that profitable. He says, "I'm seeing some things that make me think we are in a classic bubble scenario, and lots of trends that can't clearly continue."
I put 5% on an AI winter happening by 2030, with all the robustness that having written a blog post inspires, and where AI winter is operationalised as a drawdown in annual global AI investment of ≥50%.^[1] (I reckon a winter must feature not only decreased interest or excitement, but always also decreased

... (read 4354 more words →)

Against LLM Reductionism

Erich_Grunewald

Summary

Large language models (henceforth, LLMs) are sometimes said to be "just" shallow pattern matchers, "just" massive look-up tables or "just" autocomplete engines. These comparisons amount to a form of (methodological) reductionism. While there's some truth to them, I think they smuggle in corollaries that are either false or at least not obviously true.
For example, they seem to imply that what LLMs are doing amounts merely to rote memorisation and/or clever parlour tricks, and that they cannot generalise to out-of-distribution data. In fact, there's empirical evidence that suggests that LLMs can learn general algorithms and can contain and use representations of the world similar to those we use.
They also seem to suggest that

... (read 5145 more words →)

140

Notes on Meta's Diplomacy-Playing AI

Erich_Grunewald

Disclaimer: I'm not an expert at machine learning, AI safety or Diplomacy, so there may be errors here, though hopefully no major ones. For previous discussion of CICERO on here, see the comments in this post, this rundown and this commentary.

Summary

CICERO is a new AI developed by Meta AI that achieves good performance at the board game Diplomacy. Diplomacy involves tactical and strategic reasoning as well as natural language communication: players must negotiate, cooperate and occasionally deceive in order to win.
- CICERO comprises (1) a strategic model deciding which moves to make on the board and (2) a dialogue model communicating with the other players.
- CICERO is honest in the sense that the dialogue

... (read 3907 more words →)

Here's what I usually try when I want to get the full text of an academic paper:

Search Sci-Hub. Give it the DOI (e.g. https://doi.org/...) and then, if that doesn't work, give it a link to the paper's page at an academic journal (e.g. https://www.sciencedirect.com/science...).
Search Google Scholar. I can often just search the paper's name, and if I find it, there may be a link to the full paper (HTML or PDF) on the right of the search result. The linked paper is sometimes not the exact version of the paper I am after -- for example, it may be a manuscript version instead of the accepted journal version -- but in my

... (read more)

A few months ago I wrote a post about Game B. The summary:

I describe Game B, a worldview and community that aims to forge a new and better kind of society. It calls the status quo Game A and what comes after Game B. Game A is the activity we’ve been engaged in at least since the dawn of civilisation, a Molochian competition over resources. Game B is a new equilibrium, a new kind of society that’s not plagued by collective action problems.
While I agree that collective action problems (broadly construed) are crucial in any model of catastrophic risk, I think that

civilisations like our current one are not inherently self-terminating (75% confidence);

there

... (read more)

Erich_Grunewald's Shortform

Erich_Grunewald

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Supposing Europe is headed for a serious energy crisis this winter, what can/should one do as an individual to prepare?

Erich_Grunewald

All suggestions welcome, including "nothing" if that's the right answer.

Quick Summaries of Two Papers on Kant and Game Theory

Erich_Grunewald

It's not obvious from a Darwinian point of view why people (even people who aren't related) often cooperate with one another. Researchers like Axelrod and Hamilton (1981) have used game theory to show that some strategies are both adaptive and favour cooperation. Though these projects may not say anything about how we should act towards one another^[1], they do say something about why we do act the way we act. That's important because we'd like moral strategies (under whatever view we think is more plausible) to outcompete immoral strategies; if we notice that various game-theoretic situations tend to steer people away from sound ethics, that's both (1) an indication that these situations... (read 1042 more words →)

LESSWRONG
LW

LESSWRONG
LW

Against LLM Reductionism

Alpha

The Prospect of an AI Winter

How Bad Is QWERTY, Really? A Review of the Literature, such as It Is

Erich_Grunewald

Erich_Grunewald

If all humans were turned into high-fidelity mind uploads tomorrow, would we be self-sustaining?

Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI

Linkpost: Are Emergent Abilities in Large Language Models just In-Context Learning?

Alpha

The Prospect of an AI Winter

Against LLM Reductionism

Notes on Meta's Diplomacy-Playing AI

Erich_Grunewald

Against LLM Reductionism

Alpha

The Prospect of an AI Winter

How Bad Is QWERTY, Really? A Review of the Literature, such as It Is

Erich_Grunewald

Erich_Grunewald

If all humans were turned into high-fidelity mind uploads tomorrow, would we be self-sustaining?

Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI

Linkpost: Are Emergent Abilities in Large Language Models just In-Context Learning?

Alpha

The Prospect of an AI Winter

Against LLM Reductionism

Notes on Meta's Diplomacy-Playing AI

Summary

Summary

Summary

Summary

Summary