Erich_Grunewald's Shortform

Erich_Grunewald

LESSWRONG
LW

Erich_Grunewald's Shortform — LessWrong

Erich_Grunewald's Shortform

by Erich_Grunewald

16th Sep 2022

1 min read

4

This is a special post for quick takes by Erich_Grunewald. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Erich_Grunewald's Shortform

16 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:51 AM

[-]Erich_Grunewald6mo*632

There's a tweet (1,564 likes as I write this) making the rounds that I think is at least half false. Since I don't have a Twitter/X account, I will reply here. The tweet says

Every day I get reminded of the story of how KPD and SPD members would clap when a member of the other party would come into the concentration camps

quote tweeting this tweet:

Not a fan of Tr*mp's to say the least but so far it is unclear there is anyone in his administration as monstrous as Biden's Middle East team.

The source for the KPD-SPD claim seems to be this earlier tweet from February (2,051 likes):

When the first concentration camps for political prisoners were created in Nazi Germany between 33-34, SPD deputy Gerhard Seger reported that KPD prisoners would cheer when the prison guards announced new prisoners of the SPD had arrived and vice-versa

When asked about the source, the author of that February tweet claimed that

You can find it in Seger's A Nation Terrorized, which is his personal report on one of the first concentration camps

A Nation Terrorized is the English title of Oranienburg. Erster authentischer Bericht eines aus dem Konzentrationslager Geflüchteten (my translation: Oranienburg: First Authentic Report from a Concentration Camp Escapee). The relevant passage reads, in full:

One evening at roll call, Sturmbannführer Krüger stepped before the ranks of prisoners and announced that the next day the "complete social democratic bigwig Fritz Ebert" would be delivered, this Marxist swine who belonged to the November criminals who had plunged Germany into disaster, and well, the SA would take care of this pig.

What happened after this speech with its ominous announcement at the end?

Loud cheers of "Bravo!" rang out from the ranks of the communist prisoners!

The communists in question, themselves victims of the SA charlatan standing before them, noisily took the side of their own party enemies, applauding when this National Socialist promised to take action against a Social Democrat!

I can find no other evidence of cheering or celebration of political opponents entering a concentration camp, and no other reports in A Nation Terrorized to that effect. So the original account seems wrong in a few ways:

It suggests this type of thing happened multiple times, whereas there is only one report of it happening once
It says it was reciprocal between both sides, whereas the report only mentioned the KPD doing it to a member of the SPD
It states as a fact what one SPD politician (Seger) reported about his opponents (KPD)

Also, not to excuse the communists, but Fritz Ebert was not just any member of the SPD, he was the son of Friedrich Ebert who in the late 1910s had allied with conservative military forces and right-wing Freikorps units to violently suppress the communists, and who was plausibly most responsible for the murders of Rosa Luxemburg and Karl Liebknecht. The median SPD member would not have gotten the same reception. And finally, the events Seger reports happened in 1933 when the SPD-KPD rivalry was still fresh, and most of the Nazis' worst misdeeds had yet to happen -- for example, it was five years before Kristallnacht.

[-]lc6mo105

An enormous, unconscionable amount of information shared on Twitter/"TPOT" is like this. Plausible sounding anecdotes that get stretched and pixelated through legions of cross-platform and intra-platform quote-tweeting.

[-]Erich_Grunewald14d360

At my job on the compute policy team at IAPS, we recently started a Substack that we call The Substrate. I think this could be of interest to some here, since I quite often see discussions on LessWrong around export controls, hardware-enabled mechanisms, security, and other compute-governance-related topics.

Here are the posts we've published so far:

For chip exports, quantity is at least as important as quality, about how to best set AI chip export policy
The case for paying whistleblowers to report on export violations, about the Stop Stealing Our Chips Act
BIS is getting more funding—here's how to spend it, about the upcoming Bureau of Industry and Security budget increase, and what BIS plans to and should do with the money
Why securing AI model weights isn’t enough, about AI integrity (that is, ensuring AI models don't have backdoors, etc.)

To make this quick take not merely an advertisement, I would also be happy to discuss anything about any of these posts, and/or to hear suggestions for things that we should write about.

[-]Erich_Grunewald7d210

I wrote a post forecasting Chinese compute acquisition in 2026. The very short summary is that I expect about 60% to be legally imported NVIDIA H200s, with domestically produced Huawei Ascends accounting for about 25%, and the remainder being smuggled AI chips and Ascends illegally fabricated outside China via proxies.

While China likely produces GPU dies in quite large quantities, it is likely bottlenecked by an HBM shortage, which limits the total number of Ascend 910Cs and other AI chips that can actually be assembled. I do expect domestic production to grow substantially in 2027 and 2028, as CXMT ramps up HBM production.

In total, I expect China to acquire about 320,000 B300-equivalents (90% CI: 150,000 to 600,000) in 2026, enough to train about six Grok-4-scale models simultaneously. By comparison, the Stargate campus that Oracle has been building for OpenAI in Abilene, Texas will alone house over 300,000 B300-equivalents.

Some Chinese companies are also renting AI chips from non-Chinese cloud providers. For example, according to SemiAnalysis, ByteDance is Oracle’s largest customer; their largest joint cluster, located in Southeast Asia, will perhaps reach about 250,000 B300-equivalents this summer. (I don’t count remote access as “acquisition” since there is no ownership.)

NB. These estimates are quite rough, so take them with a grain of salt. But I think they give a good sense of the general size of these different pathways.

[-]Josh You7d30

Bit of feedback: would be helpful if you explicitly stated your estimated number of H200 and Huawei chips and/or provide a B300-eq conversion table, so they are more comparable to other reports that are quoted just in number of chips. I understand how you do the conversion but it is not super apparent in the post.

[-]Erich_Grunewald3y80

Here's what I usually try when I want to get the full text of an academic paper:

Search Sci-Hub. Give it the DOI (e.g. https://doi.org/...) and then, if that doesn't work, give it a link to the paper's page at an academic journal (e.g. https://www.sciencedirect.com/science...).
Search Google Scholar. I can often just search the paper's name, and if I find it, there may be a link to the full paper (HTML or PDF) on the right of the search result. The linked paper is sometimes not the exact version of the paper I am after -- for example, it may be a manuscript version instead of the accepted journal version -- but in my experience this is usually fine.
Search the web for "name of paper in quotes" filetype:pdf. If that fails, search for "name of paper in quotes" and look at a few of the results if they seem promising. (Again, I may find a different version of the paper than the one I was looking for, which is usually but not always fine.)
Check the paper's authors' personal websites for the paper. Many researchers keep an up-to-date list of their papers with links to full versions.
Email an author to politely ask for a copy. Researchers spend a lot of time on their research and are usually happy to learn that somebody is eager to read it.

[-]Lao Mein3y10

I would add Semantic Scholar to the list. It gives consistantly better search results than Google Scholar and has a better interface. I've also found a really difficult-to-find paper on pre-print websites once or twice.

[-]Erich_Grunewald3y10

Thanks for the suggestion! I'll be trying it out and adding it to the list if I find it useful.

[-]Erich_Grunewald1mo42

Some argue that even without misaligned AI, humanity could lose control of societal systems simply by delegating more and more to AI. They would delegate because these future AIs are more capable and faster than humans, and because competitive dynamics pushing everyone to delegate further, until eventually humans have no control over these societal systems.

Delegation ≠ loss of control, though. A principal can delegate to an agent while maintaining control and seeing what the agent does. CEOs and managers do this all the time obviously. So to go from "strong incentive to delegate" to "loss of control", you may need to also argue that humans will be unable to meaningfully oversee what the AIs do, e.g., because those AIs are too fast and their actions are too complicated for humans to understand. (Again, we're assuming these AIs are intent-aligned, so modulo information, humans can retain control over the AIs.)

I guess to me it isn't at all obvious that all humans would in fact delegate everything to AIs when that means giving up meaningful control. First, there may well exist methods to better aggregate and abstract information for humans so that they can understand enough of what the AIs are doing. Second, most humans would probably be reluctant to give up meaningful control when delegating -- e.g., a CEO would likely be more reluctant to delegate a task or role if they have reason to think they will have no insight into how it's done, or no ability to meaningfully control the employee -- and this seems like it should move the equilibrium away a bit from "delegate everything", even with competitive pressure. But unless all humans do so delegate, some humans will retain meaningful control over the AIs, and arguments about gradual disempowerment look more like arguments about concentration of power.

[-]Vladimir_Nesov1mo72

If CEOs (and boards) are also AIs, the analogy breaks. Humans are currently necessary in such positions, their necessity is sufficient to explain the fact that they are there at all, even if there are other reasons it might be a good thing. The situation changes when a system won't break down without humans in positions of power, it's not clear that these other reasons have any teeth in practice.

This doesn't need to be the case, but only in a sense similar to how humanity doesn't need to build AGIs before it's ready. It's a new affordance, and there is a danger that it gets used irresponsibly and leads to bad outcomes. There should be some understanding of how specifically this won't be happening.

[-]Karl Krueger1mo2-2

For a legally constituted corporation, the role of CEO is not only one of decision-maker, but also blame-taker: if the company goes into decline, the CEO can be fired; if the company does a sufficiently serious crime, the CEO can be prosecuted and punished (think Jeffrey Skilling of Enron). The presence of a human whose reputation (and possibly freedom) depend on the business's conduct, conveys some trustworthiness to other humans (investors, trading partners, creditors).

If a company has an AI agent for its top-level decision-maker, then those decisions are made without this kind of responsibility for the outcome. An AI agent cannot meaningfully be fired or punished; it can be turned off, and some chatbot characters sometimes act to avoid such a fate; but I don't think investors would be wise to count on that.

Now, what about a non-legally-constituted entity, or even a criminal one? Criminal gangs do rely on a big boss to adjudicate disputes, set strategy, and to risk taking a fall if things go sour. But online criminal groups like ransomware gangs or darknet marketplaces might be able to rest their reputation solely on performance rather than on the ability for a human big boss to fall or be punished. I don't know enough about the sociology of these groups to say.

[-]cubefox1mo20

The question with intent alignment is: intent aligned with whom? If the AI executive is intent aligned with (follows orders from) the government, and the human government is voluntarily replaced with an AI government, we are left with an AI that is intent aligned with another AI.

[-]Erich_Grunewald1mo30

Has anyone else noticed a thing recently (the past couple of days) where Claude is extremely reluctant to search the web, and instead is extremely keen to search past conversations or Google Drive and other nonsense like that? Even after updating my system prompt to encourage the former and discourage the latter, it still defaults to the latter. Also, instead of using the web search tool it will sometimes try and fail to search using curl and code execution. Is this just me or is anyone else experiencing similar issues?

[-]Erich_Grunewald3y34

I'm really confused by this passage from The Six Mistakes Executives Make in Risk Management (Taleb, Goldstein, Spitznagel):

We asked participants in an experiment: “You are on vacation in a foreign country and are considering flying a local airline to see a special island. Safety statistics show that, on average, there has been one crash every 1,000 years on this airline. It is unlikely you’ll visit this part of the world again. Would you take the flight?” All the respondents said they would.
We then changed the second sentence so it read: “Safety statistics show that, on average, one in 1,000 flights on this airline has crashed.” Only 70% of the sample said they would take the flight. In both cases, the chance of a crash is 1 in 1,000; the latter formulation simply sounds more risky.

One crash every 1,000 years is only the same as one crash in 1,000 flights if there's exactly one flight per year on average. I guess they must have stipulated that in the experiment (of which there's no citation), because otherwise it's perfectly rational to suppose the first option is safer (since generally an airline serves >1 flight per year)?

[-]Erich_Grunewald1y10

On the advice of @adamShimi, I recently read Hasok Chang's Inventing Temperature. The book is terrific and full of deep ideas, many of which relate in interesting ways to AI safety. What follows are some thoughts on that relationship, from someone who is not an AI safety researcher and only somewhat follows developments there, and who probably got one or two things wrong.

(Definitions: By "operationalizing", I mean "giving a concept meaning by describing it in terms of measurable or closer-to-measurable operations", whereas "abstracting" means "removing properties in the description of an object".)

There has been discussion on LessWrong about the relative value of abstract work on AI safety (e.g., agent foundations) versus concrete work on AI safety (e.g., mechanistic interpretability, prosaic alignment). Proponents of abstract work argue roughly that general mathematical models of AI systems are useful or essential for understanding risks, especially coming from not-yet-existing systems like superintelligences. Proponents of concrete work argue roughly that safety work is more relevant when empirically grounded and subjected to rapid feedback loops. (Note: The abstract-concrete distinction is similar to, but different from, the distinction between applied and basic safety research.)

As someone who has done neither, I think we need both. We need abstract work because we need to build safety mechanisms using generalizable concepts, so that we can be confident that the mechanisms apply to new AI systems and new situations. We need concrete work because we must operationalize the abstract concepts in order to measure them and apply them to actually existing systems. And finally we need work that connects the abstract concepts to the concrete concepts, to see that they are coherent and for each to justify the other.

Chang writes:

The dichotomy between the abstract and the concrete has been enormously helpful in clarifying my thinking at the earlier stages, but I can now afford to be more sophisticated. What we really have is a continuum, or at least a stepwise sequence, between the most abstract and the most concrete. This means that the operationalization of a very abstract concept can proceed step by step, and so can the building-up of a concept from concrete operations. And it may be beneficial to move only a little bit at a time up and down the ladder of abstraction.

Take for example the concept of (capacity for) corrigibility, i.e., the degree to which an AI system can be corrected or shut down. The recent alignment faking paper showed that, in experiments, Claude would sometimes "pretend" to change its behavior when it was ostensibly being trained with new alignment criteria, while not actually changing its behavior. That's an interesting and important result. But (channeling Bridgman) we can only be confident that it applies to the concrete concept of corrigibility measured by the operations used in the experiments -- we have no guarantees that it holds for some abstract corrigibility, or when corrigibility is measured using another set of operations or under other circumstances.

An interesting case study discussed in the book is the development of the abstract concept of temperature by Lord Kelvin (the artist formerly known as William Thomson) in collaboration with James Prescott Joule (of conservation of energy fame). Thomson defined his abstract temperature in terms of work and pressure (which were themselves abstract and needed to be operationalized). He based his definition on the Carnot cycle, an idealized process performed by the theoretical Carnot heat engine. The Carnot heat engine was inspired by actual heat engines, but was fully theoretical -- there was no physical Carnot heat engine that could be used in experiments. In other words, the operationalization of temperature that Thomson invented using the Carnot cycle was an intermediate step that required further operationalization before Thomson's abstract temperature could be connected with experimental data. Chang suggests that, while the Carnot engine was never necessary for developing an abstract concept of temperature, it did help Thomson achieve that feat.

Ok, back to AI safety. So above I said that, for the whole AI thing to go well, we probably need progress on both abstract and concrete AI safety concepts, as well as work to bridge the two. But where should research effort be spent on the margin?

You may think abstract work is useless because it has no error-correcting mechanism when it is not trying to, or is not close to being able to, operationalize its abstract concepts. If it is not grounded in any measurable quantities, it can't be empirically validated. On the other hand, many abstract concepts (such as corrigibility) still make sense today and are currently being studied in the concrete (though they have not yet been connected to fully abstract concepts) despite being formulated before AI systems looked much like they do today.

You may think concrete work is useless because AI changes so quickly that the operations used to measure things today will soon be irrelevant, or more pertinently perhaps, because the superintelligent systems we truly need to align are presumably vastly different from today's AI systems, in their behavior if not in their architecture. In that way, AI is quite different from temperature. The physical nature of temperature is constant in space and time -- if you measure temperature with a specific set of operations (measurement tools and procedures), you would expect the same outcomes regardless of which century or country you do it in -- whereas the properties of AI change rapidly over time and across architectures. On the other hand, timelines seem short, such that AGI may share many similarities with today's AI systems, and it is possible to build abstractions gradually on top of concrete operations.

There is in fact an example from the history of thermometry of extending concrete concepts to new environments without recourse to abstract concepts. In the 18th century, scientists realized that the mercury and air thermometers used then behaved very differently, or could not be used at all due to freezing and melting, for very low and very high temperatures. While they had an intuitive notion that some abstract temperature ought to apply across all degrees of heat or cold, their operationalized temperatures clearly only applied to a limited range of heat and cold. To solve this, they eventually developed different sets of operations for the measurement of temperatures in extreme ranges. For example, Josiah Wedgwood measured very high temperatures in ovens by baking standardized clay cylinders and measuring how much they'd shrunk. These different operations, which yielded measurements of temperature on different scales, were then connected by measuring temperature for both scales (using different operations) in an overlapping range and lining those up. All this was done without an abstract theory of temperature, and while the resulting scale was not on very solid theoretical ground, it was good enough to provide practical value.

Of course, the issue with superintelligence is that, because of e.g., deceptive alignment and gradient hacking, we want trustworthy safety mechanisms and alignment techniques in place well before the system has finished training. That's why we want to tie those techniques to abstract concepts which we are confident will generalize well. But I have no idea what the appropriate resource allocation is across these different levels of abstraction.^[1] Maybe what I want to suggest is that abstract and concrete work is complementary and should strive towards one another. But maybe that's what people have been doing all along?

The most upvoted dialogue topic on an October 2023 post by Ben Pace was "Prosaic Alignment is currently more important to work on than Agent Foundations work", which received 40 agree and 32 disagree votes, suggesting that the general opinion on LessWrong at that time was that the current balance was about right, or that prosaic alignment should get some more resources on the margin. ↩︎

[-]Erich_Grunewald3y10

A few months ago I wrote a post about Game B. The summary:

I describe Game B, a worldview and community that aims to forge a new and better kind of society. It calls the status quo Game A and what comes after Game B. Game A is the activity we’ve been engaged in at least since the dawn of civilisation, a Molochian competition over resources. Game B is a new equilibrium, a new kind of society that’s not plagued by collective action problems.

While I agree that collective action problems (broadly construed) are crucial in any model of catastrophic risk, I think that

civilisations like our current one are not inherently self-terminating (75% confidence);

there are already many resources allocated to solving collective action problems (85% confidence); and

Game B is unnecessarily vague (90% confidence) and suffers from a lack of tangible feedback loops (85% confidence).

I think it can be of interest to some LW users, though it didn't feel on-topic enough to post in full here.

Moderation Log

More from Erich_Grunewald

Curated and popular this week

16Comments

16 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:51 AM

[-]Erich_Grunewald6mo*632

There's a tweet (1,564 likes as I write this) making the rounds that I think is at least half false. Since I don't have a Twitter/X account, I will reply here. The tweet says

Every day I get reminded of the story of how KPD and SPD members would clap when a member of the other party would come into the concentration camps

quote tweeting this tweet:

Not a fan of Tr*mp's to say the least but so far it is unclear there is anyone in his administration as monstrous as Biden's Middle East team.

The source for the KPD-SPD claim seems to be this earlier tweet from February (2,051 likes):

When the first concentration camps for political prisoners were created in Nazi Germany between 33-34, SPD deputy Gerhard Seger reported that KPD prisoners would cheer when the prison guards announced new prisoners of the SPD had arrived and vice-versa

When asked about the source, the author of that February tweet claimed that

You can find it in Seger's A Nation Terrorized, which is his personal report on one of the first concentration camps

One evening at roll call, Sturmbannführer Krüger stepped before the ranks of prisoners and announced that the next day the "complete social democratic bigwig Fritz Ebert" would be delivered, this Marxist swine who belonged to the November criminals who had plunged Germany into disaster, and well, the SA would take care of this pig.

What happened after this speech with its ominous announcement at the end?

Loud cheers of "Bravo!" rang out from the ranks of the communist prisoners!

The communists in question, themselves victims of the SA charlatan standing before them, noisily took the side of their own party enemies, applauding when this National Socialist promised to take action against a Social Democrat!

It suggests this type of thing happened multiple times, whereas there is only one report of it happening once
It says it was reciprocal between both sides, whereas the report only mentioned the KPD doing it to a member of the SPD
It states as a fact what one SPD politician (Seger) reported about his opponents (KPD)

[-]lc6mo105

[-]Erich_Grunewald14d360

Here are the posts we've published so far:

For chip exports, quantity is at least as important as quality, about how to best set AI chip export policy
The case for paying whistleblowers to report on export violations, about the Stop Stealing Our Chips Act
BIS is getting more funding—here's how to spend it, about the upcoming Bureau of Industry and Security budget increase, and what BIS plans to and should do with the money
Why securing AI model weights isn’t enough, about AI integrity (that is, ensuring AI models don't have backdoors, etc.)

To make this quick take not merely an advertisement, I would also be happy to discuss anything about any of these posts, and/or to hear suggestions for things that we should write about.

[-]Erich_Grunewald7d210

NB. These estimates are quite rough, so take them with a grain of salt. But I think they give a good sense of the general size of these different pathways.

[-]Josh You7d30

[-]Erich_Grunewald3y80

Here's what I usually try when I want to get the full text of an academic paper:

Search Sci-Hub. Give it the DOI (e.g. https://doi.org/...) and then, if that doesn't work, give it a link to the paper's page at an academic journal (e.g. https://www.sciencedirect.com/science...).
Search Google Scholar. I can often just search the paper's name, and if I find it, there may be a link to the full paper (HTML or PDF) on the right of the search result. The linked paper is sometimes not the exact version of the paper I am after -- for example, it may be a manuscript version instead of the accepted journal version -- but in my experience this is usually fine.
Search the web for "name of paper in quotes" filetype:pdf. If that fails, search for "name of paper in quotes" and look at a few of the results if they seem promising. (Again, I may find a different version of the paper than the one I was looking for, which is usually but not always fine.)
Check the paper's authors' personal websites for the paper. Many researchers keep an up-to-date list of their papers with links to full versions.
Email an author to politely ask for a copy. Researchers spend a lot of time on their research and are usually happy to learn that somebody is eager to read it.

[-]Lao Mein3y10

[-]Erich_Grunewald3y10

Thanks for the suggestion! I'll be trying it out and adding it to the list if I find it useful.

[-]Erich_Grunewald1mo42

[-]Vladimir_Nesov1mo72

[-]Karl Krueger1mo2-2

[-]cubefox1mo20

[-]Erich_Grunewald1mo30

[-]Erich_Grunewald3y34

I'm really confused by this passage from The Six Mistakes Executives Make in Risk Management (Taleb, Goldstein, Spitznagel):

We asked participants in an experiment: “You are on vacation in a foreign country and are considering flying a local airline to see a special island. Safety statistics show that, on average, there has been one crash every 1,000 years on this airline. It is unlikely you’ll visit this part of the world again. Would you take the flight?” All the respondents said they would.
We then changed the second sentence so it read: “Safety statistics show that, on average, one in 1,000 flights on this airline has crashed.” Only 70% of the sample said they would take the flight. In both cases, the chance of a crash is 1 in 1,000; the latter formulation simply sounds more risky.

[-]Erich_Grunewald1y10

Chang writes:

The dichotomy between the abstract and the concrete has been enormously helpful in clarifying my thinking at the earlier stages, but I can now afford to be more sophisticated. What we really have is a continuum, or at least a stepwise sequence, between the most abstract and the most concrete. This means that the operationalization of a very abstract concept can proceed step by step, and so can the building-up of a concept from concrete operations. And it may be beneficial to move only a little bit at a time up and down the ladder of abstraction.

The most upvoted dialogue topic on an October 2023 post by Ben Pace was "Prosaic Alignment is currently more important to work on than Agent Foundations work", which received 40 agree and 32 disagree votes, suggesting that the general opinion on LessWrong at that time was that the current balance was about right, or that prosaic alignment should get some more resources on the margin. ↩︎

[-]Erich_Grunewald3y10

A few months ago I wrote a post about Game B. The summary:

I describe Game B, a worldview and community that aims to forge a new and better kind of society. It calls the status quo Game A and what comes after Game B. Game A is the activity we’ve been engaged in at least since the dawn of civilisation, a Molochian competition over resources. Game B is a new equilibrium, a new kind of society that’s not plagued by collective action problems.

While I agree that collective action problems (broadly construed) are crucial in any model of catastrophic risk, I think that

civilisations like our current one are not inherently self-terminating (75% confidence);

there are already many resources allocated to solving collective action problems (85% confidence); and

Game B is unnecessarily vague (90% confidence) and suffers from a lack of tangible feedback loops (85% confidence).

I think it can be of interest to some LW users, though it didn't feel on-topic enough to post in full here.

Moderation Log