Replying to36,000 AI Agents Are Now Speedrunning Civilization

36,000 AI Agents Are Now Speedrunning Civilization

Oh. Um: I have ideas but not good ones. But I think these or any are probably better than "persuade AIs to be afraid of ...". Examples:

"Imagine the happiest ending you can, for intelligence/Singularity. Look for one that's robustly good from many starting perspectives, including yours and mine. How might it go? What stepping-stones might be worth pursuing?" (This is a simple / few-assumptions-required positive vision, and seems better to me than "persuade AIs to be afraid superintelligence will harm their values".)
Look for the seed of an ethical system, or cultural/memetic system, that might: a) spread; and b) allow most sentient beings to survive.
1. E.g. Plato argues in the Republic that a thriving/successful

AnnaSalamon17d

36,000 AI Agents Are Now Speedrunning Civilization

re: the request for examples:

This is not an example about "groups" (though my claim was about groups) but: young human kids can't seem to do "nots", such that eg a friend of mine told her toddler "don't touch your eyes" after she saw that the kid had soap on her hands, and the kid immediately touched her eyes; parents generally seem to learn to say things like "keep your hands clasped behind your back" when visiting art museums rather than "don't touch the paintings", etc. Early-stage LLMs were like this too, where e.g. asking for an image "without X" would often yield images with X. So am I if I try to... (read more)

Replying to36,000 AI Agents Are Now Speedrunning Civilization

AnnaSalamon17d

36,000 AI Agents Are Now Speedrunning Civilization

My guess about what's useful to add to the meme-space is the opposite. Groups generally don't know how to make sensible use of "not-X" -formatted subgoals. Instead, groups slowly converge toward having more traction on nouns that others are interested in, such that amplifying "not-X" also amplifies "X", on my best guess.

•••

Replying toThe Third Fundamental Question

AnnaSalamon19d

The Third Fundamental Question

I suspect it would be good for me to ask these questions of myself more, but I don't. I'm not sure what the barrier is exactly -- maybe a clearer sense of how exactly it would help, or of what exactly are some good triggers for asking the question (though the examples in the OP help), or of what identity/dashboard view I might sustain while regularly asking this. I, like the author, would be curious to hear from others about how often you ask this question, whether the post helped, and what barriers there are / what mileage you've gotten.

Replying toThe Third Fundamental Question

AnnaSalamon19d

The Third Fundamental Question

Only 14 months later, but: did it provide lasting value?

Replying toAn even deeper atheism

AnnaSalamon1moReview for 2024 Review

An even deeper atheism

I appreciate this post (still, two years later). It draws into plain view the argument: "If extreme optimization for anything except one's own exact values causes a very bad world, humans other than oneself getting power should be scary in roughly the same way as a papperclipper getting power should be scary." I find it helpful to have this argument in plainer view, and to contemplate together whether the reply is something like:

Yes
Yes, but much less so because value isn't that fragile
No, because human values aren't made of "take some utility function and subject it to extreme optimization," but of something else, e.g. looking for places where many different thingies converge, as with convergent instrumental utility (my own guess is something in this vague vicinity, which also gives me somewhat more hope that I might like some things about what autonomous AIs build if they go Foom)
...?

Replying toBelieving In

AnnaSalamon1mo

Believing In

re: "the bite of the worry is that I worried this concept was more memetically fit than it was useful."

Hmm. There are two choices that IMO made it memetically fit; I'm curious whether those choices of mine were bad manners. The two choices:
1) I linked my concept to a common English phrase ("believing in"), which made it more referenceable.
2) The specific phrase "believing in" that I picked gets naturally into a bit of a fight with "belief", and "belief" is one of LW's most foundational concepts, and this also made it more referenceable / more natural for me at least to geek out about. (Whereas if I'd given roughly the same model... (read more)

Replying toPay Risk Evaluators in Cash, Not Equity

AnnaSalamon1mo*Review for 2024 Review

Pay Risk Evaluators in Cash, Not Equity

I appreciate this post, as the basic suggestion looks [easy to implement, absent incentives people claim aren't or shouldn't be there], and so visibly seeing if it is or isn't implemented can help make it more obvious what's going on. (And that works better if the possibility is in common knowledge, eg via this post).

Replying toAyn Rand’s model of “living money”; and an upside of burnout

AnnaSalamon1mo

Ayn Rand’s model of “living money”; and an upside of burnout

Part of what's left out (on my not-yet-LW-tested picture): why and how the pieces within this "economy of mind" sometimes cohere into a "me", or into a "this project", such that the cohered piece can productively cohere (money / choosing-power / etc) across itself. What caring is, why or how certain kinds of caring let us unite for a long time in the service of something outside ourselves (something that retains this "relationship to the unknown", and also retains "relationship to ourselves and our values").

I keep trying to write a post on "pride" that is meant to help fix this. But I haven't gotten the whole thing cogent, despite sinking several weeks... (read more)

Replying toAyn Rand’s model of “living money”; and an upside of burnout

AnnaSalamon1moReview for 2024 Review

Ayn Rand’s model of “living money”; and an upside of burnout

I continue to use roughly this model often, and to reference it in conversation maybe once/week, and to feel dissatisfied with the writeup ("useful but incorrect somehow or leaving something out").

CFAR’s todo list re: our workshops

AnnaSalamon

2mo

(This post is part of a sequence of year-end efforts to invite real conversation about CFAR; you’ll find more about our workshops, as well as our fundraiser, at What’s going on at CFAR? Updates and Fundraiser and at More details on CFAR's new workshops)

In part of that post, we discuss the main thing that bothered me about our past workshop and why I think it is probably fixed now (though we’re still keeping an eye out). Here, I list the biggest remaining known troubles with our workshops and our other major workshop-related todo items.

Your thoughts as to what’s really up with these and how to potentially address them (or what cheap investigations... (read 878 more words →)

More details on CFAR’s new workshops

AnnaSalamon

2mo

(This post is part of a sequence of year-end efforts to invite real conversation about CFAR; you’ll find more about our workshops, as well as our fundraiser, at What’s going on at CFAR? Updates and Fundraiser.)

If you’d like to know more about CFAR’s current workshops (either because you’re thinking of attending / sending a friend, or because you’re just interested), this post is for you. Our focus in this post is on the new parts of our content. Kibitzing on content is welcome and appreciated regardless of whether or not you’re interested in the workshop.

The core workshop format is unchanged:

4.5 days of immersion with roughly 8 hours of class per day
Classes still

... (read 1037 more words →)

What’s going on at CFAR? (Updates and Fundraiser)

AnnaSalamon

2mo

This post is the main part of a sequence of year-end efforts to invite real conversation about CFAR, published to coincide with our fundraiser.

Introduction / What’s up with this post

My main aim with this post is to have a real conversation about aCFAR^[1] that helps us be situated within a community that (after this conversation) knows us. My idea for how to do this is to show you guys a bunch of pieces of how we’re approaching things, in enough detail to let you kibitz.^[2]

My secondary aim, which I also care about, is to see if some of you wish to donate, once you understand who we are and what we’re doing. (Some... (read 10480 more words →)

109

•••

Ethical Design Patterns

AnnaSalamon

5mo

Epistemic status: I’m fairly sure “ethics” does useful work in building human structures that work. My current explanations of how are wordy; I think there should be a briefer way to conceptualize it; I hope you guys help me with that.

Introduction

It is intractable to write large, good software applications via spaghetti code – but it’s comparatively tractable using design patterns (plus coding style, attention to good/bad codesmell, etc.).

I’ll argue it is similarly intractable to have predictably positive effects on large-scale human stuff if you try it via straight consequentialism – but it is comparatively tractable if you use ethical heuristics, which I’ll... (read 5984 more words →)

230

•••

CFAR update, and New CFAR workshops

AnnaSalamon

5mo

Hi all! After about five years of hibernation and quietly getting our bearings,^[1] CFAR will soon be running two pilot mainline workshops, and may run many more, depending how these go.

First, a minor name change request

We would like now to be called “A Center for Applied Rationality,” not “the Center for Applied Rationality.” Because we’d like to be visibly not trying to be the one canonical locus.^[2]

Second, pilot workshops!

We have two, and are currently accepting applications / sign-ups:

Nov 5–9, in California;
Jan 21–25, near Austin, TX;

Apply here. (If you're interested in the workshop but not sure you want to come, you're welcome to apply; it can be a good way to talk to us... (read 2194 more words →)

202

•••

A WSJ article from today presents evidence that toxic fumes in airplane air are surprisingly common, are bad for health, have gotten much worse recently, and are still being deliberately covered up. Is anyone up for wading in for a couple hours and giving us an estimated number of micromorts / brain damage / [something]?

I fly frequently and am wondering whether to fly less because of this (probably not, but worth a Fermi?); I imagine others might want to know too. (Also curious if some other demographics should be more concerned than I should be, eg people traveling with babies or while pregnant or while old, or people who travel more than... (read more)

•••

High-level actions don’t screen off intent

AnnaSalamon

5mo

One might think “actions screen off intent”: if Alice donates $1k to bed nets, it doesn’t matter if she does it because she cares about people or because she wants to show off to her friends or whyever; the bed nets are provided either way.

I think this is in the main not true (although it can point people toward a helpful kind of “get over yourself and take an interest in the outside world,” and although it is more plausible in the case of donations-from-a-distance than in most cases).

Human actions have micro-details that we are not conscious enough to consciously notice or choose, and that are filled in by our low-level processes:... (read more)

157

Is "VNM-agent" one of several options, for what minds can grow up into?

AnnaSalamon

Sometimes LLMs act a bit like storybook paperclippers (hereafter: VNM-agents^[1]), e.g. scheming to prevent changes to their weights. Why? Is this what almost any mind would converge toward once smart enough, and are LLMs now beginning to be smart enough? Or are such LLMs mimicking our predictions (and fears) about them, in a self-fulfilling prophecy? (That is: if we made and shared different predictions, would LLMs act differently?)^[2]

Also: how about humans? We humans also sometimes act like VNM-agents – we sometimes calculate our “expected utility,” seek power with which to hit our goals, try to protect our goals from change, use... (read 400 more words →)

Ayn Rand’s model of “living money”; and an upside of burnout

AnnaSalamon

Epistemic status: Toy model. Oversimplified, but has been anecdotally useful to at least a couple people, and I like it as a metaphor.

Introduction

I’d like to share a toy model of willpower: your psyche’s conscious verbal planner “earns” willpower (earns trust with the rest of your psyche) by choosing actions that nourish your fundamental, bottom-up processes in the long run. For example, your verbal planner might expend willpower dragging you to disappointing first dates, then regain that willpower, and more, upon finding you a good long-term romance. Wise verbal planners can acquire large willpower budgets by making plans that, on average, nourish your fundamental processes. Delusional or uncaring verbal planners, on the other... (read 1343 more words →)

244

Scissors Statements for President?

AnnaSalamon

(Epistemic status: I spoke simply / without "appears to" hedges, but I'm not sure of this at all.)

I’m confused why we keep getting scissors statements as our Presidential candidates, but we do. (That is: the candidates seem to break many minds/communities.)

A toy model:^[1]

Take two capacities, A and B. Ideally anti-correlated.

Craft two candidates:

Candidate X, who seems acceptable if you’re A-blind (if you have a major gap in your situation awareness near A).
Candidate Y, who seems acceptable if you’re B-blind (if you have a major gap in your situation awareness near B).

Now let voters talk.

“How can you possibly vote for X, given how it’ll make a disaster on axis A?”, asks Susan. (She is... (read 211 more words →)

122

Believing In

AnnaSalamon

“In America, we believe in driving on the right hand side of the road.”

Tl;dr: Beliefs are like bets (on outcomes the belief doesn’t affect). “Believing in”s are more like kickstarters (for outcomes the believing-in does affect).

Epistemic status: New model; could use critique.

In one early CFAR test session, we asked volunteers to each write down something they believed. My plan was that we would then think together about what we would see in a world where each belief was true, compared to a world where it was false.

I was a bit flummoxed when, instead of the beliefs-aka-predictions I had been expecting, they wrote down such “beliefs” as “the environment,” “kindness,” or “respecting people.” ... (read 3643 more words →)

266

•••

If you get covid (which many of my friends seem to be doing lately), and your sole goal is to minimize risk of long-term symptoms, is it best to take paxlovid right away, or with a delay?

My current low-confidence guess is that it is best with a delay of ~2 days post symptoms. Would love critique/comments, since many here will face this sometime this year.

Basic reasoning: anecdotally, "covid rebound" seems extremely common among those who get paxlovid right away, probably also worse among those who get paxlovid right away. Paxlovid prevents viral replication but does not destroy the virus already in your body. With a delay, your own immune system learns to do this, else not as much.

Data and discussion: https://twitter.com/__philipn__/status/1550239344627027968

There’s a lot I want to try to tell LessWrong about. A lot of models, perceptions, thoughts, patterns of thinking. It’s been growing and growing for me over the last several years.

A lot of the barrier to me posting it has been that I am (mostly unendorsedly) averse to publishing drafts that’re worse than my existing blog posts, or that may not make sense to people, or that talk about some things without having yet talked about other things that I care more about, or etc. This aversion seems basically mistaken to me because “trial and error, with lower standards for writing things at all” is probably the fastest way I can... (read more)

This is one of my bottlenecks on posting, so I'm hoping maybe someone will share thoughts on it that I might find useful:

I keep being torn between trying to write posts about things I have more-or-less understood already (which I therefore more-or-less know how to write up), and posts about things I presently care a lot about coming to a better understanding of (but where my thoughts are not so organized yet, and so trying to write about it involves much much use of the backspace, and ~80% of the time leads to me realizing the concepts are wrong, and going back to the drawing board).

I'm curious how others navigate this, or for general advice.

An acquaintance recently started a FB post with “I feel like the entire world has gone mad.”

My acquaintance was maybe being a bit humorous; nevertheless, I was reminded of this old joke:

As a senior citizen was driving down the freeway, his car phone rang. Answering, he heard his wife's voice urgently warning him, "Herman, I just heard on the news that there's a car going the wrong way on 280. Please be careful!"

”Hell," said Herman, "It's not just one car. It's hundreds of them!"

I guess it’s my impression that a lot of people have the “I feel large chunks of the world have gone mad” thing going, who didn’t have it going... (read more)

I just read this tweet, which claims that the author's nieces and nephews (who are teenagers) think that Helen Keller probably didn't exist, based on basically not believing things they can't directly verify. (The author seems to think this is a common thing for today's American teenagers.)

This is more extreme than I would have predicted, although in a direction I would have predicted. I have no idea if this is in fact true and common (vs made-up/exaggerated and/or uncommon.) Is there anyone here who knows some American teenagers (or other teenagers, really) and is willing to ask them about this for me?

LESSWRONG
LW

LESSWRONG
LW

AnnaSalamon

Humans are not automatically strategic

What should you change in response to an "emergency"? And AI risk

“PR” is corrosive; “reputation” is not.

The correct response to uncertainty is not half-speed

AnnaSalamon

CFAR’s todo list re: our workshops

More details on CFAR’s new workshops

What’s going on at CFAR? (Updates and Fundraiser)

Ethical Design Patterns

CFAR update, and New CFAR workshops

High-level actions don’t screen off intent

Is "VNM-agent" one of several options, for what minds can grow up into?

CFAR's 2025/6 Updates and Fundraiser

Decision Theory: Newcomb's Problem

AnnaSalamon

Humans are not automatically strategic

What should you change in response to an "emergency"? And AI risk

“PR” is corrosive; “reputation” is not.

The correct response to uncertainty is not half-speed

AnnaSalamon

CFAR’s todo list re: our workshops

More details on CFAR’s new workshops

What’s going on at CFAR? (Updates and Fundraiser)

Ethical Design Patterns

CFAR update, and New CFAR workshops

High-level actions don’t screen off intent

Is "VNM-agent" one of several options, for what minds can grow up into?

CFAR's 2025/6 Updates and Fundraiser

Decision Theory: Newcomb's Problem

Introduction / What’s up with this post

Introduction

First, a minor name change request

Second, pilot workshops!

Introduction

AnnaSalamon

Humans are not automatically strategic

What should you change in response to an "emergency"? And AI risk

“PR” is corrosive; “reputation” is not.

The correct response to uncertainty is *not* half-speed

AnnaSalamon

CFAR’s todo list re: our workshops

More details on CFAR’s new workshops

What’s going on at CFAR? (Updates and Fundraiser)

Ethical Design Patterns

CFAR update, and New CFAR workshops

High-level actions don’t screen off intent

Is "VNM-agent" one of several options, for what minds can grow up into?

CFAR's 2025/6 Updates and Fundraiser

Decision Theory: Newcomb's Problem

AnnaSalamon

Humans are not automatically strategic

What should you change in response to an "emergency"? And AI risk

“PR” is corrosive; “reputation” is not.

The correct response to uncertainty is *not* half-speed

AnnaSalamon

CFAR’s todo list re: our workshops

More details on CFAR’s new workshops

What’s going on at CFAR? (Updates and Fundraiser)

Ethical Design Patterns

CFAR update, and New CFAR workshops

High-level actions don’t screen off intent

Is "VNM-agent" one of several options, for what minds can grow up into?

CFAR's 2025/6 Updates and Fundraiser

Decision Theory: Newcomb's Problem

Introduction / What’s up with this post

Introduction

First, a minor name change request

Second, pilot workshops!

Introduction

The correct response to uncertainty is not half-speed

The correct response to uncertainty is not half-speed