re: the request for examples:
This is not an example about "groups" (though my claim was about groups) but: young human kids can't seem to do "nots", such that eg a friend of mine told her toddler "don't touch your eyes" after she saw that the kid had soap on her hands, and the kid immediately touched her eyes; parents generally seem to learn to say things like "keep your hands clasped behind your back" when visiting art museums rather than "don't touch the paintings", etc. Early-stage LLMs were like this too, where e.g. asking for an image "without X" would often yield images with X. So am I if I try to "not think of a pink elephant."
(If toddlers and early LLMs and the less conscious bits of my thinking process are in some ways hive minds, perhaps these constitute examples of "groups"? But it's a stretch.)
Re: groups of human adults: I'm less sure of these examples, but e.g. the "Black Lives Matter" efforts seem to have in some ways inflamed racial tensions; "gain of function" research in biology seems to gain its memetic fitness and funding-acquisition fitness from our desire not to get ill and yet to probably cause illness in expectation given the risk of lab leaks; environmentalist efforts to ban nuclear power seem bad for the environment; outrage about Trump among media-reading mainstream people in ~2016 seemed to me to help amplify his voice and get him elected.
My belief that groups mostly can't make sensible "not-X"-formatted goals stems more from trying to think about mechanisms than from these examples though. I... can see how a being with a single train of planned strategic actions could in principle optimize for "not X." I can't see how a group can. I can see how a group can backchain its way toward some positively-formatted "do Y", via members upvoting and taking an interest in proposals that show parts of how to obtain Y, or of how to obtain "stepping stones" that look like they might help with obtaining Y.
My guess about what's useful to add to the meme-space is the opposite. Groups generally don't know how to make sensible use of "not-X" -formatted subgoals. Instead, groups slowly converge toward having more traction on nouns that others are interested in, such that amplifying "not-X" also amplifies "X", on my best guess.
I suspect it would be good for me to ask these questions of myself more, but I don't. I'm not sure what the barrier is exactly -- maybe a clearer sense of how exactly it would help, or of what exactly are some good triggers for asking the question (though the examples in the OP help), or of what identity/dashboard view I might sustain while regularly asking this. I, like the author, would be curious to hear from others about how often you ask this question, whether the post helped, and what barriers there are / what mileage you've gotten.
Only 14 months later, but: did it provide lasting value?
I appreciate this post (still, two years later). It draws into plain view the argument: "If extreme optimization for anything except one's own exact values causes a very bad world, humans other than oneself getting power should be scary in roughly the same way as a papperclipper getting power should be scary." I find it helpful to have this argument in plainer view, and to contemplate together whether the reply is something like:
re: "the bite of the worry is that I worried this concept was more memetically fit than it was useful."
Hmm. There are two choices that IMO made it memetically fit; I'm curious whether those choices of mine were bad manners. The two choices:
1) I linked my concept to a common English phrase ("believing in"), which made it more referenceable.
2) The specific phrase "believing in" that I picked gets naturally into a bit of a fight with "belief", and "belief" is one of LW's most foundational concepts, and this also made it more referenceable / more natural for me at least to geek out about. (Whereas if I'd given roughly the same model but called it "targets" or "aims" my post would've been less naturally in a fight with "beliefs", and so less salient/referenceable / less natural to compare-and-contrast to the many claims/questions/etc I have stored up around 'beliefs'.)
I think a crux for me about whether this was bad manners (or, alternately phrased, whether discussions will go better or worse if more posts follow similar "manners") is whether the model I share in the post is basically predicts the ordinary English meaning of "believing in". (In my book, ordinary English words and phrases that've survived many generations often map onto robustly useful concepts, at least compared to just-made-up jargon words; and so it's often good to let normal English words/concepts have a lot of effects on how we parse things; they've come by their memetic fitness honestly.) A related crux for me is whether the LW technical term "belief" was/is overshadowing many LWers' ability to understand some of the useful things that normal people are up to with the word "belief".
I appreciate this post, as the basic suggestion looks [easy to implement, absent incentives people claim aren't or shouldn't be there], and so visibly seeing if it is or isn't implemented can help make it more obvious what's going on. (And that works better if the possibility is in common knowledge, eg via this post).
Part of what's left out (on my not-yet-LW-tested picture): why and how the pieces within this "economy of mind" sometimes cohere into a "me", or into a "this project", such that the cohered piece can productively cohere (money / choosing-power / etc) across itself. What caring is, why or how certain kinds of caring let us unite for a long time in the service of something outside ourselves (something that retains this "relationship to the unknown", and also retains "relationship to ourselves and our values").
I keep trying to write a post on "pride" that is meant to help fix this. But I haven't gotten the whole thing cogent, despite sinking several weeks into it spread across months.
My draft opening section of the post on 'pride'
Picture a person who is definitely not a fanatic – someone who cares about many different things, and takes pride in many different things.
Personally, I’m picturing Tiffany Aching from Terry Pratchett’s excellent book The Wee Green Men. (No major spoilers upcoming; but if you do read the book, it might help you get the vibe.)
Our person, let’s say, has lots of different projects, on lots of different scales, that she would intuitively say she “cares about for its own sake”, such as:
Each of these projects does several things at once:
My aim in this essay is to share a model of how (some) minds might work, on which Tiffany Aching is a normal expected instance of “mind-in-this-sense,” and a paperclipper is not.
I continue to use roughly this model often, and to reference it in conversation maybe once/week, and to feel dissatisfied with the writeup ("useful but incorrect somehow or leaving something out").
Oh. Um: I have ideas but not good ones. But I think these or any are probably better than "persuade AIs to be afraid of ...". Examples: