Ok, I was probably not going to write the post anyway, but since no one seems to actively want it, your insistence that it requires this much extra care is enough to dissuade me.
I will say, though, that you may be committing a typical mind fallacy when you say "convincing is >>> costly to complying to the request" in your reply to Zack Davis' comment. I personally dislike doing this kind of lit-review style research because in my experience it's a lot of trudging through bullshit with little payoff, especially in fields like social psychology, and especially when the only guidance I get is "ask ChatGPT for related Buddhist texts". I don't like using ChatGPT (or LLMs in general; it's a weakness of mine I admit). Maybe after a few years of capabilities advances that will change.
And it seems that I was committing a typical mind fallacy as well, since I implicitly thought that when you said "this topic has been covered extensively" you had specific writings in mind, and that all you needed to do was retrieve them and link them. I now realize that this assumption was incorrect, and I'm sorry for making it. It is clear now that I underestimated the cost that would be incurred by you in order to convince me to do said research before making a post.
I hope this concept gets discussed more in places like Lesswrong someday, because I think that there may be a lot of good we can do in preventing this kind of suffering, and the first step to solving a problem is pointing at it. But it seems like now is not the time and/or I am not the person to do that.
By the time you have AIs capable of doing substantial work on AI r&d, they will also be able to contribute effectively to alignment research (including, presumably, secret self-alignment).
Humans do substantial work on AI r&d, but we haven't been very effective at alignment research. (At least, according to the view that says alignment is very hard, which typically also says that basically all of our current "alignment" techniques will not scale at all.)
Even if takeoff is harder than alignment, that problem becomes apparent at the point where the amount of AI labor available to work on those problems begins to explode, so it might still happen quickly from a calendar perspective.
Yup, this is very possible.
Not sure why you replied in three different places. I will (try to) reply to all of them here.
I did this so that you could easily reply to them separately, since they were separate responses.
I do not consider linking to those Aella and Duncan posts a literature review, nor do I consider them central examples of work on this topic.
I did not link them for that reason. I linked them to ask whether my understanding of the general problem you're pointing to is correct: "Especially bad consequences relative to other instances of this mistake because the topic relates to people’s relationship with their experience of suffering and potentially unfair dismissals of suffering, which can very easily cause damage to readers or encourage readers to cause damage to others."
I am not going to do a literature review on your behalf.
Fair. I was simply wondering whether or not you had something to back up your claim that this topic has been covered "quite extensively".
Your explanation of how you will be careful gave me no confidence; the cases I'm worried about are related to people modeling others as undergoing 'fake' suffering, and ignoring their suffering on that basis. This is one of the major nexuses of abuse stumbled into by people interested in cognition. You have to take extreme care not to be misread and wielded in this way, and it just really looks like you have no interest in exercising that care. You're just not going to anticipate all of the different ways this kind of frame can be damaging to someone and forbid them one by one.
I would like to be clear that I do not intend to claim that Newcomblike suffering is fake in any way. Suffering is a subjective experience. It is equally real whether it comes from physical pain, emotional pain, or an initially false belief that quickly becomes true. Hopefully posting it in a place like Lesswrong will keep it mostly away from the eyes of those who will fail to see this point.
I again ask though, how would a literature review help at all?
I'd look at Buddhist accounts of suffering as a starting point.
This does vibe as possibly relevant.
If you're going to invite people to sink hundreds of cumulative person hours into reading your thing, you really should actually try to make it good, and part of that is having any familiarity at all with relevant background material.
I'm not sure how to feel about this general attitude towards posting. I think with most things I would rather err on the side of posting something bad; I think a lot of great stuff goes unwritten because people's standards on themselves are too high (of course, Scott's law of advice reversal applies here, but I think, given I've only posted a handfull of times, I'm on the "doesn't post enough" end of the spectrum). I try to start all of my posts with a TLDR, so that people who aren't interested or who think they might be harmed by my post can steer clear. Beyond this, I think it's the readers' responsibility to avoid content that will harm them or others.
Psych wards are horrible Kafkaesque nightmares, but I don't think they are Out to Get You in the way Zvi describes. Things that are Out to Get You feed on your slack. For example, social media apps consume your attention. Casinos consume your money. They are incentivized to go after those who have a lot of slack to lose ("whales"), and those who have few defenses against their techniques (see Tsvi's comment about desperation.
Psych wards are, to a first approximation, prisons: one of their primary functions is to destroy your slack so that you cannot use it to do something that society at large dislikes. In the prison case: committing crimes; in the psych ward case (for depression): killing yourself. They destroy your slack because they don't want you to have it. Things that Get You consume your slack because they want it for themselves.
How does reviewing literature help avoid this failure mode?
Could you point me to some specific examples of this? Or at least, could you tell me if these seem like correct examples:
If I write a post about Newcomblike suffering, I would probably want to encourage people to escape such situations without hurting others, and emphasize that, even if someone is ~directly inflicting this on you, thinking of it as "their fault" is counterproductive. Hate the game, not the players. They are in traps much the same as yours.
Where might I find such pre-existing literature? I have never seen this discussed before, though it's sort of eluded* to in many of Zvi's posts, especially in the immoral mazes sequence.
I must admit, if you're talking about literature in the world of social psych outside Lesswrong, I don't have much exposure to it, and I don't really consider it worth my time to take a deep dive there, since their standards for epistemic rigor are abysmal.
But if you have pointers to specific pieces of research, I'd love to see them.
*eluded or alluded? idk?
This post seems to presuppose that:
These both seem false to me, or at least, not obviously true.
Many things in the world want you to suffer. Signalling suffering is useful in many social situations. For example, suffering is a sign that one has little slack, and so entities that are out to get you will target those who signal suffering less.
Through Newcomblike self-deception, a person can come to believe that they are suffering. The easiest way to make yourself think that you are suffering is to actually suffer. In this way, the self-deception hyperstitions itself into reality. Perhaps a large amount of human suffering is caused by this.
Solving this problem may be of great interest to those who want to reduce human suffering.
I may write a longer post about this with more details and a more complete argument. If you particularly want this, please comment or dm, as that will make me more likely to write it.
Basically all of it except the user bans are for low quality content (and almost all of that, LLM written, at least these days). It's important to filter stuff like that out and I'm very glad you guys are doing it, but I don't think that this is keeping out many conterfactual Girards and Zizes. (I don't know as much about Torres and Nier).
If we condition on "doesn't post obviously low quality content", we're left with a distribution almost entirely filled with people we want in our community, and yet still contains Girard and Ziz (and I'm guessing the others as well). My ass-numbers prior is at least 20:1 in favor of them being a good fit. Please correct me if I'm wrong about this.
Looking at the each banned user and sorting by "old" on their posts and comments, I usually didn't find anything that would provide much of an update on this prior. Maybe at most a 1:3 likelihood ratio. However, I'm probably less tuned to these sort of things than you and other moderators. Do you know of specific warning signs that {Girard, Nier, Torres, Ziz and co.} exhibited early on that could have been stronger evidence of this?
Edit: where -> we're