LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.
The evidence is mostly some circles where this sort of thing is more common and a subjective feeling of it helping. Most of my reasoning for thinking this is good is a mix of "some anecdata, but mostly 'the theory makes a lot of sense to me.'"
(In some close personal relationships, a related social-tech is saying "look, I really need to say out loud how this feels internally to me without having to police myself about whether I'm being fair", and it definitely feels helpful for what would have been an escalating fight into a cooperative process")
But, part of the point of this post is to give an opportunity for people to take it as object and argue about it.
The point of it is not especially to make people feel better (I think it adds a slight saving throw for a conversation escalating more than it needs to, but, like, not an overwhelming one)
It's a rationality norm more than a politeness norm – the point is that it makes it more likely for you to notice that you're doing a rant/uncharitable/psychologizing, and helping other people notice "oh, yes, that happened" and "oh, I guess in this social scene this is a thing you are supposed to notice and flag as costly and not just do willy-nilly."
And, I think having a habit of noticing and tacking on a disclaimer makes it more like you go "hmm, do I actually really need to make this a full fledged rant?" (and write it more carefully) or "is this the psychologizing model the only explanation for why this guy is doing/believing this dumb-looking thing?" (and then actually come up with a second theory and realize you were overconfident in your first theory)
It adds scaffolding for other rationality practice.
The thing I most anticipate backfiring, is people only ever doing the rant-with-disclaimer (which I've seen sometimes accumulating), without every really trying to pay down the debt. I expect that to be aggravating for people on the receiving end.
So, a thing I consider an unsolved-problem in this current thread is to make the memetic-payload here more naturally include "I am taking on a bit of social debt" by doing this.
Nod, to be clear I wasn't at all advocating "we deliberately have it self-modify to avoid money pumps." My whole point was "the incentive towards self-modifying is an important fact about reality to model while you are trying to ensure corrigibility."
i.e. you seem to be talking about "what we're trying to do with the AI", as opposed to "what problems will naturally come up as we attempt to train the AI to be corrigible."
You've stated that you don't think corribility is that hard, if you're trying to build narrow agents. It definitely seems easier if you're building narrow agents, and a lot of my hope does route through using narrower AI to accomplish specific technical things that are hard-but-not-that-hard.
The question is "do we actually have such things-to-accomplish, that Narrow AI can do, that will be sufficient to stop superintelligence being developed somewhere else?"
(Also, I do not get the sense from outside that this is what the Anthropic plan actually is)
The thing I meant to imply was something like:
<uncharitableRant>
[contents of uncharitableRant]
</uncharitableRant>
(the ways I've seen people do this include the complete brackets, or just having them afterwards as a sort of selfaware pseudojoke, or, more commonly, spelling out the whole thing. I don't actually feel very opinionated on how actually you do it)
Do you have a link to existing discussion of "VNM rationality is a dead end" that you think covers it pretty thoroughly?
My offhand gesture of a response is "I get the gist of why VNM rationality assumptions are generally not true in real life and you should be careful about what assumptions you're making here."
But, it seems like whether the next step is "and therefore the entire reasoning chain relying on them is sus enough you should throw it out" vs "the toy problem is still roughly mapping to stuff that is close-enough-to-true that the intuitions probably transfer" depends on the specifics.
I think I do get why baking in assumptions belief/goal decomposition makes sense to be particularly worried about.
I assume there has been past argumentation about this, and whether you think there is a version of the problem-statement that is grappling with the generators of what MIRI was trying to do, but, not making the mistakes you're pointing at here.
I definitely agree it is not the best kind of comment Thomas could have written and I hope it's not representative of the average quality of comment in this discussion, it just seemed to me the LW mod reactions to it were extreme and slightly isolated-demand-for-rigor-y.
(I do want this thread to be one where overall people are some kind of politically careful, but I don't actually have that strong a guess as to what the best norms are. I view this as sort of the prelude to a later conversation with a better-set container)
I think the thing Zack meant was content-free was your response, to Thomas' response, which didn't actually explain the gears of why Thomas' comment felt tramplingly bad.
A few reasons I don't mind the Thomas comment:
should be invulnerable to all money-pumps, which is not a property we need or want.
Something seems interesting about your second paragraph, but, isn't the part of the point here that 'very capable' (to the point of 'can invent important nanotech or whatever quickly'), will naturally push something towards being the sort of agent that will try to self-modify into something that avoids money-pumps, whether you wre aiming for that or not?
fwiw this seems basically what's happening to me. (the comment reads kinda defeatist about it, but, not entirely sure what you were going for, and the model seems right, if incomplete. [edit: I agree that several of the statements about entire groups are not literally true for the entire group, when I say 'basically right' I mean "the overall dynamic is an important gear, and I think among each group there's a substantial chunk of people who are tired in the way Thomas depicts"])
On my own end, when I'm feeling most tribal-ish or triggered, it's when someone/people are looking to me like they are "willfully not getting it". And, I've noticed a few times on my end where I'm sort of willfully not getting it (sometimes while trying to do some kind of intellectual bridging, which I bet is particularly annoying).
I'm not currently optimistic about solving twitter.
The angle I felt most optimistic about on LW is aiming for a state where a few prominent-ish* people... feel like they get understood by each other at the same time, and can chill out at the same time. This maybe works IFF there are some people who:
a) aren't completely burned out on the "try to communicate / actually have a good group epistemic culture about AI" project.
b) are prominent / intellectually-leader-y enough that, if they (a few people on multiple sides/angles of the AI-situation-issue), all chilled out at the same time, it'd meaningfully radiate out and give people more of a sense of "okay things are more chill now."
c) they are willing to actually seriously doublecrux about it (i.e. having some actual open curiosity, both people trying to paraphrase/pass ITTs, both people trying to locate and articulate the cruxes beneath their cruxes, both people making an earnest effort to be open to changing their mind)
Shoulder Eliezer/Nate/JohnW/Rohin/Paul pop up to say "this has been tried, dude, for hundreds of hours", and my response is
(Maybe @Eli Tyre can say if he thinks True Doublecrux has ever been tried on this cluster of topics)
–––
...that was my angle something like 3 months ago. Since then, someone argued another key piece of the model:
There is something in the ecosystem that is going to keep generating prominent Things-Are-Probably-Okay people, no matter how much doublecruxing and changing minds happens. People in fact really want things to be okay, so whenever someone shows up with some kind of sophisticated sounding reasoning for why maybe things are okay, some kind of egregore will re-orient and elevate them to the position of "Things-Are-Probably-Okay-ish intellectual leader". (There might separately be something that keeps generating Things-Probably-Aren't-Okay people, maybe a la Val's model here. I don't think it tends to generate intellectual thought leaders, but, might be wrong).
If the true underlying reality turns out to be "actually, one should be more optimistic about alignment difficulty, or whether leading companies will do reasonable things by default", then, hypothetically, it could resolve in the other direction. But, if there's not a part of a plan that somehow deals with it, it makes sense for Not-Okay-ists to be less invested in actually trying.
–––
Relatedly: new people are going to keep showing up, who haven't been through 100 hours of attempted nuanced arguing, who don't get all the points, and people will keep being annoyed at them.
And there's something like... intergenerational trauma, where the earlier generation of people who have attempted to communicate and are just completely fed up with people not listening, are often rude/dismissive of people who are still in the process of thinking through the issue (though, notably, sometimes while also still kinda willfully not listening), and then the newer generation is like "christ why was that guy so dismissive?"
In particular, the newer person might totally have a valid nontrivial point they are making, but it's entangled with some other point the other person thinks is obviously dumb, so older Not-Okay-ists end up dismissing the whole thing.
–––
Originally this used "pessimist" and "optimist" as the shorthand, but, I decided I didn't like that because it is easier to interpret as "optimism/pessimism as a disposition, rather than a property of your current beliefs", which seemed to do more bad reifying)
Or: the generalized version of this is, "notice when you are doing something you wouldn't endorse doing all the time, and flag it with a quick observation, and apology if it seems like it'd impose costs on others." That seems like a generally good metahabit to me.