Thanks for linking this post. I think it has a nice harmony with Prestige vs Dominance status games.
I agree that this is a dynamic that is strongly shaping AI Safety, but would specify that it's inherited from the non-profit space in general - EA originated with the claim that it could do outcome focused altruism, but.. there's still a lot of room for improvement, and I'm not even sure we're improving.
The underlying dynamics and feedback loops are working against us, and I don't see evidence that core EA funders/orgs are doing more than pay lip service to this problem.
Something in the physical ability of the top-down processes to control the bottom-up ones is damaged, possibly permanently.
Metaphorically, it's like the revolting parts don't just refuse to collaborate anymore; they also blow up some of the infrastructure that was previously used to control them.
This is scary; big if true, would significantly change my own personal strategies and those I endorse to others -a switch from focusing on recovery to rehabilitation/adaptation.
I'd be grateful if you can elaborate on this part of your model and/or point me toward relevant material elsewhere.
Meek people (like me), may not see the worth in undertaking the risk of publicly revealing arguments or preferences. Embarrassment, shame, potentially being shunned for your revealed preferences, and so on -- there are many social risks to being public with your arguments and thought process
2 of the 3 'risks' you highlighted are things you have control over; you are an active participant in your feelings of shame and embarrassment[1], they are strategies 'parts' of you are pursuing to meet your needs, and through inner work[2][3] you can stop relying on these self-limiting strategies.
The 3rd is a feature, not a bug. By and large, anyone who would shun you in this context is someone you want to be shunned by; someone who really isn't worth your time and energy.
The obvious exceptions are for those who find themselves in hostile cultures where revealing certain preferences poses the risk of literal harm.
Epistemic status: assertive/competitive, status blind autist who is having a great time being this way and loves convincing others to dip their toe in the water and give it a try; you might just find yourself enjoying it too :)
The only remedy I know of is to cultivate enjoying being wrong. This involves giving up a good bit of one's self-concept as a highly intelligent individual. This gets easier if you remember that everyone else is also doing their thinking with a monkey brain that can barely chin itself on rationality.
Some thoughts:
I have less trouble with this than most, and the areas where I do notice it arising lead me toward an interesting speculation.
I'm status blind: I very rarely, and mostly only when I was much younger, worry about looking like an idiot/failing publicly etc etc. There is no perceived/felt social cost to me of being wrong, and it often feels good to explicitly call out when I'm wrong in a social context - it feels like finding your way again after being lost.
I generally follow the 'strong opinions, loosely held' strategy - I guess at least partly because the shortest path to the right answer is often to be confidently wrong on the internet and wait for someone to correct you :D
However...
Where I do notice the 'ick field' arising, where I do notice motivated reasoning coming out in force - is in my relationships. Which makes total sense - being 'wrong' about my choice of life partner is hugely costly, so much is built on top of that belief.
Evaluating your relationships is often bad for your relationships; a common piece of relationship advice is 'Don't Keep Score'.
Perhaps relationships are a kind of self-fulfiling self-deception - they work because we engage in motivated reasoning, because we commit 'irrationally'. Or at least this strategy results in better outcomes than we would have otherwise if we'd been more rational.
And with my rough idea of the evolutionary environment, this makes total sense: you don't choose your family, your tribe, often even your partner. If we weren't engaging in a whole bunch of motivated reasoning, the most important foundation of our survival/wellbeing - social bonds - would be significantly weakened.
And that ties in neatly with a common theme in the conversation around 'biases' - that they're features, not bugs.
I am very confused.
My first thought when reading this was 'huh, no wonder they're getting mixed results - they're doing it wrong'.
My second thought when returning to this a day later: good - anything I do to contribute to the ability to understand and measure persuasion is literally directly contributing to dangerous capabilities.
Counterfactually, if we don't create evals for this... are we not expected to notice that LLMs are becoming increasingly more persuasive? More able to model and predict human psychology?
What is actually the 'safety' case for this research? What theory of change predicts this work will be net positive?
Re: 2
Most promising way is just raising children better.
See (which I'm sure you've already read): https://www.lesswrong.com/posts/CYN7swrefEss4e3Qe/childhoods-of-exceptional-people
Alongside that though, I think the next biggest leverage point would be something like nationalising social media and retargeting development/design toward connection and flourishing (as opposed to engagement and profit).
This is one area where, if we didn't have multiple catastrophic time pressures, I'd be pretty optimistic about the future. These are incredibly high impact and tractable levers for changing the world for the better; part of the whole bucket of 'just stop doing the most stupid thing' stuff.
Is there anything useful we can learn from Crypto ASICs as to how this will play out? And specifically, how to actually bet on it?
To the extent that anecdata is meaningful:
I have met somewhere between 100-200 AI Safety people in the past ~2 years; people for whom AI Safety is their 'main thing'.
The vast majority of them are doing tractable/legible/comfortable things. Most are surprisingly naive; have less awareness of the space than I do (and I'm just a generalist lurker who finds this stuff interesting; not actively working on the problem).
Few are actually staring into the void of the hard problems; where hard here is loosely defined as 'unknown unknowns, here be dragons, where do I even start'.
Fewer still progress from staring into the void to actually trying things.
I think some amount of this is natural and to be expected; I think even in an ideal world we probably still have a similar breakdown - a majority who aren't contributing (yet)[1], a minority who are - and I think the difference is more in the size of those groups.
I think it's reasonable to aim for a larger, higher quality, minority; I think it's tractable to achieve progress through mindfully shaping the funding landscape.
Think it's worth mentioning that all newbies are useless, and not all newbies remain newbies. Some portion of the majority are actually people who will progress to being useful after they've gained experience and wisdom.