I am very confused.
My first thought when reading this was 'huh, no wonder they're getting mixed results - they're doing it wrong'.
My second thought when returning to this a day later: good - anything I do to contribute to the ability to understand and measure persuasion is literally directly contributing to dangerous capabilities.
Counterfactually, if we don't create evals for this... are we not expected to notice that LLMs are becoming increasingly more persuasive? More able to model and predict human psychology?
What is actually the 'safety' case for this research? What theory of change predicts this work will be net positive?
Re: 2
Most promising way is just raising children better.
See (which I'm sure you've already read): https://www.lesswrong.com/posts/CYN7swrefEss4e3Qe/childhoods-of-exceptional-people
Alongside that though, I think the next biggest leverage point would be something like nationalising social media and retargeting development/design toward connection and flourishing (as opposed to engagement and profit).
This is one area where, if we didn't have multiple catastrophic time pressures, I'd be pretty optimistic about the future. These are incredibly high impact and tractable levers for changing the world for the better; part of the whole bucket of 'just stop doing the most stupid thing' stuff.
Is there anything useful we can learn from Crypto ASICs as to how this will play out? And specifically, how to actually bet on it?
Replying to this because it seems a useful addition to the thread; assuming OP already knows this (and more).
1.) And none of the correct counterplays are 'look, my opponent is cheating/look, this game is unfair'. (Scrub mindset)
2.) You know what's more impressive than winning a fair fight? Winning an unfair one. While not always an option, and usually with high risk:reward, beating an opponent who has an assymetric situational advantage is hella convincing; it affords a much higher ceiling (relative to a 'fair' game) to demonstrate just how much better than your opponent you are.
It's an interesting framework, I can see it being useful.
I think it's more useful when you consider both high-decoupling and low-decoupling to be failure modes, more specifically: when one is dominant and the other is neglected, you reliably end up with inacccurate beliefs.
You went over the mistakes of low-decouplers in your post, and provided a wonderful example of a high-decoupler mistake too!
High decouplers will notice that, holding preferences constant, offering people an additional choice cannot make them worse off. People will only take the choice if its better than any of their current options
Aside from https://thezvi.wordpress.com/2017/08/12/choices-are-really-bad/ there's also the consideration of what choice I offer you, or how I frame the choice (see Kahneman's stuff).
And that's just considering it from the individual psychological level, but there are social/cultural levers and threads to pull here too.
I think the optimal functioning of this process is cyclical with both high decoupling phases and highly integrated phases, and the measure of balance is something like 'this isn't obviously wrong in either context'.
I think future technology all has AI as a pre-requisite?
My high conviction hot take goes further: I think all positive future timelines have AI as a pre-requisite. I expect that, sans AI, our future - our immediate future: decades, not centuries - is going to be the ugliest, and last, chapter in our civilization's history.
I have been in the position of trying to moderate a large and growing community - it was at 500k users last I checked, although I threw in the towel around 300k - and I know what a thankless, sisyphean task it is.
I know what it is to have to explain the same - perfectly reasonable - rule/norm again and again and again.
I know what it is to try to cultivate and nurture a garden while hordes of barbarians trample all over the place.
But...
If it aint broke, don't fix it.
I would argue that the majority of the listed people penalized are net contributors to lesswrong, including some who are strongly net positive.
I've noticed y'all have been tinkering in this space for a while, I think you're trying super hard to protect lesswrong from the eternal september and you actually seem to be succeeding, which is no small feat, buuut...
I do wonder if the team needs a break.
I think there's a thing that happens to gardeners (and here I'm using that as a very broad archetype), where we become attached to and identify with the work of weeding - of maintaining, of day after day holding back entropy - and cease to take pleasure in the garden itself.
As that sets in, even new growth begins to seem like a weed.
Fine. You win. Take your upvote.
Big fan of both of your writings, this dialogue was a real treat for me.
I've been trying to find a satisfying answer to the seeming inverse correlation of 'wellbeing' and 'agency' (these are very loose labels).
You briefly allude to a potential mechanism for this[1]
You also briefly allude to another mechanism with explanatory power for the inverse[2] - i.e. that while it might seem an individual is highly agentic, they are in fact little more than a host for a highly agentic egregore
I'm engaged in that most quixotic endeavour of actually trying to save the world[3] [4], and thus I'm constantly playing with my world model and looking for levers to pull, dominos to push over, that might plausibly -and quickly- shift probability mass towards pleasant timelines.
I think germ theory is exactly the kind of intervention that works here - it's a simple map that even a child can understand, yet it's a 100x impact.
I think there's some kind of 'germ theory for minds', and I think we already have all the pieces - we just need to put them together in the right way. I think it's plausible that this is easy, rapidly scaleable and instrumentally valuable to other efforts in the 'save the world' space.
But... I don't want to end up net negative on agency. In fact my primary objective is to end up strongly net positive. I need more people trying to change the world, not less.
Yet... that scale of ambition seems largely the preserve of people you'd be highly unlikey to describe as 'enlightened', 'balanced' or 'well adjusted'; it seems to require a certain amount of delusion to even (want to) try, and benefit from unbalanced schema that are willing to sacrifice everything on the altar of success.
Most of the people who seem to succcessfully change the world are the people I least want to; whereas the people I most want to change the world seem the least likely to.
Since the schools that removed social conditioning and also empowered practitioners to upend the social order, tended to get targeted for destruction. (Or at least so I suspect and some people on Twitter said "yes this did happen" when I speculated this out loud.)
In the Buddhist model of human psychology, we are by default colonized by parasitic thought patterns, though I guess in some cases, like the aforementioned fertility increasing religious memes, they should be thought of as symbiotes with a tradeoff, such as degrading the hosts' episteme.
I don't expect to succeed, I don't expect to even matter, but it's a fun hobby.
Also the world does actually seem to be in rather urgent need of saving; short of a miracle or two it seems like I'm unlikely to live to enjoy my midlife crisis.
See Also: Catch-22