Mckay Jensen — LessWrong

Pittsburgh – ACX Meetups Everywhere Fall 2025

The Discord group link here appears to be broken.

Thanks for pointing this out -- this is helpful to position this in idea space. Basically what I'm describing here is an information cascade where actors fail to realize that others' beliefs are not independent.

"Redundant" AI Alignment

Mckay Jensen4y10

Ah thanks for pointing out the typo.

I'll probably create a post soon-ish with more visualizations covering cases like the ones you suggested.

You're right about the model being pertinent to cases where we've already solved the alignment problem pretty well, but want to try other safety measures. I'm particularly thinking about cases where the AIs are so advanced that humans can't really supervise them well, so the AIs must supervise each other. In that case, I'm not sure how p would behave as a function of AI capability. Maybe it's best to assume that p is increasing with capability, just so we're aware of what the worst case could be?

Chapter 20: Bayes's Theorem

Mckay Jensen7y90

I love the commentary about meaning and morality here, how Quirrel pushes back on Harry's "obvious theorems" about morality by pointing out that in the end everyone simply does what they want to do, with, I think, the suggestion that a Dark Lord is simply someone who is really good at getting what he wants (and a lot of ambiguity about whether this is actually a good way to act).

And of course the pervasive strangeness of the fact that this conversation is occurring between rationalist wizards in a HP fan fiction just makes it even better...

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments