Curated. This feels like an obvious idea (at least in retrospect), and I haven’t seen anyone else discuss it. The fact that you ran experiments and got interesting results puts this above my bar for curation.
I also appreciated the replies comparing it to ELK and debate paradigms. I’d love to see more discussion in the comments about how it relates to ELK.
I’m not very optimistic about this scaling to smarter models in domains where solutions are harder to verify, but I’m not confident in that take, and I hope I’m wrong. Either way, it likely helps with current models in easier-to-verify domains, and it seems like the implementation is close to ready, which is pretty cool.
Yeah this always bothered me. And worse "expected value" isn't about "value" as in what matters terminally, it's about "value" as in quantity.
Let me know if you wanna go to a sports bar and interact with some common folk some time.
Curated. This does indeed seem like a common kind of bad argument around these parts which has not yet been named. I also appreciate Rohin's comment pointing out that it's not obvious what makes this kind of reasoning bad, as well as David Manheim's comment saying that what is needed is a way to distinguish cases when bounded search works well from cases where bounded search works poorly. More generally, I like content being posted that are about evaluating a kind of reasoning that is common, especially of the sort that inspires interesting engagement and/or disagreement in the replies. I would be excited to see more case studies in when this sort of reasoning works well or poorly, and maybe even a general theory to help us decide when this kind of reasoning tends to work out well, eg, when implemented by superforecasters on many topics.
Curated. I have wanted someone to write out an assessment of how the Risks from Learned Optimization arguments hold up in light of the evidence we have acquired over the last half decade. I particularly appreciated breaking down the potential reasons for risk and assessing to what degree we have encountered each problem, as well as reassessing the chances of running into those problems. I would love to see more posts that take arguments/models/concepts from before 2020, consider what predictions we should have made pre-2020 if these arguments/models/concepts were good, and then reassess them in light of our observations of progress in ML over the last five years.
Curated. This is a simple and obvious argument that I have never heard before with important implications. I have heard similar considerations in conversations about whether someone should take some job at a capabilities lab, or whether some particular safety technique is worth working on, but it's valuable to generalize across those cases and have a central place for discussing the generalized argument.
I would love to see more pushback in the comments from those who are currently working on legible safety problems.
Is this coming just from the models having geographic data in their training? Much less impressive if so but still cool.
To check, do you have particular people in mind for this hypothesis? Seems kinda rude to name them here, but could you maybe send me some guesses privately? I currently don't find this hypothesis as stated very plausible, or like sure maybe, but I think it's a relatively small fraction of the effect.
What would a class aimed at someone like me (read lesswrong for many years, familiar with the basics of LLM architecture and learning to some extent) have to cover to get me up to speed on AI futurism by your lights? I am imagining the output here being like a bulleted list of 12-30 broad thingies.