Arturs — LessWrong

I am a Technical AI Governance researcher with interests in animal ethics, multilingual AI capabilities and safety, compute governance, and the economics of transformative AI. My background includes over 10 years of experience spanning project management, quantitative risk analysis and model validation in finance, and research in economics. I am also the founder and chair of the board at 𝘌𝘧𝘧𝘦𝘤𝘵𝘪𝘷𝘦 𝘈𝘭𝘵𝘳𝘶𝘪𝘴𝘮 𝘓𝘢𝘵𝘷𝘪𝘢 and a board member of the animal advocacy organization 𝘋𝘻𝘪̄𝘷𝘯𝘪𝘦𝘬𝘶 𝘣𝘳𝘪̄𝘷𝘪̄𝘣𝘢.

These frontier models could still be vulnerable to stealth (e.g. “sleeper agent”) attacks, specialist models, and stealth attacks by specialist models. The balance depends on the ability gap – if the top model is way ahead of others, then maybe defence dominates attack efforts. But a big ability gap does not seem to be playing out, instead there are several frontier models near-frontier, and lots of (more or less) open source stuff not far behind.

Seems like a stark case of contrast between Bayesianism and the way a frequentist might approach things. I.e. do not reject the null hypothesis of no significant probability until convinced by evidence, either formal arguments or by seeing real-life mishaps. Labeling something as having P(x)~0 probably helps to compartmentalize things, focus to other tasks at hand. But can lead to huge risks being neglected, like in this case of AI Alignment.

Edit: "premortem" seems like a useful exercise to align mind & gut

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments