This post is an informal explainer of our paper which can be found on arxiv. This work was funded by the Advanced Research + Invention Agency (ARIA) Safeguarded AI Programme through project code MSAI-SE01-P005. Introduction There is an intuition that a powerful agent might have to contain some kind of...
As the current Dovetail research fellowship comes to a close, the fellows are giving talks on their projects. All are welcome to join! Unlike the previous cohort talks, these talks will be scheduled one at a time. This is partly because there are too many to do all in one...
This post is about one of the results described in the 2004 paper 'Information-theoretic approach to the study of control systems' by Hugo Touchette and Seth Lloyd.[1] The paper compares 'open-loop' and 'closed-loop' controllers (which we here call 'blind' and 'sighted' policies) for the task of reducing entropy and quantifies...
I recently had a conversation with a friend of a friend who has a very curious child around 5 years of age. I offered to answers some of their questions, since I love helping people understand the world. They sent me eight questions, and I answered them by hand-written letter....
I recently read Red Heart, a spy novel taking place in the core of a Chinese AGI project. Disclaimer that the author is my friend, and that I’m ideologically incentivized to promote stuff about AI safety! That said, I think you should read it. If nothing else, it’s a fun...
Every so often, I have this conversation: > Them: So you know how the other day we talked about whether we should leave for our trip on that sunday or monday? > Me: …doesn’t sound familiar… > Them: And you said it depended on what work you had left to...
I knew I wanted to do science and math from a very early age. And I didn’t want to spend my life investigating just some particular phenomenon; I wanted to understand “everything”. You obviously can’t do that in a literal sense, so I focused on understanding things that were increasingly...