Preamble > Sleeping Beauty volunteers to undergo the following experiment and is told all of the following details: On Sunday she will be put to sleep. Once or twice, during the experiment, Sleeping Beauty will be awakened, interviewed, and put back to sleep with an amnesia-inducing drug that makes her...
Polished from my shortform. Epistemic Status Thinking out loud. Introduction I've argued that system wide/total optimisation for an objective function in the real world is so computationally intractable as to be prohibited by the laws of physics of our universe[1]. Yet it's clearly the case that e.g., evolution is optimising...
Related: * Contra "Strong Coherence" * why assume AGIs will optimize for fixed goals * Why The Focus on Expected Utility Maximisers? Background and Core Concepts I operationalised "strong coherence" as: > Informally: a system has immutable terminal goals. > > Semi-formally: a system's decision making is well described as...
Something I do frequently is run into a block of LaTeX that I'm not sure I fully grasp. At such a point, what I'd like to do is just copy the equation and paste it to GPT-4 and have it explain it to me step by step, then iterate on...
Preamble I heavily recommend @beren's "Deconfusing Direct vs Amortised Optimisation". It's a very important conceptual clarification that has changed how I think about many issues bearing on technical AI safety. Currently, it's the most important blog post I've read this year. This sequence (if I get around to completing it)...
Disclaimer I wrote this three months ago, and abandoned it[1]. I currently do not plan to return to it anytime soon, but I have nonetheless found myself wanting of a document I could point to for why I'm not particularly enthused by or overly concerned with "recursive self improvement". As...
@beren discusses the assumption that intelligent systems would be well factored into a world model, objectives/values and a planning system. He highlights that this factorisation doesn't describe intelligent agents created by ML systems (e.g. model free RL) well. Model free RL agents don't have cleanly factored architectures but tend to...