Escaping the Löbian Obstacle
Earlier this year, when looking for an inroad to AI safety, I learned about the Löbian Obstacle, which is a problem encountered by 'purely logical' agents when trying to reason about and trust one another. In the original paper of Yudkowsky and Herreshoff [1], they show that a consequence of Löb's theorem is that an agent X can only "trust" the reasoning of an agent Y with a strictly weaker reasoning system than themselves, where "trust" here means 'formally prove that the conclusions of the other agent's reasoning will be true'. As stated, this looks like a major problem if X is a human trying to build an artificially intelligent system Y, but it's also a problem for any individual (embedded) agent trying to reason about their own future behaviour. I'm not the first person to find this problem counterintuitive, and for good reason. In this article I'm going to explain why a formal (purely syntactic) logic system alone is a poor model of the reasoning of embedded agents, and show that by fixing this, we remove the foundation for the difficulties arising from Löb's theorem. For the uninitiated, there is a handy survey of application of Löb's theorem in AI safety research by Patrick LaVictoire [6]. Pure syntax First, I should explain the formal set-up for applying Löb's theorem to agents. We model an agent's reasoning with a formal language, or logic, which I'll call L. Here I shall make the further assumption that this logic fits (or can be squeezed into) a formal language of the kind logicians are familiar with: the logic consists of some formal symbols or variables A,B,C... along with some logical connectives, operators and quantifiers for combining variables into expressions, or formulas. The agent is also assumed to carry some inference rules for manipulating formulas. Altogether, this data constitutes the syntax of L (its symbolic content and the rules for manipulating those symbols). Since we don't care precisely what the symbols in L refer to, we need go
I can say from first and second-hand experience that a hard part of supervising a PhD or Masters student in research (there are many) is taking someone who lies at one end of the bird-frog spectrum and pushing them to acquire the skills they need from the other end. To get to the point of pursuing research in the first place, you're likely to be either someone technically skilled who can easily work out the fine details of a problem and habitually focuses on examples or someone who has enough of an appreciation for the overarching ideas to be motivated to build them further -- it sounds like you are/were of the... (read more)