Morgan_Rogers

Escaping the Löbian Obstacle

Earlier this year, when looking for an inroad to AI safety, I learned about the Löbian Obstacle, which is a problem encountered by 'purely logical' agents when trying to reason about and trust one another. In the original paper of Yudkowsky and Herreshoff [1], they show that a consequence of Löb's theorem is that an agent X can only "trust" the reasoning of an agent Y with a strictly weaker reasoning system than themselves, where "trust" here means 'formally prove that the conclusions of the other agent's reasoning will be true'. As stated, this looks like a major problem if X is a human trying to build an artificially intelligent system Y, but it's also a problem for any individual (embedded) agent trying to reason about their own future behaviour. I'm not the first person to find this problem counterintuitive, and for good reason. In this article I'm going to explain why a formal (purely syntactic) logic system alone is a poor model of the reasoning of embedded agents, and show that by fixing this, we remove the foundation for the difficulties arising from Löb's theorem. For the uninitiated, there is a handy survey of application of Löb's theorem in AI safety research by Patrick LaVictoire [6]. Pure syntax First, I should explain the formal set-up for applying Löb's theorem to agents. We model an agent's reasoning with a formal language, or logic, which I'll call L. Here I shall make the further assumption that this logic fits (or can be squeezed into) a formal language of the kind logicians are familiar with: the logic consists of some formal symbols or variables A,B,C... along with some logical connectives, operators and quantifiers for combining variables into expressions, or formulas. The agent is also assumed to carry some inference rules for manipulating formulas. Altogether, this data constitutes the syntax of L (its symbolic content and the rules for manipulating those symbols). Since we don't care precisely what the symbols in L refer to, we need go

21Jun 16, 2021

Morgan_Rogers

Message

112

Goal-directedness: relativising complexity

This is the fourth post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me. The funding has come to an end, but I expect to finish off this...

Aug 18, 20223

Goal-directedness: tackling complexity

This is the third post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me. My strategy for achieving a formalisation of goal-directed behaviour is to equate it with...

Jul 2, 20228

Examining Armstrong's category of generalized models

This post is my capstone project for the AI Safety Fundamentals programme. I would like to thank the organizers of the programme for putting together the resources and community which have broadened my horizons in the field. Thanks to my cohort and facilitator @sudhanshu_kasewa for the encouragement. Thanks also to...

May 10, 202214

Goal-directedness: imperfect reasoning, limited knowledge and inaccurate beliefs

This is the second post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me. In my first post I started thinking about goal-directedness in terms of explanations, and...

Mar 19, 20224

Goal-directedness: exploring explanations

This is the first post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me. In my preliminary post, I described my basic intuitions about goal-directedness, and focussed on...

Feb 14, 202213

Goal-directedness: my baseline beliefs

In a short time I will be starting a project (funded by LTFF) under the supervision of Adam Shimi with the goal of deconfusing goal-directedness. I have decided to record the progress on this project on a biweekly basis here on LW, to test whether this helps to keep me...

Jan 8, 202221

Escaping the Löbian Obstacle

Jun 16, 202121

LESSWRONG
LW

LESSWRONG
LW

Morgan_Rogers

Morgan_Rogers

Morgan_Rogers

Escaping the Löbian Obstacle

Goal-directedness: my baseline beliefs

Examining Armstrong's category of generalized models

Goal-directedness: exploring explanations

Morgan_Rogers

Goal-directedness: relativising complexity

Goal-directedness: tackling complexity

Examining Armstrong's category of generalized models

Goal-directedness: imperfect reasoning, limited knowledge and inaccurate beliefs

Goal-directedness: exploring explanations

Goal-directedness: my baseline beliefs

Escaping the Löbian Obstacle

Escaping the Löbian Obstacle

Goal-directedness: my baseline beliefs

Examining Armstrong's category of generalized models

Goal-directedness: exploring explanations

Goal-directedness: relativising complexity

Goal-directedness: tackling complexity

Examining Armstrong's category of generalized models

Goal-directedness: imperfect reasoning, limited knowledge and inaccurate beliefs

Goal-directedness: exploring explanations

Goal-directedness: my baseline beliefs

Escaping the Löbian Obstacle