Morgan_Rogers

I can say from first and second-hand experience that a hard part of supervising a PhD or Masters student in research (there are many) is taking someone who lies at one end of the bird-frog spectrum and pushing them to acquire the skills they need from the other end. To get to the point of pursuing research in the first place, you're likely to be either someone technically skilled who can easily work out the fine details of a problem and habitually focuses on examples or someone who has enough of an appreciation for the overarching ideas to be motivated to build them further -- it sounds like you are/were of the... (read more)

Replying toA Bird's Eye View of the ML Field [Pragmatic AI Safety #2]

Morgan_Rogers3y

A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]

This post sought to give an overview of how they do this, which is in my view extremely useful information!

This is what I was trying to question with my comment above: Why do you think this? How am I to use this information? It's surely true that this is a community that needs to be convinced of the importance of work on safety, as you point out in the next post in the sequence, but how does information about, say, the turnover of ML PhD students help me do that?

Thus to answer the question "what kind of research approaches generally work for shaping machine learning systems?" it is quite useful to engage

Morgan_Rogers3y

A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]

There is a disheartening irony to calling this series "Practical AI Safety" and having the longest post being about capabilities advancements which largely ignore safety.

The first part of this post consists in observing that ML applications proceed from metrics, and subsequently arguing that theoretical approaches have been unsuccessful in learning problems. This is true but irrelevant for safety, unless your proposal is to apply ML to safety problems, which reduces AI Safety to 'just find good metrics for safe behaviour'. This seems as far from a pragmatic understanding of what is needed in AI Safety as one can get.

In the process of dismissing theoretical approaches, you ask "Why do residual connections work?... (read more)

Goal-directedness: relativising complexity

Morgan_Rogers

This is the fourth post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me. The funding has come to an end, but I expect to finish off this project as a hobby in the coming months.

My previous post was all about complexity, and ended with an examination of the complexity of functions. In principle, I should now be equipped to explicitly formalise the criteria I came up with for evaluating goal-based explanations. However, several of the structures whose complexity I need to measure take the form of transformations from one... (read 3173 more words →)

Goal-directedness: tackling complexity

Morgan_Rogers

This is the third post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me.

My strategy for achieving a formalisation of goal-directed behaviour is to equate it with "behaviour which is well-explained in terms of goals". So far, I have explored criteria for judging explanations in general and a structured way to decompose explanations of agent behaviour specifically.

One of those criteria was simplicity, or its dual, complexity. In order to determine whether my initial proposal for characterizing goal-directedness holds water, I need to get my hands dirty and learn some complexity... (read 11399 more words →)

Examining Armstrong's category of generalized models

Morgan_Rogers

This post is my capstone project for the AI Safety Fundamentals programme. I would like to thank the organizers of the programme for putting together the resources and community which have broadened my horizons in the field. Thanks to my cohort and facilitator @sudhanshu_kasewa for the encouragement. Thanks also to @adamShimi, Brady C and @DavidHolmes for helpful discussion about the contents of a more technical version of this post which may appear in the future.

As the title suggests, the purpose of this post is to take a close look at Stuart Armstrong's category of generalized models. I am a category theorist by training, and my interest lies in understanding how category theory... (read 1995 more words →)

Replying to[Closed] Job Offering: Help Communicate Infrabayesianism

Morgan_Rogers4y

[Closed] Job Offering: Help Communicate Infrabayesianism

If I haven't found a way to extend my post-doc position (ending in August) by mid-July and by some miracle this job offer is still open, it could be the perfect job for me. Otherwise, I look forward to seeing the results.

Goal-directedness: imperfect reasoning, limited knowledge and inaccurate beliefs

Morgan_Rogers

This is the second post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me.

In my first post I started thinking about goal-directedness in terms of explanations, and considered some abstract criteria and mathematical tools for judging and comparing explanations. My intention is to consider a class of explanations representing goals that an agent might be pursuing, and to directly compare these with other classes of explanations of the agent's behaviour; goal-directed behaviour will be behaviour which is better explained by pursuit of goals than by other possible explanatory models.

However, in... (read 6091 more words →)

Replying toGoal-directedness: exploring explanations

Morgan_Rogers4y

Goal-directedness: exploring explanations

A note on judging explanations

I should address a point that wasn't addressed in the post, and which may otherwise be a point of confusion going forward: the quality of an explanation can be high according to my criteria even if it isn't empirically correct. That is, there are some explanations of behaviour which may be falsifiable: if I am observing a robot, I could explain its behaviour in terms of an algorithm, and one way to "test" that explanation would be to discover the algorithm which the robot is in fact running. However, no matter the result of this test, the judged quality of the explanation is not affected. Indeed, there are... (read more)

Replying toHarmful Options

Morgan_Rogers4y

Harmful Options

Suppose your computer games, in addition to the long difficult path to your level's goal, also had little side-paths that you could use—directly in the game, as corridors—that would bypass all the enemies and take you straight to the goal, offering along the way all the items and experience that you could have gotten the hard way. And this corridor is always visible, out of the corner of your eye.

Even if you resolutely refused to take the easy path through the game, knowing that it would cheat you of the very experience that you paid money in order to buy—wouldn't that always-visible corridor, make the game that much less fun? Knowing, for

Morgan_Rogers4y

Why Rationalists Shouldn't be Interested in Topos Theory

[0,1] is a commutative quantale when equipped with its usual multiplication. You can lift the monoidal product structure to sheaves on [0,1] (viewed as a frame) via Day convolution. So we recover a topos where the truth values are probabilities.

People who have attempted to build toposes with probabilities as truth values have also failed to notice this. Take Isham and Doering's paper, for example, (which I personally am quite averse to because they bullishly follow through on constructing toposes with certain properties which are barely justified). They don't even think about products of probabilities.

I think the monoidal topos on the unit interval merits some serious investigation.

Replying toGoal-directedness: exploring explanations

Morgan_Rogers4y

Goal-directedness: exploring explanations

I see what you're getting at. For an arbitrary explanation, we need to take into account not only the complexity of the explanation itself, but also how difficult it is to compute a relevant prediction from that explanation; according to my criteria, the Standard Model (or any sufficiently detailed theory of physics that accurately explains phenomena within a conservative range of low-ish energy environments encountered on Earth) would count as a very good explanation for any behaviour for its complexity, but that's ignoring the fact that it would be impossible to actually compute those predictions.

While I made the claim that there is a clear dividing line between (accuracy and power) and (complexity),... (read more)

Goal-directedness: exploring explanations

Morgan_Rogers

This is the first post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me.

In my preliminary post, I described my basic intuitions about goal-directedness, and focussed on explainability. Concisely, my initial, informal working definition of goal-directedness is that an agent's behaviour is goal-directed to the extent that it is better explained by the hypothesis that the agent is working towards a goal than by other types of explanation.

In this post I'm going to pick away at the most visible idea in this formulation: the concept of an explanation (or at... (read 5143 more words →)

Replying toGoal-directedness: my baseline beliefs

Morgan_Rogers4y

Goal-directedness: my baseline beliefs

Thanks for the ideas!

I like the idea about the size of the target states; there's bound to be some interesting measure theory that I can apply if I decide to formalize in that direction. In fact, measure theory might be able to clarify some of the subtleties I alluded to above regarding what happens when we refine the world model (for example, in a way that causes a single goal state to split into two or more).

There are hints in your last paragraph of associating competence with goal-directedness, which I think is an association to avoid. For example, when a zebra is swimming across a river as fast as it can, I would like the extent to which that behaviour is considered goal-directed to be independent of whether that zebra is the one that gets attacked by a crocodile.

Goal-directedness: my baseline beliefs

Morgan_Rogers

In a short time I will be starting a project (funded by LTFF) under the supervision of Adam Shimi with the goal of deconfusing goal-directedness. I have decided to record the progress on this project on a biweekly basis here on LW, to test whether this helps to keep me accountable for making progress on my project, and to record the process.

Before the project begins, I want to record my baseline beliefs about goal-directedness. I'm doing this partly to see how by beliefs change through the research process and partly just to get my thoughts in order.

Existing ideas on goal-directedness

Adam Shimi has thought a lot about this topic. His literature review with... (read 705 more words →)

Replying toWhy Subagents?

Morgan_Rogers4y

Why Subagents?

The example you give has a pretty simple lattice of preferences, which lends itself to illustrations but which might create some misconceptions about how the subagent model should be formalized. For example, in your example you assume that the agents' preferences are orthogonal (one cares about pepperoni, the other about mushrooms, and each is indifferent to the opposite direction), the agents have equal weighting in the decision-making, the lattice is distributive... Compensating for these factors, there are many ways that a given 'weak utility' can be expressed in terms of subagents. I'm sure there are optimization questions that follow here, about the minimum number of subagents (dimensions) needed to embed a given weak-utility function (partially ordered set), and about when reasonable constraints such as orthogonality of subagents can be imposed. There are also composition questions: how does a committee of agents with subagents behave?

Escaping the Löbian Obstacle

Morgan_Rogers

Earlier this year, when looking for an inroad to AI safety, I learned about the Löbian Obstacle, which is a problem encountered by 'purely logical' agents when trying to reason about and trust one another. In the original paper of Yudkowsky and Herreshoff [1], they show that a consequence of Löb's theorem is that an agent X can only "trust" the reasoning of an agent Y with a strictly weaker reasoning system than themselves, where "trust" here means 'formally prove that the conclusions of the other agent's reasoning will be true'. As stated, this looks like a major problem if X is a human trying to build an artificially intelligent system Y,... (read 2095 more words →)

LESSWRONG
LW

LESSWRONG
LW

Escaping the Löbian Obstacle

Goal-directedness: my baseline beliefs

Examining Armstrong's category of generalized models

Goal-directedness: exploring explanations

Morgan_Rogers

Morgan_Rogers

Goal-directedness: relativising complexity

Goal-directedness: tackling complexity

Examining Armstrong's category of generalized models

Goal-directedness: imperfect reasoning, limited knowledge and inaccurate beliefs

Goal-directedness: exploring explanations

Goal-directedness: my baseline beliefs

Escaping the Löbian Obstacle

Goal-directedness via explanations

Morgan_Rogers

Escaping the Löbian Obstacle

Goal-directedness: my baseline beliefs

Examining Armstrong's category of generalized models

Goal-directedness: exploring explanations

Morgan_Rogers

Morgan_Rogers

Goal-directedness: relativising complexity

Goal-directedness: tackling complexity

Examining Armstrong's category of generalized models

Goal-directedness: imperfect reasoning, limited knowledge and inaccurate beliefs

Goal-directedness: exploring explanations

Goal-directedness: my baseline beliefs

Escaping the Löbian Obstacle

Goal-directedness via explanations

Existing ideas on goal-directedness