User Comment Replies

(My understanding of) What Everyone in Technical Alignment is Doing and Why

I don't think the onus should be on the reader to infer x-risk motivations. In academic ML, it's the author's job to explain why the reader should care about the paper. I don't see why this should be different in safety. If it's hard to do that in the paper itself, you can always e.g. write a blog post explaining safety relevance (as mentioned by aogara, people are already doing this, which is great!).

There are often many different ways in which a paper might be intended to be useful for x-risks (and ways in which it might not be). Often the motivation for... (read more)

2Morpheus3y

On the other hand there are a lot of reasons to belief the authors to be delusional about promises of their research and it's theory for impact. I think the most I get personally out of posts like this is having this 3rd party perspective that I can compare with my own.

AMA: Paul Christiano, alignment researcher

JohnMalin4yΩ13350

I wonder how valuable you find some of the more math/theory focused research directions in AI safety. I.e., how much less impactful do you find them, compared to your favorite directions? In particular,

Vanessa Kosoy's learning-theoretic agenda, e.g., the recent sequence on infra-Bayesianism, or her work on traps in RL. Michael Cohen's research, e.g. the paper on imitation learning seems to go into a similar direction.
The "causal incentives" agenda (link).
Work on agent foundations, such as on cartesian frames. You have commented on MIRI's research in the pa

... (read more)

paulfchristiano4yΩ8160

I'm generally bad at communicating about this kind of thing, and it seems like a kind of sensitive topic to share half-baked thoughts on. In this AMA all of my thoughts are half-baked, and in some cases here I'm commenting on work that I'm not that familiar with. All that said I'm still going to answer but please read with a grain of salt and don't take it too seriously.

Vanessa Kosoy's learning-theoretic agenda, e.g., the recent sequence on infra-Bayesianism, or her work on traps in RL. Michael Cohen's research, e.g. the paper on imitation learning seems t

... (read more)

LESSWRONG
LW

All of JohnMalin's Comments + Replies