All of JamesH's Comments + Replies

We do want the participants on ARENA to have quite a strong interest in AI safety, which is why we ask people to evidence some substantial engagement with AI safety agendas (which is what that question is designed to do). However, we're not looking for perfect answers for either of the questions, nothing that should take hours of research if you're engaged consistently on LessWrong/Alignment Forum.

However, it's not that you must finish the application within 60-90 minutes, this is just a rough estimate of how long it would take someone who's engaged with A... (read more)

I think there's a mistake in 17: \sin(x) is not a diffeomorphism between (-\pi,\pi) and (-1,1) (since it is e.g. not bijective between these domains). Either you mean sin(x/2) or the interval bounds should be (-\pi/2, \pi/2)

1Zach Furman
Good catch, thanks! Fixed now.

ARENA might end up teaching this person some mech-interp methods they haven't seen before, although it sounds like they would be more than capable of self-teaching any mech-interp. The other potential value-add for your acquaintance would be if they wanted to improve their RL or Evals skills, and have a week to conduct a capstone project with advisors. If they were mostly aiming to improve their mech-interp ability by doing ARENA, there would probably be better ways to spend their time.

The way we see this project going concretely looks something like:

First things first, we want to get a good enough theoretical background of IBP. This will ultimately result in something like a distillation of IBP that we will use as reference, and hope others will get a lot of use from.

In this process, we will be doing most of our testing in a theoretical framework. That is to say, we will be constructing model agents and seeing how InfraBayesian Physicalism actually deals with these in theory, whether it breaks down at any stage (as judged by us), and if... (read more)