All of James Fox's Comments + Replies

Thank you for your comment.

We are confident that ARENA's in-person programme is among the most cost-effective technical AI safety training programmes: 
- ARENA is highly selective, and so all of our participants have the latent potential to contribute meaningfully to technical AI safety work
- The marginal cost per participant is relatively low compared to other AI safety programmes since we only cover travel and accommodation expenses for 4-5 weeks (we do not provide stipends)
- The outcomes set out in the above post seem pretty strong (4/33 immediate t... (read more)

1JaimeRV
Thanks for sharing this! Great to see the impact of ARENA! According to the OpenPhil public grant[1] this iteration of Arena got £245,895, and with this you were able to achieve the points mentioned in this post right? Also it is great to hear that there are 4 new people working in AIS thanks to the program! It would be nice to know how did you manage it (and what was the counterfactual). Getting 4 people through full hiring processes within 4 weeks seems impresive, did you manage because they got jobs at orgs who were also at LISA? or there were other networking effects or other factors that made this possible? [1] https://www.openphilanthropy.org/grants/alignment-research-engineer-accelerator-ai-safety-technical-program-2024/

Sorry for not seeing this. Hopefully, the first paragraph of the summary answers this question.  We're excited about running more ARENA iterations exactly because its track record has been pretty strong.

I know you've acknowledged Friston at the end, but I'm just commenting for other interested readers' benefit that this is very close to Karl Friston’s active inference framework, which posits that all agents minimise the discrepancies (or prediction errors) between their internal representations of the world and their incoming sensory information through both action and perception.

6mattmacdermott
It's worth emphasising just how closely related it is. Fristons' expected free energy of a policy isG(π)=EQ(sτ∣π)DKL[Q(sτ∣π)∣∣Q(sτ∣oτ)]−EQ(sτ,oτ∣π)lnP(oτ), where the first term is the expected information gained by following the policy and the second the expected 'extrinsic value'.  The extrinsic value term −EQ(sτ,oτ∣π)lnP(oτ), translated into John's notation and setup, is precisely E[−logP(X|M2)∣M1(θ)]. Where John has optimisers choosing θ to minimise the cross-entropy of X under M2 with respect to X under M1, Friston has agents choosing π to minimise the cross-entropy of preferences (P) with respect to beliefs (Q). What's more, Friston explicitly thinks of the extrinsic value term −EQ(sτ,oτ∣π)lnP(oτ) as a way of writing expected utility (see the image below from one of his talks). In particular P is a way of representing real-valued preferences as a probability distribution. He often constucts P by writing down a utility function and then taking a softmax (like in this rat T-maze example), which is exactly what John's construction amounts to. It seems that John is completely right when he speculates that he's rediscovered an idea well-known to Karl Friston.  

Hi Vanessa, Thanks for your question! Sorry for taking a while to reply. The answer is yes if we allow for mixed policies (i.e., where an agent can correlate all of their decision rules for different decisions with a shared random bit), but no if we restrict agents to only be able to use behavioural policies (i.e., decision rules for each of an agent's decisions are independent because they can't access a shared random bit). This is analogous to the difference between mixed and behavioural strategies in extensive form games, where (in general) a subgame pe... (read more)