The post has now been edited with the updated plots for open consultancy/debate.
Thanks for the comment Fabien. A couple of points:
Thanks for the comment Michael. Firstly, just wanted to clarify the framing of this literature review - when considering strengths and weaknesses of each threat model, this was done in light of what we were aiming to do: generate and prioritise alignment research projects -- rather than as an all-things-considered direct critique of each work (I think that is best done by commenting directly on those articles etc). I'll add a clarification of that at the top. Now to your comments:
To your 1st point: I think the lack of specific assumptions about the AGI dev...
I haven't considered this in great detail, but if there are variables, then I think the causal discovery runtime is . As we mention in the paper (footnote 5) there may be more efficient causal discovery algorithms that make use of certain assumptions about the system.
On adoption, perhaps if one encounters a situation where the computational cost is too high, one could coarse-grain their variables to reduce the number of variables. I don't have results on this at the moment but I expect that the presence of agency (no...
Thanks, this has now been corrected to say 'not terminal'.
Thanks for featuring our work! I'd like to clarify a few points, which I think each share some top-level similarities: our study is study of protocols as inference-only (which is cheap and quick to study, possibly indicative) whereas what we care more about it protocols for training (which is much more expensive, and will take longer to study) which was out of scope for this work, though we intend to look at that next based on our findings -- e.g. we have learnt that some domains are easier to work with than others, some baseline protocols are more meaning... (read more)