User Comment Replies

"Safety as a Scientific Pursuit" (2024)

Very much appreciate the link post - I’d been trying to write a summary/contextualisation for LW and this is a much better one than I’d come up with.

I’d be very grateful for the LW community’s thoughts (especially any pushback). I expect this will be the source of the strongest counterarguments.

"Safety as a Scientific Pursuit" (2024)

Tom McGrath1y30

Thanks! I really like inductive vs deductive and would probably have used them if I’d thought of it.

"Acquisition of Chess Knowledge in AlphaZero": probing AZ over time

Tom McGrath3y50

I'm one of the authors on this paper - happy to answer any questions/discuss if anyone is interested.

1Dirichlet-to-Neumann3y

Although the technical details are way to difficult for me, as a chess player I found the article really interesting. When it was first release, AlphaZero seemed to play more human-like than traditional engine such as Stockfish. Do your analysis support this conclusion ?

"Acquisition of Chess Knowledge in AlphaZero": probing AZ over time

Tom McGrath3y40

Thanks for the summary! Your first bullet point was my motivation for doing this. I think it's important to test out interpretability ideas in more challenging domains.

We didn't really do much interpretability in this paper, this is more meta-interpretability in a sense (i.e. studying whether interpretability should in principle be possible). I'd say section 4 is worth a look, especially section 4.5 which covers fundamental and practical challenges to probing. Section 7 has some NMF analysis, and we open-sourced NMF factors which you might find interesting.

1Zac Hatfield-Dodds3y

I enjoyed the whole paper! It's just that "read sections 1 through 8" doesn't reduce the length much, and 5-6 have some nice short results that can be read alone :-)

LESSWRONG
LW

All of Tom McGrath's Comments + Replies