All of Ed Li's Comments + Replies

Ed Li10

Some questions about the feasibility section:

About (2): what would 'crisp internal representations' look like? I think this is really useful to know since we haven't figure this out for the brain or LLMs (e.g. Interpreting Neural Networks through the Polytope Lens

Moreover, the current methods used in comparing the human brain's representations to those of ML models like RSA are quite high-level, or, at the very least do not reveal any useful insights for interpretability for either side (pls feel free to correct me). This question is not a point against me... (read more)

1Arthur Conmy
By "crisp internal representations" I mean that  1. The latent variables the model uses to perform tasks are latent variables that humans use. Contra ideas that language models reasons in completely alien ways. I agree that the two cited works are not particularly strong evidence : (  2. The model uses these latent variables with a bias towards shorter algorithms (e.g shallower paths in transformers). This is important as it's possible that even when performing really narrow tasks, models could use a very large number of (individually understandable) latent variables in long algorithms such that no human could feasibly understand what's going on. I'm not sure what the end product of automated interpretability is, I think it would be pure speculation to make claims here.
Ed Li10

Thank you so much for posting this. It feels weird to tick every single symptom mentioned here...

 

The burnout that 'Dmitry' experiences is remarkably accurate for what am I experiencing, Are there any further guides on how to manage this? It would help me so much, any help is appreciated:)