User Comment Replies

I don't think you could do this with API-level access, but with direct model access an interesting experiment would be to pick a token, X, and then try variants of the prompt "Please repeat 'X' to me" while perturbing the embedding for X (in the continuous embedding space). By picking random 2D slices of the embedding space, you could then produce church window plots showing what the model's understanding of the space around X looks like. Is there a small island around the true embedding which the model can repeat surrounded by confusion, or is... (read more)

LESSWRONG
LW

All of Nick's Comments + Replies