After the best-explaining scale is applied to the feature direction vector, is the magnitude of this resulting vector similar to the magnitudes of the other token activation vectors in the prompt? If so, perhaps that fact can be used to approximate the best scale without manual finetuning. For instance, the magnitudes of all the token activation vectors can be averaged and the scale can be the proportion of this mean magnitude with the original feature direction vector's magnitude.

1

0