This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
All of Erik Garrison's Comments + Replies
Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Erik Garrison
7mo
3
0
Could this affect distributed training that might make the assumption of rotational invariance?
Reply
Could this affect distributed training that might make the assumption of rotational invariance?