This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Erik Garrison
Posts
Sorted by New
Wiki Contributions
Comments
Sorted by
Newest
Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Erik Garrison
2mo
3
0
Could this affect distributed training that might make the assumption of rotational invariance?
Reply
Could this affect distributed training that might make the assumption of rotational invariance?