This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Govind Pimpale
Posts
Sorted by New
35
Forecasting Frontier Language Model Agent Capabilities
1mo
0
54
Do models know when they are being evaluated?
1mo
3
158
Current safety training techniques do not fully transfer to the agent setting
5mo
9
46
~80 Interesting Questions about Foundation Model Agent Safety
5mo
4
69
Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Ω
8mo
Ω
0
Wikitag Contributions
Comments
Sorted by
Newest