LESSWRONG
LW

Govind Pimpale
307200
Message
Dialogue
Subscribe

Posts

Sorted by New
35Forecasting Frontier Language Model Agent Capabilities
3mo
0
59Do models know when they are being evaluated?
3mo
3
158Current safety training techniques do not fully transfer to the agent setting
7mo
9
46~80 Interesting Questions about Foundation Model Agent Safety
7mo
4
69Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Ω
10mo
Ω
0

Wikitag Contributions

No wikitag contributions to display.

Comments

Sorted by
Newest
No Comments Found