x

LESSWRONG
LW

fidgetsinner — LessWrong

fidgetsinner

312300

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

No Comments Found

No wikitag contributions to display.

35Forecasting Frontier Language Model Agent Capabilities

1y

0

57Do models know when they are being evaluated?

1y

9

162Current safety training techniques do not fully transfer to the agent setting

1y

9

48~80 Interesting Questions about Foundation Model Agent Safety

1y

4

69Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities

2y

0