All of Steven Weiss's Comments + Replies

Solid post, but I think in particular mentioning GPT distracts from the main point. GPT is a generative model with no reward function, meaning it has no goals that it's optimizing for. It's not engineered for reliability (or any other goal), so it's not meaningful to compare it's performance against humans in goal-oriented tasks.

If you assume a 1:20,000 random suicide rate and that 40% of people can kill themselves in a minute (roughly, the US gun ownership rate), then the rate of not doing it per decision is ~20,000 * 60 * 16 * 365 * 0.4 = 1:3,000,000,000, or ~99.99999997%.

IIUC, people aren't deciding whether to kill themselves once a minute, every minute. The thought only comes up when things are really rough, and thinking about it can take hours or days. That's probably a nitpick.

More importantly, an agent optimizing for not intentionally shooting itself in the face would pr... (read more)