Optimality is the tiger, and agents are its teeth
You've done it. You've built the machine. You've read the AI safety arguments and you aren't stupid, so you've made sure you've mitigated all the reasons people are worried your system could be dangerous, but it wasn't so hard to do. AI safety seems a tractable concern. You've built a useful and intelligent system that operates along limited lines, with specifically placed deficiencies in its mental faculties that cleanly prevent it from being able to do unboundedly harmful things. You think. After all, your system is just a GPT, a pre-trained predictive text model. The model is intuitively smart—it probably has a good standard deviation or two better intuition than any human that has ever lived—and it's fairly cheap to run, but it is just a cleverly tweaked GPT, not an agent that has any reason to go out into the real world and do bad things upon it. * It doesn't have any wants. A tuned GPT system will answer your questions to the best of its ability because that's what it's trained to do, but it will only answer to the best of its abilities, as it doesn't have any side-goals to become better at doing that in the future. Nowhere is the model motivated to gather more resources to become a better thinker. There was never an opportunity during training to meta-learn that skill, because it was never the optimal thing for the model to be when it was trained. * It doesn't plan. GPTs have no memories. Its mental time span is precisely one forward pass through the network, which at a depth of a few thousand means it can never come up with anything that requires more than the equivalent of maybe 10-ish human-time equivalent coherent seconds of thought at once. There is a fearful worry that perhaps the model could start forming plans that span multiple instantiations, using one output to feed into the next input, but it's a text-prediction model, and that's directly at odds with its trained goal. The system was trained primarily by asking it to maximize actual probabil
I generally don't care much about people's confidence levels. I don't Aumann agree that hard. But I do care how much effort someone has put in, how settled an idea is, whether is has been helpful or predictive. "Epistemic status: personal experience" is directly useful to me. I'll judge probability on merits however confident someone is (maybe not if I know their calibration curves, but I don't), but if I know what effort they did and didn't put in, I'll happily directly update on that. I don't think it's factually true that epistemic status 'almost never' conveys something other than a confidence level.
Epistemic status: did a few minutes informal searching to sanity check my claims, which were otherwise off the cuff.