Disempowerment patterns in real-world AI usage
We’re publishing a new paper that presents the first large-scale analysis of potentially disempowering patterns in real-world conversations with AI. > AI assistants are now embedded in our daily lives—used most often for instrumental tasks like writing code, but increasingly in personal domains: navigating relationships, processing emotions, or advising on...
I went down a rabbithole on inference-from-goal-models a few years ago (albeit not coalitional ones) -- some slightly scattered thoughts below, which I'm happy to elaborate on if useful.
- A great toy model is decision transformers: basically, you can make a decent "agent" by taking a predictive model over a world that contains agents (like Atari rollouts), conditioning on some 'goal' output (like the player eventually winning), and sampling what actions you'd predict to see from a given agent. Some things which pop out of this:
- There's no utility function or even reward function
- You can't even necessarily query the probability that the goal will be reached
- There's no updating or learning -- the beliefs are
... (read more)