Software engineering, parenting, cognition, meditation, other
Linkedin, Facebook, Admonymous (anonymous feedback)
Many people expect 2025 to be the Year of Agentic AI. I expect that too - at least in the sense of it being a big topic. But I expect people to eventually be disappointed. Not because the AI is failing to make progress but for more subtle reasons. These agents will not be aligned well - because, remember, alignment is an unsolved problem. People will not trust them enough.
I'm not sure how the dynamic will unfold. There is trust in the practical abilities. Right now it is low, but that will only go up. There is trust in the agent itself: Will it do what I want or will it have or develop goals of its own? Can the user trust that its agents are not influenced in all kinds of ways by all kinds of parties - the labs, the platforms, the websites the agents interact with, attackers, or the governments. And it is not clear which aspect of trust will dominate at which stage. Maybe the bioggest problems will manifest only after 2025, but we should see some of this this year.
Why is risk of human extinction hard to understand? Risk from a critical reactor or atmospheric ignition was readily seen by the involved researchers. Why not for AI? Maybe the reason is inscrutable stacks of matrixes instead of comparably simple physical equations which described the phenomena. Mechinterp does help because it provides a relation between the weights and understandable phenomena. But I wonder if we can reduce the models to a minimal reasoning model that doesn't know much about the world or even language but learn only to reason with minimal primitives, e.g. lambda calculus as language and learning to learn in a minimal world (but still complex enough to benefit reasoning). Many game worlds have too much visual and not enough agency and negotiation. I wouldn't be surprised if you could train a model with less than a million parameter to reason and scheme with much smaller data sets that are focused on this domain. The risk would arguably small because the model can't deal with the real world. But it might prove many instrumental convergence results.
many AI economics papers have a big role for principal-agent problems as their version of AI alignment.
I wasn't aware of such papers but found some:
AIs can be copied on demand. So can entire teams and systems. There would be no talent or training bottlenecks. Customization of one becomes customization of all. A virtual version of you can be everywhere and do everything all at once.
This is "efficient copying of acquired knowledge" - what I predicted to be the fifth meta innovation, Robin Hanson asked about.
About archipelago: c2.com, the original wiki tried this. They created a federated wiki but that didn't seem to work. My guess: the volume was too low.
And LW has already all the filtering you need: just subscribe to the people and topics you are interested. There is also the unfinishe reading list.
I get tha this may not feel like its own community. Within LW this could be done with ongoing open threads about a topic. But tgat requires an organizer and participation. And we are back at volume. And at needing good writers.
Proof: https://en.wikipedia.org/wiki/Th%C3%ADch_Qu%E1%BA%A3ng_%C4%90%E1%BB%A9c