cuts off some nuance, I would call this the projection of the collective intelligence agenda onto the AI safety frame of "eliminate the risk of very bad things happening" which I think is an incomplete way of looking at how to impact the future
in particular I tend to spend more time thinking about future worlds that are more like the current one in that they are messy and confusing and have very terrible and very good things happening simultaneously and a lot of the impact of collective intelligence tech (for good or ill) will determine the parameters of that world
Thanks, this is a really helpful broad survey of the field. Would be useful to see a one-screen-size summary, perhaps a table with the orthodox alignment problems as one axis?
I'll add that the collective intelligence work I'm doing is not really "technical AI safety" but is directly targeted at orthodox problems 11. Someone else will deploy unsafe superintelligence first and 13. Fair, sane pivotal processes, and targeting all alignment difficulty worlds not just the optimistic one (in particular, I think human coordination becomes more not less important in the pessimistic world). I write more of how I think about pivotal processes in general in AI Safety Endgame Stories but it's broadly along the lines of von Neumann's
For progress there is no cure. Any attempt to find automatically safe channels for the present explosive variety of progress must lead to frustration. The only safety possible is relative, and it lies in an intelligent exercise of day-to-day judgment.
Feels connected to his distrust of "quick, bright, standardized, mental processes", and the obsession with language. It's like his mind is relentlessly orienting to the territory, refusing to accept anyone else's map. Which makes it harder to be a student but easier to discover something new. Reminds me of Geoff Hinton's advice to not read the literature before engaging with the problem yourself.
I like this a lot! A few scattered thoughts
I know this isn't the central point of your life reviews section but curious if your model has any lower bound on life review timing - if not minutes to hours, at least seconds? milliseconds? (1 ms being a rough lower bound on the time for a signal to travel between two adjacent neurons).
If it's at least milliseconds it opens the strange metaphysical possibility of certain deaths (e.g. from very intense explosions) being exempt from life reviews.
Really appreciated this exchange, Ben & Alex have rare conversational chemistry and ability to sense-make productively at the edge of their world models.
I mostly agree with Alex on the importance of interfacing with extant institutional religion, though less sure that one should side with pluralists over exclusivists. For example, exclusivist religious groups seem to be the only human groups currently able to reproduce themselves, probably because exclusivism confers protection against harmful memes and cultural practices.
I'm also pursuing the vision of a decentralized singleton as alternative to Moloch or turnkey totalitarianism, although it's not obvious to me how the psychological insights of religious contemplatives are crucial here, rather than skilled deployment of social technology like the common law, nation states, mechanism design, cryptography, recommender systems, LLM-powered coordination tools, etc. Is there evidence that "enlightened" people, for some sense of "enlightened" are in fact better at cooperating with each other at scale?
If we do achieve existential security through building a stable decentralized singleton, it seems much more likely that it would be the result of powerful new social tech, rather than the result of intervention on individual psychology. I suppose it could be the result of both with one enabling the other, like the printing press enabling the Reformation.
definitely agree there's some power-seeking equivocation going on, but wanted to offer a less sinister explanation from my experiences in AI research contexts. Seems that a lot of equivocation and blurring of boundaries comes from people trying to work on concrete problems and obtain empirical information. a thought process like
Not too different from how research psychologists will start out trying to understand the Nature of Mind and then run a n=20 study on undergrads because that's what they had budget for. We can argue about how bad this equivocation is for academic research, but it's a pretty universal pattern and well-understood within academic communities.
The unusual thing in AI is that researchers have most of the decision-making power in key organizations, so these research norms leak out into the business world, and no-one bats an eye at a "long-term safety research" team that mostly works on toy and short term problems.
This is one reason I'm more excited about building up "AI security" as a field and hiring infosec people instead of ML PhDs. My sense is that the infosec community actually has good norms for thinking about and working on things-shaped-like-existential-risks, and the AI x-risk community should inherit those norms, not the norms of academic AI research.
Great, thought-provoking post. The AI research community certainly felt much more cooperative before it got an injection of startup/monopoly/winner-take-all thinking. Google Brain publishing the Transformer paper being a great example.
I wonder how much this truly is narrative, as opposed to AI being genuinely more winner-take-all than fusion in the economic sense. Certainly the hardware layer has proven quite winner-take-all so far with NVDA taking a huge fraction of the profit; same with adtech, the most profitable application of (last-generation) AI, where network effects and first mover advantages have led to the dominance of a couple of companies.
Global foundation model development efforts being pooled into an international consortium like ITER or CERN seems quite good to me in terms of defusing race dynamics. Perhaps we will get there in a few years if private capital loses interest in funding 100B+ training runs.