I think your model of co-option dynamics here is non-obvious to me, and I currently think it's (probably) false for AI safety though I might just not be understanding it.
In particular in adversarial situations I expect group coherence/coordination to make cooption harder rather than easier.
Like in general when I think through the logic of collective action and relax some of the game theory assumptions, or think through history when there's powerful vs less powerful actors in (often violent) conflicts, people being less coordinated usually makes them more susceptible to cooption rather than less. Think of the conquistadors playing up intra-Americas conflict in the Americas, the Achaemenid Empire sponsoring intra-Greek conflicts, broadly divide-and-conquer tactics in history, etc.
As I mentioned in a bunch of comments, I think "loose social movement that centrally coordinates via vibes and whoever is currently the highest-prestige thought-leader" is a format that is both low on coherence/coordination and high on ability to be co-opted.
Indeed I would feel a lot better about AI Safety/EA/Rationality on a lot of these dimensions if there was more formal membership, more things like courts, etc.
The worry here is that we have chosen a fundamental form of social organization that scores low on defensibility and high on resource aquisition, and moving away from that is now very difficult. Many alternatives (both in the "less coordinated" and the "more coordinated" directions) seem to me to be more defensible here.
Or to phrase it in the terms of the post:
I wish we either conquered less, or had more of a plan for how to defend what we conquered. Right now we are doing a lot of conquering, but without any plan for how to defend it, and that seems like it really has a pretty high chance of going badly.
more things like courts
Good point.
Literal courts are expensive, but there's a larger design space if we relax the constraints. Courts have to scale to state-size, interoperate with a huge variety of participants (lawyers, judges, police, other officers, and arbitrary citizens), be robust to certain kids of adversarial attacks, ...
My cached thought is to have a norm of "if you're occupying an important exclusive social niche, such as company leader, thought leader, etc., then you have an obligation to debate representatives from major disagreeing relevant views". May require infrastructure for better debates to go well.
(A very natural-seeming extension of this point is "build a general-purpose optimization system to improve the world" -> "whoops it develops independent agency and kills everyone / is stolen from you by a sociopath who installs a totalitarian dictatorship". It always amuses me when the object-level and the meta-level dynamics mirror each other.)
Seconded. I think there is something small-scale-Pythian-ish going on here.
One way to frame this is that a "general-purpose optimization system (that can be used) to improve the world" needs to be strongly retargetable, and the simplest/cheapest/default-est ways to build such a system involve it being also easily corruptible, susceptible to something like "adversarial inputs", both from the inside ("develops independent agency and kills everyone") and from the outside (corrupted by external actors, or just "mundane" context disasters).
Thank you for writing this. I found it both a useful reference post I will be using when explaining this concept to people in the future, and impressively reflective.
As someone that thinks that EA has, in fact, conquered way more than it can defend, and should have declared "ideological bankruptcy" and either started over or retired to a quiet life in the mountains where it can't continue accelerating ASI long ago, it is admirable to see such honest reflection.
Well wishes to all the smart men that could not defend what they conquered. May they find a second chance and redemption, or at the least peaceful retirement...
What evil forces or people do you see as threatening to take control of the stick? How can we better support you so that you feel like it's less likely that this will happen?
I feel like going into this will predictably cause a demon thread, but as an obvious pointer that is hopefully on the less controversial side, my sense is the vibes of "how hard is AI Alignment" and "how much are we on track to build superintelligent systems safely" are really quite a lot downstream of the incentives frontier labs face, since >50% of talent-weighted people in AI safety work at the frontier labs.
This seems quite bad to me, and it is quite plausible the world would be better of if instead of there being a field of "AI Safety" that has this much of a central vibe, and is this directly exposed to some extremely strong incentives, it was more the case that a bunch of existing fields are thinking about this on their own term, probably overall using worse epistemics and tools less well-suited to the task, but in a way that I think by-default would be much less hijack-able.
(There are also many other things of this kind going on but I feel like those will be more controversial)
Edit: Or alternatively, that the "field of AI Safety" had something more akin to journals or membership or courts or other forms of social organization that could bring deliberation and intention to how all of these vibes are shifting. I do think right now I am thinking more in the direction of "maybe we should have conquered less", but I am also sympathetic to arguments of the form "but maybe we could just defend more?", I am just somewhat burned out on those dimensions.
I'm not entirely sure I'm convinced of the idea that the broad rationalist-EA-AI safety community isn't a confusing patchwork of metaphorical city states? I suppose the money and power is probably concentrated more than the vague culture is?
It is not the least federalist arrangement of interest groups!
I guess I invited lots of comments about specifically the rationality/EA community, though I am worried this discussion is trickier (and I am a bit worried it will cause my thinking on this to get badly anchored and worse).
But to respond nevertheless:
I think the weakpoint of the EA and rationality communities in this framework is more that they are generally not very defensible, not that they aren't a confusing patchwork of different interest groups. Large diffuse social communities without strong boundaries are always subject to capture by random fads, charismatic misaligned leaders, or changes in the information landscape. The EA community in-particular just experiences a staggering amount of turnover in its leadership, while continuously presenting a large pile of resources for the taking for whoever can get influence within its ranks.
I don't think the terminology was clear. (I finished 100% of the essay and got to this comment before I understood why you picked the word.)
I sometimes consider quitting.
Seems like "quitting" is very different from stepping back to maintain what has been established and is realistically defensible? I think you may be overindexing on the George Washington example, where him quitting exemplified a central part of the principles he was advocating.
But maybe you mean something less obvious by "quitting"?
I think LessWrong and many other things I've built are in a confusing place as it relates to this post. At the present my thinking is roughly:
It does seem like overall the things this broader ecosystem has built are not that federalist, and not that defensible, but I sure think I have made things marginally more federalist and marginally more defensible so maybe that means I shouldn't quit but others should?
Also, IDK, I don't think LessWrong is that defensible. It's not like we have formal membership, and things are quite beholden to quite a lot of random memetic drift and it would if anything be more surprising than not for this site to still be roughly aligned with the culture that I am excited about in 10 years.
The track record of "online communities stay aligned with the interests of its founders or head admins" is really very weak, indeed so weak I have trouble thinking of almost any positive examples. I do think I've been doing a decent job in the last decade, but that doesn't buy me that much confidence for the next (especially as things will probably be pretty crazy with AI).
...I think you may be overindexing on the George Washington example, where him quitting exemplified
in 10 years
I struggle to understand the following. Since I don't believe that anyone could have any mission in an ASI-ruled world, the critical period is likely to be at most 5 years, not 10. Additionally, during the critical period I expect LW to stay the most important AI-related forum[1] where researchers exchange insights like Greenblatt's impression that most AIs are misaligned, Anthropic's Persona Selection Model or Harms' CAST. Finally, I think that Wikipedia is an online community which stayed aligned with the interests of its founders or head admins of creating the encyclopedia... until AI came and made the public lose interest in it.
The most important other mission of LW is clear philosophy and practical topics like Daycare illnesses.
the critical period is likely to be at most 5 years, not 10
Come on. Yes, timelines appear to be on the shorter side, but clearly it would be extreme hubris to stop planning around >5 year timelines! That really seems very dogmatic to me.
My median timeline is ~7 years until truly transformative AI. And I have quite a lot of probability on things longer than that!
Finally, I think that Wikipedia is an online community which stayed aligned with the interests of its founders or head admins of creating the encyclopedia
I strongly disagree! I think Wikipedia lost the way around 10 years ago.
Additionally, during the critical period I expect LW to stay the most important AI-related forum
Correct (probably unless I go and try to actively build a competing forum or shut down LessWrong). Why this concerns me is I think kind of clearly answered in the post.
I think Wikipedia lost the way around 10 years ago.
I think I agree, though I currently believe it continues to be strongly net positive for the world. My current guess is that it will lose its value to LLMs before it starts to be sufficiently politically captured to be net negative. I am interested to know if you think it is already net negative.
Wikipedia's principles require it to rely on external analysts of news,
This seems like a nice example of Wikipedia preventing itself from conquering what it cannot defend.
Curated.
Trying to be moral has many failure modes. I'm curating this ("Do not conquer what you cannot defend"), kind of in combination with the next post ("Let goodness conquer all that it can defend"). Together, they make both halves of a point that seems pretty important.
I think I grew up with something like the "innocence as the moral ideal" mindset, and it's been a shift in my adult life to think of myself as having the moral obligation to be powerful (if you want goodness to exist in the universe, someone needs to be defending it), and the moral obligation to be wise enough to do useful things with that power.
I think if I had written these two posts I would have framed them differently. ("conquer" sort of leans into a connotation of power that is specifically, ya know, the bad parts). But, naming things is hard, and the intensity of the word is doing some useful work.
Maybe the most important way ambitious, smart, and wise people leave the world worse off than they found it is by seeing correctly how some part of the world is broken and unifying various powers under a banner to fix that problem
I note that the generic hypotheticals of the great king, scientist, and advocate all end in a way where the conclusion is "it would have been better if the centralization never happened", while the actual historical cases are less clear. Yes, Rome fell, but the Pax Romana was long and many people's lives were better as a result, and it's unclear what the alternative would have been - possibly something much like the lives people lived after the fall and before the rise of Rome. And it's still remembered and analyzed and learned from to this day (unlike the work of the hypothetical scientist - I think it would be a more realistic hypothetical if people remembered and used her framework, given that it was a genuine advance, but just didn't make many further advances after that for a while because of academic incentives). Similarly with Singapore - if it grew 30x over 30 years, it seems like successors can make things worse than they are currently, but gettin...
King George III of Great Britain called him "the greatest man in the world" upon hearing the news
History trivia: There seem to be two versions of this story out there; a quick internet search suggests that they both come from the same person recounting a conversation with King George III but telling the story differently on different occasions.
In the one I originally heard, and have heard more often, the remark "If he does that, he will be the greatest man in the world" was not about Washington declining to run for a third term, but rather about the news t...
Executive Summary: LessWrong 2.0 as it actually exists runs at Bus Factor Habryka, and this is probably fine.
(epistemic status: I notice my thesis is confused, but want comments on it anyway. Writing a long comment since I don't have time to write a shorter one.)
If we compare this post directly to LessWrong, things become less clear to me, because I'm not certain which elements of LessWrong are designed to persist.
When we look at LessWrong 1.0 (before my time), then, as described by "what Alex of LessWrong 2.0 believes about history", it consists of (i) ...
opting out of this conversation.
I feel like I have a thesis about how generating cultural information in the modern world involves writing essays where, if you do well, you impact more people who didn't read the source material than did (e.g. my Korzybski point), and you are ignoring my central point.
Instead, you are aiming to persuade me of your point by using weaknesses in my analogies.
LessWrong is for learning about each other's models, not for having an argument. We're having an argument. I'm deliberately not engaging with your most recent points because I don't want arguments like this on my favourite website.
According to me (and potentially nobody else), I view LessWrong as a place where we do argument in the truth-building/philosophical/debate sense, and not in the shouting match sense. I think there is a way where we can do argument that works, but the above was not working for me.
[I felt like the above was getting into "shouting match" territory more than "squishing our different models of the situation together and attempting to do our best to get to Aumann's Agreement Theorem in real life.
(note this is mostly because I noticed myself getting defensive in my own head - your comments may have worked perfectly well on the same words posted by someone else).]
According to me, this is good. The reason that comes to mind is "we want LessWrong to be a place of repeated idea exchange, and therefore people getting alienated is bad because then they might stop posting - model-sharing leads to much less alienation than bad-tempered argument", although this may not be cruxy.
(epistemic status - typed quickly. Am interested if you disagree with my central point. My examples almost certainly have non-cruxy holes in them)
Marcus Aurelius ... was succeeded by his son Commodus.
Rephrasing of your motto: don't build a huge empire, because eventually that empire will grow corrupt. It's better to have an Archipelago of City-states, because when an individual city decays, it's not a global catastrophe.
But Eliezer seems to think that we need a global regulatory agency for AI. It's a plausible enough idea, but what happens when the agency falls into corruption like all the other crappy 3-letter agencies run by the US and the UN?
While Singapore continued to thrive under his son's leadership
Not related to your post's thrust at all, but: I broadly disagree with this clause, and expect future historians to demarcate a few overlooked choices under the era as instrumental to her decline.
I think this neglects an important aspect of checks to power: functioning feedback loops.
Your model seems to be “as long as the good people can defend their power than the system is good” but I think every person fails in some ways, and a more important criteria for successful leadership is the ability to get feedback about what’s going wrong (or right) and iterate.
If a system no longer accepts critique (or actively selects against it) that’s very likely a sign things have gone wrong. Ideally critique should be embraced and encouraged, and any organization’s first concerns should be to setup ways to maintain healthy feedback cycles and decrease blind spots.
I love it, but, of course, no good leader, in the moment, thinks that they are over concentrating power - Each believes that they are only doing as much as is necessary for the greater good and so your analysis, can never hope to achieve more than to have every would be conqueror question themselves, which all the better ones do anyway.
Maybe a decent heuristic for executing "If you make a plan that involves concentrating a bunch of power, especially in the name of goodness and justice, really actually think about whether you can defend that power from corruption and adversaries" is "try extra hard to bake structures & incentives that support your goal into your organization", or more glibly "don't big brain".
The addition this heuristic gives is "do" vs "really actually think"
Obviously, this is easier said than done.
The principle of not-for-life rulers (including even the founder) that Washington established by stepping down prevented a concentration of power (as presidents will swap every few years), so that when Washington died there isn't as much of a chance that some bad ruler would take power for a long time.
In Washington's case, the power that could've been concentrated was presidential power + length of rule + lack of practice transfering power. Similarly for the king, Marcus Aurelius, and Singapore.
These seem to clump into a class of "ruler" scenarios. This c...
I agree with a lot of the ideas in this post, the dangers of centralization of power and the considerations that have to be made for long term success. I agree that the core problem is expansion without consideration of long-term defensibility. I do believe that protecting the core of the ideology of lesswrong and AI safety is a noble cause.
I have an almost orthogonal issue with your arguments, I think institutional robustness may be the wrong tool for the job, and potentially counterproductive. My problems are specifically the case for antifragility, and ...
I have little experience with online community building, but wrt keeping online communities aligned to the original vision, Duncan Sabien's call to "make more grayspaces" might have merit.
In summary, have a 2-tiered system where a select few gatekeepers determine who gets to promote from the open tier to the higher tier, with work from the higher tier treated as exemplary for those in the open tier. You could also frame the open tier as "for those who want to promote" so you have a mandate to kick people out when they're there for different reasons.
Alignm...
Napoleon was not an aggressor except against Russia and arguably Spain. In the other cases, he did not start fights; he finished them.
And he was not an aggressor at all against the peoples of Europe. He was an aggressor against the deeply conservative feudal nobility who were enemies of progress, reason, and efficiency. Napoleon was far more rationalist and humanist than everyone he fought against, except Britain.
Epistemic status: All of the western canon must eventually be re-invented in a LessWrong post. So today we are re-inventing federalism.
Once upon a time there was a great king. He ruled his kingdom with wisdom and economically literate policies, and prosperity followed. Seeing this, the citizens of nearby kingdoms revolted against their leaders, and organized to join the kingdom of this great king.
While the kingdom's ability to defend itself against external threats grew with each person who joined the land, the kingdom's ability to defend itself against internal threats did not. One fateful evening, the king bit into a bologna sandwich poisoned by a rival noble. That noble quickly proceeded to behead his political enemies in the name of the dead king. The flag bearing the wise king's portrait known as "the great unifier" still flies in the fortified cities where his successor rules with an iron fist.
Once upon a time there was a great scientific mind. She developed a new theoretical framework that made large advances on the hardest scientific questions of the day. Seeing the promise of her work, new graduate students, professors, and corporate R&D teams flocked into the field, hungry to tackle new open problems and make their mark on the world. Within ten years, a vibrant new academic field had formed, with herself among its most respected members.
While the field's ability to make progress on the hard problems increased with each new researcher who joined the field, the field's ability to defend itself against the institutional incentives of the broader academic ecosystem did not. Low-quality researchers, seeing lucrative new opportunities for publication, began producing flashy results on the easier problems adjacent to her field with low attention to scientific rigor. Seeing their success, others began to join them, attracted to the social and financial rewards. Being conflict averse and not seeing it as her job to prosecute these people, a growing fraction of the field became careerists.
Twenty years later, her scientific field had become so diluted by uninteresting or irrelevant work that the great original problems remained unsolved, mired in bureaucracy, respectability politics, and academic warfare. Most of the scientists who joined early, attracted by the promise of great progress, stopped being scientists altogether and moved to industry. Almost nobody remembers her name in the history books.
Once upon a time there was a great advocate. She built a social movement around the protection of the rights of a marginalized group, and after many years of hard work, saw the day that the most severe forms of discrimination against the group had been outlawed, and wide social consensus had moved in favor of respecting the members of this group.
But in the success of the movement's aims, she also lost most of her authority. No longer having a compelling vision to offer the members of this movement, others who did became more influential. While she remained the acknowledged founder of the movement, she was no longer treated by the general public as its spokesperson. The press would always talk to the new, charismatic leaders of the movement who had the strongest and most unyielding views. She couldn't afford to make enemies in the movement that she considered hers, so she would publicly endorse the perspectives of these new leaders even when she privately disagreed with them.
Ten years later, her social movement had become so focused on purity and removing any remaining trace of its original enemy that it had begun causing substantially more harm than the original problem it was founded to address. In the history books, she would be briefly mentioned as one of the people who laid the groundwork for the new dark age.
Once upon a time emperor Marcus Aurelius (himself a great general and a great leader) died in 180 AD, and was succeeded by his son Commodus. Commodus, whom historian Cassius Dio described as "a greater curse to the Romans than any pestilence or any crime", turned out to be interested in gladiator fighting much more than in governing the Roman Empire. The Pax Romana began its long descent into the Crisis of the Third Century, and marked the start of the eventual collapse of the Roman Empire.
Once upon a time the French revolution swept across France, bringing the people liberty and executing the corrupt French aristocracy in an unprecedented flurry of violence. Within a decade the idealistic leaders of the revolution would mostly all be dead, executed by the political machine they themselves had created. And within another few years, Napoleon Bonaparte would claim power and proceed to wage aggressive war across all of continental Europe for another decade.
Once upon a time Lee Kuan Yew built modern Singapore out of what was, at the time, a small regional trading post in Southeast Asia. Under his leadership, Singapore's GDP per capita grew 30x over 30 years. But Lee Kwan Yew is dead and his son just handed over power to Lawrence Wong, not a member of the Lee family. While Singapore continued to thrive under his son's leadership, I find myself very worried about what happens once the Singapore story depends on a third generation of leaders, and wonder if Singapore has in fact already peaked.
Once upon a time George Washington retired. George Washington, the Continental Army general who defeated the British army and successfully established the United States of America as an independent nation, and later the first United States president, served his two terms as president and then voluntarily relinquished power. King George III of Great Britain called him "the greatest man in the world" upon hearing the news. Some say this decision singlehandedly saved American democracy.
Do not conquer what you cannot defend.
At the heart of classical liberalism, a philosophy I have much sympathy for, is the belief that allowing many individuals to act freely and autonomously (especially when they are empowered by markets, democratic processes, and the scientific method) will tend to produce outcomes that are better than the outcomes that can be produced by central authorities.
Maybe the most important way ambitious, smart, and wise people leave the world worse off than they found it is by seeing correctly how some part of the world is broken and unifying various powers under a banner to fix that problem — only for the thing they have built to slip from their grasp and, in its collapse, destroy much more than anything previously could have.
I sometimes consider quitting. When I do, my friends and colleagues often react with bafflement. "How can you think that what you've done is bad for the world? Do you not think that you are steering this boat we are in together into a good direction? Do you really think a world without the AI Safety movement, without LessWrong, without Effective Altruism would be better?".
And in their heads when they visualize the alternative, I can only imagine that they see a great big emptiness where rationality and EA and AI Safety is. And they compare our current community against nothingness, and come to the conclusion that even if its leadership is kind of broken, and the incentives are kind of messed up, that this is still clearly better than no one in the world working on the things we care about.
But what I am worried about, is that we conquered much more than we can defend. That the alternative to the work of me and others in the space is not nothingness, but a broken and dysfunctional and confusing patchwork of metaphorical city-states that barely does anything, but at least when any part of it fails, it doesn't all go down together, and in its distributed nature, promises much less nourishing food to predators and sociopaths.
In grug language: Smart man sees big problem. Often state of nature is many small things. Smart man make one big thing out of many small things to throw at big problem. But then evil man take big thing from smart man and make more problem. Or big thing grow legs and beat smart man without making problem go away. This is bad. Maybe better to throw small things at big problem and not make big thing, even if solve problem less. Or before make big thing have plan for how to not have big thing do evil.
But Moloch, in whom I sit lonely
"But what about Moloch" you say!
"Your principle betrays itself. If we want to have good things, we need to coordinate and work together. And death comes for us all, eventually, so nothing we build can truly be defended. Do you not see how one company owning one lake will produce more fish than 20 companies each polluting the commons until all fish are dead? Do you not see how having 20 AI companies all racing to the precipice is worse than having one clearly in the lead, even if the one that raced to the top might stray from the intentions of its creators?"
And you know, fair enough. Coordination problems are real. I am not saying that you should not centralize power.
Here I am arguing for a much narrower principle. Much has been written, and will continue to be written, about the tradeoff between freedom and justice. About small vs. big government. I am not trying to cover all of that.
Here I am just trying to highlight a single principle that seems robust across a wide range of tradeoffs: "If you make a plan that involves concentrating a bunch of power, especially in the name of goodness and justice, really actually think about whether you can defend that power from corruption and adversaries".
And if you can, then go ahead! When George Washington stepped down, he traded off direct power in favor of a system that would actually be able to defend the principles he cared about for much longer, birthing much of Western democracy. I am glad the US exists and covers almost all of the north American continent. Its leaders and founders did have a plan for defending what they conquered, and the world is better off for it.
But if your plan involves rallying a bunch of people under the banner of truth and goodness and justice, and your response to the question of "how are you going to ensure these people will stay on the right path?" is "they will stay on the right path because they will be truthseeking, good, and just people", or if as a billionaire your plan for distributing your wealth is "well, I'll hire some people to run a foundation for me to distribute all of my money according to my goals", then I think you are in for a bad time.