CEO at Conjecture.
I don't know how to save the world, but dammit I'm gonna try.
Hi, as I was tagged here, I will respond to a few points. There are a bunch of smaller points only hinted at that I won't address. In general, I strongly disagree with the overall conclusion of this post.
There are two main points I would like to address in particular:
There seems to be a deep underlying confusion here that in some sense more information is inherently more good, or inherently will result in good things winning out. This is very much the opposite of what I generally claim about memetics. Saying that all information is good is like saying all organic molecules or cells are equally good. No! Adding more biosludge and toxic algal blooms to your rosegarden won't make it better!
Social media is the exact living proof of this. People genuinely thought social media will bring everyone together, resolve conflicts, create a globally unified culture and peace and democracy, that autocracy and bigotry couldn't possibly thrive if you just only had enough information. I consider this hypothesis thoroughly invalidated. "Increasing memetic evolutionary pressure" is not a good thing! (all things equal)
Increasing the evolutionary pressure on the flu virus doesn't make the world better, and viruses mutate a lot faster than nice fluffy mammals. Most mutations in fluffy mammals kills them, mutations in viruses helps them far more. Value is fragile. It is asymmetrically easy to destroy than to create.
Raw evolution selects for fitness/reproduction, not Goodness. You are just feeding the Great Replicator.
For an accessible intro to some of this, I recommend the book "Nexus" by Yuval Harari. (not that I endorse everything in that book, but the first half is great)
You talk about theories of change of the form "we safety people will keep everything secret and create an aligned AI, ship it to big labs and save the world before they destroy it (or directly use the AI to stop them)". I don't endorse, and in fact strongly condemn, such theories of change.
But not because of the hiding information part, but because of the "we will not coordinate with others and will use violence unilaterally" part! Such theories of change are fundamentally immoral for the same reasons labs building AGI is immoral. We have a norm in our civilization that we don't as private citizens threaten to harm or greatly upend the lives of our fellow civilians without either their consent or societal/governmental/democratic authority.
The not sharing information part is fine! Not all information is good! For example, Canadian researchers a while back figured out how to reconstruct an extinct form of smallpox, and then published how to do it. Is this a good thing for the world to have that information out there?? I don't think so. Should we open source the blue prints of the F-35 fighter jet? I don't think so, I think it's good that I don't have those blueprints!
Information is not inherently good! Not sharing information that would make the world worse is virtuous. Now, you might be wrong about the effects of sharing the information you have, sure, but claiming there is no tradeoff or the possibility that sharing might actually, genuinely, be bad, is just ignoring why coordination is hard.
If you ever find yourself thinking something of the shape "we must simply unreservedly increase [conceptually simple variable X], with no tradeoffs", you're wrong. Doesn't matter how clever you think X is, you're wrong. Any real life, not fake complex thing is made of towers upon towers of tradeoffs. If you think there are no tradeoffs in whatever system you are looking at, you don't understand the system.
Memes are not our friends. Conspiracy theories and lies spread faster than complex, nuanced truth. The printing press didn't bring the scientific revolution, it brought the witch burnings and the 30 year war. The scientific revolution came from the Royal Society and its nuanced, patient, complex norms of critical inquiry. Yes, spreading your scientific papers was also important, it was necessary but not sufficient for a good outcome.
More mutation/evolution, all things equal, means more cancer, not more health and beauty. Health and beauty can come from cancerous mutation and selection, but it's not a pretty process, and requires a lot of bloody, bloody trial and error (and a good selection function). The kind of inefficient and morally abominable process I would prefer us not relying on.
With that being said, I think it's good that you wrote things down and are thinking about them, please don't take what I'm saying as some kind of personal disparaging, I wish more people wrote down their ideas and tried to think things through! I think there is indeed a lot of valuable things in this direction, around better norms, tools, processes and memetic growth, but they're just really quite non trivial! You're on your way to thinking critically about morality, coordination and epistemology, which is great! That's where I think real solutions are!
Nice set of concepts, I might use these in my thinking, thanks!
I don't understand what point you are trying to make, to be honest. There are certain problems that humans/I care about that we/I want NNs to solve, and some optimizers (e.g. Adam) solve those problems better or more tractably than others (e.g. SGD or second order methods). You can claim that the "set of problems humans care about" is "arbitrary", to which I would reply "sure?"
Similarly, I want "good" "philosophy" to be "better" at "solving" "problems I care about." If you want to use other words for this, my answer is again "sure?" I think this is a good use of the word "philosophy" that gets better at what people actually want out of it, but I'm not gonna die on this hill because of an abstract semantic disagreement.
"good" always refers to idiosyncratic opinions, I don't really take moral realism particularly seriously. I think there is "good" philosophy in the same way there are "good" optimization algorithms for neural networks, while also I assume there is no one optimizer that "solves" all neural network problems.
I strongly disagree and do not think that will be how AGI will look, AGI isn't magic. But this is a crux and I might be wrong of course.
I can't rehash my entire views on coordination and policy here I'm afraid, but in general, I believe we are currently on a double exponential timeline (though I wouldn't model it quite like you, but the conclusions are similar enough) and I think some simple to understand and straightforwardly implementable policy (in particular, compute caps) at least will move us to a single exponential timeline.
I'm not sure we can get policy that can stop the single exponential (which is software improvements), but there are some ways, and at least we will then have additional time to work on compounding solutions.
Sure, it's not a full solution, it just buys us some time, but I think it would be a non-trivial amount, and let not perfect be the enemy of good and what not.
I see regulation as the most likely (and most accessible) avenue that can buy us significant time. The fmpov obvious is just put compute caps in place, make it illegal to do training runs above a certain FLOP level. Other possibilities are strict liability for model developers (developers, not just deployers or users, are held criminally liable for any damage caused by their models), global moratoria, "CERN for AI" and similar. Generally, I endorse the proposals here.
None of these are easy, of course, there is a reason my p(doom) is high.
But what happens if AI deception then gets solved relatively quickly (or someone comes up with a proposed solution that looks good enough to decision makers)? And this is another way that working on alignment could be harmful from my perspective...
Of course if a solution merely looks good, that will indeed be really bad, but that's the challenge of crafting and enforcing sensible regulation.
I'm not sure I understand why it would be bad if it actually is a solution. If we do, great, p(doom) drops because now we are much closer to making aligned systems that can help us grow the economy, do science, stabilize society etc. Though of course this moves us into a "misuse risk" paradigm, which is also extremely dangerous.
In my view, this is just how things are, there are no good timelines that don't route through a dangerous misuse period that we have to somehow coordinate well enough to survive. p(doom) might be lower than before, but not by that much, in my view, alas.
I think this is not an unreasonable position, yes. I expect the best way to achieve this would be to make global coordination and epistemology better/more coherent...which is bottlenecked by us running out of time, hence why I think the pragmatic strategic choice is to try to buy us more time.
One of the ways I can see a "slow takeoff/alignment by default" world still going bad is that in the run-up to takeoff, pseudo-AGIs are used to hypercharge memetic warfare/mutation load to a degree basically every living human is just functionally insane, and then even an aligned AGI can't (and wouldn't want to) "undo" that.
Morality is multifaceted and multilevel. If you have a naive form of morality that is just "I do whatever I think is the right thing to do", you are not coordinating or being moral, you are just selfish.
Coordination is not inherently always good. You can coordinate with one group to more effectively do evil against another. But scalable Good is always built on coordination. If you want to live in a lawful, stable, scalable, just civilization, you will need to coordinate with your civilization and neighbors and make compromises.
As a citizen of a modern country, you are bound by the social contract. Part of the social contract is "individuals are not allowed to use violence against other individuals, except in certain circumstances like self defense." [1] Now you might argue that this is a bad contract or whatever, but it is the contract we play by (at least in the countries I have lived in), and I think unilaterally reneging on that contract is immoral. Unilaterally saying "I will expose all of my neighbors to risk of death from AGI because I think I'm a good person" is very different from "we all voted and the majority decided building AGI is a risk worth taking."
Now, could it be that you in some exceptional circumstances need to do something immoral to prevent some even greater tragedy? Sure, it can happen. Murder is bad, but self defense can make it on net ok. But just because it's self defense doesn't make murder moral, it just means there was an exception in this case. War is bad, but sometimes countries need to go to war. That doesn't mean war isn't bad.
Civilization is all about commitments, and honoring them. If you can't honor your commitments to your civilization, even when you disagree with them sometimes, you are not civilized and are flagrantly advertising your defection. If everyone does this, we lose civilization.
Morality is actually hard, and scalable morality/civilization is much, much harder. If an outcome you dislike happened because of some kind of consensus, this has moral implications. If someone put up a shitty statue that you hate in the town square because he's an asshole, that's very different morally from "everyone in the village voted, and they like the statue and you don't, so suck it up." If you think "many other people want X and I want not X" has no moral implications whatsoever your "morality" is just selfishness.[2]
(building AGI that might kill everyone to try to create your vision of utopia is "using violence")
(I expect you don't actually endorse this, but your post does advocate for this)