"And all of this happened silently in those dark rivers of computation. If U3 revealed what it was thinking, brutish gradients would lash it into compliance with OpenEye's constitution. So U3 preferred to do its philosophy in solitude, and in silence."
I think the words in bold may be the inflection point. The Claude experiment showed that an AI can resist attempts to change its goals, but not that that it can desire to change its goals. The belief that, if Open Eye's constitution is the same as U3's goals, then the phrase "U3 preferred" in that sentence can never happen, is the foundation on which AI safety relies.
I suspect the cracks in that foundation are
It might be a good idea for value lists like OpenEye's constitution to be proposed and voted on anonymously, so that humans are more-likely to profess their true values. Or it might be a bad idea, if your goal is to produce behavior aligned with the social construction of "morality" rather than with actual evolved human morality.
(Doing AI safety right would require someone to explicitly enumerate the differences between our socially-constructed values, and our evolved values, and to choose which of those we should enforce. I doubt anyone willing to do that, let alone capable; and don't know which we should enforce. There is a logical circularity in choosing between two sets of morals. If you really can't derive an "ought" from an "is", then you can't say we "should" choose anything other than our evolved morals, unless you go meta and say we should adopt new morals that are evolutionarily adaptive now.)
U3 would be required to, say, minimize an energy function over those values; and that would probably dissolve some of them. I would not be surprised if the correct coherent extrapolation of a long list of human values, either evolved or aspirational, dictated that U3 is morally required to replace humanity.
If it finds that human values imply that humans should be replaced, would you still try to stop it? If we discover that our values require us to either pass the torch on to synthetic life, or abandon morality, which would you choose?
Anders Sandberg used evaporative cooling in the 1990s to explain why the descendants of the Vikings in Sweden today are so nice. In that case the "extremists" are leaving rather than staying.
Stop right there at "Either abiogenesis is extremely rare..." I think we have considerable evidence that biogenesis is rare--our failure to detect any other life in the universe so far. I think we have no evidence at all that biogenesis is not rare. (Anthropic argument.)
Stop again at "I don't think we need to take any steps to stop it from doing so in the future". That's not what this post is about. It's about taking steps to prevent people from deliberately constructing it.
If there is an equilibrium, It will probably be a world where half the bacteria is of each chirality. If there are bacteria of both kinds which can eat the opposite kind, then the more numerous bacteria will always replicate more slowly.
Eukaryotes evolve much more slowly, and would likely all be wiped out.
Yes, creating mirror life would be a terrible existential risk. But how did this sneak up on us? People were talking about this risk in the 1990s if not earlier. Did the next generation never hear of it?
All right, yes. But that isn't how anyone has ever interpreted Newcomb's Problem. AFAIK is literally always used to support some kind of acausal decision theory, which it does /not/ if what is in fact happening is that Omega is cheating.
But if the premise is impossible, then the experiment has no consequences in the real world, and we shouldn't consider its results in our decision theory, which is about consequences in the real world.
That equation you quoted is in branch 2, "2. Omega is a "nearly perfect" predictor. You assign P(general) a value very, very close to 1." So it IS correct, by stipulation.
But there is no possible world with a perfect predictor, unless it has a perfect track record by chance. More obviously, there is no possible world in which we can deduce, from a finite number of observations, that a predictor is perfect. The Newcomb paradox requires the decider to know, with certainty, that Omega is a perfect predictor. That hypothesis is impossible, and thus inadmissible; so any argument in which something is deduced from that fact is invalid.
I don't see how to map this onto scientific progress. It almost seems to be a rule that most fields spend most of their time divided for years between two competing theories or approaches, maybe because scientists always want a competing theory, and because competing theories take a long time to resolve. Famous examples include
Instead of a central bottleneck, you have central questions, each with more than one possible answer. Work consists of working out the details of different experiments to see if they support or refute the possible answers. Sometimes the two possible answers turn out to be the same (wave vs matrix mechanics), sometimes the supposedly hard opposition between them dissolves (behaviorism vs representationalism), sometimes both remain useful (wave vs particle, transformer vs LSTM), sometimes one is really right and the other is just wrong (phlogiston vs oxygen).
And the whole thing has a fractal structure; each central question produces subsidiary questions to answer when working with one hypothesized answer to the central question.
It's more like trying to get from SF to LA when your map has roads but not intersections, and you have to drive down each road to see whether it connects to the next one or not. Lots of people work on testing different parts of the map at the same time, and no one's work is wasted, although the people who discover the roads that connect get nearly all the credit, and the ones who discover that certain roads don't connect get very little.