Waking up to reality. No, not that one. We're still dreaming.
I feel like this "back off and augment" is downstream of an implicit theory of intelligence that is specifically unsuited to dealing with how existing examples of intelligence seem to work. Epistemic status: the idea used to make sense to me and apparently no longer does, in a way that seems related to the ways i've updated my theories of cognition over the past years.
Very roughly, networking cognitive agents stacks up to cognitive agency at the next level up easier than expected and life has evolved to exploit this dynamic from very early on across scales. It's a gestalt observation and apparently very difficult to articulate into a rational argument. I could point to memory in gene regulatory networks, Michael Levin's work in nonneural cognition, trainability of computational ecological models (they can apparently be trained to solve sudoku), long term trends in cultural-cognitive evolution, and theoretical difficulties with traditional models of biological evolution - but I don't know how to make the constellation of data points easily distinguishable from pareidolia.
I for one found this post insightful though I wouldn't necessarily call it a book review.
Going against the local consensus tends to go over better when it's well-researched and carefully argued. This one unfortunately reads as little more than an expression of opinion, and an unpopular one at that.
Yeah, this seems close to the crux of the disagreement. The other side sees a relation and is absolutely puzzled why others wouldn't, to the point where that particular disconnect may not even be in the hypothesis space.
When a true cause of disagreement is outside the hypothesis space the disagreement often ends up attributed to something that is in the hypothesis space, such as value differences. I suspect this kind of attribution error is behind most of the drama I've seen around the topic.
Nathaniel is offering scenarios where the problem with the course of action is aesthetic in a sense he finds equivalent. Your question indicates you don't see the equivalence (or how someone else could see it for that matter).
Trying to operate on cold logic alone would be disastrous in reality for map-territory reasons and there seems to be a split in perspectives where some intuitively import non-logic considerations into thought experiments and others don't. I don't currently know how to bridge the gap given how I've seen previous bridging efforts fail; I assume some deep cognitive prior is in play.
My suspicion is that it has to do with cultural-cognitive developments generally filed under "religion". As it's little more than a hunch and runs somewhat counter to my impression of LW mores, I hesitate to discuss it in more depth here.
Conditional on living in an alignment-by-default Universe, the true explanations for individual and societal human failings must be consistent with alignment-by-default. Have we been deviated from the default by some accident of history or does alignment just look like a horrid mess somehow?
You're describing an alignment failure scenario, not a success scenario. In this case the AI has been successfully instructed to paperclip-maximize a planned utopia (however you'd do that while still failing at alignment). Successful alignment would entail the AI being able and willing to notice and correct for an unwise wish.
I don't think it's possible to evaluate a model without inhabiting it. Therefore we must routinely accept (and subsequently reject) propositions.
Michal Levin's paper The Computational Boundary of a "Self" seems quite relevant re identity fusion. The paper argues that larger selves emerge rather readily in living systems but it's not quite clear to me whether that would be an evolved feature of biology or somehow implicit in cognition-in-general. Disambiguating that seems like an important research topic.
I think the biggest thing holding AI alignment back is a lack of general theory of alignment. How do extant living system align, and what to?
The Computational Boundary of a "Self" paper by Michael Levin seems to suggest one promising line of inquiry.