Some language to simplify some of the places where the debate got stuck.
Analyzing how to preserve or act on preferences is a coherent thing to do, and it's possible to do so without assuming a one true universal morality. Assume a preference ordering, and now you're in the land of is, not ought, where there can be a correct answer (highest expected value).
Let existence be defined to mean everything, all the math, all the indexical facts. "Ah, but you left out-" Nope, throw that in too. Everything. Existence is a pretty handy word for that; let's reserve it for that purpose. As for any points about how our observations are compatible with multiple implementations: we've already lumped those into our description of a "unique reality".
Noise is noise with respect to a prediction, and so is coherent to discuss. One can abstract away from certain details for the purpose of making a specific prediction; call the stuff that can be abstracted away from noise relative to that prediction.
Inclusive genetic fitness led to weirdos that like ice cream, but predictive loss may be a purer target than IGF. If we don't press down on that insanely hard, it's quite plausible that we get all the way to significantly superhuman generality without any unfortunate parallels to that issue. If you work at a frontier AI lab, probably don't build agents in stupid ways or enable their being built too quickly; that seems like the greatest liability at present.
Human Intelligence Enhancement via Learning:
Intelligence enhancement could entail cognitive enhancements which increase rate / throughput of cognition, increase memory, use of BCI or AI harnesses which offload work / agency or complement existing skills and awareness.
In the vein of strategies which could eventually lead to ASI alignment by leveraging human enhancement, there is an alternative to biological / direct enhancements which attempt to influence cognitive hardware, and instead attempt to externalize one's world model and some of the agency necessary to improve it. This could look like interacting with a system intended to elicit this world model and formalize it as a bayesian network or a HMM, with some included operations for its further exploration such as resolving inconsistencies and gaps, and communicating relevant details back to the user in a feedback loop.
This strategy has a number of benefits, for example it could:
- mitigate risks associated to direct biological enhancement such as instability following large leaps in capability, or health risks which could follow from changing the physical demands of the brain or distancing in other ways from a stable equilibrium
- reduce the distance to understanding AI systems operating at a higher level of intelligence or which use more complete world models
- sidestep some of the burden of having people with radically different degrees of agency and responsibility which could result from more direct forms of enhancement
- be near-term actionable by using AI models similar to those available today
Grabby aliens doesn't even work as an explanation for what it purports to explain. The enormous majority of conscious beings in such a universe-model are members of grabby species who have expanded to fill huge volumes and have a history of interstellar capability going back hundreds of billions of years or more.
If this universe model is correct, why is this not what we observe?
'Alignment' has been used to refer to both aligning a single AI model, and the harder problem of aligning all AIs. This difference in the way the word alignment is used has led to some confusion. Alignment is not solved by aligning a single AI model, but by using a strategy which prevents catastrophic misalignment/misuse from any AI.