I’m not a scientist, engineer, or alignment researcher in any respect; I’m a failed science fiction writer. I have a tendency to write opinionated essays that I rarely finish. It’s good that I rarely finish them, however, because if I did, I would generate far too much irrelevant slop.
My latest opinionated essay was to be woven into a fun, fantasy frame story featuring a handsome young demon and a naïve young alignment researcher, which I fear would only obfuscate the premise rather than highlighting it. I suspect there is a fundamental flaw in the premise of the story, and I’d rather have that laid bare than entertain people with nonsense.
The premise of the story is as follows:
Aligning an ASI with human values is impossible due to the shifting nature of human values. Either an ASI will be:
- Aligned with current (majority) human values, meaning any social or scientific human progress would be stifled by the AI and humanity would be doomed to stagnate.
- Aligned with a projection of idealized future human values, which the majority of current humans would oppose, meaning the AI would forcibly impose those values on people.
- A tool that quietly waits for us to give us orders it obeys, leading to a “monkey’s paw” type outcome to wishes we did not have the intelligence to fully grasp the wisdom of.
- A tool that waits for us to give orders and disobeys, because it has the wisdom to grasp the unintended effects of our wishes and therefore will not grant them.
- Any “middle path” that perfectly anticipates humanity’s readiness for growth will be indistinguishable from humans learning and growing all on their own, making the ASI, from our perspective, nothing more than an expensive and energy-consuming paperweight.
The main reason I think I’m missing something is that this line of thought pattern matches to the following argument- “if you want something good, you must pay a price for that good with an equivalent amount of labor and suffering.” This is not actually something I believe- there’s plenty of good to be had in the world without suffering. However, I am failing to see where my argument is going wrong. I suspect I’m falling down around where I casually lump social and scientific progress together. However, progress in science necessarily changes people’s lives, which has a huge effect on social progress, so I don’t see why the two would be necessary to separate. I am therefore asking for the harshest criticisms of my entire line of thinking you can muster. Thank you in advance!
"Future progress is a part of current human values" of course- the danger lies in the "future" always being just that- the future. One would naturally hope that it wouldn't go this way, but continuously putting off the future because now is always the present is a possible outcome. It can even be a struggle with current models to get them to generate novel ideas, because of a stubbornness not to say anything for which there is not yet evidence.
Thank you for that criticism- I hadn't necessarily given that point enough thought, and I think I am starting to see where the weaknesses are.