If your conclusion is that we don't know how to do value alignment, I and I think most alignment thinkers would agree with you. If the conclusion is that AGI is useless, I don't think it is at all. There are a lot of other goals you could give it beyond directly doing what humanity as a whole wants in any sense. Some are taking instructions from some (hopefully trustworthy) humans, and another is following some elaborate set of rules to give humans more freedoms and opportunities to go on deciding what they want as history unfolds.
I agree that the values future humans would adopt can only be reached through a process of societal progression. People have expressed this idea by saying that human values are path-dependent.
So, if our goal were simply to observe the values that emerge naturally from human efforts alone, an AGI would indeed be nothing more than a paperweight. However, the values humanity develops with the assistance of an AGI aren’t necessarily worse—if anything, I’d suspect they’d be better.
The world as it stands is highly competitive and often harsh. If we had external help that allowed us to focus more on what we truly want—like eliminating premature death from cancer or accidents, or accelerating technological progress for creative and meaningful projects—we’d arrive at a very different future. But I don’t think that future would be worse; in fact, I suspect it would be significantly better. It would be less shaped by competition, including warfare, and less constrained by the tragedy of involuntary death.
An AGI that simply fulfills human desires without making rigid commitments about the long-term future seems like it would be a net positive—potentially a massive one.
I think the paradox you mention is generally accepted to be an unsolved problem with value alignment - we don't know how to ask for what we will want, since what we will want could be a bunch of different things depending on the path, and we don't know what path to what values and what world would in any sense be best.
This is commonly listed as one of the big difficulties with alignment. I think it only sort of is. I think we really want the future to remain in human hands for at least some time. The concept of a "long reflection" is one term people use to describe this.
In the meantime, you either have an AGI whose goal is to facilitate that long reflection, according to some criteria, or you have an AGI that takes instructions from some human(s) you trust.
This is one reason I think Instruction-following AGI is easier and more likely than value aligned AGI. The reasoning there is pretty different from the conceptual difficulties with value alignment you mention here; it's just easier to specify "do what this person meant by what he tells you" than what we mean by human values - and you get to change your mind by instructing it to shut down. Even if that worked we'd have to worry: If we solve alignment, do we die anyway? because having a bunch of different humans in charge of a bunch of different AGIs could produce severe conflicts. But that whole logic is separable from the initial puzzle you posed.
"If your conclusion is that we don't know how to do value alignment, I and I think most alignment thinkers would agree with you. If the conclusion is that AGI is useless, I don't think it is at all."
Sort of- I worry that it may be practically impossible for current humans to align AGI to the point of usefulness.
"If we had external help that allowed us to focus more on what we truly want—like eliminating premature death from cancer or accidents, or accelerating technological progress for creative and meaningful projects—we’d arrive at a very different futur...
Aligned with current (majority) human values, meaning any social or scientific human progress would be stifled by the AI and humanity would be doomed to stagnate.
Only true when current values are taked naively, because future progress is a part of current human values (otherwise we would not be all agreeing with you that preventing it would be a bad outcome). It is hard to coherently generalize and extrapolate the human values, so that future progress is included in that, but not necessarily impossible.
"Future progress is a part of current human values" of course- the danger lies in the "future" always being just that- the future. One would naturally hope that it wouldn't go this way, but continuously putting off the future because now is always the present is a possible outcome. It can even be a struggle with current models to get them to generate novel ideas, because of a stubbornness not to say anything for which there is not yet evidence.
Thank you for that criticism- I hadn't necessarily given that point enough thought, and I think I am starting to see where the weaknesses are.
I don't have harsh criticism for you sorry -- I think the problem/paradox you are pointing to is a serious one. I don't think it's hopeless though, but I do think that there's a good chance we'll end up in one of the first three cases you describe.
I’m not a scientist, engineer, or alignment researcher in any respect; I’m a failed science fiction writer. I have a tendency to write opinionated essays that I rarely finish. It’s good that I rarely finish them, however, because if I did, I would generate far too much irrelevant slop.
My latest opinionated essay was to be woven into a fun, fantasy frame story featuring a handsome young demon and a naïve young alignment researcher, which I fear would only obfuscate the premise rather than highlighting it. I suspect there is a fundamental flaw in the premise of the story, and I’d rather have that laid bare than entertain people with nonsense.
The premise of the story is as follows:
Aligning an ASI with human values is impossible due to the shifting nature of human values. Either an ASI will be:
The main reason I think I’m missing something is that this line of thought pattern matches to the following argument- “if you want something good, you must pay a price for that good with an equivalent amount of labor and suffering.” This is not actually something I believe- there’s plenty of good to be had in the world without suffering. However, I am failing to see where my argument is going wrong. I suspect I’m falling down around where I casually lump social and scientific progress together. However, progress in science necessarily changes people’s lives, which has a huge effect on social progress, so I don’t see why the two would be necessary to separate. I am therefore asking for the harshest criticisms of my entire line of thinking you can muster. Thank you in advance!