I am not a computer scientist and do not know much about complexity theory. However, it's a field that interests me, so I occasionally browse some articles on the subject. I was brought to https://www.simonsfoundation.org/mathematics-and-physical-science/approximately-hard-the-unique-games-conjecture/ by a link on Scott Aaronson's blog, and read the article to reacquaint myself with the Unique Games Conjecture, which I had partially forgotten about. If you are not familiar with the UGC, that article will explain it to you better than I can.
One phrase in the article stuck out to me: "there is some number of colors k for which it is NP-hard (that is, effectively impossible) to distinguish between networks in which it is possible to satisfy at least 99% of the constraints and networks in which it is possible to satisfy at most 1% of the constraints". I think this sentence is concerning for those interested in the possibility of creating FAI.
It is impossible to perfectly satisfy human values, as matter and energy are limited, and so will be the capabilities of even an enormously powerful AI. Thus, in trying to maximize human happiness, we are dealing with a problem that's essentially isomorphic to the UGC's coloring problem. Additionally, our values themselves are ill-formed. Human values are numerous, ambiguous, even contradictory. Given the complexities of human value systems, I think it's safe to say we're dealing with a particularly nasty variation of the problem, worse than what computer scientists studying it have dealt with.
Not all specific instances of complex optimization problems are subject to the UGC and thus NP hard, of course. So this does not in itself mean that building an FAI is impossible. Also, even if maximizing human values is NP hard (or maximizing the probability of maximizing human values, or maximizing the probability of maximizing the probability of human values) we can still assess a machine's code and actions heuristically. However, even the best heuristics are limited, as the UGC itself demonstrates. At bottom, all heuristics must rely on inflexible assumptions of some sort.
Minor edits.
UGC is a conjecture, and unlike many computational complexity conjectures (such as P != NP and P=BPP) there's substantially more doubt about it being true or not. There are serious attempts for example to show that solving unique games lives in BQP, and if that's the case, then UGC cannot be true unless NP is contained in BQP which is widely believed to be false.
Note that UGC also isn't so narrow: it essentially says that a whole host of different approximation problems are tough. If UGC is true, then one should doubt recursive self-improvement will happen in general, which makes most of the focus on human morality less relevant.
(I personally lean towards UGC being false but that's very weakly and it is strictly speaking outside my area of expertise.)
This is interesting, can you expand on this? I feel there clearly are some arguments in complexity theory against AI as an existential risk, and that these arguments would deserve more attention.
To sidetrack a bit, as I've argued in a comment, if it turns out that many important problems are practically unsolvable in realistic timescales, any superintelligence would be unlikely to get strategic advantage. The support for this idea is much more concrete than the specul... (read more)