I am not a computer scientist and do not know much about complexity theory. However, it's a field that interests me, so I occasionally browse some articles on the subject. I was brought to https://www.simonsfoundation.org/mathematics-and-physical-science/approximately-hard-the-unique-games-conjecture/ by a link on Scott Aaronson's blog, and read the article to reacquaint myself with the Unique Games Conjecture, which I had partially forgotten about. If you are not familiar with the UGC, that article will explain it to you better than I can.

One phrase in the article stuck out to me: "there is some number of colors k for which it is NP-hard (that is, effectively impossible) to distinguish between networks in which it is possible to satisfy at least 99% of the constraints and networks in which it is possible to satisfy at most 1% of the constraints". I think this sentence is concerning for those interested in the possibility of creating FAI.

It is impossible to perfectly satisfy human values, as matter and energy are limited, and so will be the capabilities of even an enormously powerful AI. Thus, in trying to maximize human happiness, we are dealing with a problem that's essentially isomorphic to the UGC's coloring problem. Additionally, our values themselves are ill-formed. Human values are numerous, ambiguous, even contradictory. Given the complexities of human value systems, I think it's safe to say we're dealing with a particularly nasty variation of the problem, worse than what computer scientists studying it have dealt with.

Not all specific instances of complex optimization problems are subject to the UGC and thus NP hard, of course. So this does not in itself mean that building an FAI is impossible. Also, even if maximizing human values is NP hard (or maximizing the probability of maximizing human values, or maximizing the probability of maximizing the probability of human values) we can still assess a machine's code and actions heuristically. However, even the best heuristics are limited, as the UGC itself demonstrates. At bottom, all heuristics must rely on inflexible assumptions of some sort.

Minor edits.

New Comment
26 comments, sorted by Click to highlight new comments since: Today at 7:53 AM

Thus, in trying to maximize human happiness, we are dealing with a problem that's essentially isomorphic to the UGC's coloring problem.

This strikes me as a very bold claim. Also, while I understand that some problems may have no solution, even approximately, except by brute force, these seem rare in that humans do actually manage to optimise things in real life.

Humans rarely optimize, or even approximate optimization. And, I think we only get as close as we do to optimal because we have had many years of evolution honing our motivations. An AI would be created much more quickly and in a larger mindspace. Well intended criteria can go awry and often do. A moral example of this would be the Repugnant Conclusion, and a logical example would be Newcomb's Paradox.

In those cases, we look at the anticipated outcomes of certain actions and judge those outcomes to be flawed, and so we can revisit our assumptions and tweak our criteria. But if the UGC is true, this tactic will not work in complex cases where there are many constraints. We have nothing to rely on to avoid subtler but equally suboptimal outcomes except our own intuitions about satisficing criteria, even though these intuitions are very weak. Simply hoping that no such subtle difficulties exist seems like a bad plan to me.

I don't think the difficulties in adapting our moral intuitions into a self-consistent formal system (e.g. the Repugnant Conclusion) is a problem of insufficient optimisation power per se, its more the case that there are multiple systems within the brain (morality, intuitions, logic) and these are not operating in perfect harmony. This doesn't mean that each individual system is not working ok in its own way.

Surely designing an engine, or writing a novel, or (... you get the gist) are complex optimising problems with many constraints, and yet it seems that humans can at least approximately solve these problems far faster than brute-force trial and error.

AFAICT the UGC was saying that there exist insoluble problems, not that these problems are actually common. It seems to me like Godel's incompleteness theorem - there are statements which are true but which cannot be proved, but this doesn't mean that no statement can be proved, or that mathematics is pointless. At the end of the day regardless of whether or not the fundamental underpinning of mathematics are on shaky ground, or whether there are unprovble theorems and unsolvable problems, the actual mathematics that allows us to build aeroplanes works, and the planes fly.

[-][anonymous]9y00

Humans do not optimize things. We stack satisficer on satisficer and sometimes approximate optimization. That's fine for us. But an AI would be working in a much larger mind space. It has no already established satisficing criteria. Deciding which satisficing criteria we want to use and which we do not is an enormously difficult problem.

[This comment is no longer endorsed by its author]Reply

"there is some number of colors k for which it is NP-hard (that is, effectively impossible) to distinguish between networks in which it is possible to satisfy at least 99% of the constraints and networks in which it is possible to satisfy at most 1% of the constraints". I think this sentence is concerning for those interested in the possibility of creating FAI.

Not really. This sounds very similar to an argument I've heard that misuses Rice's theorem to conclude that creating a provably friendly AI is impossible. Here's the argument, as I remember it: "By Rice's theorem, there is no general method for determining whether any given AI is an FAI. Therefore it is impossible to build an AI that you can prove is an FAI." The first step of the argument is correct. However, the next step does not follow; just because there is no method that can correctly tell you which of all possible AIs is friendly, it does not follow that there does not exist a method to prove that some particular AI is friendly. You could use the exact same argument to "prove" that there is no program that provably halts; after all, there is no general method for determining whether an arbitrary program halts. But of course writing a program that provably halts is actually quite easy.

I looks to me like your argument makes the same error, but starting from this graph coloring thing that I don't know anything about, instead of from Rice's theorem. The problem of classifying networks based on how many of some set of constraints it is possible for it to satisfy may be NP-hard, but that does not mean that it is difficult to find some network in which you can satisfy most or even all of the constraints.

It is possible that I'm misunderstanding your argument, but even if I am, I'm not particularly worried that it could be a real problem. You don't give a concrete reason to believe that this coloring problem has much to do with FAI. In particular, how does it relate to FAI in a way that it doesn't relate to an operating system? Operating systems are also programs that we want to satisfy a complicated set of constraints. But we can make those.

I think my reply to Lurker, above, might clarify some things. To answer your question.

Making an operating system is easy. Deciding which operating system should be used is harder. This is true despite the fact that an operating system's performance on most criteria can easily be assessed. Assessing whether an operating system is fast is easier than assessing whether a universe is "just", for example. Also, choosing one operating system from a set of preexisting options is much easier than choosing one future from all possibilities that can be created (unimaginably many). Finally, there are far fewer tradeoffs involved in choice of operating system than there are in choice of satisficing criteria. You might trade money for speed, or speed for usability, and be able to guess correctly a reasonably high percentage of the time. But few other tradeoffs exist, so the situation is comparatively a simple one. If comparing two AI with different satisficing protocols, it would be very hard to guess which AI did a better job just by looking at the results they created, unless one failed spectacularly. But the fact that it's difficult to judge a tradeoff doesn't mean it doesn't exist. Because I think human values are complicated, I think such tradeoffs are very likely to exist.

We are used to imagining spectacular failures resulting from tiny mistakes, on LW. An AI that's designed to make people smiles creates a monstrous stitching of fleshy carpet. But we are not used to imagining failures that are less obvious, but still awful. I think that's a problem.

We are used to imagining spectacular failures resulting from tiny mistakes, on LW. An AI that's designed to make people smiles creates a monstrous stitching of fleshy carpet. But we are not used to imagining failures that are less obvious, but still awful. I think that's a problem.

I think people understand that this is a danger, but by nature, one can't spend a lot of time imagining these situations. Also, UGC has little to do with this problem.

Real physics is local. The graphs, to the extent that there are any, are embedded in metric spaces, have small upper bounds on the number of edges per vertex, are planar, .... generally there is plenty of exploitable structure beyond the pure graph-theoretical problem. This is why I do not think hardness results on abstract graph-theoretical problems will be a great obstacle for practical problems.

Real physics is local, but many practical problems require understanding how the local aspects interact. Consider for example the traveling salesman: this is a purely local problem in statement, but many versions or variants of it occur in practical contexts where actually solving hard instances matters. For example, the bottleneck version shows up in circuit board design. Similarly, graph bandwith is "local" in nature but shows up in chip design, and tough instances do show up in practice. Similarly the pathwidth probem has practical applications in compiler design,general circuit design, and CPU design.

This also fails to appreciate the central interesting thing about UGC: hardness of UGC translates into hardness of approximating many problems which are not phrased in a graph way. Many different NP-complete problems have practical applications, not just for an AI trying to go Foom but also industrial applications now. Consider for example the cutting stock problem which has variants relevant to many different industries and where slightly better solutions really do lead to real savings.

It is also worth noting that this shouldn't be surprising: most NP-complete problems are of the form where one can phrase the problem so that one has some sort of local information and one is interested in a solution that satisfies all the necessary local restrictions as well as some very weak global condition.

Real physics is local.

Where are you getting this from?

Every single physical theory that is currently considered fundamental is local, from general relativity to quantum mechanics.

I dislike the wikipedia article on the subject, it gives far to much credence to the fringe notion that maybe there is a way to exploit entanglement to get faster-than-light information transfer.

The quantum nonlocality article is much better, it correctly points out that

it (quantum nonlocality) does not allow for faster-than-light communication, and hence is compatible with special relativity.

Does quantum nonlocality count as not being real physics?

I was unclear, of course it is real physics. By "real" I mean simply something that occurs in reality, which quantum nonlocality certainly does.

Quantum nonlocality - despite being named "nonlocality"- is actually local in a very important sense, just like the rest of physics : information never moves faster than c.

It follows from special relativity.

UGC is a conjecture, and unlike many computational complexity conjectures (such as P != NP and P=BPP) there's substantially more doubt about it being true or not. There are serious attempts for example to show that solving unique games lives in BQP, and if that's the case, then UGC cannot be true unless NP is contained in BQP which is widely believed to be false.

Note that UGC also isn't so narrow: it essentially says that a whole host of different approximation problems are tough. If UGC is true, then one should doubt recursive self-improvement will happen in general, which makes most of the focus on human morality less relevant.

(I personally lean towards UGC being false but that's very weakly and it is strictly speaking outside my area of expertise.)

If UGC is true, then one should doubt recursive self-improvement will happen in general

This is interesting, can you expand on this? I feel there clearly are some arguments in complexity theory against AI as an existential risk, and that these arguments would deserve more attention.

To sidetrack a bit, as I've argued in a comment, if it turns out that many important problems are practically unsolvable in realistic timescales, any superintelligence would be unlikely to get strategic advantage. The support for this idea is much more concrete than the speculations in the OP of this thread; for example, there are many problems in economics that we don't know how to solve efficiently despite investing a lot of effort.

This is interesting, can you expand on this? I feel there clearly are some arguments in complexity theory against AI as an existential risk, and that these arguments would deserve more attention.

Sure.

There's a basic argument that if P != NP then we should expect recursive self-improvement to be hard because many of the problems that would be involved in self-improvement (e.g. memory management, circuit design, protein folding) are NP-hard or NP-complete.

This argument in this form suffers from two problems:

First, it isn't clear what the constants in question are. There could be an exponential time algorithm for solving your favorite NP-complete problem but the algorithm is so efficient on instances of any size that matters that it doesn't end up impacting things. It isn't completely clear how serious this problem is. On the one hand, as of right now, it looks like "polynomial time algorithm if and only if has practical algorithm" is a decent heuristic even though it has exceptions. On the other hand, we've seen even in the last few years important examples where careful improvements to algorithms can drastically improve practical running speed. One major example is linear programming.

Second, most of the NP-complete problems we want to solve in practice are optimization problems. So one might hope that even if a problem is NP-hard, getting close to an optimal solution will still be doable efficiently. Maybe for most of these problems, we can expect that as the problem instances get large we actually get really efficient approximations for things which are very close to the best possible solutions.

UGC addresses both these points- the first point in a weak way, and the second point in a strong way. Prior to the Unique Games Conjecture, there was a major theorem called the PCP theorem. This is one of the few unambiguously deep results in computational complexity and it has important results that say that in many cases approximation is tough. If UGC is true, then the idea that approximation is tough becomes even more true. Consider for example the problem of minimal vertex cover(which shows up in a bunch of practical contexts and is NP-complete). Now, there is an algorithm which will approximate minimal vertex cover within a factor of 2 or so, and the algorithm is simple, and it isn't a bad idea try to work it out. PCP implies that no algorithm can in polynomial time do better than about 1.3 times the best possible, but there's a gap between 2 and 1.3. It turns out that UGC implies that a factor of 2 is really the best unless P=NP (more formally that approximating minimal vertex cover within a factor of 2 would be NP complete). This is discussed in some detail (with spoilers for the above recommended exercise) here. This is one of many problems where UGC implies that some fairly straightforward algorithm really is best possible unless P=NP. More precisely, if UGC is true, then in a wide variety of circumstances beating approximations one get from semidefinite programming is not doable.

So, if one believes UGC, then optimization is tough. This addresses primarily issue 2. How does this address issue 1? Well, as a rough heuristic, if all these approximation problems are NP-complete even for very weak approximations, and P != NP, then one should expect that even improving one's demand for the closeness of approximation a tiny amount will cause the constants (if not even also the asymptotics) to shoot up.

This isn't the entire story: For example, it ignores any context that quantum computing may impact things. Right now, the general consensus is that it is very unlikely that a quantum computer will be able to solve NP-complete problems in polynomial time. Formally, we believe that NP is not contained in BQP. But since BQP contains P, this is a much stronger claim than P != NP. On the other hand, if one believes UGC, then one also gets that if one does believe that NP is not contained in BQP then one should also believe that many approximate optimization problems cannot be done substantially better on a quantum computer than they can on a classical computer. Some people have also raised the possibility that quantum computers may somehow even if they don't improve the asymptotics for some problems may improve the constants: there's no intrinsic, theoretical, or strong empirical reason to believe this (with the exception of Grover's algorithm), but it seems like enough of a concern that I consider the idea of letting any nascent AI have access to a functioning quantum computer to be a Really Bad Idea (TM).

It is also worth noting that even if P=NP in some strong sense, this doesn't mean that recursive self-improvement necessarily can happen, although it does it make it more likely: physical limits and other issues may still come into play. It is worth keeping in mind that even if one has really fast optimization algorithms one cannot optimize beyond what is actually possible. To put it another way, if you have a bunch of needles in a haystack, even if you can search haystacks really fast, if none of the needles are made of gold you won't find any gold needles.

Replying separately(rather than editing), to make sure this comment gets seen.

It is worth noting that although UGC may not be true, weaker versions of UGC get many similar results. So for example, one can get most results of UGC if one believes that NP <= P^U/poly where U is a unique games oracle. Then one would get similar results under the assumption that the polynomial hierarchy does not collapse. Also if one instead of believing that UGC is NP-complete one believes that UGC has no polynomial time algorithm then you get some of the same results or very similar results.

Since many people don't assign that high a probability to UGC, it may be worth asking what evidence should cause us to update to assign a higher probability to UGC (short of a proof). There seem to be four obvious lines of evidence:

First, showing that some problems expected to be NP-intermediate (e.g. factoring, graph isomorphism, ring isomorphism) can be reduced to unique games. (Remark: the obvious candidate here would be graph isomorphism. I'm not aware of such a reduction but my knowledge of the literature here is not very good.) This would be strong evidence for at least some of the weaker versions of UGC, and thus evidence for UGC.

Second, showing that the attempted methods to solve unique games fail for intrinsic reasons that cannot be overcome by those methods. Right now, the two noteworthy methods are Sums of Squares and QAOA. If there are fundamental barriers to variants of either method working that would make UGC substantially more plausible.

Third, proving more of the inapproximation results implied by UGC by methods independent of UGC. There's been some progress on this, but it may turn out that the inapproximations implied by UGC are the correct bounds even as UGC is false. At the same time, even as these sorts of results are proved, they make UGC less directly useful for trying to actually estimate whether recursive self-improvement can substantially occur.

Fourth, it may be possible to prove not that some unexpected collapse occurs when given access to a unique games oracle that would be unlikely unless UGC is true. Obvious candidates would be something like NP < co-NP^U, or NP < P^U/poly.

If the UGC is true, then some classes of constraint satisfaction problems are hard to approximate in the worst case. There's:

  • no reason to believe that "satisfying human values" in general falls into one of these classes;

  • no reason to expect that our specific instance of human values would be worst-case (this in particular is unlikely, since nobody is choosing human values adversarially with this problem in mind);

  • no particularly compelling support for the UGC being true.

In addition, given that you haven't made any concrete appeal to complexity theory, it seems unfair to rely on the mere mention of mathematics to lend support to your argument.

If the UGC is true, then some classes of constraint satisfaction problems are hard to approximate in the worst case.

What do you mean by classes? I'm pretty sure the UGC applies to all sorts of constraint satisfaction problems, not just a certain kind of them.

no reason to expect that our specific instance of human values would be worst-case (this in particular is unlikely, since nobody is choosing human values adversarially with this problem in mind);

I agree. But what are the odds it's best case, either? I'm not saying that we're doomed and have no chance of evaluating things. I'm saying that this is a potential difficulty I hope someone looks into further.

What do you mean by classes? I'm pretty sure the UGC applies to all sorts of constraint satisfaction problems, not just a certain kind of them.

Some CSPs already have efficient exact or approximate algorithms. (For example, we can approximate Max-Cut to within a factor of 0.878, or test if a graph has a 2-coloring, in polynomial time.) The UGC is normally stated about a single class of CSPs, though I'm sure it has implications for some other classes as well.

I agree. But what are the odds it's best case, either? I'm not saying that we're doomed and have no chance of evaluating things. I'm saying that this is a potential difficulty I hope someone looks into further.

The impression I've always gotten was that average-case instances of many NP-complete problems are believed to be tractable. JoshuaZ points out that this may not be true, but it seems from some skimming that results along that direction are less like "a typical instance is hard" and more like "we can generate random hard instances semi-efficiently".

This is interesting from a cryptographical point of view: it would be nice to have a one-way function whose hardness depended on P vs NP rather than a weaker assumption. But if we encounter a constraint satisfaction problem in the real world, it is not going to come even from an adversarially chosen random distribution.

It is, in fact, likely that such a problem will have some additional exploitable structure that will make it easier than the typical instance (which we already don't know is hard). An example of this is metric TSP. This would be the "best case" you're referring to.

no reason to expect that our specific instance of human values would be worst-case (this in particular is unlikely, since nobody is choosing human values adversarially with this problem in mind);

The point of why UGC matters is that one can use it to show that for many instances simply approximating is tough. This together with (more widely believed) conjectures suggests that for many of those problems the average instance is almost as hard as the hardest instances. (I agree however with most of the rest of your comment.)

Can you elaborate on how we go from "approximating is hard" to "the average instance is hard"? (Or point me to somewhere I can read up on it?)

This is pushing the limits of my knowledge of the subject. Actually formalizing this is pretty difficult, and actually making such claims mathematically precise is currently open. See e.g. here(pdf), and there's been more technical work like this(pdf). Right now, the main reasons for believing it are essentially empirical: that for most natural NP-complete problems, the average cases look to be about as hard as the worst cases. Given that one would therefore expect the same thing with the approximation problems. One direction for formalization is to use variants of the exponential time hypothesis, but one needs in this context to then distinguish between "the average is hard" and "random instances are hard."

It is possible to use padding arguments to construct NP-complete problems where most random instances are fairly easy, but the set of hard instances of given size still has to have grow faster than any polynomial in the length of the input, or one would NP in P/Poly (by hard coding the solutions to the rare instances) and that's strongly believed not to happen.

One of the unfortunate limitations of modern complexity theory is that a set of problems that look isomorphic sometimes have very different complexity properties. Another awkwardness is that worst-case complexity isn't a reliable guide to practical difficulty. "This sorta feels like a coloring problem" isn't enough to show it's intractable on the sort of instances we care about.

Separately, it's not actually clear to me whether complexity is good or bad news. If you think that predicting human desires and motivations is infeasible computationally, you should probably worry less about super intelligent AI, since that complexity barrier will prevent the AI from being radically effective at manipulating us.

It would seem to require an unusually malicious universe for a superhuman AI to be feasible, for that AI to be able to manipulate us efficiently, but for it to be infeasible for us to write a program to specify constraints that we would be happy with in retrospect.

, but for it to be infeasible for us to write a program to specify constraints that we would be happy with in retrospect.

The point I believe that 27chaos is trying to argue isn't that writing down the constraints would necessarily be hard (although it very likely is) but that trying to satisfy them may be tough.