That's for linking the post! I quite liked it, and I agree that computational complexity doesn't pose a challenge to general intelligence. I do want to dispute your notion that "if you hear that a problem is in a certain complexity class, that is approximately zero evidence of any conclusion drawn from it". The world is filled with evidence, and it's unlikely that closely related concepts give approximately zero evidence for each other unless they are uncorrelated or there are adversarial processes present. Hearing that list-sorting is O(n*log(n)) is pretty strong evidence that it's easy to do, and hearing that simulating quantum mechanics is not in P is pretty strong evidence that it's hard to do. Sure, there are lots of exceptions, but computational complexity is in fact a decent heuristic, especially if you go with average-case complexity of an approximation, rather than worst-case complexity of an exact answer.
Obviously, if computational complexity really forbid general intelligence...then we wouldn't have it.
Technically, list sorting is O(n) if done a certain way...we just do it the O(n log n) way because it uses slightly less memory and is simpler for us (which is kind of related to Gwern's rant, and he does technically mention a specialized one, though he doesn't realize what it means, and puts it in a section about giving up generality for speed). I actually invented an example of that type of algorithm when I was teaching myself c++ using sorting algorithms and wanted to beat the built-in algorithms. In practice, sorting algorithms are often multiple different algorithms of different complexity classes mashed together...that then end up still being O(n log n) with better constant factors. We already optimize the hell out of constant factors for important problems.
I agree with you rather than Gwern. In my experience, complexity class really is a good indicator for how much compute something takes. It's not a perfect proxy, and can definitely be goodharted, but it is still effective. Often, the constant factors are closely related to the complexity class too. Even describing an 'NP' problem is vastly more difficult than a 'P' problem, because there is just so much more to it.
I would claim that difficulty describing the process is actually a good proxy for how large the constant factors are likely to be, since they are what you have to keep in mind even for problem size 0, where we aren't even trying to solve the issue, just keep it in mind.
How can list sorting be O(n)? There are n! ways to sort a list, which means that it's impossible to have a list sorting algorithm faster than O(log(n!)) = O(n*log(n)).
He probably means bucket sort or basically same concept radix sort. You might call that cheating, but if the range of values you sort is small, then you can think of it as linear.
Yes, technically it solves only a subproblem of the general problem of sorting: the case where the number of possible values is bounded.
Morpheus is incorrect, I am actually referring to a generalization of 'counting sort' that I independently reinvented (where you are no longer limited to small positive numbers). I had never heard of sub n log n sorts at that time either. It's actually quite simple. In fact, it's how humans prefer to sort things. If you look at the object, and just know where it goes, put it there. The downside is just that there has to be a known place to put it. The O(n log n) lower bound is only for comparison sorts.
On a computer, we can just make a place to put it. If each value is unique, or we're sorting numbers, or each value that is the same does not need further sorting, we can implement this in constant time per object being sorted. Also linear space, with space being O(m + n) where n is the number of objects being sorted and m being the range of values they can hold. This only works well if m is reasonable in size, not being too much larger than n. For instance, you might be sorting integers between 0 and 100, requiring only a hundred and one extra spots in memory, and be sorting 1,000,000,000 of them. (This is conceptually related to the bucket sort mentioned by Morpheus, but they are distinct, and this is truly linear).
Worst case is usually that you are sorting things whose values are an unknown 32bit integer, in which case you would need about 16GB of memory just for the possible values. This is normally unjustifiable, but if you were sorting a trillion numbers (and had a large amount or RAM), it's nothing. (And the fact that it is linear would be quite a godsend.) Another positive is that this sort has virtually no branching, and is this much easier on a per instruction basis too. (Branching kills performance).
This can also be integrated as a sub-routine (that requires very little memory) in an overall sorting routine similar to (but better than) quicksort, and then it is more similar to bucket sort, but kind of reversed.
I feel I should mention an important implementation detail. If you do not know the range of possible values, it is often a good idea to check, as most generative processes only use a subset. This is also a linear time process, that only takes two values in memory, but does involve some branching. It is better to just know and put that in, but the ability to do this makes it a much more practical algorithm.
Has anyone ever suggested that the algorithm for intelligence might just be a tractable solution to an NP-complete problem, and that we live in a P=NP (or approximate) universe? I don't believe this, because it doesn't map to anything I understand about AI or the human brain, but it's an interesting idea and I'm sure someone has pursued it before.
That would be shocking in a very fun sort of way…after breaking all cryptography on the internet, what sort of practical applications would such a discovery have?
Well, in this universe, NP and AI are secretly the same problem, so we'd get all the benefits of AI (singularity?) plus all the benefits of solving NP-class problems at the same time. I can't think of an application of NP-class problem solving that would catch notice compared to AI, though. We already solved protein folding. Reduced carbon emissions from men who happen to be in sales and do a lot of travel, perhaps.
Protein folding is nothing compared with a general solution to finding proofs up to any given length if they exist or confirming that they don't, or optimal solutions to virtually every scientific problem that can be formulated in any suitably verifiable way.
If solving NP-complete problems in polynomial time is the hallmark of intelligence, then humans are as dumb as rocks and any actual AI will blow us out of the water.
If you want to be pedantic, randomization doesn't buy you anything for your complexity classes, because probably "P=BPP"[1](BPP is the class of problems where you can give probabilistic yes and no answers), so if you can solve it with randomness, it is already in P (if P = BPP), thus already easy to begin with. I'd say complexity classes are pretty important for realizing when your algorithm just isn't going to work, and you need a different approach. For example, no matter how theoretically satisfying you will not get to AGI, if you just use lots of compute and implement naive inference with Bayes Nets (https://en.wikipedia.org/wiki/Bayesian_network).
(Computational chemistry may be intractable, even with approximations, on classical hardware—but what about if one has a quantum computer with a few hundred qubits, enough that one can do quantum simulation?) The importance of constant factors is one of the major traps in practical use of complexity classes: a fancy algorithm with a superior complexity class may easily be defeated by a simpler algorithm with worse complexity but faster implementation.
This example contradicts the point he's trying to make here. Quantum computers are thought to be asymptomatically faster at some tasks (including quantum simulation) but have much, much worse constant factors for individual operations.
Basically an article about why knowing a problem is in a certain complexity class does not mean AGI can't do that problem. It's some evidence, but usually isn't enough unless 2-3 or more assumptions are right..
The biggest reasons are these:
Basically, we can tolerate not getting the algorithm to work every time, since we can just run it again and we can reduce error rates to levels far below the cosmic ray error rate. Also, the perfect is not the enemy of the good, so we can accept a 99.999% approximation, and that's enough to design new technologies.
Lastly, constant factors are an issue:
You should really read Gwern's full post, it has a lot of insights.
One final note on complexity: Protein Folding was predicted to never progress that much because it was NP-Hard for computers. Suffice it to say with AlphaFold 2 and Deepmind releasing 200 million proteins, the prediction is very wrong now.
EDIT: I no longer endorse the strong version of no evidence from complexity theory of my argument above, so I am removing parts of my post and defending a weaker claim than I originally did.