Thanks for writing this up! Your "amplified weak" version of the conjecture (with complexity bounds increasing exponentially in 1/ε) seems plausible to me. So if you could amplify the original (weak) conjecture to this unconditionally, it wouldn't significantly decrease my credence in the principle. But it would be nice to have this bound on what the dependence on ε would need to be.
The statements are equivalent if only a tiny fraction (tending to 0) of random reversible circuits satisfy . We think this is very likely to be true, since it is a very weak consequence of the conjecture that random (depth-) reversible circuits are pseudorandom permutations. If it turned out to not be true, it would no longer make sense to think of as an "outrageous coincidence" and so I think we would have to abandon the conjecture. So in short we are happy to consider either version (though I agree that "for which is false" is a bit more natural).
The hope is to use the complexity of the statement rather than mathematical taste.
If it takes me 10 bits to specify a computational possibility that ought to happen 1% of the time, then we shouldn't be surprised to find around 10 (~1% of ) occurrences. We don't intend the no-coincidence principle to claim that these should all happen for a reason.
Instead, we intend the no-coincidence principle to claim that such if such coincidences happen much more often than we would have expected them to by chance, then there is a reason for that. Or put another way: if we applied bits of selection to the statement of a -level coincidence, then there is a reason for it. (Hopefully the "outrageous" qualifier helps to indicate this, although we don't know whether Gowers meant quite same thing as us.)
The formalization reflects this distinction: the property is chosen to be so unlikely that we wouldn't expect it to happen for any circuit at all by chance (), not merely that we wouldn't expect it to happen for a single random circuit. Hence by the informal principle, there ought to be a reason for any occurrence of property .
For the informal no-coincidence principle, it's important to us (and to Gowers IIUC) that a "reason" is not necessarily a proof, but could instead be a heuristic argument (in the sense of this post). We agree there are certainly apparently outrageous coincidences that may not be provable, such as Chebyshev's bias (discussed in the introduction to the post). See also John Conway's paper On Unsettleable Arithmetical Problems for a nice exposition of the distinction between proofs and heuristic arguments (he uses the word "probvious" for a statement with a convincing heuristic argument).
Correspondingly, our formalization doesn't bake in any sort of proof system. The verification algorithm only has to correctly distinguish circuits that might satisfy property from random circuits using the advice string – it doesn't necessarily have to interpret as a proof and verify its correctness.
It's not, but I can understand your confusion, and I think the two are related. To see the difference, suppose hypothetically that 11% of the first million digits in the decimal expansion of were 3s. Inductive reasoning would say that we should expect this pattern to continue. The no-coincidence principle, on the other hand, would say that there is a reason (such as a proof or a heuristic argument) for our observation, which may or may not predict that the pattern will continue. But if there were no such reason and yet the pattern continued, then the no-coincidence principle would be false, whereas inductive reasoning would have been successful.
So I think one can view the no-coincidence principle as a way to argue in favor of induction (in the context of formally-defined phenomena): when there is a surprising pattern, the no-coincidence principle says that there is a reason for it, and this reason may predict that the pattern will continue (although we can't be sure of this until we find the reason). Interestingly, one could also use induction to argue in favor of the no-coincidence principle: we can usually find reasons for apparently outrageous coincidences in mathematics, so perhaps they always exist. But I don't think they are the same thing.
Good question! We also think that NP ≠ co-NP. The difference between 99% (our conjecture) and 100% (NP = co-NP) is quite important, essentially because 99% of random objects "look random", but not 100%. For example, consider a uniformly random string for some large . We can quite confidently say things like: the number of 0s in is between and ; there is no streak of alternating 0s and 1s; etc. But these only hold with 99% confidence (more precisely, with probability tending to 1), not 100%.
Going back to the conjecture statement, the job of the verification algorithm is much harder for 100% than 99%. For 100%, has to definitively tell (with the help of a certificate) whether a circuit has property . Whereas for 99%, it simply has to spot (again with the help of a "certificate" of sorts) any structure at all that reveals the circuit to be non-random in a way that could cause it to have property . For example, could start by checking the proportions of different types of gates, and if these differed too much from a random circuit, immediately reject the circuit out-of-hand for being "possibly structured". Footnote 6 has another example of structure that could cause a circuit to have property , which seems much harder for a 100%- to deal with.
Before reversible circuits, we first considered a simpler setting: triangle counting. The no-coincidence principle in that setting turned out to be true, but for a relatively uninteresting reason, because the domain was not rich enough. Nevertheless, I think this result serves as a helpful exercise for people trying to get to grips with our definitions, as well as providing more of the story about how we ended up with our reversible circuits statement.
In the triangle counting setting, we consider the distribution over undirected 3-partite graphs on 3 groups of vertices obtained by taking each of the possible edges to be present independently with probability . We take , so there are many more edges than vertices, but much fewer than the total number of possible edges.
For a randomly sampled , one can check that the number of triangles in has mean and variance . So if has triangles, then we consider this to be an "outrageous coincidence", since this exceeds the mean by standard deviations. (This is "outrageous" because if the number of triangles were normally distributed with this mean and variance, then the probability of exceeding this for any of the possible graphs would tend to zero.) This motivates the following statement, which turns out to be true:
Proposition (No-coincidence principle for triangle counting, ). For any , there is a linear-time verification algorithm that receives as input:
such that:
(Note that the result would be trivial if we allowed to run in polynomial time, since it could then directly count the number of triangles, so we need to be more fine-grained than that.)
Dmitry Vaintrob and Aryan Bhatt were able to generalize this result to polygon counting (note that we consider graphs on groups of vertices arranged in a cycle, not -partite graphs in general, so there are possible edges, not ) and to permanents. So we concluded that neither of these settings seemed to be rich enough to produce an interesting no-coincidence principle (even though permanents are #P-hard to compute), and moved on to considering circuits instead.
Hint for polygon counting:
Write the number of polygons as a cyclic trace .
It sounds like we are not that far apart here. We've been doing some empirical work on toy systems to try to make the leap from mechanistic interpretability "stories" to semi-formal heuristic explanations. The max-of-k draft is an early example of this, and we have more ambitious work in progress along similar lines. I think of this work in a similar way to you: we are not trying to test empirical assumptions (in the way that some empirical work on frontier LLMs is, for example), but rather to learn from the process of putting our ideas into practice.
For those who are interested in the mathematical details, but would like something more accessible than the paper itself, see this talk I gave about the paper:
Yes, by "unconditionally" I meant "without an additional assumption". I don't currently see why the Reduction-Regularity assumption ought to be true (I may need to think about it more).