Jacob_Hilton

Wikitag Contributions

Comments

Sorted by

Yes, by "unconditionally" I meant "without an additional assumption". I don't currently see why the Reduction-Regularity assumption ought to be true (I may need to think about it more).

Thanks for writing this up! Your "amplified weak" version of the conjecture (with complexity bounds increasing exponentially in 1/ε) seems plausible to me. So if you could amplify the original (weak) conjecture to this unconditionally, it wouldn't significantly decrease my credence in the principle. But it would be nice to have this bound on what the dependence on ε would need to be.

The statements are equivalent if only a tiny fraction (tending to 0) of random reversible circuits satisfy . We think this is very likely to be true, since it is a very weak consequence of the conjecture that random (depth-) reversible circuits are pseudorandom permutations. If it turned out to not be true, it would no longer make sense to think of  as an "outrageous coincidence" and so I think we would have to abandon the conjecture. So in short we are happy to consider either version (though I agree that "for which  is false" is a bit more natural).

The hope is to use the complexity of the statement rather than mathematical taste.

If it takes me 10 bits to specify a computational possibility that ought to happen 1% of the time, then we shouldn't be surprised to find around 10 (~1% of ) occurrences. We don't intend the no-coincidence principle to claim that these should all happen for a reason.

Instead, we intend the no-coincidence principle to claim that such if such coincidences happen much more often than we would have expected them to by chance, then there is a reason for that. Or put another way: if we applied  bits of selection to the statement of a -level coincidence, then there is a reason for it. (Hopefully the "outrageous" qualifier helps to indicate this, although we don't know whether Gowers meant quite same thing as us.)

The formalization reflects this distinction: the property  is chosen to be so unlikely that we wouldn't expect it to happen for any circuit at all by chance (), not merely that we wouldn't expect it to happen for a single random circuit. Hence by the informal principle, there ought to be a reason for any occurrence of property .

For the informal no-coincidence principle, it's important to us (and to Gowers IIUC) that a "reason" is not necessarily a proof, but could instead be a heuristic argument (in the sense of this post). We agree there are certainly apparently outrageous coincidences that may not be provable, such as Chebyshev's bias (discussed in the introduction to the post). See also John Conway's paper On Unsettleable Arithmetical Problems for a nice exposition of the distinction between proofs and heuristic arguments (he uses the word "probvious" for a statement with a convincing heuristic argument).

Correspondingly, our formalization doesn't bake in any sort of proof system. The verification algorithm  only has to correctly distinguish circuits that might satisfy property  from random circuits using the advice string  – it doesn't necessarily have to interpret  as a proof and verify its correctness.

It's not, but I can understand your confusion, and I think the two are related. To see the difference, suppose hypothetically that 11% of the first million digits in the decimal expansion of  were 3s. Inductive reasoning would say that we should expect this pattern to continue. The no-coincidence principle, on the other hand, would say that there is a reason (such as a proof or a heuristic argument) for our observation, which may or may not predict that the pattern will continue. But if there were no such reason and yet the pattern continued, then the no-coincidence principle would be false, whereas inductive reasoning would have been successful.

So I think one can view the no-coincidence principle as a way to argue in favor of induction (in the context of formally-defined phenomena): when there is a surprising pattern, the no-coincidence principle says that there is a reason for it, and this reason may predict that the pattern will continue (although we can't be sure of this until we find the reason). Interestingly, one could also use induction to argue in favor of the no-coincidence principle: we can usually find reasons for apparently outrageous coincidences in mathematics, so perhaps they always exist. But I don't think they are the same thing.

Good question! We also think that NP ≠ co-NP. The difference between 99% (our conjecture) and 100% (NP = co-NP) is quite important, essentially because 99% of random objects "look random", but not 100%. For example, consider a uniformly random string  for some large . We can quite confidently say things like: the number of 0s in  is between  and ; there is no streak of  alternating 0s and 1s; etc. But these only hold with 99% confidence (more precisely, with probability tending to 1), not 100%.

Going back to the conjecture statement, the job of the verification algorithm  is much harder for 100% than 99%. For 100%,  has to definitively tell (with the help of a certificate) whether a circuit has property . Whereas for 99%, it simply has to spot (again with the help of a "certificate" of sorts) any structure at all that reveals the circuit to be non-random in a way that could cause it to have property . For example,  could start by checking the proportions of different types of gates, and if these differed too much from a random circuit, immediately reject the circuit out-of-hand for being "possibly structured". Footnote 6 has another example of structure that could cause a circuit to have property , which seems much harder for a 100%- to deal with.

Before reversible circuits, we first considered a simpler setting: triangle counting. The no-coincidence principle in that setting turned out to be true, but for a relatively uninteresting reason, because the domain was not rich enough. Nevertheless, I think this result serves as a helpful exercise for people trying to get to grips with our definitions, as well as providing more of the story about how we ended up with our reversible circuits statement.

In the triangle counting setting, we consider the distribution  over undirected 3-partite graphs on 3 groups of  vertices obtained by taking each of the  possible edges to be present independently with probability . We take , so there are many more edges than vertices, but much fewer than the total number of possible edges.

For a randomly sampled , one can check that the number of triangles in  has mean  and variance . So if  has  triangles, then we consider this to be an "outrageous coincidence", since this exceeds the mean by  standard deviations. (This is "outrageous" because if the number of triangles were normally distributed with this mean and variance, then the probability of exceeding this for any of the  possible graphs would tend to zero.) This motivates the following statement, which turns out to be true:

Proposition (No-coincidence principle for triangle counting, ). For any , there is a linear-time verification algorithm  that receives as input:

  • A 3-partite graph on 3 groups of  vertices, represented as a list of edges
  • An advice string 

such that:

  • For all graphs  with  triangles, if  is sufficiently large then there exists  with length linear in the number of edges of  such that .
  • For , the probability that there is any  with  tends to zero.

(Note that the result would be trivial if we allowed  to run in polynomial time, since it could then directly count the number of triangles, so we need to be more fine-grained than that.)

Dmitry Vaintrob and Aryan Bhatt were able to generalize this result to polygon counting (note that we consider graphs on  groups of  vertices arranged in a cycle, not -partite graphs in general, so there are  possible edges, not ) and to permanents. So we concluded that neither of these settings seemed to be rich enough to produce an interesting no-coincidence principle (even though permanents are #P-hard to compute), and moved on to considering circuits instead.

Hint for polygon counting:

Write the number of polygons as a cyclic trace .

Jacob_HiltonΩ230

It sounds like we are not that far apart here. We've been doing some empirical work on toy systems to try to make the leap from mechanistic interpretability "stories" to semi-formal heuristic explanations. The max-of-k draft is an early example of this, and we have more ambitious work in progress along similar lines. I think of this work in a similar way to you: we are not trying to test empirical assumptions (in the way that some empirical work on frontier LLMs is, for example), but rather to learn from the process of putting our ideas into practice.

Jacob_HiltonΩ8110

For those who are interested in the mathematical details, but would like something more accessible than the paper itself, see this talk I gave about the paper:

Load More