LESSWRONG
LW

Comment Permalink

I think it would be much more interesting and helpful to exhibit a case of software with a vulnerability where it's really hard for someone to verify the claim that the vulnerability exists.

Conditional on such counterexamples existing, I would usually expect to not notice them. Even if someone displayed such a counterexample, it would presumably be quite difficult to verify that it is a counterexample. Therefore a lack of observation of such counterexamples is, at most, very weak evidence against their existence; we are forced to fall back on priors.

I get the impression that you have noticed the lack of observed counterexamples, and updated that counterexamples are rare, without noticing that you would also mostly not observe counterexamples even if they were common. (Though of course this is subject to the usual qualifiers about how it's difficult to guess other peoples' mental processes, you have better information than I about whether you indeed updated in such a way, etc.)

That said, if I were to actively look for such counterexamples in the context of software, the obfuscated C code competition would be one natural target.

We can also get indirect bits of evidence on the matter. For instance, we can look at jury trials, and notice that they are notoriously wildly unreliable in practice. That suggests that, relative to the cognition of a median-ish human, there must exist situations in which one lawyer can point out the problem in another's logic/evidence, and the the median-ish human will not be able verify it. Now, one could argue that this is merely because median-ish humans are not very bright (a claim I'd agree with), but then it's rather a large jump to claim that e.g. you or I is so smart that analogous problems are not common for us.

See in context

71 Verification Is Not Easier Than Generation In General

by johnswentworth

6th Dec 2022

AI Alignment Forum

1 min read

71 Ω 28

People who’ve spent a lot of time thinking about P vs NP often have the intuition that “verification is easier than generation”. It’s easier to verify a solution to some equations than to find a solution. It’s easier to verify a password than to guess it. That sort of thing. The claim that it is easier to verify solutions to such problems than to generate them is essentially the claim that P ≠ NP, a conjecture which is widely believed to be true. Thus the intuition that verification is generally easier than generation.

The problem is, this intuition comes from thinking about problems which are in NP. NP is, roughly speaking, the class of algorithmic problems for which solutions are easy to verify. Verifying the solution to some equations is easy, so that problem is in NP.

I think a more accurate takeaway would be that among problems in NP, verification is easier than generation. In other words, among problems for which verification is easy, verification is easier than generation. Rather a less impressive claim, when you put it like that.

With that in mind, here is an algorithmic problem for which generation is easier than verification.

Predicate: given a program, does it halt?
Generation problem: generate a program which halts.
Verification problem: given a program, verify that it halts.

The generation problem is trivial. The verification problem is uncomputable.

That’s it for the post, you all can argue about the application to alignment in the comment section.

Frontpage

71 Ω 28

Mentioned in

191My AI Model Delta Compared To Christiano

72Conditioning Predictive Models: Outer alignment via careful conditioning

New Comment

27 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:08 AM

[-]paulfchristiano2yΩ297428

I think most people's intuitions come from more everyday experiences like:

It's easier to review papers than to write them.
Fraud is often caught using a tiny fraction of the effort required to perpetrate it.
I can tell that a piece of software is useful for me more easily than I can write it.

These observations seem relevant to questions like "can we delegate work to AI" because they are ubiquitous in everyday situations where we want to delegate work.

The claim in this post seems to be: sometimes it's easier to create an object with property P than to decide whether a borderline instance satisfies property P. You chose a complicated example but you just as well have used something very mundane like "Make a pile of sand more than 6 inches tall." I can do the task by making a 12 inch pile of sand, but if someone gives me a pile of sand that is 6.0000001 inches I'm going to need very precise measurement devices and philosophical clarification about what "tall"means.

I don't think this observation undermines the claim that "it is easier to verify that someone has made a tall pile of sand than to do it yourself." If someone gives me a 6.000001 inch tall pile of sand I can say "could you make it taller?" And if I can ask for a program that halts and someone gives me a program that looks for a proof of false in PA, I can just say "try again."

I do think there are plenty of examples where verification is not easier than generation (and certainly where verification is non-trivial). It's less clear what the relevance of that is.

[-]johnswentworth2yΩ122710

I don't think the generalization of the OP is quite "sometimes it's easier to create an object with property P than to decide whether a borderline instance satisfies property P". Rather, the halting example suggests that verification is likely to be harder than generation specifically when there is some (possibly implicit) adversary. What makes verification potentially hard is the part where we have to quantify over all possible inputs - the verifier must work for any input.

Borderline cases are an issue for that quantifier, but more generally any sort of adversarial pressure is a potential problem.

Under that perspective, the "just ask it to try again on borderline cases" strategy doesn't look so promising, because a potential adversary is potentially optimizing against me - i.e. looking for cases which will fool not only my verifier, but myself.

As for the everyday experiences you list: I agree that such experiences seem to be where peoples' intuitions on the matter often come from. Much like the case in the OP, I think people select for problems for which verification is easy - after all, those are the problems which are most legible, easiest to outsource (and therefore most likely to be economically profitable), etc. On the other hand, once we actively look for cases where adversarial pressure makes verification hard, or where there's a "for all" quantifier, it's easy to find such cases. For instance, riffing on your own examples:

It's only easier to review papers than to write them because reviewers do not actually need to catch all problems. If missing a problem in a paper review resulted in a death sentence, I expect almost nobody would consider themselves competent to review papers.
Likewise with fraud: it's only easier to catch than to perpetrate if we're not expected to catch all of it, or even most of it.
It's easier to write secure software than to verify that a piece of software is secure.

[-]paulfchristiano2yΩ11192

If including an error in a paper resulted in a death sentence, no one would be competent to write papers either.
For fraud, I agree that "tractable fraud has a meaningful probability of being caught," and not "tractable fraud has a very high probability of being caught." But "meaningful probability of being caught" is just what we need for AI delegation.
Verifying that arbitrary software is secure (even if it's actually secure) is much harder than writing secure software. But verifiable and delegatable work is still extremely useful for the process of writing secure software.

To the extent that any of these problems are hard to verify, I think it's almost entirely because of the "position of the interior" where an attacker can focus their effort on hiding an attack in a single place but a defender needs to spread their effort out over the whole attack surface.

But in that case we just apply verification vs generation again. It's extremely hard to tell if code has a security problem, but in practice it's quite easy to verify a correct claim that code has a security problem. And that's what's relevant to AI delegation, since in fact we will be using AI systems to help oversee in this way.

If you want to argue that e.g. writing secure software is fundamentally hard to verify, I think it would be much more interesting and helpful to exhibit a case of software with a vulnerability where it's really hard for someone to verify the claim that the vulnerability exists.

Rather, the halting example suggests that verification is likely to be harder than generation specifically when there is some (possibly implicit) adversary.

Rice's theorem says there are a lot of programs where you can't tell if they will halt. But if I want to write a program that will/won't halt, I'm just going to write a program for which it's obvious. And if I asked you to write a program that will/won't halt and you write the kind of program where I can't tell, I'm just going to send it back.

Now that could still be hard. You could put a subtle problem in your code that makes it so it halts eventually even though it looks like it obviously doesn't. But Rice's theorem doesn't say anything about that.

And reiterating the previous point, if someone wrote down a program that looks like it obviously doesn't halt, but secretly it does because of an adversarial trick, then I would very strongly expect that someone could point out my mistake to me and I would conclude that it is no longer obvious whether it halts. Counterexamples to this kind of optimism would be way more impactful.

[-]Joachim Bartosik2y30

But in that case we just apply verification vs generation again. It's extremely hard to tell if code has a security problem, but in practice it's quite easy to verify a correct claim that code has a security problem. And that's what's relevant to AI delegation, since in fact we will be using AI systems to help oversee in this way.

I know you said that you're not going to respond but in case you feel like giving a clarification I'd like to point out that I'm confused here.

Yes it usually easy to verify that a specific problem exists if the exact problem is pointed out to you[1].

But it's much harder to verify claim that there are no problems, this code is doing exactly what you want.

And AFAIK staying in a loop:

1) AI tells us "here's a specific problem"

2) We fix the problem then

3) Go back to step 1)

Doesn't help with anything? We want to be in a state where AI says "This is doing exactly what you want" and we have reasons to trust that (and that is hard to verify).

EDIT to add: I think I didn't make it clear enough what clarification I'm asking for.

Do you think it's possible to use AI which will point out problems (but which we can't trust when it says everything is ok) to "win"? It would be very interesting if you did and I'd love to learn more.
Do you think that we could trust AI when it says that everything is ok? Again that'd be very interesting.
Did I miss something? I'm curious to learn what but that's just me being wrong (but that's not new path to win interesting).

Also it's possible that there are two problems, each problem is easy to fix on its own but it's really hard to fix them both at the same time (simple example: it's trivial to have 0 false positives or 0 false negatives when testing for a disease; it's much harder to eliminate both at the same time).

[1] Well it can be hard to reliably reproduce problem, even if you know exactly what the problem is (I know because I couldn't write e2e tests to verify some bug fixes).

[-]johnswentworth2yΩ11-2

I think it would be much more interesting and helpful to exhibit a case of software with a vulnerability where it's really hard for someone to verify the claim that the vulnerability exists.

That said, if I were to actively look for such counterexamples in the context of software, the obfuscated C code competition would be one natural target.

[-]Dweomite2y92

For instance, we can look at jury trials, and notice that they are notoriously wildly unreliable in practice. That suggests that, relative to the cognition of a median-ish human, there must exist situations in which one lawyer can point out the problem in another's logic/evidence, and the the median-ish human will not be able verify it.

This is something of a tangent, but juries' unreliability does not particularly suggest that conclusion to me. I immediately see three possible reasons for juries to be unreliable:

The courts may not reliably communicate to juries the criteria by which they are supposed to decide the case
The jurors may decide to ignore the official criteria and do something else instead
The jurors may know the official criteria and make a sincere attempt to follow them, but fail in some way

You're supposing that the third reason dominates. I haven't made a serious study of how juries work in practice, but my priors say the third reason is probably the least significant, so this is not very convincing to me.

(I also note that you'd need to claim that juries are inconsistent relative to the lawyers' arguments, not merely inconsistent relative to the factual details of the case, and it's not at all obvious to me that juries' reputation for unreliability is actually controlled in that way.)

[-]paulfchristiano2yΩ791

Conditional on such counterexamples existing, I would usually expect to not notice them. Even if someone displayed such a counterexample, it would presumably be quite difficult to verify that it is a counterexample. Therefore a lack of observation of such counterexamples is, at most, very weak evidence against their existence; we are forced to fall back on priors.

You can check whether there are examples where it takes an hour to notice a problem, or 10 hours, or 100 hours... You can check whether there are examples that require lots of expertise to evaluate. And so on. the question isn't whether there is some kind of magical example that is literally impossible to notice, it's whether there are cases where verification is hard relative to generation!
You can check whether you can generate examples, or whether other people believe that they can generate examples. The question is about whether a slightly superhuman AI can find examples, not whether they exist (and indeed whether they exist is more unfalsifiable, not because of the difficulty of recognizing them but because of the difficulty of finding them).
You can look for examples in domains where the ground truth is available. E.g. we can debate about the existence of bugs or vulnerabilities in software, and then ultimately settle the question by running the code and having someone demonstrate a vulnerability. If Alice claims something is a vulnerability but I can't verify her reasoning, then she can still demonstrate that it was correct by going and attacking the system.
I've looked at e.g. some results from the underhanded C competition and they are relatively easy for laypeople to recognize in a short amount of time when the attack is pointed out. I have not seen examples of attacks that are hard to recognize as plausible attacks without significant expertise or time, and I am legitimately interested in them.

I'm bowing out here, you are welcome to the last word.

[-]Dweomite2y32

What makes verification potentially hard is the part where we have to quantify over all possible inputs - the verifier must work for any input.

I feel like it should be possible to somehow factor this discussion into two orthogonal claims, where one of the claims is something like "doing something for all inputs is harder than doing it for just one input" and the other is something like "it is harder to identify one particular example as noteworthy than it is to verify the noteworthiness of that particular example".

And it seems likely to me that both claims are true if you separate them like that.

[-]Vanessa Kosoy2yΩ203922

deserves a little more credit than you give it. To interpret the claim correctly, we need to notice $P$ and $N P$ are classes of decision problems, not classes of proof systems for decision problems. You demonstrate that for a fixed proof system it is possible that generating proofs is easier than verifying proofs. However, if we fix a decision problem and allow any valid (i.e. sound and complete) proof system, then verifying cannot be harder than generating. Indeed, let $S_{1}$ be some proof system and $A$ an algorithm for generating proofs (i.e. an algorithm that finds a proof if a proof exists and outputs "nope" otherwise). Then, we can construct another proof system $S_{2}$ , in which a "proof" is just the empty string and "verifying" a proof for problem instance $x$ consists of running $A (x)$ and outputting "yes" if it found an $S_{1}$ -proof and "no" otherwise. Hence, verification in $S_{2}$ is no harder than generation in $S_{1}$ . Now, so far it's just $P \subseteq N P$ , which is trivial. The non-trivial part is: there exist problems for which verification is tractable (in some proof system) while generation is intractable (in any proof system). Arguably there are even many such problems (an informal claim).

[-]hold_my_fish2y1418

I'm afraid that this take is incredibly confused, so much that it's hard to know where to start with correcting it.

Maybe the most consequential error is the misunderstanding of what "verify" means in this context. It means "checking a proof of a solution" (which in the case of a decision problem in NP would be a proof of a "yes" answer). In a non-mathematical context, you can loosely think of "proof" as consisting of reasoning, citations, etc.

That's what went wrong with the halting problem example. The generator did not support their claim that the program halts. If they respond to this complaint by giving us a proof that's too hard, we can (somewhat tautologically) ensure that our verifier job is easy by sending back any program+proof pair where the proof was too hard to verify.

[-]localdeity2y136

The most trivial programs that halt are also trivially verifiable as halting, though. Yes, you can't have a fully general verification algorithm, but e.g. "does the program contain zero loops or recursive calls" covers the most trivial programs (for my definition of "trivial", anyway). I think this example of yours fails to capture the essence—which I think is supposed to be "someone easily spewing out lots of incomprehensible solutions to a problem".

In fact, more generally, if there is a trivial algorithm to generate solutions, then this immediately maps to an algorithm for verifying trivial solutions: run the trivial algorithm for a reasonable number of steps (a number such that the generation would no longer be called "trivial" if it took that long) and see if the proposed solution appears among them. The hardest part is knowing what that algorithm was—but by assumption the algorithm was trivial, which puts some limits on how hard finding it can be.^[1]

The main path that might still qualify as "trivially generating lots of solutions that are hard to verify by someone who has some notion of how the generation works" is if the generation process includes randomness. E.g. make a comprehensible program, then insert a bunch of trash and permute things around. Or, say, have a number maze and the question is "Is there a path through these numbers such that sha256(those numbers) = 0xabcde?"—and the person who made the maze did so by picking random numbers, computing their sha256, putting them into a maze, and filling the rest of the maze with more random numbers. I probably would have recommended that example.^[2]

This brings it back to NP, incidentally: if the generation process was trivial, then it must have taken a "reasonable" amount of time; and if it used N bits of randomness, then a verifier has 2^N possibilities to try, and if it guesses right the first time (a nondeterministic machine) then it terminates quickly. But this time the verification process is NP, while the generation process is P.

^{^}
Though if we concern ourselves with not "trivial" algorithms but rather "algorithms that aren't too complicated"—which might be approximated by "algorithms specified in under N characters"—then there are exponentially many algorithms to sort through.
^{^}
Or, a version that meets more rigorous constraints: The predicate is, "Is this an n x k grid of bytes such that there is a nonrepeating path through the grid, such that sha256(those bytes), xored with the top row of bytes, yields zero?" The generation algorithm is to pick random bytes, sha256 them, draw a random path in a grid and put them there, fill out the rest of the grid with random bytes, and put the sha256 result as the top row of bytes.

[-]Richard_Kennaway2y93

It's not clear to me what the slogan is intended to mean, and the example only confuses me further. In the first paragraph of the OP, "generate" appears to mean "find a solution to a given problem". In the supposed counterexample, it means "find a problem that has a given solution". These are very different things. The former task is hard for problems in NP–P, the latter task is typically trivial.

[-]Richard_Kennaway2y50

"Verification" is also vague.

"Does this program compute this function?" is a hard problem — unsolvable.

"Is this a proof that this program computes this function?" is an easy problem — primitive recursive in any reasonable encoding scheme, maybe of near-linear complexity.

Both of these might be described as "verification".

[-]TekhneMakre2y92

E.g. "does this plan avoid having a steganographically encoded world-ending message hidden in it" is more co-NPish than NP-ish. Like, just showing me the plan doesn't make it easy to verify that there isn't a hidden message, even if there isn't a hidden message. Checking whether a sequence of actions is the first half of a strategy to take over the world is potentially more like P-space.

[-]Closed Limelike Curves2y73

I think there's a fundamental asymmetry in the case you mentioned--it's not verifying whether a program halts that's difficult, it's writing an algorithm that can verify whether any program halts. In other words, the problem is adversarial inputs. To keep the symmetry, we'd need to say that the generation problem is "generate all computer programs that halt," which is also not possible.

I think a better example would be, how hard is it to generate a semiprime? Not hard at all: just generate 2 primes and multiply them. How hard is it to verify a number is semiprime? Very hard, you'd have to factor it.

[-]Lao Mein2y60

I think this is kinda like the no free lunch theorem - it sounds kinda profound but relies on using the set of all problems/data distributions, which is very OOD from any problem that originates in physical reality.

What examples of practical engineering problems actually have a solution that is harder to verify than to generate?

And I thought uncomputability is only true for programs that don't halt? A blank program can definitely be verified to halt.

[-]Joachim Bartosik2y177

What examples of practical engineering problems actually have a solution that is harder to verify than to generate?

My intuition says that we're mostly engineering to avoid problems like that, because we can't solve them by engineering. Or use something other than engineering to ensure that problem is solved properly.

For example most websites don't allow users to enter plain html. Because while it's possible to write non-harmful html it's rather hard to verify that a given piece of html is indeed harmless. Instead sites allow something like markdown or visual editors which make it much easier to ensure that user-generated content is harmless. (that's example of engineering to avoid having to verify something that's very hard to verify)

Another example is that some people in fact can write html for those websites. In many places there is some process to try and verify they're not doing anything harmful. But those largely depend on non-engineering to work (you'll be fired and maybe sued if you do something harmful) and the parts that are engineering (like code reviews) can be fooled because they rely on assumption of your good intent to work (I think; I've never tried to put harmful code in any codebase I've worked with; I've read about people doing that).

[-]johnswentworth2y20

I endorse this answer.

[-]Richard_Kennaway2y20

And I thought uncomputability is only true for programs that don't halt? A blank program can definitely be verified to halt.

For any single program, the question "does this program halt?" is computable, for a trivial reason. Either the program that prints "Yes" or the program that prints "No" correctly answers the question, although we might not know which one. The uncomputability of the halting problem is the fact that there is no program that can take as input any program whatever and say whether it halts.

[-]neveronduty2y30

I am exactly in this post's target audience. Epic post.

[-]drocta2y30

First, I want to summarize what I understand to be what your example is an example of:
"A triple consisting of
1) A predicate P
2) the task of generating any single input x for which P(x) is true
3) the task of, given any x (and given only x, not given any extra witness information), evaluating whether P(x) is true
"

For such triples, it is clear, as your example shows, that the second task (the 3rd entry) can be much harder than the first task (the 2nd entry).

_______

On the other hand, if instead one had the task of producing an exhaustive list of all x such that P(x), this, I think, cannot be easier that verifying whether such a list is correct (provided that one can easily evaluate whether x=y for whatever type x and y come from), as one can simply generate the list, and check if it is the same list.

Another question that comes to mind is: Are there predicates P such that the task of verifying instances which can be generated easily, is much harder than the task of verifying those kinds of instances?

It seems that the answer to this is also "yes": Consider P to be "is this the result of applying this cryptographic hash function to (e.g.) a prime number?". It is fairly easy to generate large prime numbers, and then apply the hash function to it. It is quite difficult to determine whether something is the hash of a prime number (... maybe assume that the hash function produces more bits for longer inputs in order to avoid having pigeonhole principle stuff that might result in it being highly likely that all of the possible outputs are the hash of some prime number. Or just put a bound on how big the prime numbers can be in order for P to be true of the hash.)

(Also, the task of "does this machine halt", given a particular verifying process that only gets the specification of the machine and not e.g. a log of it running, should probably(?) be reasonably easy to produce machines that halt but which that particular verifying process will not confirm that quickly.
So, for any easy-way-to-verify there is an easy-way-to-generate which produces ones that the easy-way-to-verify cannot verify, so that seems to be another reason why "yes", though, there may be some subtleties here?)

[-]Aprillion2y30

TypeError: Comparing different "solutions".

How do I know that I generated a program that halts?

a) I can prove to myself that my program halts => the solution consists of both the program and the proof => the verification problem is a sub-problem of the generation problem.

b) I followed a trusted process that is guaranteed to produce valid solutions => the solution consists of both the program and the history that generated the proof of the process => if the history is not shared between the 2 problems, then you redefined "verification problem" to include generation of all of the necessary history, and that seems to me like a particularly useless point of view (for the discussion of P vs NP, not useless in general).

In the latter point of view, you could say:

Predicate: given a set of numbers, is the first the sum of the latter 2?
Generation problem: provide an example true solution: "30 and prime factors of 221"
Verification problem: verify that 30 is the sum of prime factors of 221

WTF does that have to say about P vs NP? ¯\_(ツ)_/¯

[-][anonymous]2y30

Predicate: given a program, does it halt?
Generation problem: generate a program which halts.
Verification problem: given a program, verify that it halts.

In this example, I'm not sure that generation is easier than verification in the most relevant sense.

In AI alignment, we have to verify that some instance of a class satisfies some predicate. E.g. is this neural network aligned, is this plan safe, etc. But we don't have to come up with an algorithm that verifies if an arbitrary example satisfies some predicate.

The above example is only relevant if there is some program P that halts, where it's easier to generate P than to verify that P halts. If that's the case, then it there might be some dangerous output O, where it's easier for the AI to generate O than it is for us to verify O.

[-]quetzal_rainbow2y30

Generally, statement "solutions of complex problems are easy to verify" is false. Your problem can be EXPTIME-complete, but not in NP, especially if NP=P, because EXPTIME-complete problems are strictly not in P.

And even if some problem is NP-problem, we often don't know verification algorithm.

[-]ryan_b2y20

Would it be correct to consider things like proof by contradiction and proof by refutation as falling on the generation side, as they both rely on successfully generating a counterexample?

Completely separately, I want to make an analogy to notation in the form of the pi vs tau debate. Short background for those who don't want to wade through the link (though I recommend it, it is good fun): pi, the circle constant, is defined as the ratio between the diameter of a circle and its circumference; tau is defined as the ratio between the radius of a circle and its circumference. Since the diameter is twice the radius, tau is literally just 2pi, but our relentless habit of pulling out or reducing away that 2 in equations makes everything a little harder and less clear than it should be.

The bit which relates to this post is that it turns out that pi is the number of measurement. If we were to encounter a circle in the wild, we could characterize the circle with a length of string by measuring the circle around (the circumference) and at its widest point (the diameter). We cannot measure the radius directly. By contrast, tau is the number of construction: if we were to draw a circle in the wild, the simplest way is to take two sticks secured together at an angle (giving the radius between the two points), hold one point stationary and sweep the other around it one full turn (the circumference).

Measurement is a physical verification process, so I analogize pi to verification. Construction is the physical generation process, so I analogize tau to generation.

I'm on the tau side in the debate, because cost of adjustment is small and the clarity gains are large. This seems to imply that tau, as notation, captures the circle more completely than pi does. My current feeling, based on a wildly unjustified intuitive leap, is that this implies generation would be the more powerful method, and therefore it would be "easier" to solve problems within its scope.

[-]Towards_Keeperhood2y20

(Not sure if I'm missing something, but my initial reaction:)

There's a big difference between being able to verify for some specific programs if they have a property, and being able to check it for all programs.

For an arbitrary TM, we cannot check whether it outputs a correct solution to a specific NP complete problem. We cannot even check that it halts! (Rice's theorem etc.)

Not sure what alignment relevant claim you wanted to make, but I doubt this is a valid argument for it.

[-]Towards_Keeperhood2y10

Likewise, for some specific programs we can verify that they halt.

Moderation Log