People who’ve spent a lot of time thinking about P vs NP often have the intuition that “verification is easier than generation”. It’s easier to verify a solution to some equations than to find a solution. It’s easier to verify a password than to guess it. That sort of thing. The claim that it is easier to verify solutions to such problems than to generate them is essentially the claim that P ≠ NP, a conjecture which is widely believed to be true. Thus the intuition that verification is generally easier than generation.
The problem is, this intuition comes from thinking about problems which are in NP. NP is, roughly speaking, the class of algorithmic problems for which solutions are easy to verify. Verifying the solution to some equations is easy, so that problem is in NP.
I think a more accurate takeaway would be that among problems in NP, verification is easier than generation. In other words, among problems for which verification is easy, verification is easier than generation. Rather a less impressive claim, when you put it like that.
With that in mind, here is an algorithmic problem for which generation is easier than verification.
Predicate: given a program, does it halt?
Generation problem: generate a program which halts.
Verification problem: given a program, verify that it halts.
The generation problem is trivial. The verification problem is uncomputable.
That’s it for the post, you all can argue about the application to alignment in the comment section.
I know you said that you're not going to respond but in case you feel like giving a clarification I'd like to point out that I'm confused here.
Yes it usually easy to verify that a specific problem exists if the exact problem is pointed out to you[1].
But it's much harder to verify claim that there are no problems, this code is doing exactly what you want.
And AFAIK staying in a loop:
1) AI tells us "here's a specific problem"
2) We fix the problem then
3) Go back to step 1)
Doesn't help with anything? We want to be in a state where AI says "This is doing exactly what you want" and we have reasons to trust that (and that is hard to verify).
EDIT to add: I think I didn't make it clear enough what clarification I'm asking for.
Also it's possible that there are two problems, each problem is easy to fix on its own but it's really hard to fix them both at the same time (simple example: it's trivial to have 0 false positives or 0 false negatives when testing for a disease; it's much harder to eliminate both at the same time).
[1] Well it can be hard to reliably reproduce problem, even if you know exactly what the problem is (I know because I couldn't write e2e tests to verify some bug fixes).