[Some thoughts that are similar but different to my previous comment;]
I suspect you can often just prove the behavioral selection theorem and structural selection theorem in separate, almost independent steps.
Behavior essentially serves as an "interface", and a given behavior can be implemented by any number of different structures. So it would make sense that you need to prove something about structure separately (and that you can prove it for multiple different types of structural assumption).
Further claims: for any given structural class,
A structural class is something like programs, or Markov chains, or structural causal models. The point of specifying structure is to in some way model how the system might actually be shaped in real life. So it seems to me that any of these will be specified with a finite string over a finite alphabet. This comes with the natural simplicity measure of the length of the specification string, and there are exponentially fewer short strings than long ones.[1]
So let's say you want to prove that your thing X which has behavior B has specific structure S. Since structure S has a fixed description length, you almost automatically know that it's exponentially less likely for X to be one of the infinitely many structures with description length longer than S. (Something similar holds for being within delta of S) The remaining issue is whether there are any other secret structures that are shorter than S (or of similar length) that X could be instead.
Technically, you could have a subset of strings that didn't grow exponentially. For example, you could, for some reason, decide to specify your Markov chains using only strings of zeros. That would grow linearly rather than exponentially. But this is clearly a less natural specification method.
For some reason the "only if" always throws me off. It reminds me of the unless
keyword in ruby, which is equivalent to if not
, but somehow always made my brain segfault.
It's maybe also worth saying that any other description method is a subset of programs (or is incomputable and therefore not what real-world AI systems are). So if the theoretical issues in AIT bother you, you can probably make a similar argument using a programming language with no while loop, or I dunno, finite MDPs whose probability distributions are Gaussian with finite parameter descriptions.
Yeah, I think structural selection theorems matter a lot, for reasons I discussed here.
This is also one reason why I continue to be excited about Algorithmic Information Theory. Computable functions are behavioral, but programs (= algorithms) are structural! The fact that programs can be expressed in the homogeneous language of finite binary strings gives a clear way to select for structure; just limit the length of your program. We even know exactly how this mathematical parameter translates into real-world systems, because we can know exactly how many bits our ML models take up on the hard drives.
And I think you can use algorithmic information distance to well-define just how close to agent-structured your policy is. First, define the specific program A that you mean to be maximally agent-structured (which I define as a utility-maximizing program). If your policy (as a program) can be described as "Program A, but different in ways X" then we have an upper bound for how close it is to agent-structured it is. X will be a program that tells you how to transform A into your policy, and that gives us a "distance" of at most the length of X in bits.
For a given length, almost no programs act anything like A. So if your policy is only slightly bigger than A, and it acts like A, then it's probably of the form "A, but slightly different", which means it's agent-structured. (Unfortunately this argument needs like 200 pages of clarification.)
FWIW I think this would be a lot less like "tutoring" and a lot more like "paying people to tell you their opinions". Which is a fine thing to want to do, but I just want to make sure you don't think there's any kind of objective curriculum that comprises AI alignment.
Nice! Yeah I'd be happy to chat about that, and also happy to get referrals of any other researchers who might be interested in receiving this funding to work on it.
FWIW I have used Perplexity twice since you mentioned it, it was somewhat helpful both times, but also, both times the citations had errors. By that I mean it would say something and then put a citation number next to it, but what it said was not in the cited document.
Positive feedback, I am happy to see the comment karma arrows pointing up and down instead of left and right. I have some degree of left-right confusion and was always click and unclicking my comments votes to figure out which was up and down.
Also appreciate that the read time got put back into main posts.
(Comment font stuff looks totally fine to me, both before and after this change.)