Brevity of code and english can correspond via abstraction.
I don't know why brevity in low and high abstraction programs/explanations/ideas would correspond (I suspect they wouldn't). If brevity in low/high abstraction stuff corresponded; isn't that like contradictory? If a simple explanation in high abstraction is also simple in low abstraction then abstraction feels broken; typically ideas only become simple after abstraction. Put another way: the reason to use abstraction is to make ideas/thing that are highly complex into things that are less complex.
I think Occam's Razor makes sense only if you take into account abstractions (note: O.R. itself is still a rule of thumb regardless). Occam's Razor doesn't make sense if you think about all the extra stuff an explanation invokes - partially because that body of knowledge grows as we learn more, and good ideas become more consistent with the population of other ideas over time.
When people think of short code they think of doing complex stuff with a few lines of code. e.g. cat asdf.log | cut -d ',' -f 3 | sort | uniq
. When people think of (good) short ideas they think of ideas which are made of a few well-established concepts that are widely accessible and easy to talk about. e.g. we have seasons because energy from sunlight fluctuates ~sinusoidally through our annual orbit.
One of the ways SI can use abstraction is via the abstraction being encoded in both the program, program inputs, and the observation data.
(I think) SI uses an arbitrary alphabet of instructions (for both programs and data), so you can design particular abstractions into your SI instruction/data language. Of course the program would be a bit useless for any other problem than the one you designed it for, in this case.
Is there literature arguing that code and English brevity usually or always correspond to each other?
I don't know of any.
If not, then most of our reasons for accepting Occam’s Razor wouldn’t apply to SI.
I think some of the reasoning makes sense in a pointless sort of way. e.g. the hypothesis 1100
corresponds to the program "output 1 and stop". The input data is from an experiment, and the experiment was "does the observation match our theory?", and the result was 1
. The program 1100
gets fed into SI pretty early, and it matches the predicted output. The reason this works is that SI found a program which has info about 'the observation matching the theory' already encoded, and we fed in observation data with that encoding. Similarly, the question "does the observation match our theory?" is short and elegant like the program. The whole thing works out because all the real work is done elsewhere (in the abstraction layer).
Yeah okay, I think that's fair.
My issue generally (which is in my reply to johnswentworth) is the overhead is non-negligible if you're going to invoke a human. In that case we can't conclude that simplicity would carry over from the english representation to the code representation. So this argument doesn't answer the question.
You do say it's a loose bound, but I don't think it's useful. One big reason is that the overhead would dwarf any program we'd ever run, and pretty much every program would look identical b/c of the overhead. For simplicity to carry over we need relatively small overhead (even like the entire python runtime is only like 20mb extra via py2exe, much smaller than a mind and definitely not simple).
Maybe it's worth mentioning the question in OP: I read it as: "why would stuff the simplicity an idea had in one form (code) necessarily correspond to simplicity when it is in another form (english)? or more generally: why would the complexity of an idea stay roughly the same when the idea is expressed through different abstraction layers?" After that there's implications for Occam's Razor. Particularly it's relevant b/c occam's razor would give different answers when comparing ideas at different levels of abstraction, and if that's the case we can't be sure that ideas which are simple in english will be simple in code and we don't have a reason for Occam's Razor applying to SI.
Does that line up with what you think OP is about? If not we might be talking cross-purposes.
Ahh okay; first class functions.
Re "perform operations on [functions]": you can make new functions and partially or fully apply functions, but that's about it. (that does mean you can partially apply functions and pass them on, though, which is super useful)
I agree with you that the theoretical upper bound on the minimum overhead is the size of a compiler/interpreter.
I think we might disagree on this, though: the compiler/interpreter includes data such as initial conditions (e.g. binary extensions, dynamic libraries, etc). I think this is an issue b/c there's no upper bound on that. If you invoke a whole person then it's an issue b/c for that person to solve more and more complex problems (or a wider and wider array) those initial conditions are going to grow correspondingly. Our estimates for the data requirements to store a mind are like 1020 bits. I'd expect the minimum required data to drop as problems got "simpler", but my intuition is that pattern is not the same pattern as what Occam's Razor gives us (e.g. minds taking less data can still think about what Thor would do).