Here's a quick sketch of a constructive version:
1) build a superintelligence that can model both humans and the world extremely accurately over long time-horizons. It should be approximately-Bayesian, and capable of modelling its own uncertainties, concerning both humans and the world, i.e. capable of executing the scientific method
2) use it to model, across a statistically representative sample of humans, how desirable they would say a specific state of the world X is
3) also model whether the modeled humans are in a state (drunk, sick, addicted, dead, suffering from religious fanaticism, etc) that for humans is negatively correlated with accuracy on evaluative tasks, and decrease the weight of their output accordingly
4) determine whether the humans would change their mind later, after learning more, thinking for longer, experiencing more of X, learning about or experiencing subsequent consequences of state X, etc - if so update their output accordingly
5) implement some chosen (and preferably fair) averaging algorithm over the opinions of the sample of humans
6) sum over the number of humans alive in state X and integrate over time
7) estimate error bars by predicting when and ...
The pessimizing over Knightian uncertainty is a graduated way of telling the model to basically "tend to stay inside the training distribution". Adjusting its strength enough to overcome the Look-Elsewhere Effect means we estimate how many bits of optimization pressure we're applying and then do the pessimizing harder depending on that number of bits, which, yes, is vastly higher for all possible states of matter occupying an 8 cubic meter volume than for a 20-way search (the former is going to be a rather large multiple of Avagadro's number of bits, the latter is just over 4 bits). So we have to stay inside what we believe we know a great deal harder in the former case. In other words, the point you're raising is already addressed, in a quantified way, by the approach I'm outlining. Indeed on some level the main point of my suggestion is that there is a quantified and theoretically motivated way of dealing with exactly this problem. The handwaving above is a just a very brief summary, accompanied by a link to a much more detailed post containing and explaining the details with a good deal less handwaving.
Trying to explain this piecemeal in a comments section isn't very efficient: ...
If your argument is, "if it is possible for humans to produce some (verbal or mechanical) output, then it is possible for a program/machine to produce that output", then, that's true I suppose?
I don't see why you specified "finite depth boolean circuit".
While it does seem like the number of states for a given region of space is bounded, I'm not sure how relevant this is. Not all possible functions from states to {0,1} (or to some larger discrete set) are implementable as some possible state, for cardinality reasons.
I guess maybe that's why you mentioned th...
The relevance to alignment is that the state you want is the one that is reached.
I think the main problem with the argument in the linked text is that it is too static. One is not looking for a static outcome, one is looking for a process with some properties.
And it might be that the set of properties one wants is contradictory. (I am not talking about my viewpoint, but about a logical possibility.)
For example, it might potentially be the case that there are no processes where superintelligence is present and the chances of "bad" things with "badness" e...
Your proof actually fails to fully account for the fact that any ASI must actually exist in the world. It would affect the world other then just through its outputs - e.g. if it's computation produces heat, that heat would also affect the world. Your proof does not show that the sum of all effects of the ASI on the world (both intentional + side-effects of it performing its computation) could be aligned. Further, real computation takes time - it's not enough for the aligned ASI to produce the right output, it also needs to produce it at the right time. You did not prove it to be possible.
Over time I have seen many people assert that “Aligned Superintelligence” may not even be possible in principle. I think that is incorrect and I will give a proof - without explicit construction - that it is possible.
The meta problem here is that you gave a "proof" (in quotes because I haven't verified it myself as correct) using your own definitions of "aligned" and "superintelligence", but if people asserting that it's not possible in principle have different definitions in mind, then you haven't actually shown them to be incorrect.
We’ll say that a state is in fact reachable if a group of humans could in principle take actions with actuators - hands, vocal chords, etc - that could realize that state.
The main issue here is that groups of humans may in principle be capable of great many things, but there's a vast chasm between "in principle" and "in practice". A superintelligence worthy of the name would likely be able to come up with plans that we wouldn't in practice be able to even check exhaustively, which is the sort of issue that we want alignment for.
Over time I have seen many people assert that “Aligned Superintelligence” may not even be possible in principle. I think that is incorrect and I will give a proof - without explicit construction - that it is possible.