"X is possible in principle" means X is in the space of possible mathematical things (as an independent claim to whether humans can find it).
Thanks, when you say “in the space of possible mathematical things”, do you mean “hypothetically possible in physics” or “possible in the physical world we live in”?
The claim "alignment is solvable in principle" means "there are possible worlds where alignment is solved."
Consequently, the claim "alignment is unsolvable in principle" means "there are no possible worlds where alignment is solved."
Thanks!
With ‘possible worlds’, do you mean ‘possible to be reached from our current world state’?
And what do you mean with ‘alignment’? I know that can sound like an unnecessary question. But if it’s not specified, how can people soundly assess whether it is technically solvable?
Here's how I specify terms in the claim:
Typically, I saw researchers make this claim confidently in one sentence. Sometimes, it's backed by a loose analogy. [1]
This claim is cruxy. If alignment is not solvable, then the alignment community is not viable. But little is written that disambiguates and explicitly reasons through the claim.
Have you claimed that ‘AGI alignment is solvable in principle’?
If so, can you elaborate what you mean with each term? [2]
Below I'll also try specify each term, since I support research here by Sandberg & co.
Some analogies I've seen a few times (rough paraphrases):
E.g. what does ‘in principle’ mean? Does it assert that the problem described is solvable based on certain principles, or some model of how the world works?