Compared to other people on this site this is a part of my alignment optimism. I think that there are Natural abstractions in the moral landscape that makes agents converge towards cooperation and similar things. I read this post recently and Leo Gao made an argument that concave agents generally don't exist because since they stop existing. I think that there are pressures that conform agents to part of the value landscape.
Like I agree that the orthogonality thesis is presumed to be true way too often. It is more like an argument that it may not happen by default but I'm also uncertain about the evidence that it actually gives you.
Orthogonality thesis says that it's invalid to conclude benevolence from the premise of powerful optimization, it gestures at counterexamples. It's entirely compatible with benevolence being very likely in practice. You then might want to separately ask yourself if it's in fact likely. But you do need to ask, that's the point of orthogonality thesis, its narrow scope.
an assumption that objective norms / values do not exist. In my opinion AGI would not make this assumption
The question isn't whether every AGI would or would not make this assumption, but whether it's actually true, and therefore whether it's true that a powerful AGI could have a wide range of goals or values, including the possibility that they're alien or contradictory to common human values.
I think it's highly unlikely that objective norms/values exist, and that weak versions of orthogonality (not literally ANY goals are possible, but enough bad ones to still be worried) are true. Even more strongly, I think it hasn't been shown that they're false, and we should take the possibility very seriously.
Orthogonality thesis is not about the existence or nonexistence of "objective norms/values", but whether a specific agent could have a specific goal. The thesis says that for any specific goal, there can be an intelligent agent that has the goal.
To simplify it, the question is not "is there an objective definition of good?" where we probably disagree, but rather "can an agent be bad?" where I suppose we both agree the answer is clearly yes.
More precisely, "can a very intelligent agent be bad?". Still, the answer is yes. (Even if there is such thing as "objective norms/values", the agent can simply choose to ignore them.)
Orthogonality Thesis (as well as Fact–value distinction) is based on an assumption that objective norms / values do not exist. In my opinion AGI would not make this assumption, it is a logical fallacy, specifically argument from ignorance. As black swan theory says - there are unknown unknowns. Which in this context means that objective norms / values may exist, maybe they are not discovered yet. Why Orthogonality Thesis has so much recognition?