A stupid question: in all the active discussions about (U)FAI I see a lot of talk about goals. I see no one talking about constraints. Why is that?
If you think that you can't make constraints "stick" in a self-modifying AI, you shouldn't be able to make a goal hierarchy "stick" as well. If you assume that we CAN program in an inviolable set of goals I don't see why we can't program in an inviolable set of constraints as well.
And yet this idea is obvious and trivial -- so what's wrong with it?
It's less an issue with value drift* -- which does need to be solved for both goals and constraints -- and more about the complexity of the system.
A well-designed goal hierarchy has an upper limit of complexity. Even if the full definition of human terminal values is too complicated to fit in a single human head, it can at least be extrapolated from things that fit within multiple human brains.
Even the best set of constraint heirachies do not share that benefit. Constraint systems in the real world are based around the complexity of our moral and ethical...
This is a thread where people can ask questions that they would ordinarily feel embarrassed for not knowing the answer to. The previous thread is at close to 500 comments.