I think the constraint-based problems are more intuitive. As someone who thinks about this regularly, the classical examples had an abstract, alignment-theoretic texture, while the constraint-based ones seemed more relatable to something I’d actually be doing on a daily basis.
The specific constraint-based example chosen would be dependent on the audience. If all your readers are familiar with the process of completing literature reviews, go for that - otherwise, the CEO problem seems most natural.
Variable constraint optimisation: Wedding planner. Tasked with maximizing satisfaction of the people getting married of the event, subject to constraints that it fit within a given budget.
That sounds like classical constrained optimisation. Does the wedding planner have power to increase the budget?
Can you help find the most intuitive example of reward function learning?
In reward function learning, there is a set of possible non-negative reward functions, R, and a learning process ρ which takes in a history of actions and observations and returns a probability distribution over R.
If π is a policy, Hm is the set of histories of length m, and Pπ(hm) is the probability of hm∈Hm given that the agent follows policy π, the expected value of π at horizon m is:
where R(hm) is the total R-reward over the history hm. Problems can occur if ρ is riggable (this used to be called "biasable", but that term was over-overloaded), or influenceable.
There's an interesting subset of value learning problems, which could be termed "constrained optimisation with variable constraints" or "variable constraints optimisation". In that case, there is an overall reward R, and every R′∈R is the reward R subject to constraints C. This can be modelled as having C(hm) being 1 (if the constraints are met) and 0 (if they are not).
Then if we define RC(hm)=R(hm)C(hm), and let ρ be a distribution over C, the set of constraints, the equation changes to:
If ρ is riggable or influenceable, similar sorts of problems occur.
Intuitive examples
Here I'll present some examples of reward function learning or variable constraints optimisation, and I'm asking for readers to give their opinions as to which one seems the most intuitive to you, and the easiest to explain to outsiders. You're also welcome to suggest new examples if you think they work better.