Basically, the AI does the following:
Create a list of possible futures that it could cause.
For each person and future at the time of the AI's activation:
1. Simulate convincing that person that the future is going to happen.
2. If the person would try to help the AI, add 1 to the utility of that future, and if the person would try to stop the AI, subtract 1 from the utility of that future.
Cause the future with the highest utility
One thing I'd be concerned about is that there are a lot of possible futures that sound really appealing, and that a normal human would sign off on, but are actually terrible (similar concept: siren worlds).
For example, in a world of Christians the AI would score highly on a future where they get to eternally rest and venerate God, which would get really boring after about five minutes. In a world of Rationalists the AI would score highly on a future where they get to live on a volcano island with catgirls, which would also get really boring after about five minutes.
There are potentially lots of futures like this (that might work for a wider range of humans), and because the metric (inferred approval after it's explained) is different from the goal (whether the future is good) and there's optimisation pressure increasing with the number of futures considered, I would expect it to be Goodharted.
Some possible questions this raises: