It is an open problem to propose a limited AI that would be relevant to the value achievement dilemma - an agent cognitively constrained along some dimensions that render it much safer, but still able to perform some task useful enough to prevent catastrophe.
Consider an Oracle AI that is so constrained as to be allowed only to output proofs in HOL of input theorems; these proofs are then verified by a simple and secure-seeming verifier in a sandbox whose exact code is unknown to the Oracle, and this verifier outputs 1 if the proof is true and 0 otherwise, then discards the proof-data. Suppose also that the Oracle is in a shielded box, etcetera.
It's possible that this Provability Oracle has been so constrained that it is cognitively containable (it has no classes of options we don't know about). If the verifier is unhackable, it gives us trustworthy knowledge that a theorem is provable. But this limited system is not obviously useful in a way that enables humanity to extricate itself from its larger dilemma. Nobody has yet stated a plan which could save the world if only we had a superhuman capacity to detect which theorems were provable in Zermelo-Fraenkel set theory.
Saying "The solution is for humanity to only build Provability Oracles!" does not resolve the value achievement dilemma because humanity does not have the coordination ability to 'choose' to develop only one kind of AI over the indefinite future, and the Provability Oracle has no obvious use that prevents non-Oracle AIs from ever being developed. Thus our larger value achievement dilemma would remain unsolved. It's not obvious how the Provability Oracle would even constitute significant strategic progress.
Describe a cognitive task or real-world task for a AI to carry out, that makes great progress upon the value achievement dilemma if executed correctly, and that can be done with a limited AI that: