Review

Abstract: Numerical simulations are versatile predictive tools that permit explorations of complex systems. The ability of LLM agents to simulate real-world scenarios will expand the AI risk landscape. In the proxy-simulation threat model, a user (or a deceptively aligned AI) can obfuscate the goal behind simulation-based predictions by leveraging the generalizability of simulation tools. Three highly idealized proxy-simulation examples are presented that illustrate how damage, casualties, and concealment of illegal activities can be planned for, in obfuscation. This approach bypasses existing alignment and safety filters (GPT4, Claude2 and LLama2). AI-enabled simulations facilitate access to prediction-based planning that is not otherwise readily available. To the extent that goal obfuscation is possible, this increases AI risk.

New Comment