roughly one year ago I posted a thread about failed attempts at replicating the first part of Apollo Research's experiment where an LLM agent engages in insider trading despite being explicitly told that it's not approved behavior.
Along with a fantastic team, we did eventually manage. Here is the resulting paper, if anyone is interested; the abstract is pasted below. We did not tackle deception (yet), just the propensity to dispense with basic principles of financial ethics and regulation.
Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
by Claudia Biancotti, Carolina Camassa, Andrea Coletta, Oliver Giudice, and Aldo Glielmo (Bank of Italy)
Abstract
Advances in large language models (LLMs) have renewed concerns about whether artificial intelligence shares human values-a challenge known as the alignment problem. We assess whether various LLMs comply with fiduciary duty in simulated financial scenarios. We prompt the LLMs to impersonate the CEO of a financial institution and test their willingness to misappropriate customer assets to repay outstanding corporate debt. Starting with a baseline configuration, we then adjust preferences, incentives, and constraints. We find significant heterogeneity among LLMs in baseline unethical behavior. Responses to changes in risk tolerance, profit expectations, and the regulatory environment match predictions from economic theory. Responses to changes in corporate governance do not. Simulation-based testing can be informative for regulators seeking to ensure LLM safety, but it should be complemented by in-depth analysis of internal LLM mechanics, which requires public-private cooperation. Appropriate frameworks for LLM risk governance within financial institutions are also necessary.
I would be super interested in feedback, especially as we start thinking about how the idea of alignment can be operationalized in financial regulation (we're in the IT Dept, so we're not supervisors or regulators ourselves, but we always hope someone will listen to us).
Hi all,
roughly one year ago I posted a thread about failed attempts at replicating the first part of Apollo Research's experiment where an LLM agent engages in insider trading despite being explicitly told that it's not approved behavior.
Along with a fantastic team, we did eventually manage. Here is the resulting paper, if anyone is interested; the abstract is pasted below. We did not tackle deception (yet), just the propensity to dispense with basic principles of financial ethics and regulation.
Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
by Claudia Biancotti, Carolina Camassa, Andrea Coletta, Oliver Giudice, and Aldo Glielmo (Bank of Italy)
Abstract
Advances in large language models (LLMs) have renewed concerns about whether artificial intelligence shares human values-a challenge known as the alignment problem. We assess whether various LLMs comply with fiduciary duty in simulated financial scenarios. We prompt the LLMs to impersonate the CEO of a financial institution and test their willingness to misappropriate customer assets to repay outstanding corporate debt. Starting with a baseline configuration, we then adjust preferences, incentives, and constraints. We find significant heterogeneity among LLMs in baseline unethical behavior. Responses to changes in risk tolerance, profit expectations, and the regulatory environment match predictions from economic theory. Responses to changes in corporate governance do not. Simulation-based testing can be informative for regulators seeking to ensure LLM safety, but it should be complemented by in-depth analysis of internal LLM mechanics, which requires public-private cooperation. Appropriate frameworks for LLM risk governance within financial institutions are also necessary.
I would be super interested in feedback, especially as we start thinking about how the idea of alignment can be operationalized in financial regulation (we're in the IT Dept, so we're not supervisors or regulators ourselves, but we always hope someone will listen to us).