Dylan Cope

Message

[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs

This research was completed for London AI Safety Research (LASR) Labs 2024 by Yohan Mathew, Ollie Matthews, Robert McCarthy and Joan Velja. The team was supervised by Nandi Schoots and Dylan Cope (King’s College London, Imperial College London). Find out more about the programme and express interest in upcoming iterations...

Sep 25, 2024•37

Message

31 karma

Member for 7 years

Dylan Cope — LessWrong

Dylan Cope

Message

Dylan Cope

[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs

Sep 25, 2024•37

Message

31 karma

Member for 7 years

[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs

Yohan Mathew

Yohan Mathew, joanv, robert mccarthy, ollie, Nandi, Dylan Cope+ 0 more

Yohan Mathew, joanv, robert mccarthy, ollie, Nandi, Dylan Cope

The full paper can be found here, while a short video presentation covering the highlights of the paper is here (note that some graphs have been updated since the presentation).

Introduction

Collusion in multi-agent systems is defined as 2 or more agents covertly coordinating to the disadvantage of other agents [6], while steganography is the practice of concealing information within a message while avoiding detection.... (read 927 more words →)

LESSWRONG
LW

LESSWRONG
LW

Dylan Cope

Dylan Cope

Dylan Cope

[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs

Dylan Cope

Dylan Cope

Dylan Cope

[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs