Paul Christiano: "Catastrophic Misalignment of Large Language Models"
Talk recording: https://www.youtube.com/watch?v=FyTb81SS_cs (no transcript or chat replay yet)
Talk page: https://pli.princeton.edu/events/2024/catastrophic-misalignment-large-language-models
We're thrilled to invite you to attend the virtual Princeton AI Alignment and Safety Seminar (PASS)!
Ensuring safe behavior by aligning increasingly capable models is crucial, and PASS offers a virtual, collaborative platform for researchers from various backgrounds and institutions to explore these vital issues. Bi-weekly talks will be live-streamed, with opportunities for audience engagement and recordings shared via our PLI-Info YouTube channel. Our inaugural lineup features esteemed experts including Paul Christiano, Aleksander Madry, Dan Hendrycks, Irene Soleiman, John Schulman, and Jacob Steinhardt. Stay informed and receive seminar reminders by joining our mailing list: https://tinyurl.com/pass-mailing
Our first talk will be on Tuesday, March 19th at 2pm Eastern Time. The livestream will be here, and a recording will be posted to the channel afterwards. Please submit your questions for our speakers! Paul Christiano from the Alignment Research Center will be talking about Catastrophic Misalignment of Large Language Models. The abstract is below.
I’ll discuss two possible paths by which AI systems could be so misaligned that they attempt to deceive and disempower their human operators. I’ll review the current state of evidence about these risks, what we might hope to learn over the next few years, and how we could become confident that the risk is adequately managed.