Steganography and the CycleGAN - alignment failure case study
1. This is a (lightly edited) transcript of a lightning talk I gave during the Virtual AI Safety Camp 2022. The original talk in video format can be found here (can also by listened to) 2. Many thanks to Remmelt Ellen for preparing the initial version of the transcript and motivating me to publish this 3. I could probably improve this post a lot, but I decided to publish it as is because otherwise there's a chance I'd have never published it. Just to start – recall the story of the PaperClip Maximiser - a story about a model that is doing things that we don't want it to do (like destroying humanity) as an unintended side effect of fulfilling a goal it was explicitly programmed to pursue. The story that I wanted to tell today is one where such a situation occurred in real life, which is quite simple, well-documented, and not really widely known. So coming back to 2017, I just started working on my first position that had NLP in the title, so I was very excited about artificial intelligence and learning more about it. Somewhere around the same time, the CycleGAN paper came out. For those not familiar: what is CycleGAN about? In short, it was a clever way to put two GANs (Generative Adversarial Networks - how exactly these work is not important here) together that allowed training a model to translate pictures between two domains without any paired training examples. For example, in the middle, we see a translation from zebras to horses. You don't have datasets of paired examples of photos of zebras and photos of horses in the same postures. Source: [1] So how does this system work more or less? The important part here is that basically, you train two translators. Here they are called G and F. One is translating data from domain X to Y, and the other one is translating the other way around. Source: [1] In training, to the given training sample you first apply G, then F. So, in the horses-to-zebras example: you take a picture of a horse and first you apply G to
What can EA learn from Scouting
epistemic status: random notes for writing something more structured in the future
After EA fUnconference in Berlin, I'm thinking more and more about what we can learn from the scouting movement for purpose of community building/cooperation. Some elements characteristic of scouting (note, it's a rough idea, I don't think all of them are worth reproducing):
EDIT: to clarify (as I'm afraid now in this chaotic draft this may be misunderstood) this is not a list of things I observed on fUnconference. There there was only a partial implementation of the 'small groups' idea, as indicated in the text. Everything else is my memories from scouting times, and I never observed... (read more)