On March 2, 2024, Jim Jordan calls on Google to testify before congress over the extent to which Google colluded with, or was coerced by, the Executive Branch into censoring lawful speech. Jack Krawczyk, Google’s Senior Director of Product for Gemini, and Jen Gennai, (former) Director of Google’s Responsible Innovation...
Meet inside The Shops at Waterloo Town Square - we will congregate in the indoor seating area next to the Your Independent Grocer with the trees sticking out in the middle of the benches (pic) at 7:00 pm for 15 minutes, and then head over to my nearby apartment's amenity room. If you've been around a few times, feel free to meet up at the front door of the apartment at 7:30 instead.
Lighter fare time is over, let’s discuss gradual disempowerment risks from advanced AI.
Readings:
The original paper: https://arxiv.org/pdf/2501.16946
For a paper it’s actually very readable and not too long (~18 pages).
Alternatively, you can read my much shorter summary of the paper + some opinion here.
Optional:
RLHF does work currently? What makes you think it doesn't work currently?
This is definitely the crux so probably really the only point worth debating.
RLHF is just papering over problems. Sure, the model is slightly more difficult to jailbreak but it's still pretty easy to jailbreak. Sure, the agent is less likely through RLHF to output text you don't like, but I think agents will reliably overcome that obstacle as useful agents won't just be outputting the most probable continuation, they'll be searching through decision space and finding those unlikely continuations that will score well on its task. I don't think RLHF does anything remotely analogous to making it care about whether it's... (read more)
(Sorry, hard to differentiate quotes from you vs quotes from the paper in this format)
(I agree it is assuming that the judge has that goal, but I don't see why that's a terrible assumption.)
If the judge is human, sure. If the judge is another AI, it seems like a wild assumption to me. The section on judge safety in your paper does a good job of listing many of the problems. On thing I want to call out as something I more strongly disagree with is:
... (read more)One natural approach to judge safety is bootstrapping: when aligning a new AI system, use the previous generation of AI (that has already been aligned) as the
... (read more)The key idea is to use the AI systems themselves to help identify the reasons that the AI system produced the output. For example, we could put two copies of the model in a setting where each model is optimized to point out flaws in the other’s outputs to a human “judge”. Ideally, if one model introduced a subtle flaw in their output that the judge wouldn’t notice by default, the other model would point out and explain the flaw, enabling the judge to penalise the first model appropriately.
In amplified oversight, any question that is too hard to supervise directly is systematically reduced to ones that we hypothesize can be supervised. However,
It's copied from the pdf which hard-codes line breaks.
When people argue many AIs competing will make us safe, Yud often counters with how AI will coordinate with each other but not us. This is probably true, but not super persuasive. I think a more intuitive explanation is that offense and defense are asymmetrical. An AI defending my home cannot simply wait for attacks to happen and then defend against them (eg another AI cuts off the power, or fries my AI's CPU with a laser). To truly defend my home, an AI would have to monitor and, importantly, control a hugely outsized part of the world (possibly the entire world).
It's satire, links are real though.
On March 2, 2024, Jim Jordan calls on Google to testify before congress over the extent to which Google colluded with, or was coerced by, the Executive Branch into censoring lawful speech. Jack Krawczyk, Google’s Senior Director of Product for Gemini, and Jen Gennai, (former) Director of Google’s Responsible Innovation team, appear for transcribed interviews with the Committee. What follows is an excerpt of the transcript from that interview.
CONGRESSMAN: Thank you Mr. Chairman. I want to thank the witness for being willing to testify here today.
JEN GENNAI: Of course.
CONGRESSMAN: Now, you work on “Responsible Innovation” at Google, with a focus on ethical AI, is that right?
JEN GENNAI: Broadly, yes.
CONGRESSMAN: And your name is... (read more)
In my non-tech circles people mostly complain about AI stealing jobs from artists, companies making money off of other people's work, etc.
People are also just scared of losing their own jobs.
Also, his statements in the verge are so bizarre to me:
"SA: I learned that the company can truly function without me, and that’s a very nice thing. I’m very happy to be back, don’t get me wrong on that. But I come back without any of the stress of, “Oh man, I got to do this, or the company needs me or whatever.” I selfishly feel good because either I picked great leaders or I mentored them well. It’s very nice to feel like the company will be totally fine without me, and the team is ready and has leveled up."
2 business days away and the company is ready to blow up if... (read more)
Let that last paragraph sink in. The leadership team ex-Greg is clearly ready to run the company without Altman.
I'm struggling to interpret this, so your guesses as to what this might mean would be helpful. It seems he clearly wanted to come back - is he threatening to leave again if he doesn't get his way?
Also note Ilya not included in the leadership team.
While Ilya will no longer serve on the board, we hope to continue our working relationship and are discussing how he can continue his work at OpenAI.
This statement also really stood out to me - if there really was no ill will, why would they have to discuss how Ilya can continue his work? Clearly there's something more going on here. Sounds like Ilya's getting the knife.
One of the problems with low german as I understand it (I have mennonite grandparents that speak it), is that there's no formal spelling system, and an abundance dialects. There's also not a ton of low german content online to train off of, though there are a decent number of books written in low german.
I'd be curious to know how good LLM translation is despite that.