Something happened to me a few months back that I still don't have a satisfying explanation for. I was in a small, 10x10 room, and on my way out. Still a few paces from being within arm's length of the light switch, my partner asked me to "turn off the...
Domain: Music, songwriting
Link: The Beatles: Get Back
Person: The Beatles
Background: the making of the Beatles' 1970 album Let It Be
Why: Nearly 8 hours of remarkably raw footage, documenting the Beatles creating and recording Let It Be.
One of the best short stories I've read in a while
Seems like a huge point here is ability to speak unfiltered about AI companies? The Radicals working outside of AI labs would be free to speak candidly while the Moderates would have some kind of relationship to maintain.
Even if the internals-based method is extremely well supported theoretically and empirically (which seems quite unlikely), I don't think this would suffice for this to trigger a strong response by convincing relevant people
Its hard for me to imagine a world where we really have internals-based methods that are "extremely well supported theoretically and empirically," so I notice that I should take a second to try and imagine such a world before accepting the claim that internals-based evidence wouldn't convince the relevant people...
Today, the relevant people probably wouldn't do much in response to the interp team saying something like: "our deception SAE is firing when we ask the model bio risk questions, so... (read more)
"Lots of very small experiments playing around with various parameters" ... "then a slow scale up to bigger and bigger models"
This Dwarkesh timestamp with Jeff Dean & Noam Shazeer seems to confirm this.
"I'd also guess that the bottleneck isn't so much on the number of people playing around with the parameters, but much more on good heuristics regarding which parameters to play around with."
That would mostly explain this question as well: "If parallelized experimentation drives so much algorithmic progress, why doesn't gdm just hire hundreds of researchers, each with small compute budgets, to run these experiments?"
It would also imply that it would be a big deal if they had an AI with good heuristics for this kind of thing.
I would love to see an analysis and overview of predictions from the Dwarkesh podcast with Leopold. One for Situational awareness would be great too.
Seems like a pretty similar thesis to this: https://www.lesswrong.com/posts/fPvssZk3AoDzXwfwJ/universal-basic-income-and-poverty
I expect that within a year or two, there will be an enormous surge of people who start paying a lot of attention to AI.
This could mean that the distribution of who has influence will change a lot. (And this might be right when influence matters the most?)
I claim: your effect on AI discourse post-surge will be primarily shaped by how well you or your organization absorbs this boom.
The areas I've thought the most about this phenomena are:
(But this applies to anyone who's impact primarily comes from spreading their ideas, which is a lot of people.)
I think that you or your... (read more)
Securing AI labs against powerful adversaries seems like something that almost everyone can get on board with. Also, posing it as a national security threat seems to be a good framing.
The scene in planecrash where Keltham gives his first lecture, as an attempt to teach some formal logic (and a whole bunch of important concepts that usually don't get properly taught in school), is something I'd highly recommend reading! As far as I can remember, you should be able to just pick it up right here, and follow the important parts of the lecture without understanding the story
Should it be more tabooed to put the bottom line in the title?
Titles like "in defense of <bottom line>" or just "<bottom line>" seem to:
When making safety cases for alignment, its important to remember that defense against single-turn attacks doesn't always imply defense against multi-turn attacks.
Our recent paper shows a case where breaking up a single turn attack into multiple prompts (spreading it out over the conversation) changes which models/guardrails are vulnerable to the jailbreak.
Robustness against the single-turn version didn't imply robustness against the multi-turn version of the attack, and robustness against the multi-turn version didn't imply robustness against the single-turn version of the attack.
Something happened to me a few months back that I still don't have a satisfying explanation for.
I was in a small, 10x10 room, and on my way out. Still a few paces from being within arm's length of the light switch, my partner asked me to "turn off the lights, please."
The lights immediately turned off and the room went completely dark.
I stood there, shocked, standing in the darkness until the lights came on, probably 3/4 of a second later.
Relatedly, Staknova’s Berkeley Math Circle program was recently shut down due to new stringent campus background check requirements. Very sad.
Also, she was my undergrad math professor last year and was great.