I work primarily on AI Alignment. Scroll down to my pinned Shortform for an idea of my current work and who I'd like to collaborate with.
Website: https://jacquesthibodeau.com
Twitter: https://twitter.com/JacquesThibs
GitHub: https://github.com/JayThibs
Another option I didn't mention is to build a company with the intention of getting acquired. This is generally bizarre, and VCs don't like it, since you'd be unlikely to deliver massive returns for them (most acquisitions are considered failures). That said, acquisitions in the AI space are quite high. Then again, VCs may be concerned the founders will just get acquihired instead.
One way that might work is to basically have no legitimate revenue for a few years and still build something the big labs really want at some point in the future (unclear what this is, but a non-AI safety company like Bun can get acquired with virtually no revenue afaict, though they only raised 7 million in funding in 2022). From an AI safety perspective, it's unclear how it would play out since your goal might be to have your tech disseminated across all companies.
LW feature request (low on the importance scale):
It would be nice to be able to refresh the TTS for a post if it has been edited. I was reading this post, and it was a bit confusing to keep track of the audio since it had been edited.
Hmm, my thought was that devs (or at least Anthropic folks) have improved their ability to estimate how much AI is helping us since the release of the first truly agentic model? My feeling is that most top-end people should be better calibrated despite the moving target. Most people in the study had spent less than 50 hours (except for one of the folks who performed well), so I don’t think we cnnuse the study to say much about how things change over the course months or a year of usage and training (unless we do another study I guess).
In terms of the accurate prediction, I’m not recalling what exactly made me believe this, though if you look at the first chart in the METR thread, the confidence intervals of the predicted uplift from the devs is below the 38%. The average thought they were 24% faster at the beginning of the study (so, in fact, he probably underestimated his uplift a bit).
I think there is nuance about the downlift study that would be helpful to highlight:
This is not to say that it’s true that Anthropic employees are getting that high of an uplift, but may make it a bit more believable.
I’ve looked into this as part of my goal of accelerating safety research and automating as much as we can. It was one of the primary things I imagined we would do when we pushed for the non-profit path. We eventually went for-profit because we expected there would not be enough money dispersed to do this, especially in a short timelines world.
I am again considering going non-profit again to pursue this goal, among others. I’ll send you and others a proposal on what I would imagine this looks like in the grander scheme.
I’ve been in AI safety for a while now and feel like I’ve formed a fairly comprehensive view of what would accelerate safety research, reduce power concentration, what it takes to automate research more safely as capabilities increase, and more.
I’ve tried to make this work as part of a for-profit, but it is incredibly hard to tackle the hard parts of the problem in that situation and since that is my intention, I’m again considering if a non-profit will have to do despite the unique difficulties that come with that.
Most AI safety plans include “automating AI safety research.” There’s a need for better clarity of what it looks like.
There are at least four things that get conflated in the term “automated research”:
For AI safety, the crux of many disagreements is whether one believes that:
Ultimately, this seems like a highly important question to clarify, since I believe it is driving many people to be optimistic about AI safety progress, at least to the point that it allows them to keep chugging along the capabilities tech tree. Having clarity on what would convince people otherwise much sooner seems important.
Relevant: https://www.lesswrong.com/posts/88xgGLnLo64AgjGco/where-are-the-ai-safety-replications
I think doing replications is great and it’s one of the areas I think automated research would be helpful soon. I replicated the Subliminal Learning paper on the day of the release because it was fairly easy to grab the paper, docs, codebases, etc to replicate quickly.
Short timelines, slow takeoff vs. Long timelines, fast takeoff
Due to chain-of-thought in the current paradigm seeming like great news for AI safety, some people seem to have the following expectations:
Short timelines: CoT reduces risks, but shorter preparation time increases the odds of catastrophe.
Long timelines: the current paradigm is not enough; therefore, CoT may stop being relevant, which may increase the odds of catastrophe. We have more time to prepare (which is good), but we may get a faster takeoff than the current paradigm makes it seem like. And therefore, discontinuous takeoff may introduce significantly more risks despite longer timelines.
So, perhaps counterintuitively for some, you could have these two groups:
1. Slow (smooth, non-discontinuous) takeoff, low p(doom), takeoff happens in the next couple of years. [People newer to AI safety seem more likely to expect this imo]
Vs.
2. Fast takeoff (discontinuous capability increase w.r.t. time), high p(doom), (actual) takeoff happens in 8-10 years. [seems more common under the MIRI / traditional AI safety researchers cluster]
I’m not saying those are the only two groups, but I think part of it speaks to how some people are feeling about the current state of progress and safety.
As a result, I think it’s pretty important to gain better clarity on whether we expect the current paradigm to scale without fundamental changes, and, if not, to understand what would come after it and how it would change the risks.
That’s not to say we shouldn’t weigh short timelines more highly due to being more immediate, but there are multiple terms to weigh here.
I agree it's true that other forums would engage with even worse norms, but I'm personally happy to keep the bar high and have a high standard for these discussions, regardless of what others do elsewhere. My hope is that we never stop striving for better, especially since, for alignment, the stakes are incredibly higher than most other domains, so we need a higher standard of frankness.
I shared the following as a bio for EAG Bay Area 2024. I'm sharing this here if it reaches someone who wants to chat or collaborate.
Hey! I'm Jacques. I'm an independent technical alignment researcher with a background in physics and experience in government (social innovation, strategic foresight, mental health and energy regulation). Link to Swapcard profile. Twitter/X.
CURRENT WORK
TOPICS TO CHAT ABOUT
POTENTIAL COLLABORATIONS
TYPES OF PEOPLE I'D LIKE TO COLLABORATE WITH