jacquesthibs

I work primarily on AI Alignment. Scroll down to my pinned Shortform for an idea of my current work and who I'd like to collaborate with.

Website: https://jacquesthibodeau.com

Twitter: https://twitter.com/JacquesThibs

GitHub: https://github.com/JayThibs 

LinkedIn: https://www.linkedin.com/in/jacques-thibodeau/ 

Sequences

On Becoming a Great Alignment Researcher (Efficiently)

Wiki Contributions

Comments

Sorted by

I shared the following as a bio for EAG Bay Area 2024. I'm sharing this here if it reaches someone who wants to chat or collaborate.

Hey! I'm Jacques. I'm an independent technical alignment researcher with a background in physics and experience in government (social innovation, strategic foresight, mental health and energy regulation). Link to Swapcard profile. Twitter/X.

CURRENT WORK

  • Collaborating with Quintin Pope on our Supervising AIs Improving AIs agenda (making automated AI science safe and controllable). The current project involves a new method allowing unsupervised model behaviour evaluations. Our agenda.
  • I'm a research lead in the AI Safety Camp for a project on stable reflectivity (testing models for metacognitive capabilities that impact future training/alignment).
  • Accelerating Alignment: augmenting alignment researchers using AI systems. A relevant talk I gave. Relevant survey post.
  • Other research that currently interests me: multi-polar AI worlds (and how that impacts post-deployment model behaviour), understanding-based interpretability, improving evals, designing safer training setups, interpretable architectures, and limits of current approaches (what would a new paradigm that addresses these limitations look like?).
  • Used to focus more on model editing, rethinking interpretability, causal scrubbing, etc.

TOPICS TO CHAT ABOUT

  • How do you expect AGI/ASI to actually develop (so we can align our research accordingly)? Will scale plateau? I'd like to get feedback on some of my thoughts on this.
  • How can we connect the dots between different approaches? For example, connecting the dots between Influence Functions, Evaluations, Probes (detecting truthful direction), Function/Task Vectors, and Representation Engineering to see if they can work together to give us a better picture than the sum of their parts.
  • Debate over which agenda actually contributes to solving the core AI x-risk problems.
  • What if the pendulum swings in the other direction, and we never get the benefits of safe AGI? Is open source really as bad as people make it out to be?
  • How can we make something like the d/acc vision (by Vitalik Buterin) happen?
  • How can we design a system that leverages AI to speed up progress on alignment? What would you value the most?
  • What kinds of orgs are missing in the space?

POTENTIAL COLLABORATIONS

  • Examples of projects I'd be interested in: extending either the Weak-to-Strong Generalization paper or the Sleeper Agents paper, understanding the impacts of synthetic data on LLM training, working on ELK-like research for LLMs, experiments on influence functions (studying the base model and its SFT, RLHF, iterative training counterparts; I heard that Anthropic is releasing code for this "soon") or studying the interpolation/extrapolation distinction in LLMs.
  • I’m also interested in talking to grantmakers for feedback on some projects I’d like to get funding for.
  • I'm slowly working on a guide for practical research productivity for alignment researchers to tackle low-hanging fruits that can quickly improve productivity in the field. I'd like feedback from people with solid track records and productivity coaches.

TYPES OF PEOPLE I'D LIKE TO COLLABORATE WITH

  • Strong math background, can understand Influence Functions enough to extend the work.
  • Strong machine learning engineering background. Can run ML experiments and fine-tuning runs with ease. Can effectively create data pipelines.
  • Strong application development background. I have various project ideas that could speed up alignment researchers; I'd be able to execute them much faster if I had someone to help me build my ideas fast. 

I'm currently in the Catalyze Impact AI safety incubator program. I'm working on creating infrastructure for automating AI safety research. This startup is attempting to fill a gap in the alignment ecosystem and looking to build with the expectation of under 3 years left to automated AI R&D. This is my short timelines plan.

I'm looking to talk (for feedback) to anyone interested in the following:

  • AI control
  • Automating math to tackle problems as described in Davidad's Safeguarded AI programme.
  • High-assurance safety cases
  • How to robustify society in a post-AGI world
  • Leverage large amounts of inference-time compute to make progress on alignment research
  • Short timelines
  • Profitability while still reducing overall x-risk
  • Are someone with an entrepreneurial spirit and can spin out traditional business within the org to fund the rest of the work (thereby reducing investor pressure)

If you're interested in chatting or giving feedback, please DM me!

Are you or someone you know:

1) great at building (software) companies
2) care deeply about AI safety
3) open to talk about an opportunity to work together on something

If so, please DM with your background. If someone comes to mind, also DM. I am looking thinking of a way to build companies in a way to fund AI safety work.

Agreed, but I will find a way.

Hey Ben and Jesse!

This comment is more of a PSA:

I am building a startup focused on making this kind of thing exceptionally easy for AI safety researchers. I’ve been working as an AI safety researcher for a few years. I’ve been building an initial prototype and I am in the process of integrating it easily into AI research workflows. So, with respect to this post, I’ve been actively working towards building a prototype for the “AI research fleets”.

I am actively looking for a CTO I can build with to +10x alignment research in the next 2 years. I’m looking for someone absolutely cracked and it’s fine if they already have a job (I’ll give my pitch and let them decide).

If that’s you or you know anyone who could fill that role (or who I could talk to that might know), then please let me know!

For alignment researchers or people in AI safety research orgs: hit me up if you want to be pinged for beta testing when things are ready.

For orgs, I’d be happy to work with you to setup automations or give a masterclass on the latest AI tools/automation workflows and maybe provide a custom report (with a video overview) each month so that you can focus on research rather than trying new tools that might not be relevant to your org.

Additional context:

“When we say “automating alignment research,” we mean a mix of Sakana AI’s AI scientist (specialized for alignment), Transluce’s work on using AI agents for alignment research, test-time compute scaling, and research into using LLMs for coming up with novel AI safety ideas. This kind of work includes empirical alignment (interpretability, unlearning, evals) and conceptual alignment research (agent foundations).

We believe that it is now the right time to take on this project and build this startup because we are nearing the point where AIs could automate parts of research and may be able to do so sooner with the right infrastructure, data, etc.

We intend to study how our organization’s work can integrate with the Safeguarded AI thesis by Davidad.”

I’m currently in London for the month as part of the Catalyze Impact programme.

If interested, send me a message on LessWrong or X or email (thibo.jacques @ gmail dot com).

It has basically significantly accelerated my ability to build fully functional websites very quickly. To the point where it was basically a phase transition between me building my org’s website and not building it (waiting for someone with web dev experience to do it for me).

I started my website by leveraging the free codebase template he provides on his github and covers in the course.

I mean that it's a trade secret for what I'm personally building, and I would also rather people don't just use it freely for advancing frontier capabilities research.

Is this because it would reveal private/trade-secret information, or is this for another reason?

Yes (all of the above)

Load More