jacquesthibs

I work primarily on AI Alignment. Scroll down to my pinned Shortform for an idea of my current work and who I'd like to collaborate with.

Website: https://jacquesthibodeau.com

Twitter: https://twitter.com/JacquesThibs

GitHub: https://github.com/JayThibs 

LinkedIn: https://www.linkedin.com/in/jacques-thibodeau/ 

Sequences

On Becoming a Great Alignment Researcher (Efficiently)

Wiki Contributions

Comments

Sorted by

I shared the following as a bio for EAG Bay Area 2024. I'm sharing this here if it reaches someone who wants to chat or collaborate.

Hey! I'm Jacques. I'm an independent technical alignment researcher with a background in physics and experience in government (social innovation, strategic foresight, mental health and energy regulation). Link to Swapcard profile. Twitter/X.

CURRENT WORK

  • Collaborating with Quintin Pope on our Supervising AIs Improving AIs agenda (making automated AI science safe and controllable). The current project involves a new method allowing unsupervised model behaviour evaluations. Our agenda.
  • I'm a research lead in the AI Safety Camp for a project on stable reflectivity (testing models for metacognitive capabilities that impact future training/alignment).
  • Accelerating Alignment: augmenting alignment researchers using AI systems. A relevant talk I gave. Relevant survey post.
  • Other research that currently interests me: multi-polar AI worlds (and how that impacts post-deployment model behaviour), understanding-based interpretability, improving evals, designing safer training setups, interpretable architectures, and limits of current approaches (what would a new paradigm that addresses these limitations look like?).
  • Used to focus more on model editing, rethinking interpretability, causal scrubbing, etc.

TOPICS TO CHAT ABOUT

  • How do you expect AGI/ASI to actually develop (so we can align our research accordingly)? Will scale plateau? I'd like to get feedback on some of my thoughts on this.
  • How can we connect the dots between different approaches? For example, connecting the dots between Influence Functions, Evaluations, Probes (detecting truthful direction), Function/Task Vectors, and Representation Engineering to see if they can work together to give us a better picture than the sum of their parts.
  • Debate over which agenda actually contributes to solving the core AI x-risk problems.
  • What if the pendulum swings in the other direction, and we never get the benefits of safe AGI? Is open source really as bad as people make it out to be?
  • How can we make something like the d/acc vision (by Vitalik Buterin) happen?
  • How can we design a system that leverages AI to speed up progress on alignment? What would you value the most?
  • What kinds of orgs are missing in the space?

POTENTIAL COLLABORATIONS

  • Examples of projects I'd be interested in: extending either the Weak-to-Strong Generalization paper or the Sleeper Agents paper, understanding the impacts of synthetic data on LLM training, working on ELK-like research for LLMs, experiments on influence functions (studying the base model and its SFT, RLHF, iterative training counterparts; I heard that Anthropic is releasing code for this "soon") or studying the interpolation/extrapolation distinction in LLMs.
  • I’m also interested in talking to grantmakers for feedback on some projects I’d like to get funding for.
  • I'm slowly working on a guide for practical research productivity for alignment researchers to tackle low-hanging fruits that can quickly improve productivity in the field. I'd like feedback from people with solid track records and productivity coaches.

TYPES OF PEOPLE I'D LIKE TO COLLABORATE WITH

  • Strong math background, can understand Influence Functions enough to extend the work.
  • Strong machine learning engineering background. Can run ML experiments and fine-tuning runs with ease. Can effectively create data pipelines.
  • Strong application development background. I have various project ideas that could speed up alignment researchers; I'd be able to execute them much faster if I had someone to help me build my ideas fast. 

I’d be curious to know if there’s variability in the “hours worked per week” given that people might work more hours during a short program vs a longer-term job (to keep things sustainable).

Imagine there was an AI-suggestion tool that could predict reasons why you agree/disagree-voted on a comment, and you just had to click one of the generated answers to provide a bit of clarity at a low cost.

Completely agree. I remember a big shift in my performance when I went from "I'm just using programming so that I can eventually build a startup, where I'll eventually code much less" to "I am a programmer, and I am trying to become exceptional at it." The shift in mindset was super helpful.

This is one of the reasons I think 'independent' research is valuable, even if it isn't immediately obvious from a research output (papers, for example) standpoint.

That said, I've definitely had the thought, "I should niche down into a specific area where there is already a bunch of infrastructure I can leverage and churn out papers with many collaborators because I expect to be in a more stable funding situation as an independent researcher. It would also make it much easier to pivot into a role at an organization if I want to or necessary. It would definitely be a much more stable situation for me."(And I also agree that specialization is often underrated.)

Ultimately, I decided not to do this because I felt like there were already enough people in alignment/governance who would take the above option due to financial and social incentives and published directions seeming more promising. However, since this makes me produce less output, I hope this is something grantmakers keep in consideration for my future grant applications.

I think it's up to you and how you write. English isn't my first language, so I've found it useful. I also don't accept like 50% of the suggestions. But yeah, looking at the plan now, I think I could get off the Pro plan and see if I'm okay not paying for it.

It's definitely not the thing I care about most on the list.

There are multiple courses, though it's fairly new. They have one on full-stack development (while using Cursor and other things) and Replit Agents. I've been following it to learn fast web development, and I think it's a good starting point for getting an overview of building an actual product on a website you can eventually sell or get people to use.

Somewhat relevant blog post by @NunoSempere: https://nunosempere.com/blog/2024/09/10/chance-your-startup-will-succeed/

As an aside, I have considered that samplers were underinvestigated and that they would lead to some capability boosts. It's also one of the things I'd consider testing out to improve LLMs for automated/augmented alignment research.

The importance of Entropy

Given that there's been a lot of talk about using entropy during sampling of LLMs lately (related GitHub), I figured I'd share a short post I wrote for my website before it became a thing:

Imagine you're building a sandcastle on the beach. As you carefully shape the sand, you're creating order from chaos - this is low entropy. But leave that sandcastle for a while, and waves, wind, and footsteps will eventually reduce it back to a flat, featureless beach - that's high entropy.

Entropy is nature's tendency to move from order to disorder, from concentration to dispersion. It's why hot coffee cools down, why ice cubes melt in your drink, and why it's easier to make a mess than to clean one up. In the grand scheme of things, entropy is the universe's way of spreading energy out evenly, always moving towards a state of balance or equilibrium.

Related to entropy, the Earth radiates back approximately the same energy the Sun radiates towards it. The Sun radiates fewer photons at a higher energy wavelength (mostly visible and near-infrared) than the Earth, which radiates way more photons, but each photon has much lower energy (mostly infrared).

If the Earth didn't radiate back the same energy, the Earth would heat up continuously, which would obviously be unstable.

The cool thing is that Entropy (the tendency of energy to spread out, ex: the universe expanding or fart spreading across the room) is possibly what made life happen, and it was necessary to have a constant stream of low-entropy energy (high-energy photon packets) coming from the Sun.

If you have a constant stream of low-entropy energy from the Sun, it may favour structures that dissipate energy, thereby increasing Entropy (keeping the energy constant while spreading it). Entropy is an important ingredient in the emergence of life, how we ended up going from random clumps of atoms to plants to many complex organisms on Earth.

Dissipative structures: Living organisms are complex systems that maintain their organization by dissipating energy and matter. They take in low-entropy energy (sunlight/food) and release higher-entropy energy (heat), increasing the universe's entropy while maintaining their own order.

Life isn't just an accident but potentially an inevitable consequence of thermodynamics. Organisms can be thought of as highly efficient entropy producers, accelerating the universe's march toward maximum entropy while creating local pockets of increased order and complexity.

The emergence of life might be a natural result of physical laws, occurring wherever conditions allow for the formation of systems that can effectively dissipate energy.

One thing I'd like to ponder more about: if entropy is a necessary component for the emergence of life, what could it mean for AI? Due to entropy, the world has been biased towards increasingly complex organisms. How does that trend impact the future of the universe? Will we see an unprecedented acceleration of the universe's march toward maximum entropy?

Load More