Charbel-Raphaël

Charbel-Raphael Segerie

https://crsegerie.github.io/ 

Living in Paris

Wiki Contributions

Comments

Sorted by

What do you think of my point about Scott Aaronson? Also, since you agree with points 2 and 3, it seems that you also think that the most useful work from last year didn't require advanced physics, so isn't this a contradiction with you disagreing with point 1?

I think I do agree with some points in this post. This failure mode is the same as the one I mentioned about why people are doing interpretability for instance (section Outside view: The proportion of junior researchers doing Interp rather than other technical work is too high), and I do think that this generalizes somewhat to whole field of alignment. But I'm highly skeptical that recruiting a bunch of physicists to work on alignment would be that productive:

  • Empirically, we've already kind of tested this, and it doesn't work.
    • I don't think that what Scott Aaronson produced while at OpenAI had really helped AI Safety: He is exactly doing what is criticized in the post: Streetlight research and using techniques that he was already familiar with from his previous field of research, I don't think the author of the OP would disagree with me. Maybe n=1, but it was one of the most promising shots.
    • Two years ago, I was doing field-building and trying to source talent, primarily selecting based on pure intellect and raw IQ. I've organized the Von Neumann Symposium around the problem of corrigibility, I targeted IMO laureates, and individuals from the best school in France, ENS Ulm, which arguably has the highest concentration of future Nobel laureates in the world. However, pure intelligence doesn't work. In the long term, the individuals who succeeded in the field weren't the valedictorians from France's top school, but rather those who were motivated, had read The Sequences, were EA people, possessed good epistemology, and had a willingness to share their work online (maybe you are going to say that the people I was targeting were too young, but I think my little empirical experience is already much better than the speculation in the OP).
    • My prediction is that if you put a group of skilled physicists in a room, first, it's not even sure they would find that many people motivated in this reference class, and I don't think the few who would be motivated would produce good-quality work.
    • For the ML4Good bootcamps, the scoring system reflects this insight. We use multiple indicators and don't rely solely on pure IQ to select participants, because there is little correlation between pure high IQ and long term quality production.
  • I believe the biggest mistake in the field is trying to solve "Alignment" rather than focusing on reducing catastrophic AI risks. Alignment is a confused paradigm; it's a conflationary alliance term that has sedimented over the years. It's often unclear what people mean when they talk about it: Safety isn't safety without a social model.
    • Think about what has been most productive in reducing AI risks so far? My short list would be:
      • The proposed SB 1047 legislation.
      • The short statement on AI risks
      • Frontier AI Safety Commitments, AI Seoul Summit 2024, to encourage labs to publish their responsible scaling policies.
      • Scary demonstrations to showcase toy models of deception, fake alignment, etc, and to create more scientific consensus, which is very very needed
    • As a result, the field of "Risk Management" is more fundamental for reducing AI risks than "AI Alignment." In my view, the theoretical parts of the alignment field have contributed far less to reducing existential risks than the responsible scaling policies or the draft of the EU AI Act's Code of Practice for General Purpose AI Systems, which is currently not too far from being the state-of-the-art for AI risk management. Obviously, it's still incomplete, but that's the direction that is I think most productive today.
  • Related, The Swiss cheese model of safety is underappreciated in the field. This model has worked across other industries and seems to be what works for the only general intelligence we know: humans. Humans use a mixture of strategies for safety we could imitate for AI safety (see this draft). However, the agent foundations community seems to be completely neglecting this.

I've been testing this with @Épiphanie Gédéon  for a few months now, and it's really, really good for doing more work that's intellectually challenging. In my opinion, the most important sentence in the post is the fact that it doesn't help that much during peak performance moments, but we’re not at our peak that often. And so, it's super important. It’s really a big productivity boost, especially when doing cognitively demanding tasks or things we struggle to "eat the frog". So, I highly recommend it.

But the person involved definitely needs to be pretty intelligent to keep up and to make recommendations that aren’t useless. Sometimes, it can feel more like co-working, there are quite a few different ways it can work, more or less passive/active. But overall, generally speaking, we recommend trying it for at least a few days. 

It took me quite a while to take the plunge because there's a social aspect—this kind of thing isn’t very common in France. It’s not considered a real job. Although, honestly, it should be a real job for intellectual professions, in my opinion. And it’s not an easy job.

I often find myself revisiting this post—it has profoundly shaped my philosophical understanding of numerous concepts. I think the notion of conflationary alliances introduced here is crucial for identifying and disentangling/dissolving many ambiguous terms and resolving philosophical confusion. I think this applies not only to consciousness but also to situational awareness, pain, interpretability, safety, alignment, and intelligence, to name a few.

I referenced this blog post in my own post, My Intellectual Journey to Dis-solve the Hard Problem of Consciousness, during a period when I was plateauing and making no progress in better understanding consciousness. I now believe that much of my confusion has been resolved.

I think the concept of conflationary alliances is almost indispensable for effective conceptual work in AI safety research. For example, it helps clarify distinctions, such as the difference between "consciousness" and "situational awareness." This will become increasingly important as AI systems grow more capable and public discourse becomes more polarized around their morality and conscious status.

Highly recommended for anyone seeking clarity in their thinking!

I don't Tournesol is really mature currently, especially for non french content, and I'm not sure they try to do governance works, that's mainly a technical projet, which is already cool.

Yup, we should create an equivalent of the Nutri-Score for different recommendation AIs. 

"I really don't know how tractable it would be to pressure compagnies" seems weirdly familiar.  We already used the same argument for AGI safety, and we know that governance work is much more tractable than expected.

I'm a bit surprised this post has so little karma and engagement. I would be really interested to hear from people who think this is a complete distraction.

Fair enough.

I think my main problem with this proposal is that under the current paradigm of AIs (GPTs, foundation models),  I don't see how you want to implement ATA, and this isn't really a priority? 

I believe we should not create a Sovereign AI. Developing a goal-directed agent of this kind will always be too dangerous. Instead, we should aim for a scenario similar to CERN, where powerful AI systems are used for research in secure labs, but not deployed in the economy. 

I don't want AIs to takeover. 

Load More