Pierre Peigné — LessWrong

LESSWRONG
LW

Workshop Report: Why current benchmarks approaches are not sufficient for safety?

I’m sharing the report from the workshop held during the AI, Data, Robotics Forum in Eindhoven, a European event bringing together policymakers, industry representatives, and academics to discuss the challenges and opportunities in AI, data, and robotics. This report provides a snapshot of the current state of discussions on benchmarking within these spheres.

Speakers: Peter Mattson, Pierre Peigné and Tom David

Observations

Safety and robustness are essential for AI systems to transition from innovative concepts and research to reliable products and services that deliver real value. Without these qualities, the potential benefits of AI may be overshadowed by failures and safety concerns, hindering adoption and trust in the technology.
AI research and development have transitioned from traditional

... (read 694 more words →)

The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs

Quentin FEUILLADE--MONTIXI

Quentin FEUILLADE--MONTIXI, Pierre Peigné

This post is part of a sequence on LLM Psychology.

@Pierre Peigné wrote the details section in argument 3 and the other weird phenomenon. The rest is written in the voice of @Quentin FEUILLADE--MONTIXI

Intro

Before diving into what LLM psychology is, it is crucial to clarify the nature of the subject we are studying. In this post, I’ll challenge the commonly debated stochastic parrot hypothesis for state-of-the-art large language models (≈GPT-4), and in the next post, I’ll shed light on what LLM psychology actually is.

The stochastic parrot hypothesis suggests that LLMs, despite their remarkable capabilities, don't truly comprehend language. They are like mere parrots, replicating human speech patterns without truly grasping the essence of the... (read 1748 more words →)

Taking features out of superposition with sparse autoencoders more quickly with informed initialization

Pierre Peigné

This work was produced as part of the SERI MATS 3.0 Cohort under the supervision of Lee Sharkey.

Many thanks to Lee Sharkey for his advice and suggestions.

TL;DR: it is possible to speed up the extraction of superposed features using sparse autoencoders by using informed initialization of the sparse dictionary. Evaluated on toy data, the informed initialization scheme results are the following:

Immediate MMCS ~ 0.65 (MMCS < 30% at start with the original orthogonal initialization)
Up to ~10% speedup to reach 0.99 MMCS of the superposed feature with some initialization methods relying on collecting rare features.

The main ideas:

The data contains the (sparsely activating) true features, which can be used to initialize the dictionary
However, the rare features are

... (read 1258 more words →)

Clarifying mesa-optimization

Marius Hobbhahn

Marius Hobbhahn, Pierre Peigné

Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort.

Thanks to Jérémy Scheurer, Nicholas Dupuis and Evan Hubinger for feedback and discussion

When people talk about mesa-optimization, they sometimes say things like “we’re searching for the optimizer module” or “we’re doing interpretability to find out whether the network can do internal search”. An uncharitable interpretation of these claims is that the researchers expect the network to have something like an “optimization module” or “internal search algorithm” that is clearly different and distinguishable from the rest of the network (to be clear, we think it is fine to start with probably wrong mechanistic models).

In this post, we want to argue why... (read 2722 more words →)

Pierre Peigné's Shortform

Pierre Peigné

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.