luke_emberson — LessWrong

LESSWRONG
LW

Replying toIntroducing the Epoch Capabilities Index (ECI)

Introducing the Epoch Capabilities Index (ECI)

I'm interested in the question of how you plan to re-base the index over time.

Currently, we've chosen to scale things such that Claude 3.5 Sonnet gets 130 and GPT-5 (medium) gets 150. As we add new benchmarks, the rough plan is to try to maintain that. We're also planning on adding some way for users to define their own subset of benchmarks, in case you disagree with our choices. That should let you see how things would look under various hypothetical "rebasings".

The index excludes models from before 2023, which is understandable, since they couldn't use benchmark released after that date.

To be clear, that's not why we didn't include older models. There is... (read more)

Replying toIntroducing the Epoch Capabilities Index (ECI)

luke_emberson4mo

Introducing the Epoch Capabilities Index (ECI)

Another consideration is that we tend to prioritize evaluations on frontier models, so coverage for smaller models is spottier.

Replying toIntroducing the Epoch Capabilities Index (ECI)

luke_emberson4mo

Introducing the Epoch Capabilities Index (ECI)

Here's one framing: getting a higher ECI score requires making progress on (multiple) benchmarks that other models find difficult. Making progress on METR instead involves being more consistently successful at longer coding tasks.

So ECI tracks general capabilities on a "difficulty-weighted" scale, and seems better suited to understanding the pace of progress in general, but it's also an abstract number. There's currently no mapping like "ECI of X == AGI", or a human ECI baseline. On the other hand, METR's benchmark has a nice concrete interpretation, but is more narrow.

We're working on mapping ECI to more interpretable metrics (in fact, METR Time Horizons is one candidate), as well as allowing users to choose a subset of underlying benchmarks if they would prefer to weight ECI more heavily towards particular skills like coding.

Also note that we don't currently include METR's benchmarks as inputs to ECI, but we may add them in future iterations.

Introducing the Epoch Capabilities Index (ECI)

luke_emberson

luke_emberson, YafahEdelman, Jsevillamol

4mo

We at Epoch AI have recently released a new composite AI capability index called the Epoch Capabilities Index (ECI), based on nearly 40 underlying benchmarks.

Some key features...

Saturation-proof: ECI "stitches" benchmarks together, to enable comparisons even as individual benchmarks become saturated.
Global comparisons: Models can be compared, even if they were never evaluated on the same benchmarks.
Difficulty-based task weighting: ECI uses a simple statistical model (similar to those used in Item Response Theory) under which models deemed more capable if they score well on difficult benchmarks, and benchmarks are deemed more difficult if capable models are unable to score highly on them.

ECI will allow us to track trends in capabilities over longer spans of... (read more)

Replying toGradual Disempowerment: Systemic Existential Risks from Incremental AI Development

luke_emberson1y

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Furthermore, without unprecedented changes in redistribution, declining labor share also translates into a structural decline in household consumption power, as humans lose their primary means of earning the income needed to participate in the economy as consumers.

This holds only if the labor share of income shrinks faster than purchasing power grows. Overall, I still think the misaligned economy argument goes through if household consumption power grows in absolute terms but "human preference aligned dollars" shrinks as a fraction of total dollars spent.

Replying toThe Best Tacit Knowledge Videos on Every Subject

luke_emberson2y

The Best Tacit Knowledge Videos on Every Subject

Domain: Piano

Link: Seymour Bernstein Teaches Piano https://youtu.be/pRLBBJLX-dQ?si=-6EIvGDRyw0aJ0Sq

Person: Seymour Bernstein

Background: Pianist and composer, performed with the Chicago Symphony Orchestra, Adjunct Associate Professor of Music and Music Education at New York University.

Why: Tonebase (a paid music learning service) recorded a number of free to watch conversations with Bernstein while he plays through or teaches a piece. Bernstein is about 90 years old at the time of recording and shares an incredible amount of tacit knowledge, especially about body mechanics when playing piano.

Replying toBest arguments against instrumental convergence?

luke_emberson3y

Best arguments against instrumental convergence?

Re: specific claims to falsify, I generally buy the argument.

If I had to pick out specific aspects which seem weaker, I think they would mostly be related to our confusion around agent foundations. It isn't trivially obvious to me that the way we describe "intelligence" or "goals" within the instrumental convergence argument is a good match for the way current systems operate (though it seems close enough, and we shouldn't expect to be wrong in a way that makes the situation better).

Replying toBest arguments against instrumental convergence?

luke_emberson3y

Best arguments against instrumental convergence?

I would agree that instrumental convergence is probably not a necessary component of AI x-risk, so you're correct that "crux" is a bit of a misnomer.

However, in my experience it is one of the primary arguments people rely on when explaining their concerns to others. The correlation between credence in instrumental convergence and AI x-risk concern seems very high. IMO it is also one of the most concerning legs of the overall argument.

If somebody made a compelling case that we should not expect instrumental convergence by default in the current ML paradigm, I think the overall argument for x-risk would have to look fairly different from the one that is usually put forward.

Best arguments against instrumental convergence?

luke_emberson

When people debating AI x-risk on Twitter talk past each other, my impression is that a significant crux is whether or not the individual buys the instrumental convergence argument.

I wouldn't be surprised if the supermajority of people who don't buy the idea simply haven't engaged with it enough, and I think it is common to have a negative gut reaction to high levels of confidence about something so seemingly far reaching. That said, I'm curious if there are any strong arguments against it? Looking for something stronger than "that's a big claim, and I don't see any empirical proof."