Sequences

bgold's Shortform

Ben Goldhaber2d377

I think more leaders of orgs should be trying to shape their organizations incentives and cultures around the challenges of "crunch time". Examples of this include:

What does pay look like in a world where cognitive labor is automated in the next 5 to 15 years? Are there incentive structures (impact equity, actual equity, bespoke deals for specific scenarios) that can help team members survive, thrive, and stay on target?
What cultural norms should the team have to AI assisted work? On the one hand it seems necessary to accelerate safety progress, on the other I expect many applications are in fact trojan horses designed to automate people out of jobs (looking at you MSFT rewind) - are there credible deals to be made that can provide trust?
Does the organization expect to be rapidly changing to new events in AI - and if so how will sensemaking happen - or does it expect to make it's high conviction bet early on and stay the course through distractions? Do teammembers know that?

I have more questions than answers, but the background level of stress and disorientation for employees and managers will be rising, especially in AI Safety orgs, and starting to come up w/ contextually true answers (I doubt there's a universal answer) will be important.

Davidad's Bold Plan for Alignment: An In-Depth Explanation

Ben Goldhaber16d30

This post was one of my first introductions to davidad's agenda and convinced me that while yes it was crazy, it was maybe not impossible, and it led me to working on initiatives like the multi-author manifesto you mentioned.

Thank you for writing it!

Building AI Research Fleets

1

Ben Goldhaber16d30

I would be very excited to see experiments with ABMs where the agents model fleets of research agents and tools. I expect in the near future we can build pipelines where the current fleet configuration - which should be defined in something like the terraform configuration language - automatically generates an ABM which is used for evaluation, control, and coordination experiments.

bgold's Shortform

Ben Goldhaber3mo120

Cumulative Y2K readiness spending was approximately $100 billion, or about $365 per U.S. resident.
Y2K spending started as early 1995, and appears t peaked in 1998 and 1999 at about $30 billion per year.

https://www.commerce.gov/sites/default/files/migrated/reports/y2k_1.pdf

Ah gotcha, yes lets do my $1k against your $10k.

Given your rationale I'm onboard for 3 or more consistent physical instances of the lock have been manufactured.

Lets 'lock' it in.

@Raemon works for me; and I agree with the other conditions.

This seems mostly good to me, thank you for the proposals (and sorry for my delayed response, this slipped my mind).

OR less than three consistent physical instances have been manufactured. (e.g. a total of three including prototypes or other designs doesn't count)

Why this condition? It doesn't seem relevant to the core contention, and if someone prototyped a single lock using a GS AI approach but didn't figure out how to manufacture it at scale, I'd still consider it to have been an important experiment.

Besides that, I'd agree to the above conditions!

Ben Goldhaber6mo50

(8) won't be attempted, or will fail at some combination of design, manufacture, or just-being-pickable. This is a great proposal and a beautifully compact crux for the overall approach.

I agree with you that this feels like a 'compact crux' for many parts of the agenda. I'd like to take your bet, let me reflect if there's any additional operationalizations or conditioning.

However, I believe that the path there is to extend and complement current techniques, including empirical and experimental approaches alongside formal verification - whatever actually works in practice.

FWIW in Towards Guaranteed Safe AI I we endorse this: "Moreover, while we have argued for the need for verifiable quantitative safety guarantees, it is important to note that GS AI may not be the only route to achieving such guarantees. An alternative approach might be to extract interpretable
policies from black-box algorithms via automated mechanistic interpretability... it is ultimately an empirical question whether it is easier to create interpretable world models or interpretable policies in a given domain of operation."