User Comment Replies

Announcing the Vitalik Buterin Fellowships in AI Existential Safety!

Are astronomical suffering risks (s-risk) considered a subset of existential risks (x-risk) because they "drastically curtail humanity’s potential"? Or is this concern not taken into account for this research program?

5DanielFilan4y

I suppose that types of s-risks that did drastically curtail humanity's potential would count, but s-risks that don't have that issue (e.g. humanity decides to suffer massively, but still has the potential to do lots of other things) would not.

reverendfoom4y30

I appreciate the response and stand corrected.

The point about it being an iterated prisoner's dilemma is a good one, and I would rather there be more such ACX instances where he shares even more of his thinking due to our cooperative/trustworthy behavior, than this to be the last one or have the next ones be filtered PR-speak.

A small number of people in the alignment community repeatedly getting access to better information and being able to act on it beats the value of this one single post staying open to the world. And even in the case of "the cat being out of the bag," hiding/removing the post would probably do good as a gesture of cooperation.

reverendfoom4y-10

Another point about "defection" is which action is a defection with respect to whom.

Sam Altman is the leader of an organization with a real chance of bringing about the literal end of the world, and I find any and all information about his thoughts and his organization to be of the highest interest for the rest of humanity.

Not disclosing whatever such information ones comes into contact with, except in case of speeding up potentially even-less-alignment-focused competitors, is a defection against the rest of us.

If this were an off-the-record meeting with a... (read more)

[This comment is no longer endorsed by its author]Reply

Lukas Finnveden4y140

The point of the defection/cooperation thing isn't just that cooperation is a kindness to Sam Altman personally, which can be overridden by the greater good. The point is that generally cooperative behavior, and generally high amounts of trust, can make everyone better off. If it was true as you said that:

he's a head state in control of WMDs and should be (and expect to be) treated as such

and as a consequence, he e.g. expected someone to record him during the Q&A, then he would presumably not have done the Q&A in the first place, or would have sha... (read more)

reverendfoom4y90

Did he really speak that little about AI Alignment/Safety? Does anyone have additional recollections on this topic?

The only relevant parts so far seem to be these two:

Behavioral cloning probably much safer than evolving a bunch of agents. We can tell GPT to be empathic.

And:

Chat access for alignment helpers might happen.

Both of which are very concerning.

"We can tell GPT to be empathetic" assumes it can be aligned in the first place so you "can tell" it what to do, and "be empathetic" is a very vague description of what a good utility function would ... (read more)

Lukas Finnveden4y110

As a general point, these notes should not be used to infer anything about what Sam Altman thought was important enough to talk a lot about, or what his general tone/attitude was. This is because

The notes are filtered through what the note-takers thought was important. There's a lot of stuff that's missing.
What Sam spoke about was mostly a function of what he was asked about (it was a Q&A after all). If you were there live you could maybe get some idea of how he was inclined to interpret questions, what he said in response to more open questions, etc.

reverendfoom4y10

Very interested in this, especially looking out for how to balance or resolve trade-offs between high inner coordination (people agree fast and completely on actions and/or beliefs) and high "outer" coordination (with reality, i.e. converging fast and strongly on the right things), aka how to avoid echo-chambers/groupthink without devolving into bickering and splintering into factions.

LESSWRONG
LW

All of reverendfoom's Comments + Replies