I think this is an important subject and I agree with much of this post. However, I think the framing/perspective might be subtly but importantly wrong-or-confused.
To illustrate:
How much of the issue here is about the very singular nature of the One dominant project, vs centralization more generally into a small number of projects?
Seems to me that centralization of power per se is not the problem.
I think the problem is something more like
we want to give as much power as possible to "good" processes, e.g. a process that robustly pursues humanity's CEV[1]; and we want to minimize the power held by "evil" processes
but: a large fraction of humans are evil, or become evil once prosocial pressures are removed; and we do not know how to reliably construct "good" AIs
and also: we (humans) are confused and in disagreement about what "good" even means
and even if it were clear what a "good goal" is, we have no reliable way of ensuring that an AI or a human institution is robustly pursuing such a goal.
I agree that (given the above conditions) concentrating power into the hands of a few humans or AIs would on expectation be (very) bad. (OTOH, a decentralized race is also very bad.) But concentration-vs-decentralization of power is just one relevant consideration among many.
Thus: if the quoted question has an implicit assumption like "the main variable to tweak is distribution-of-power", then I think it is trying to carve the problem at unnatural joints, or making a false implicit assumption that might lead to ignoring multiple other important variables.
(And less centralization of power has serious dangers of its own. See e.g. Wei Dai's comment.)
I think a more productive frame might be something like "how do we construct incentives, oversight, distribution of power, and other mechanisms, such that Ring Projects remain robustly aligned to 'the greater good'?"
And maybe also "how do we become less confused about what 'the greater good' even is, in a way that is practically applicable to aligning Ring Projects?"
If such a thing is even possible. ↩︎
Upvoted and disagreed. [1]
One thing in particular that stands out to me: The whole framing seems useless unless Premise 1 is modified to include a condition like
[...] we can select a curriculum and reinforcement signal which [...] and which makes the model highly "useful/capable".
Otherwise, Premise 1 is trivially true: We could (e.g.) set all the model's weights to 0.0; thereby guaranteeing the non-entrainment of any ("bad") circuits.
I'm curious: what do you think would be a good (...useful?) operationalization of "useful/capable"?
Another issue: K and epsilon might need to be unrealistically small: Once the model starts modifying itself (or constructing successor models) (and possibly earlier), a single strategically-placed sign-flip in the model's outputs might cause catastrophe. [2]
I think writing one's thoughts/intuitions out like this is valuable --- for sharing frames/ideas, getting feedback, etc. Thus: thanks for writing it up. Separately, I think the presented frame/case is probably confused, and almost useless (at best). ↩︎
Although that might require the control structures (be they Shards or a utility function or w/e) of the model to be highly "localized/concentrated" in some sense. (OTOH, that seems likely to at least eventually be the case?) ↩︎
In Fig 1, is the vertical axis P(world) ?
Possibly a nitpick, but:
The development and deployment of AGI, or similarly advanced systems, could constitute a transformation rivaling those of the agricultural and industrial revolutions.
seems like a very strong understatement. Maybe replace "rivaling" with e.g. "(vastly) exceeding"?
Referring to the quote-picture from the Nvidia GTC keynote talk: I searched the talk's transcript, and could not find anything like the quote.
Could someone point out time-stamps of where Huang says (or implies) anything like the quote? Or is the quote entirely made up?
That clarifies a bunch of thing. Thanks!
I'm not sure I understand what the post's central claim/conclusion is. I'm curious to understand it better. To focus on the Summary:
So overall, evolution is the source of ethics,
Do you mean: Evolution is the process that produced humans, and strongly influenced humans' ethics? Or are you claiming that (humans') evolution-induced ethics are what any reasonable agent ought to adhere to? Or something else?
and sapient evolved agents inherently have a dramatically different ethical status than any well-designed created agents [...]
...according to some hypothetical evolved agents' ethical framework, under the assumption that those evolved agents managed to construct the created agents in the right ways (to not want moral patienthood etc.)? Or was the quoted sentence making some stronger claim?
evolution and evolved beings having a special role in Ethics is not just entirely justified, but inevitable
Is that sentence saying that
or is it saying something like
I roughly agree with the first version; I strongly disagree with the second: I agree that {what oughts humans have} is (partially) explained by Evolutionary theory. I don't see how that crosses the is-ought gap. If you're saying that that somehow does cross the is-ought gap, could you explain why/how?
I.e., similar to how one might say "amino acids having a special role in Biochemistry is not just entirely justified, but inevitable"? ↩︎
I wonder how much work it'd take to implement a system that incrementally generates a graph of the entire conversation. (Vertices would be sub-topics, represented as e.g. a thumbnail image + a short text summary.) Would require the GPT to be able to (i.a.) understand the logical content of the discussion, and detect when a topic is revisited, etc. Could be useful for improving clarity/productivity of conversations.
One of the main questions on which I'd like to understand others' views is something like: Conditional on sentient/conscious humans[1] continuing to exist in an x-risk scenario[2], with what probability do you think they will be in an inescapable dystopia[3]?
(My own current guess is that dystopia is very likely.)
or non-human minds, other than the machines/Minds that are in control ↩︎
as defined by Bostrom, i.e. "the permanent and drastic destruction of [humanity's] potential for desirable future development" ↩︎
Versus e.g. just limited to a small disempowered population, but living in pleasant conditions? Or a large population living in unpleasant conditions, but where everyone at least has the option of suicide? ↩︎
We could maybe weaken this requirement? Perhaps it would suffice to show/argue that it's feasible[1] to build any kind of "acute risk period -ending AI"[2] that is not a "subject of AGI-doom arguments"?
I'd be (very) curious to see such arguments. [3]
within time constraints, before anyone else builds a "subject of AGI-doom arguments" ↩︎
or, "AIs that implement humanity's CEV" ↩︎
If I became convinced that it's feasible to build such a "pivotal AI" that is not "subject to AGI doom arguments", I think that would shift a bunch of my probability mass from "we die due to unaligned AI" to "we die-or-worse due to misaligned humans controlling ASI" and "utopia". ↩︎