This post was written under Evan Hubinger’s direct guidance and mentorship, as a part of the Stanford Existential Risks Institute ML Alignment Theory Scholars (MATS) program. Where not indicated, ideas are summaries of the documents mentioned, not original contributions. I am thankful for the encouraging approach of the organizers of the program, especially Oliver Zhang.
TL;DR
In the post “The case for aligning narrowly superhuman models”, Ajeya Cotra advocates for performing technical alignment work on currently available large models using human feedback, in domains where (some) humans are already outperformed (in some aspects). The direct benefit of this line of work could be testing out conceptual work, highlighting previously unknown challenges and eliciting new scalable,... (read 2301 more words →)
It is a bit ambiguous from your reply whether you mean distributed AI deployment, or distributed training. Agree that distributed deployment seems very hard to police, once training took place, implying also that there is some large amount of compute available somewhere.
About training, I guess the hope for enforcement would be ability to constrain (or at least monitor) total compute available and hardware manufacturing.
Even if you do training in a distributed fashion, you would need the same amount of chips. (Probably more by some multiplier, to pay for increased latency? And if you can't distribute it to an arbitrary extent, you still need large datacenters that are hard to hide.)
Disguising hardware production... (read more)