by LWLW
1 min read

1

This is a special post for quick takes by LWLW. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
2 comments, sorted by Click to highlight new comments since:
[-]LWLW237

Making the (tenuous) assumption that humans remain in control of AGI, won't it just be an absolute shitshow of attempted power grabs over who gets to tell the AGI what to do? For example, supposing OpenAI is the first to AGI, is it really plausible that Sam Altman will be the one actually in charge when there will have been multiple researchers interacting with the model much earlier and much more frequently? I have a hard time believing every researcher will sit by and watch Sam Altman become more powerful than anyone ever dreamed of when there's a chance they're a prompt away from having that power for themselves.

You're assuming that:
- There is a single AGI instance running.
- There will be a single person telling that AGI what to do
- The AGI's obedience to this person will be total.

I can see these assumptions holding approximately true if we get really really good at corrigibility and if at the same time running inference on some discontinuously-more-capable future model is absurdly expensive. I don't find that scenario very likely, though.