"They will find bugs! Maybe stack virtual boxes with hard limits" - Why is bug-finding an issue, here? Is your scheme aimed at producing agents that will not want to escape, or agents that we'd have to contain?
The point is to help friendliness emerge naturally. If a malevolent individual agent happens to grow really fast before friendly powers are established, that could be bad.
Some of them will like it there, some will want change/escape, which can be sorted out once Earth is much safer. Containment is for our safety while friendliness is being esta...
It's way too late for the kind of top-down capabilities regulation Yudkowsky and Bostrom fantasized about; Earth just doesn't have the global infrastructure. I see no benefit to public alarm--EA already has plenty of funding.
We achieve marginal impact by figuring out concrete prosaic plans for friendly AI and doing outreach to leading AI labs/researchers about them. Make the plans obviously good ideas and they will probably be persuasive. Push for common-knowledge windfall agreements so that upside is shared and race dynamics are minimized.
It seems like time to start focusing resources on a portfolio of serious prosaic alignment approaches, as well as effective interdisciplinary management. In my inside view, the highest-marginal-impact interventions involve making multiple different things go right simultaneously for the first AGIs, which is not trivial, and the stakes are astronomical.
Little clear progress has been made on provable alignment after over a decade of trying. My inside view is that it got privileged attention because the first people to take the problem seriously h...
Maybe that's why abstract approaches to real-world alignment seem so intractable.
If real alignment is necessarily messy, concrete, and changing, then abstract formality just wasn't the right problem framing to begin with.
Take with grain of salt but maybe 119m?
Medium post from 2019 says "Tesla’s version, however, is 10 times larger than Inception. The number of parameters (weights) in Tesla’s neural network is five times bigger than Inception’s. I expect that Tesla will continue to push the envelope."
Wolfram says of Inception v3 "Number of layers: 311 | Parameter count: 23,885,392 | Trained size: 97 MB"
Not sure what version of Inception was being compared to Tesla though.
Probabilistic/inductive reasoning from past/simulated data (possibly assumes imperfect implementation of LCDT):
"This is really weird because obviously I could never influence an agent, but when past/simulated agents that look a lot like me did X, humans did Y in 90% of cases, so I guess the EV of doing X is 0.9 * utility(Y)."
Cf. smart humans in Newcomb's prob: "This is really weird but if I one box I get the million, if I two-box I don't, so I guess I'll just one box."
For a start, low-level deterministic reasoning:
"Obviously I could never influence an agent, but I found some inputs to deterministic biological neural nets that would make things I want happen."
"Obviously I could never influence my future self, but if I change a few logic gates in this processor, it would make things I want happen."
Two big issues I see with the prompt:
a) It doesn't actually end with text that follows the instructions; a "good" output (which GPT-3 fails in this case) would just be to list more instructions.
b) It doesn't make sense to try to get GPT-3 to talk about itself in the completion. GPT-3 would, to the extent it understands the instructions, be talking about whoever it thinks wrote the prompt.