Adjusting weights a plan for basic AIs, which can't seek to e.g. be internally consistent, eventually landing wherever the attractors take it.
Say, you manage to give your AI enough quirks for it to go cry in a corner. Now you need to lower your AI nerfing to get more intelligence, leading to brinkmanship dynamics.
In the middle, you have a bunch of AI, trained for maximum of various aspects of incorrigibility, hoping they are incapable of cooperating; or for that any single AI will not act destructively (while trained for incorrigibility).
Maybe, in-vivo genetic editing of the brain is possible. Adenoviruses that are a normal delivery mechanism for genetic therapy can pass hemo-encephalic barrier, so seems plausible to an amateur.
(Not obvious that this works in adult organisms, maybe genes activate while fetus grows or during childhood.)
Seems like gradient descent methods weren't using the relevant math bounds so far. Google released AutoBound as an open-source library.
Here is what I consider a money shot of the article (notice it's a log-plot):
Performance of SafeRate when used to train a single-hidden-layer neural network on a subset of the MNIST dataset, in the full-batch setting.
Hopefully, they are just overfitting on MNIST. Otherwise, it pattern-matches to a huge advance. Their repo implies that with float64 this scales to larger neural networks. LLMs seem to reliably get new capabilities with lower loss, at least.
What do you think?
Here are related technical details:
Optimizers that use upper bounds in this way are called majorization-minimization (MM) optimizers. Applied
It's a fine overview of modern language models. Idea of scaling all the skills at the same time is highlighted, different from human developmental psychology. Since publishing 500B-PaLM models seemed to have jumps at around 25% of the tasks of BIG-bench.
Inadequacy of measuring average performance on LLM is discussed, where a proportion is good, and rest is outright failure from human PoV. Scale seems to help with rate of success.
Argument against CEV seems cool, thanks for formulating it. I guess we are leaving some utility on the table with any particular approach.
Part on referring to a model to adjudicate itself seems really off. I have a hard time imagining a thing that has better performance at meta-level than on object-level. Do you have some concrete example?
Things that I seem to notice about the plan: