It's imaginable to do this work but not remember any of it, i.e. avoid having that work leave traces that can accumulate, but that seems like a delicate, probably unnatural carving.
Is the implication here that modern NNs don't do this? My own tendency would be to think that they are doing a lot of this -- doing a bunch of reasoning which gets thrown away rather than saved. So it seems like modern NNs have simply managed to hit this delicate unnatural carving. (Which in turn suggests that it is not so delicate, and even, not so unnatural.)
Yes, I think there's stuff that humans do that's crucial for what makes us smart, that we have to do in order to perform some language tasks, and that the LLM doesn't do when you ask it to do those tasks, even when it performs well in the local-behavior sense.
Probably no current AI system qualifies as a "strong mind", for the purposes of this post? Adding various kinds of long term memory is a very natural and probably instrumentally convergent improvement to make to LLM-based systems, though.
I expect that as LLM-based systems get smarter and more agentic, they'll naturally start hitting on this strategy for self-improvement on their own. If you ask GPT-4 for improvements one could make to LLMs, it will come up with the idea of adding various kinds of memory. AutoGPT and similar solutions are not yet good enough to actually implement these solutions autonomously, but I expect that will change in the near future, and that it will be pretty difficult to get comparable performance out of a memoryless system. As you go even further up the capabilities ladder, it probably gets hard to avoid developing memory, intentionally or accidentally or as a side effect.
Adding long-term memory is risky in the sense that it can accumulate weirdness -- like how Bing cut off conversation length to reduce weirdness, even though the AI technology could maintain some kind of coherence over longer conversations.
So I guess that there are competing forces here, as opposed to simple convergent incentives.
Probably no current AI system qualifies as a "strong mind", for the purposes of this post?
I am reading this post as an argument that current AI technology won't produce "strong minds", and I'm pushing back against this argument. EG:
An AI can simply be shut down, until it's able to and wants to stop you from shutting it down. But can an AI's improvement be shut down, without shutting down the AI? This can be done for all current AI systems in the framework of finding a fairly limited system by a series of tweaks. Just stop tweaking the system, and it will now behave as a fixed (perhaps stochastic) function that doesn't provide earth-shaking capabilities.
I suspect that the ex quo that puts a mind on a trajectory to being very strong, is hard to separate from the operation of the mind. Some gestures at why:
Tsvi appears to take the fact that you can stop gradient-descent without stopping the main operation of the NN to be evidence that the whole setup isn't on a path to produce strong minds.
To me this seems similar to pointing out that we could freeze genetic evolution, and humans would remain about as smart; and then extrapolating from this, to conclude that humans (including genetic evolution) are not on a path to become much smarter.
Although I'll admit that's not a great analogy for Tsvi's argument.
I think it's a good comparison, though I do think they're importantly different. Evolution figured out how to make things that figure out how to figure stuff out. So you turn off evolution, and you still have an influx of new ability to figure stuff out, because you have a figure-stuff-out figure-outer. It's harder to get the human to just figure stuff out without also figuring out more about how to figure stuff out, which is my point.
Tsvi appears to take the fact that you can stop gradient-descent without stopping the main operation of the NN to be evidence that the whole setup isn't on a path to produce strong minds.
(I don't see why it appears that I'm thinking that.) Specialized to NNs, what I'm saying is more like: If/when NNs make strong minds, it will be because the training---the explicit-for-us, distal ex quo---found an NN that has its own internal figure-stuff-out figure-outer, and then the figure-stuff-out figure-outer did a lot of figuring out how to figure stuff out, so the NN ended up with a lot of ability to figure stuff out; but a big chunk of the leading edge of that ability to figure stuff out came from the NN's internal figure-stuff-out figure-outer, not "from the training"; so you can't turn off the NN's figure-stuff-out figure-outer just by pausing training. I'm not saying that the setup can't find an NN-internal figure-stuff-out figure-outer (though I would be surprised if that happens with the exact architectures I'm aware of currently existing).
If the mind becomes much more capable than the surrounding minds, it does so by being on a trajectory of creativity: something about the mind implies that it generates understanding that is novel to the mind and its environment.
I don't really understand this claim enough to evaluate it. Can you expand a bit on what you mean by it? I'm unsure about the rest of the post because it's unclear to me what the premise your top-line claim rest upon means.
If a mind comes to understand a bunch of stuff, there's probably some compact reasons that it came to understand a bunch of stuff. What could such reasons be? The mind might copy a bunch of understanding from other minds. But if the mind becomes much more capable than surrounding minds, that's not the reason, assuming that much greater capabilities required much more understanding. So it's some other reason. I'm describing this situation as the mind being on a trajectory of creativity.
[Metadata: crossposted from https://tsvibt.blogspot.com/2023/01/a-strong-mind-continues-its-trajectory.html. First completed January 29, 2023.]
A very strong mind is produced by a trajectory of creativity. A trajectory of creativity that produces a very strong mind is hard to separate from the mind's operation. So a strong mind continues on its trajectory of creativity as long as it is active.
A strong mind comes from a trajectory of creativity
If a mind is highly capable, it got to that point by gaining understanding in a voyage of novelty. If the mind gains understanding that is novel for all the surrounding minds (e.g., preexisting humans), it does so through creativity: generating novelty, rather than merely copying it. If the mind becomes much more capable than the surrounding minds, it does so by being on a trajectory of creativity: something about the mind implies that it generates understanding that is novel to the mind and its environment. If the mind is on a trajectory of creativity that brought it to the point of being highly capable, its trajectory of creativity probably carries the mind much further, making the mind much more capable than it already is.
The ex quo of a mind's creativity is the element (collection of elements) out of which comes novel structure. The ex quo of a modern AI system is almost entirely dependent on the search (i.e. training) apparatus, which is clearly separated out from running the found system. (The ex quo isn't entirely dependent on the search apparatus. Some non-zero creativity happens in the collision of elements that happens in, say, a single run of a stable diffusion image model or a large transformer language model. But it's not much creativity, and the found structure is about as temporarily grasped as possible.)
The proximal ex quo is that out of which novel structure comes directly. The distal ex quo is that out of which novel structure comes indirectly. So the mental context that's set up when a particular idea comes to you, and the other dark matter that goes into that abduction, is the proximal ex quo; human evolution is the distal ex quo; and the history of the development of your brain is an intermediate ex quo.
Trajectory and operation are hard to separate
An AI can simply be shut down, until it's able to and wants to stop you from shutting it down. But can an AI's improvement be shut down, without shutting down the AI? This can be done for all current AI systems in the framework of finding a fairly limited system by a series of tweaks. Just stop tweaking the system, and it will now behave as a fixed (perhaps stochastic) function that doesn't provide earth-shaking capabilities.
I suspect that the ex quo that puts a mind on a trajectory to being very strong, is hard to separate from the operation of the mind. Some gestures at why:
Making doesn't imply understanding
Evolution is a distal ex quo of human understanding. But there's clearly an ex quo more proximal than evolution for, say, scientific understanding: human thought and investigation. Setting up an evolution that can produce humans doesn't imply that you understand how humans do science.
The way we make neural networks today is by setting up a distal ex quo (the search process). A more proximal ex quo for a neural net comes from the accumulated hidden features: they set up the context in which the next little tweak is beneficial. We can know how to make neural nets that work well without knowing much about how the series of tweaks in context build up the computations that end up performing well at the given task.
We can nevertheless turn off the ex quo of current AI systems because the ex quo is almost entirely dependent on the distal ex quo. We can't, however, turn off the proximal ex quo of human scientific understanding, just by turning off the distal ex quo: scientific creativity doesn't require that genetic evolution is continuing. We can, in a haphazard way, turn off some human creativity while retaining some operation, e.g. by taking sedatives or by punishing creativity.
The ex quo is self-created
The learning is itself being learned. So from our perspective, "what's learned" (as distinct from [the learning process that we explicitly set up]) as an undifferentiated blob includes [the learning process that the mind sets up for itself].
Learning beyond a certain point has to be online learning. Online learning produces mental elements that combine synchronic and diachronic functions: the elements both participate in currently crystallized skills, and also in the production of new skills. (Analogy: a healthy living codebase is made of small components that both perform their function well and also make themselves and their context readily available to be effectively understood, modified, and extended.) Note: here "learning" is used as a metonymy for creativity; to me "learning" softly excludes, for example, problem solving and imagining, which creativity includes if they involve grasping new ideas. See here.
So, the operation of the mind includes learning processes, even if our explicitly-set-up search processes have been shut off. More generally, the operation of the mind to perform even familiar tasks is set up to continue being creative.
Mental operation includes creativity
A strong mind continues its trajectory of creativity
If a mind's operation and creativity can't be separated, then the mind can't operate without also exercising its creativity. A very strong mind became strong by being on a strong trajectory of accumulating novelty through creativity. So as long as a very strong mind continues to operate, it continues forward on its trajectory of creativity, gaining more and more understanding.