For the record, I do think this is something worth mathematically formalizing. Perhaps someday you should come back to this, or restart this, or even "dump" your notes/thinking on this in an unedited form.
This is a terrible framework/approach to it. Very terrible, I don't often link to this post when I link to alignment stuff I wrote up. I think I was off base. Genealogy/lineage is not the right meta-approach/framework. A lot of premature rigour to it that is now useless.
I now have different intuitions about how to approach it and have some sketches (on my shortform the rough thoughts about formalising optimisation) laying some groundwork for it, but I doubt I'll complete that groundwork anytime soon.
Formalising returns in cognitive reinvestment is not a current research priority for me, but the groundwork does factor through research I see as highly promising for targeting the hard problems of alignment, and once the groundwork is complete, this part would be pretty easy.
It's also important for formalising my thinking/arguments re: takeoff dynamics (which aren't relevant to the hard problems, but are very important for governance/strategy).
It's crushing my motivation to see no engagement with this post.
I'd like to continue posting my thinking on takeoff dynamics here, but it's really demotivating when no one engages with it.
I don't see anything to engage with here. It's all setup and definitions and throat-clearing so far; of course one could argue with them, but that's true of every formalization of everything, they're always unrealistic and simplifying, that's the point of having them. Perhaps it leads to some interesting conclusion one doesn't want to believe, at which point one could go back and ponder the premises to think about what to reject or learn about the bare formalization itself, but as it is...
That's fair. I'll update on this for the future.
I do think/hope sequels to this would have more content to engage with.
Thanks for the reply.
P.S: I sent you a follow request on Twitter. My UN is "CineraVerinia".
I would be grateful if you accepted it.
P.S: I sent you a follow request on Twitter. My UN is "CineraVerinia".
It is impossible to look up specific follow requests, sorry.
The thing you are trying to study ("returns on cognitive reinvestment") is probably one of the hardest things in the world to understand scientifically. It requires understanding both the capabilities of specific self-modifying agents and the complexity of the world. It depends what problem you are focusing on too -- the shape of the curve may be very different for chess vs something like curing disease. Why? Because chess I can simulate on a computer, so throwing more compute at it leads to some returns. I can't simulate human biology in a computer - we have to actually have people in labs doing complicated experiments just to understand one tiny bit of human biology.. so having more compute / cognitive power in any given agent isn't necessarily going to speed things along.. you also need a way of manipulating things in labs (either humans or robots doing lots of experiments). Maybe in the future an AI could read massive numbers of scientific papers and synthesize them into new insights, but precisely what sort of "cognitive engine" is required to do that is also very controversial (could GPT-N do it?).
Are you familiar with the debate about Bloom et al and whether ideas are getting harder to find? (https://guzey.com/economics/bloom/ , https://www.cold-takes.com/why-it-matters-if-ideas-get-harder-to-find/). That's relevant to predicting take-off.
The other post I always point people too is this one by Chollet.
I don't necessarily agree with it but I found it stimulating and helpful for understanding some of the complexities here.
So basically, this is a really complex thing.. throwing some definitions and math at it isn't going to be very useful, I'm sorry to say. Throwing math and definitions at stuff is easy. Modeling data by fitting functions is easy. Neither is very useful in terms of actually being able to predict in novel situations (ie extrapolation / generalization), which is what we need to predict AI take-off dynamics. Actually understanding things mechanistically and coming up with explanatory theories that can withstand criticism and repeated experimental tests is very hard. That's why typically people break hard questions/problems down into easier sub-questions/problems.
So basically, this is a really complex thing.. throwing some definitions and math at it isn't going to be very useful, I'm sorry to say. Throwing math and definitions at stuff is easy. Modeling data by fitting functions is easy. Neither is very useful in terms of actually being able to predict in novel situations (ie extrapolation / generalization), which is what we need to predict AI take-off dynamics.
I disagree. The theoretical framework is a first step to allow us to reason more clearly about the topic. I expect to eventually bridge the gap between the theoretical and the empirical eventually. In fact, I just added some concrete empirical research directions I think could be pursued later on:
Even Further Future Directions
Some stuff I might like to do (much) later on. I would like to eventually bridge this theoretical framework to empirical work with neural networks. I'll describe in brief two approaches to do that I'm interested in.
Estimating RCR From ML History
We could try to estimate the nature and/or behaviour of RCR across particular ML architectures by e.g. looking at progress across assorted performance benchmarks (and perhaps the computational resources [data, flops, parameter size, etc.] required to reach each benchmark) and comparing across various architectural and algorithmic lineage(s) for ML models. We'd probably need to compile a comprehensive genealogy of ML architectures and algorithms in pursuit of this approach.
This estimation may be necessary, because we may be unable to measure RCR across an agent's genealogy before it is too late (if e.g. the design of more capable successors is something that agents can only do after crossing the human barrier).
Directly Measuring RCR in the Subhuman to Near Human Ranges
I am not fully convinced in the assumption behind that danger though. There is no complete map/full description of the human brain. No human has the equivalent of their "source code" or "model weights" with which to start designing a successor. It seems plausible that we could equip sufficiently subhuman (generality) agents with detailed descriptions/models of their own architectures, and some inbuilt heuristics/algorithms for how they might vary those designs to come up with new ones. We could select a few of the best candidate designs, train all of them to a similar extent and evaluate. We could repeat the experiment iteratively, across many generations of agents.
We could probably extrapolate the lineages pretty far (we might be able to reach the near-human domain without the experiment becoming too risky). Though there's a point in the capability curve at which we would want to stop such experiments. And I wouldn't be surprised if it turned out that the agents could reach superhuman ability in designing successors (able to improve their architectures faster than humans can), without reaching human generality across the full range of cognitive tasks.
(It may be wise not to test those assumptions if we did decide to run such an experiment).
Conclusions
Such empirical projects are far beyond the scope of this series (and my current research abilities). However, it's something I might try to attempt in a few years after upskilling some more in AI/ML.
Recall that I called this "a rough draft of the first draft of one part of the nth post of what I hope to one day turn into a proper sequence". There's a lot of surrounding context that I haven't gotten around to writing yet. And I do have a coherent narrative of where this all fits together in my broader project to investigate takeoff dynamics.
The formalisations aren't useless; they serve to refine and sharpen thinking. Making things formal forces you to make explicit some things you'd left implicit.
Disclaimers
This is a rough draft of the first part of the nth post of what I hope to turn into a proper sequence investigating AI takeoff dynamics. It's not entirely self-contained material. There's a lot of preceding and subsequent context that I have not written.
If you don't immediately understand the problem I'm trying to solve, why it's important, or why I chose the approach I did, that may be why. I will try to briefly explain it though, and do what I can to contextualise it.
I've written some tangible material on one aspect of the problem and share it for feedback and validation.
Some Prior Contextualising Work
My vision for where this fits into my broader attempt to investigate takeoff dynamics is in a thesis about the "hardness" of intelligence, and especially the hardness of intelligence with respect to itself (reflexive hardness?). I haven't written up my thoughts on the hardness of intelligence in a version that I fully endorse, but here's an even rougher draft that sketches up what the concept means and how it might affect takeoff dynamics.
Do note that I don't fully endorse that draft, and my notation and formalisms in it are strictly superseded by any notations and formalisms I use here. A polished up and refined version of the ideas in that draft will probably serve as the introductory post in my sequence on investigating takeoff dynamics (once I get around to stitching together all my disparate thoughts on the topic).
In the broader literature, this post can be viewed as an attempt to formalise how to measure "returns on cognitive reinvestment" as outlined by Yudkowsky in "Intelligence Explosion Microeconomics".
A significant (and important) way in which I disagree with Yudkowsky's framing of "returns on cognitive reinvestment", is that I'm thinking almost entirely about architectural and algorithmic improvements and not improvements from access to more computational resources (FLOPs, training data, hardware, etc.).
I have some other disagreements with some of Yud's framing and approach, but they wouldn't be addressed in this post.
Introduction
I would like to describe how to measure returns on cognitive reinvestment (RCR) in a more structured manner. The aim is to define — or specify how one might define — a function that captured the concept. I am most interested in the shape of that function (is it sublinear, linear or superlinear? Does the shape change across different intervals on the capability curve?) and its complexity class (logarithmic, sublinear polynomial, superlinear polynomial, exponential, superexponential, etc?). An exact specification of the function would be informative, but it's not strictly necessary.
Specifically, I want to define upper bounds on RCR. My interest in upper bounds is for reasoning better about takeoff dynamics. An upper bound on RCR constrains takeoff dynamics and gives us some form of safety assurances. E.g. we might be able to make predictions like:
And retrodictions like:
The accuracy of the retrodictions could inform how confident we are in the predictions. And if the model's retrodictions/predictions are sufficiently accurate, we could use its further out predictions as a form of safety guarantee.
An interest in determining what safety guarantees we do in fact have on AI takeoff dynamics is my sole motivation for this line of inquiry.
(The course(s) of action I would pursue in a world in which superhuman AI was 8, 20 or 50 years away are pretty different. To better plan what to do with my life, I want to have a better handle on how AI capability development will evolve).
Some Necessary Apparatus
Some apparatus that I'll be using in this piece:
Some Needed Terminology and Yet More Apparatus
Given a sufficiently intelligent agent, it would be able to instantiate another agent:
Let us consider the most capable child that a parent can create using all their resources within a given time frame to be the "successor" of that parent.
For a given agent αi, I will denote its successor thus: αi+1.
A successor can of course create its own successor, and the "growth rate" of cognitive capabilities across generations of agents is what we're trying to determine.
To allow a sensible comparison across generations of agents, we'd fix the given time frame in which a parent has to instantiate their successor. That is, the length of time defining a generation is held constant.
Let the length of a generation be: τ.
Some Assumptions and Lemmas
Self-Modification Assumption (SMA)
Self-modification is a particular case of procreation in which the child and the parent are the same agent. The cases of self-modification, that are most interesting to us are those in which the parent succeeds itself (this particular case is what arises in "recursive self-improvement").
So for subsequent considerations of self-modification here, we'll be considering self-modification as succession.
In those cases, it can be treated analogously to other forms of self-modification without loss of generality.
Note that we'll be considering agents that undergo significant self-modification to be distinct from the original agent.
Given an arbitrary agent αv, if αv undergoes significant self-modification within a generation to succeed itself, the new agent will be represented as αv+1.
I will not be justifying this assumption, just take it as axiomatic.
Self Succession Assumption (SSA)
Suppose that we permit agents to take self modifications of "no significant action" within a generation, then the original agent (modulo whatever resources it had acquired) would become its own successor.
We'll grant this allowance and refer to the cases where an agent succeeds itself without significant self-modification as "self succession". Whenever self succession occurs, we'll represent the successor using the same symbol as the original agent.
Given an arbitrary agent αv, if αv succeeds itself during a generation, the resulting successor will be represented as αv.
We'll refer to cases where the agent does not succeed itself (including by self-modification) as "distinct succession".
The notation used to refer to the successor will allow us to distinguish self succession from distinct succession.
A case where self succession will prove useful will be if the agent was not able to create a more capable child within a generation. By allowing the agent to default succeed itself and acquire new resources, we can permit the agent to "roll over" the creation of a successor across generations. This will enable us to more accurately measure RCR even if the returns diminish over time such that distinct successors do not emerge in some generations.
SSA has many ramifications on our considerations for τ, and the measurement of Ξ. These ramifications will be considered at more length in a subsequent post.
Successor Existence Lemma (SEL)
∀αx∈A∃αx+1(αx+1∈A)That is: "for every agent, there exists a successor to that agent".
This follows trivially from SMA. Via SMA, the agent can succeed itself. If a self-modification of "no action" is taken during a generation then the resulting agent (the original agent) becomes its successor (assuming the agent does not create any more capable children during that generation).
I will refer to cases where the agent becomes its own successor without taking significant self modification actions as "self succeeding".
Successor Parity Lemma (SPL)
∀αx∈A,ξ(αx+1)>=ξ(αx)That is: "The successor of every agent is at least as intelligent as the original agent".
This follows trivially from SEL:
ξ(αx+1)={ξ(αx)αx=αx+1ξ(αx)+k(k∈R+)αx≠αx+1That is:
Successor Superiority Assumption (SSA)
∃αv with successor αv+1:ξ(αv+1)>ξ(αv)That is: "there exists an agent whose successor is strictly more intelligent than itself".
This assumption is not as inherently obvious as SMA, so it does need justification. It's not necessarily the case that agents are able to instantiate agents smarter than themselves.
However, the entire concept of AI takeoff dynamics (the sole reason for which I decided to investigate this topic) takes it as an implicit assumption that we will eventually be able to create par human (and eventually superhuman) AI. Perhaps we will not. But as I'm situating my investigation within the context of AI takeoff dynamics, I feel confident making explicit this implicit assumption.
Note: I'm not saying that all agents would have successors smarter than themselves, just that there is at least one such agent. (Even if there is only one such agent, then the assumption is satisfied).
I'll refer to those agents who have successors smarter than them as "SSA-satisfying agents" or "SSA-S agents".
Let A∗={αx|ξ(αx+1)>ξ(αx)}That is: we're using A∗ to represent the set of all agents whose successor is more capable than them.
Aligned Children Assumption (ACA)
"All children are fully aligned with the values and interests of their parents."
This is not necessarily a realistic assumption. Nonetheless, I am choosing to make it.
My reason for this is that I'm (most) interested in upper bounds on RCR, and if all agents have aligned children within a generation, they can use said aligned children to build even more capable children (the most capable of which becomes the successor).
I guess this can be thought of as a best-case analysis of RCR (what's RCR like under the most favourable assumptions?). Analyses trying to demonstrate a lower bound to RCR or to measure it more accurately should not make this assumption.
Genealogy of Agents
I will refer to a line of succession involving an agent as a "lineage". I'll attempt to specify that more rigorously below.
For any two given agents (αk,αv), let:
For a given agent αx, let:
I would like to define two more concepts related to a given lineage: "head" and "tail" (as for why I chose those names, you might have noticed that a lineage can be modelled as a linked list.)
Head
For a given lineage Lαx, I'll denote the "head" (think root ancestor) as hαx .
You could read this as "the head of the lineage containing αx".
hαx:αy∈Lαx|∀αk∈Lαx,αy⪯αk (the agent that is an ancestor of or equal to all other agents in the lineage.Tail
For a given lineage Lαx, I'll denote the "tail" (think final descendant) as tαx .
You could read this as "the tail of the lineage containing αx".
tαx:αy∈Lαx|∀αk∈Lαx,αy⪰αk (the agent that is an descendant of or equal to all other agents in the lineage.Which Lineage?
Our core investigation is the nature of the change in cognitive capabilities change across agent lineages (and explicitly for the purpose of reasoning better about takeoff dynamics). To a first approximation, we might pick a reference lineage to examine.
It seems that a natural method of inquiry is to pick a "head" (α0) and then investigate how cognitive capabilities change across its descendants with each generation.
Because of our interest in takeoff dynamics, I suppose that our initial starting agent must be a member of A∗. This is because if it wasn't, its successor would be itself, and its lineage would only contain the original agent (the demonstration of this is left as an exercise for the reader).
One might even take a stronger position. We might insist that our starting agent be the least intelligent agent capable of creating a more intelligent successor. The reasons for this might be:
I am not fully convinced on all of the above reasons, but we do need to pick a particular member of A for \alpha_0, and the only choice for which there seems to be reasonable arguments is the least capable member of the set.
Thus, one potential definition of α0 might be:
α0=argminαx∈Aξ(αx)There are other ways we could potentially define α0, but I think I'll tentatively accept the above for now.
An extra constraint on α0 that I find interesting is insisting that α0 has a lineage which contains the global optima.
My reason for adding this extra constraint is again that I am most interested in an upper bound on RCR.
One way to formalise the above constraint is:
α0:∀αk∈A,ξ(tα0)≥ξ(αk)That is: our chosen "head" has a "tail" who's more capable than or equally capable as every other agent.
Some Next Steps
Some things I might like to do for my next part(s), or in future rewrites of this part.
Areas of Improvement
Places where I can improve my draft of this part
There'll be a Sequel Right?
Stuff I'd like to cover in sequels to this post:
Even Further Future Directions
Some stuff I might like to do (much) later on. I would like to eventually bridge this theoretical framework to empirical work with neural networks. I'll describe in brief two approaches to do that I'm interested in.
Estimating RCR From ML History
We could try to estimate the nature and/or behaviour of RCR across particular ML architectures by e.g. looking at progress across assorted performance benchmarks (and perhaps the computational resources required to reach each benchmark) and comparing across various architectural and algorithmic lineage(s) for ML models. We'd probably need to compile a comprehensive genealogy of ML architectures and algorithms in pursuit of this approach.
This estimation may be necessary, because we may be unable to measure RCR across an agent's genealogy before it is too late (if e.g. the design of more capable successors is something that agents can only do after crossing the human barrier).
Directly Measuring RCR in the Subhuman to Near Human Ranges
I am not fully convinced in the assumption behind that danger though. There is no complete map/full description of the human brain. No human has the equivalent of their "source code" or "model weights" with which to start designing a successor. It seems plausible that we could equip sufficiently subhuman (generality) agents with detailed descriptions/models of their own architectures, and some inbuilt heuristics/algorithms for how they might vary those designs to come up with new ones. We could select a few of the best candidate designs, train all of them to a similar extent and evaluate (the same computational resources should be expended in both training and inference). We could repeat the experiment iteratively, across many generations of agents.
We could probably extrapolate the lineages pretty far (we might be able to reach the near-human domain without the experiment becoming too risky). Though there's a point in the capability curve at which we would want to stop such experiments. And I wouldn't be surprised if it turned out that the agents could reach superhuman ability in designing successors (able to improve their architectures faster than humans can), without reaching human generality across the full range of cognitive tasks.
(It may be wise not to test those assumptions if we did decide to run such an experiment).
Closing Remarks
Such empirical projects are far beyond the scope of this series (and my current research abilities). However, it's something I might try to attempt in a few years after upskilling some more in AI/ML.