It seems to me that there are roughly two types of "boundary" to think about: ceilings and floors.
Both floors and ceilings have a flavor of "the basic stuff that's actually happening" -- the interior is built out of a lot of boundary stuff, and small changes to boundary will create large shifts in interior. However, floors and ceilings are very different. Tweaking floor is relatively dangerous, while tweaking ceiling is relatively safe. Returning to the AlphaGo analogy, the floor is like the model of the game which allows tree search. The floor is what allows us to create a ceiling. Tweaks to the floor will tend to create large shifts in the ceiling; tweaks to the ceiling will not change the floor at all.
(Perhaps other examples won't have as clear a floor/ceiling division as AlphaGo; or, perhaps they still will.)
What remains unanswered, though, is whether there is any useful way of talking about doing this (the whole thing, including the self-improvement R&D) well, doing it rationally, as opposed to doing it in a way that simply “seems to work” after the fact.
[...] Is there anything better than simply bumbling around in concept-space, in a manner that perhaps has many internal structures of self-justification but is not known to work as a whole? [...]
Can you represent your overall policy, your outermost strategy-over-strategies considered a response to your entire situation, in a way that is not a cartoon, a way real enough to defend itself?
My intuition is that the situation differs, somewhat, for floors and ceilings.
Thanks, the floor/ceiling distinction is helpful.
I think "ceilings as they exist in reality" is my main interest in this post. Specifically, I'm interested in the following:
In more detail: it's one thing to be able to assess quick heuristics, and it's another (and better) one to be able to assess quick heuristics quickly. It's possible (maybe) to imagine a convenient situation where the theory of each "speed class" among fast decisions is compressible enough to distill down to something which can be run in that speed class and still provide useful guidance. In this case there's a possibility for the theory to tell us why our behavior as a whole is justified, by explaining how our choices are "about as good as can be hoped for" during necessarily fast/simple activity that can't possibly meet our more powerful and familiar notions of decision rationality.
However, if we can't do this, it seems like we face an exploding backlog of justification needs: every application of a fast heuristic now requires a slow justification pass, but we're constantly applying fast heuristics and there's no room for the slow pass to catch up. So maybe a stronger agent could justify what we do, but we couldn't.
I expect helpful theories here to involve distilling-into-fast-enough-rules on a fundamental level, so that "an impractically slow but working version of the theory" is actually a contradiction in terms.
The way I understand your division of floors and sealing, the sealing is simply the highest level meta there is, and the agent has *typically* no way of questioning it. The ceiling is just "what the algorithm is programed to do". Alpha Go is had programed to update the network weights in a certain way in response to the training data.
What you call floor for Alpha Go, i.e. the move evaluations, are not even boundaries (in the sense nostalgebraist define it), that would just be the object level (no meta at all) policy.
I think this structure will be the same for any known agent algorithm, where by "known" I mean "we know how it works", rather than "we know that it exists". However Humans seems to be different? When I try to introspect it all seem to be mixed up, with object level heuristics influencing meta level updates. The ceiling and the floor are all mixed together. Or maybe not? Maybe we are just the same, i.e. having a definite top level, hard coded, highest level meta. Some evidence of this is that sometimes I just notice emotional shifts and/or decisions being made in my brain, and I just know that no normal reasoning I can do will have any effect on this shift/decision.
What you call floor for Alpha Go, i.e. the move evaluations, are not even boundaries (in the sense nostalgebraist define it), that would just be the object level (no meta at all) policy.
I think in general the idea of the object level policy with no meta isn't well-defined, if the agent at least does a little meta all the time. In AlphaGo, it works fine to shut off the meta; but you could imagine a system where shutting off the meta would put it in such an abnormal state (like it's on drugs) that the observed behavior wouldn't mean very much in terms of its usual operation. Maybe this is the point you are making about humans not having a good floor/ceiling distinction.
But, I think we can conceive of the "floor" more generally. If the ceiling is the fixed structure, e.g. the update for the weights, the "floor" is the lowest-level content -- e.g. the weights themselves. Whether thinking at some meta-level or not, these weights determine the fast heuristics by which a system reasons.
I still think some of what nostalgebraist said about boundaries seems more like the floor than the ceiling.
The space "between" the floor and the ceiling involves constructed meta levels, which are larger computations (ie not just a single application of a heuristic function), but which are not fixed. This way we can think of the floor/ceiling spectrum as small-to-large: the floor is what happens in a very small amount of time; the ceiling is the whole entire process of the algorithm (learning and interacting with the world); the "interior" is anything in-between.
Of course, this makes it sort of trivial, in that you could apply the concept to anything at all. But the main interesting thing is how an agent's subjective experience seems to interact with floors and ceilings. IE, we can't access floors very well because they happen "too quickly", and besides, they're the thing that we do everything with (it's difficult to imagine what it would mean for a consciousness to have subjective "access to" its neurons/transistors). But we can observe the consequences very immediately, and reflect on that. And the fast operations can be adjusted relatively easy (e.g. updating neural weights). Intermediate-sized computational phenomena can be reasoned about, and accessed interactively, "from the outside" by the rest of the system. But the whole computation can be "reasoned about but not updated" in a sense, and becomes difficult to observe again (not "from the outside" the way smaller sub-computations can be observed).
I can never tell whether they’ve never thought about the things I’m thinking about, or whether they sped past them years ago. They do seem very smart, that’s for sure.
Whenever I have a great idea, it turns out that someone at MIRI considered it five years earlier. This simultaneously makes me feel very smart and rather dissapointed. With that being said, here are some relevant things:
Thing #1:
Oh, you’re so very clever! By now you’ve realized you need, above and beyond your regular decision procedure to guide your actions in the outside world, a “meta-decision-procedure” to guide your own decision-procedure-improvement efforts.
This is a nitpick but it's an important one in understanding how meta-stuff works here: If you've decided that you need a decision procedure to decide when to update your decision procedure, then whatever algorithm you used to make that decision is already meta. This is because your decision procedure is thinking self-referentially. Given this, why would it need to build a whole new procedure for thinking about decision procedures when it could just improve itself?
This has a number of advantages because it means that anything you learn about how to make decisions can also be directly used to help you make decisions about how you make decisions--ad infinitum.
Thing #2:
You are a business. You do retrospectives on your projects. You’re so very clever, in fact, that you do retrospectives on your retrospective process, to improve it over time. But how do you improve these retro-retros? You don’t. They’re in your boundary.
This case reminded me a lot of Eliezer on Where Recursive Justification Hits Rock Bottom except placed in a context where you can modify your level of recursion.
You need to justify that your projects are good so you do retrospectives. But you need to justify why your retrospectives are good so you do retrospectives on those. But you need to justify why your retro-retros are good too right? To quote Eliezer:
Should I trust my brain? Obviously not; it doesn't always work. But nonetheless, the human brain seems much more powerful than the most sophisticated computer programs I could consider trusting otherwise. How well does my brain work in practice, on which sorts of problems?
So there are a couple questions here. The easy question:
Q: How do I justify the way I'm investing my resources?
A: You don't. You just invest them using the best of your ability and hope for the best
And the more interesting question:
Q: What is the optimal level of meta-justification I use in investing my resources?
A1: This still isn't technically knowable information. However, there are plenty of unjustified priors that might be built into you which cause you to make a decision. For instance, you might keep going up the meta-levels enough rounds until you see diminishing returns and then stop. Or you might just never go above three levels of meta because you figure that's excessive. Depends on the AI.
A2: Given that Thing #1 is true, you don't need any meta-decision algorithms--you just need a self-referential decision algorithm. In this case, we just have the answer to the easy question: You use the full capabilities of your decision algorithm and hope for the best (and sometimes your decision algorithm makes decisions about itself instead of decisions about physical actions)
I don't understand Thing #1. Perhaps, in the passage you quote from my post, the phrase "decision procedure" sounds misleadingly generic, as if I have some single function I use to make all my decisions (big and small) and we are talking about modifications to that function.
(I don't think that is really possible: if the function is sophisticated enough to actually work in general, it must have a lot of internal sub-structure, and the smaller things it does inside itself could be treated as "decisions" that aren't being made using the whole function, which contradicts the original premise.)
Instead, I'm just talking about the ordinary sort of case where you shift some resources away from doing X to thinking about better ways to do X, where X isn't the whole of everything you do.
Re: Q/A/A1, I guess I agree that these things are (as best I can tell) inevitably pragmatic. And that, as EY says in the post you link, "I'm managing the recursion to the best of my ability" can mean something better than just "I work on exactly N levels and then my decisions at level N+1 are utterly arbitrary." But then this seems to threaten the Embedded Agency programme, because it would mean we can't make theoretically grounded assessments or comparisons involving agents as strong as ourselves or stronger.
(The discussion of self-justification in this post was originally motivated by the topic of external assessment, on the premise that if we are powerful enough to assess a proposed AGI in a given way, it must also be powerful enough to assess itself in that way. And contrapositively, if the AGI can't assess itself in a given way then we can't assess it in that way either.)
(I don't think that is really possible: if the function is sophisticated enough to actually work in general, it must have a lot of internal sub-structure, and the smaller things it does inside itself could be treated as "decisions" that aren't being made using the whole function, which contradicts the original premise.)
Even if the decision function has a lot of sub-structure, I think that in the context of AGI
Re: Q/A/A1, I guess I agree that these things are (as best I can tell) inevitably pragmatic. And that, as EY says in the post you link, "I'm managing the recursion to the best of my ability" can mean something better than just "I work on exactly N levels and then my decisions at level N+1 are utterly arbitrary." But then this seems to threaten the Embedded Agency programme, because it would mean we can't make theoretically grounded assessments or comparisons involving agents as strong as ourselves or stronger.
So "I work on exactly N levels and then my decisions at level N+1 are utterly arbitrary" is not exactly true because, in all relevant scenarios, we're the ones who build the AI. It's more like "So I work on exactly N levels and then my decisions at level N+1 were deemed irrelevant by the selection pressures that created me which granted me this decision-function that deemed further levels irrelevant."
If we're okay with leveraging normative or empirical assumptions about the world, we should be able to assess AGI (or have the AGI assess itself) with methods that we're comfortable with.
In some sense, we have practical examples of what this looks like. N, the level of meta, can be viewed as a hyperparameter of our learning system. However, in data science, hyperparameters perform differently for different problems so people often use Bayesian optimization to iteratively pick the best hyperparameters. But, you might say, our Bayesian hyperparameter optimization process requires its own priors--it too has hyperparameters!
But no one really bothers to optimize these for a couple reasons--
#1. As we increase the level of meta in a particular optimization process, we tend to see diminishing returns on the improved model performance
#2. Meta-optimization is prohibitively expensive: Each N-level meta-optimizer generally needs to consider multiple possibilities of (N-1)-level optimizers in order to pick the best one. Inductively, this means your N-level meta-optimizer's computational cost is around where x represents the number of (N-1)-level optimizers each N-level optimizer needs to consider.
But #1. can't actually be proved. It's just an assumptiont that we think is true because we have a strong observational prior for it being true. Maybe we should question how human brains generate their priors but, at the end of the day, the way we do this questioning is still determined by our hard-coded algorithms for dealing with probability.
The upshot is that, when we look at problems to the one similar we face with embedded agency, we still use the Eliezer-an approach. We just happen to be very confident in our boundary for reasons that cannot be rigorously justified.
I don't understand your argument for why #1 is impossible. Consider a universe that'll undergo heat death in a billion steps. Consider the agent that implements "Take an action if PA+<steps remaining> can prove that it is good." using some provability checker algorithm that takes some steps to run. If there is some faster provability checker algorithm, it's provable that it'll do better using that one, so it switches when it finds that proof.
Just a quick note: Sometimes there is a way out of this kind of infinite regress by implementing an algorithm that approximates the limit. Of course, you can also be put back into an infinite regress by asking if there is a better approximation.
A lot of what you write here seems related to my notion of Turing Reinforcement Learning. In Turing RL we consider an AI comprising of a "core" RL agent and an "envelope" which is a computer on which the core can run programs (somewhat similarly to neural Turing machines). From the point of the view of the core, the envelope is a component of its environment (in addition to its usual I/O), about which it has somewhat stronger priors than about the rest. Such a system learns how to make optimal use of the envelope's computing resources. Your "boundary" corresponds to the core, which is the immutable part of the algorithm that produces everything else. Regarding the "justification" of why a particular core algorithm is correct, the justification should come from regret bounds we prove about this algorithm w.r.t. some prior over incomplete models. Incomplete models are the solution to "even if you could obtain a perfect model of your world and beings like you, you wouldn’t be able to fit it inside your own head". Instead of obtaining a perfect model, the agent learns all patterns (incomplete models) in the world that it can fit into its head, and exploits these patterns for gain. More precisely, in Turing RL the agent starts with some small class of patterns that the core can fit into its head, and bootstraps from those to a larger class of patterns, accounting for a cost-benefit analysis of resource use. This way, the regret bound satisfied by the core algorithm should lead to even stronger guarantees for the system as a whole (for example this).
'Doing it well' seems to be very load bearing there. I think you're sneaking in an 'all' in the background? Like, in order to be defined as superintelligent it must do better at all domains than X or something?
My current answer is something hand wavy about the process just trying to ungoodhart itself (assuming that the self and world model as given start off goodharted) and the chips fall where they may.
It's not really about doing well/better in all domains, it's about being able to explain how you can do well at all of the things you do, even if that isn't nearly everything. And making that explanation complete enough to be convincing, as an argument about the real world assessed using your usual standards, while still keeping it limited enough to avoid self-reference problems.
Why did you spend it as you did, then? You cannot answer, ever, without your answer invoking something on the boundary.
It seems like if the O(N^(n-1)) algorithm (algorithm 2) is better than the O(N^n) algorithm (algorithm 1), then there is an amount of time, such that, after that time has elapsed (after adapting the new algorithm), the reduction in resource consumption will equal the cost spent finding the new algorithm. This might be called "breaking even", and doesn't seem to invoke something on the boundary.
I think that this infinite regress can be converted into a loop. Given an infinite sequence of layers, in which the job of layer is to optimise layer . Each layer is a piece of programming code. After the first couple of layers, these layers will start to look very similar. You could have layer 3 being a able to optimize both layer 2 and layer 3.
One model is that your robot just sits and thinks for an hour. At the end of that hour, it designs what it thinks is the best code it can come up with, and runs that. To the original AI, anything outside the original hour is external, it is answering the question "what pattern of bits on this hard disk will lead to the best outcome." It can take all these balances and tradeoffs into account in whatever way it likes. If it hasn't come up with any good ideas yet, it could copy its code, add a crude heuristic that makes it run randomly when thinking (to avoid the preditors) and think for longer.
preamble
Sometimes I wonder what the MIRI-type crowd thinks about some issue related to their interests. So I go to alignmentforum.org, and quickly get in over my head, lost in a labyrinth of issues I only half understand.
I can never tell whether they’ve never thought about the things I’m thinking about, or whether they sped past them years ago. They do seem very smart, that’s for sure.
But if they have terms for what I’m thinking of, I lack the ability to find those terms among the twists of their mirrored hallways. So I go to tumblr.com, and just start typing.
parable (1/3)
You’re an “agent” trying to take good actions over time in a physical environment under resource constraints. You know, the usual.
You currently spend a lot of resources doing a particular computation involved in your decision procedure. Your best known algorithm for it is O(N^n) for some n.
You’ve worked on the design of decision algorithms before, and you think this could perhaps be improved. But to find it, you’d have to shift resources some away from running the algorithm for a time, putting them into decision algorithm design instead.
You do this. Almost immediately, you discover an O(N^(n-1)) algorithm. Given the large N you face, this will dramatically improve all your future decisions.
Clearly (…“clearly”?), the choice to invest more in algorithm design was a good one.
Could you have anticipated this beforehand? Could you have acted on that knowledge?
parable (2/3)
Oh, you’re so very clever! By now you’ve realized you need, above and beyond your regular decision procedure to guide your actions in the outside world, a “meta-decision-procedure” to guide your own decision-procedure-improvement efforts.
Your meta-decision-procedure does require its own resource overhead, but in exchange it tells you when and where to spend resources on R&D. All your algorithms are faster now. Your decisions are better, their guiding approximations less lossy.
All this, from a meta-decision-procedure that’s only a first draft. You frown over the resource overhead it charges, and wonder whether it could be improved.
You try shifting some resources away from “regular decision procedure design” into “meta-decision-procedure-design.” Almost immediately, you come up with a faster and better procedure.
Could you have anticipated this beforehand? Could you have acted on that knowledge?
parable (3/3)
Oh, you’re so very clever! By now you’ve realized you need, above and beyond your meta-meta-meta-decision-procedure, a “meta-meta-meta-meta-decision-procedure” to guide your meta-meta-meta-decision-procedure-improvement efforts.
Way down on the object level, you have not moved for a very long time, except to occasionally update your meta-meta-meta-meta-rationality blog.
Way down on the object level, a dumb and fast predator eats you.
Could you have anticipated this beforehand? Could you have acted on that knowledge?
the boundary
You’re an “agent” trying to take good actions, et cetera. Your actions are guided by some sort of overall “model” of how things are.
There are, inevitably, two parts to your model: the interior and the boundary.
The interior is everything you treat as fair game for iterative and reflective improvement. For “optimization,” if you want to put it that way. Facts in the interior are subject to rational scrutiny; procedures in the interior have been judged and selected for their quality, using some further procedure.
The boundary is the outmost shell, where resource constraints force the regress to stop. Perhaps you have a target and an optimization procedure. If you haven’t tested the optimization procedure against alternatives, it’s in your boundary. If you have, but you haven’t tested your optimization-procedure-testing-procedure against alternatives, then it’s in your boundary. Et cetera.
You are a business. You do retrospectives on your projects. You’re so very clever, in fact, that you do retrospectives on your retrospective process, to improve it over time. But how do you improve these retro-retros? You don’t. They’re in your boundary.
Of everything you know and do, you trust the boundary the least. You have applied less scrutiny to it than anything else. You suspect it may be shamefully suboptimal, just like the previous boundary, before you pushed it into the interior.
embedded self-justification
You would like to look back on the resources you spend – each second, each joule – and say, “I spent it the right way.” You would like to say, “I have a theory of what it means to decide well, and I applied it, and so I decided well.”
Why did you spend it as you did, then? You cannot answer, ever, without your answer invoking something on the boundary.
How did you spent that second? On looking for a faster algorithm. Why? Because your R&D allocation procedure told you to. Why follow that procedure? Because it’s done better than others in the past. How do you know? Because you’ve compared it to others. Which others? Under what assumptions? Oh, your procedure-experimentation procedure told you. And how do you know it works? Eventually you come to the boundary, and throw up your hands: “I’m doing the best I can, okay!”
If you lived in a simple and transparent world, maybe you could just find the optimal policy once and for all. If you really were literally the bandit among the slot machines – and you knew this, perfectly, with credence 1 – maybe you could solve for the optimal explore/exploit behavior and then do it.
But your world isn’t like that. You know this, and know that you know it. Even if you could obtain a perfect model of your world and beings like you, you wouldn’t be able to fit it inside your own head, much less run it fast enough to be useful. (If you had a magic amulet, you might be able to fit yourself inside your own head, but you live in reality.)
Instead, you have detailed pictures of specific fragments of the world, in the interior and subject to continuous refinement. And then you have pictures of the picture-making process, and so on. As you go further out, the pictures get coarser and simpler, because their domain of description becomes ever vaster, while your resources remain finite, and you must nourish each level with a portion of those resources before the level above it even becomes thinkable.
At the end, at the boundary, you have the coarsest picture, a sort of cartoon. There is a smiling stick figure, perhaps wearing a lab coat to indicate scientific-rational values. It reaches for the lever of a slot machine, labeled “action,” while peering into a sketch of an oscilloscope, labeled “observations.” A single arrow curls around, pointing from the diagram back into the diagram. It is labeled “optimization,” and decorated with cute little sparkles and hearts, to convey its wonderfulness. The margins of the page are littered with equations, describing the littlest of toy models: bandit problems, Dutch book scenarios, Nash equilibria under perfect information.
In the interior, there are much richer, more beautiful pictures that are otherwise a lot like this one. In the interior, meta-learning algorithms buzz away on a GPU, using the latest and greatest procedures for finding procedures, justified in precise terms in your latest paper. You gesture at a whiteboard as you prioritize options for improving the algorithms. Your prioritization framework has gone through rigorous testing.
Why, in the end, do you do all of it? Because you are the little stick figure in the lab coat.
coda
What am I trying to get at, here?
Occasionally people talk about the relevance of computational complexity issues to AI and its limits. Gwern has a good page on why these concerns can’t place useful bounds on the potential of machine intelligence in the way people sometimes argue they do.
Yet, somehow I feel an unscratched itch when I read arguments like Gwern’s there. They answer the question I think I’m asking when I seek them out, but at the end I feel like I really meant to ask some other question instead.
Given computational constraints, how “superhuman” could an AI be? Well, it could just do what we do, but sped up – that is, it could have the same resource efficiency but more resources per unit time. That’s enough to be scary. It could also find more efficient algorithms and procedures, just as we do in our own research – but it would find them ever faster, more efficiently.
What remains unanswered, though, is whether there is any useful way of talking about doing this (the whole thing, including the self-improvement R&D) well, doing it rationally, as opposed to doing it in a way that simply “seems to work” after the fact.
How would an AI’s own policy for investment in self-improvement compare to our own (to yours, to your society’s)? Could we look at it and say, “this is better”? Could the AI do so? Is there anything better than simply bumbling around in concept-space, in a manner that perhaps has many internal structures of self-justification but is not known to work as a whole? Is there such a thing as (approximate) knowledge about the right way to do all of it that is still small enough to fit inside the agent on which it passes judgment?
Can you represent your overall policy, your outermost strategy-over-strategies considered a response to your entire situation, in a way that is not a cartoon, a way real enough to defend itself?
What is really known about the best way to spend the next unit of resources? I mean, known at the level of the resource-spenders, not as a matter of external judgment? Can anything definite be said about the topic in general except “it is possible to do better or worse, and it is probably possible to do better than we do now?” If not, what standard of rationality do we have left to apply beyond toy models, to ourselves or our successors?