Potentially challenging example: let's say there's a server that's bottlenecked on some poorly optimized algorithm, and you optimize it to be less redundant, freeing resources that immediately gets used for a wide range of unknown but presumably good tasks.
Superficially, this seems like an optimization that increased the description length. I believe the way this is solved in the OP is that the distributions are constructed in order to assign an extra long description length to undesirable states, even if these undesirable states are naturally simpler and more homogenous than the desirable ones.
I am quite suspicious that this risks you end up with improper probability distributions. Maybe that's OK.
Writing the part that I didn't get around to yesterday:
You could theoretically imagine e.g. scanning all the atoms of a human body and then using this scan to assemble a new human body in their image. It'd be a massive technical challenge of course, because atoms don't really sit still and let you look and position them. But with sufficient work, it seems like someone could figure it out.
This doesn't really give you artificial general agency of the sort that standard Yudkowsky-style AI worries are about, because you can't assign them a goal. You might get an Age of Em-adjacent situation from it, though even not quite that.
To reverse-engineer people in order to make AI, you'd instead want to identify separate faculties with interpretable effects and reconfigurable interface. This can be done for some of the human faculties because they are frequently applied to their full extent and because they are scaled up so much that the body had to anatomically separate them from everything else.
However, there's just no reason to suppose that it should apply to all the important human faculties, and if one considers all the random extreme events one ends up having to deal with when performing tasks in an unhomogenized part of the world, there's lots of reason to think humans are primarily adapted to those.
One way to think about the practical impact of AI is that it cannot really expand on its own, but that people will try to find or create sufficiently-homogenous places where AI can operate. The practical consequence of this is that there will be a direct correspondence between each part of the human work to prepare the AI to each part of the activities the AI is engaging in, which will (with caveats) eliminate alignment problems because the AI only does the sorts of things you explicitly make it able to do.
The above is similar to how we don't worry so much about 'website misalignment' because generally there's a direct correspondence between the behavior of the website and the underlying code, templates and database tables. This didn't have to be true, in the sense that there are many short programs with behavior that's not straightforwardly attributable to their source code and yet still in principle could be very influential, but we don't know how to select good versions of such programs, so instead we go for the ones with a more direct correspondence, even though they are larger and possibly less useful. Similarly with AI, since consequentialism is so limited, people will manually build out some apps where AI can earn them a profit operating on homogenized stuff, and because this building-out directly corresponds to the effect of the apps, they will be alignable but not very independently agentic.
(The major caveat is people may use AI as a sort of weapon against others, and this might force others to use AI to defend themselves. This won't lead to the traditional doom scenarios because they are too dependent on overestimating the power of consequentialism, but it may lead to other doom scenarios.)
- After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!
I've grown undecided about whether to consider evolution a form of intelligence-powered consequentialism because in certain ways it's much more powerful than individual intelligence (whether natural or artificial).
Individual intelligence mostly focuses on information that can be made use of over a very short time/space-scale. For instance an autoregressive model relates the immediate future to the immediate past. Meanwhile, evolution doesn't meaningfully register anything shorter than the reproductive cycle, and is clearly capable of registering things across the entire lifespan and arguably longer than that (like, if you set your children up in an advantageous situation, then that continues paying fitness dividends even after you die).
Of course this is somewhat counterbalanced by the fact that evolution has much lower information bandwidth. Though from what I understand, people also massively underestimate evolution's information bandwidth due to using an easy approximation (independent Bernoulli genotypes, linear short-tailed genotype-to-phenotype relationships and thus Gaussian phenotypes, quadratic fitness with independence between organisms). Whereas if you have a large number of different niches, then within each niche you can have the ordinary speed of evolution, and if you then have some sort of mixture niche, that niche can draw in organisms from each of the other niches and thus massively increase its genetic variance, and then since the speed of evolution is proportional to genetic variance, that makes this shared niche evolve way faster than normally. And if organisms then pass from the mixture niche out into the specialized niches, they can benefit from the fast evolution too.
(Mental picture to have in mind: we might distinguish niches like hunter, fisher, forager, farmer, herbalist, spinner, potter, bard, bandit, carpenter, trader, king, warlord (distinct from king in that kings gain power through expanding their family while warlords gain power by sniping the king off a kingdom), concubine, bureaucrat, ... . Each of them used to be evolving individually, but also genes flowed between them in various ways. Though I suspect this is undercounting the number of niches because there's also often subniches.)
And then obviously beyond these points, individual intelligence and evolution focus on different things - what's happening recently vs what's happened deep in the past. Neither are perfect; society has changed a lot, which renders what's happened deep in the past less relevant than it could have been, but at the same time what's happening recently (I argue) intrinsically struggles with rare, powerful factors.
- If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
- If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.
Part of the trouble is, if you just study the organism in isolation, you just get some genetic or phenotypic properties. You don't have any good way of knowing which of these are the important ones or not.
You can try developing a model of all the different relevant exogenous factors. But as I insist, a lot of them will be too rare to be practical to memorize. (Consider all the crazy things you hear people who make self-driving cars need to do to handle the long tail, and then consider that self-driving cars are much easier than many other tasks, with the main difficult part being the high energies involved in driving cars near people.)
The main theoretical hope is that one could use some clever algorithm to automatically sort of aggregate "small-scale" understanding (like an autoregressive convolutional model to predict next time given previous time) into "large-scale" understanding (being able to understand how a system could act extreme, by learning how it acts normally). But I've studied a bunch of different approaches for that, and ultimately it doesn't really seem feasible. (Typically the small-scale understanding learned is only going to be valid near the regime that it was originally observed within, and also the methods to aggregate small-scale behavior into large-scale behavior either rely on excessively nice properties or basically require you to already know what the extreme behaviors would be.)
- If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
- Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.
First, I want to emphasize that durability and strength are near the furthest towards the easy side because e.g. durability is a common property seen in a lot of objects, and the benefits of durability can be seen relatively immediately and reasoned about locally. I brought them up to dispute the notion that we are guaranteed a sufficiently homogenous environment because otherwise intelligence couldn't develop.
Another complication is, you gotta consider that e.g. being cheap is also frequently useful, especially in the sort of helpful/assistant-based role that current AIs typically take. This trades off against agency because profit-maximizing companies don't want money tied up into durability or strength that you're not typically using. (People, meanwhile, might want durability or strength because they find it cool, sexy or excellent - and as a consequence, those people would then gain more agency.)
Also, I do get the impression you are overestimating the feasibility of "“durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern". I can see some methods where maybe this would be robustly learnable, and I can see some regimes where even current methods would learn it, but considering its simplicity, it's relatively far from falling naturally out of the methods.
One complication here is, currently AI is ~never designing mechanical things, which makes it somewhat harder to talk about.
(I should maybe write more but it's past midnight and also I guess I wonder how you'd respond to this.)
If there's some big object, then it's quite possible for it to diminish into a large number of similar obstacles, and I'd agree this is where most obstacles come from, to the point where it seems reasonable to say that intelligence can handle almost all obstacles.
However, my assertion wasn't that intelligence cannot handle almost all obstacles, it was that consequentialism can't convert intelligence into powerful agency. It's enough for there to be rare powerful obstacles in order for this to fail.
"Stupidly obstinate" is a root-cause analysis of obstinate behavior. Like an alternative root cause might be conflict, for instance.
At first glance, your linked document seems to match this. The herald who calls the printer "pig-headed" does so in direct connection with calling him "dull", which at least in modern terms would be considered a way of calling him stupid? Or maybe I'm missing some of the nuances due to not knowing the older terms/not reading your entire document?
I mean, we exist and we are at least somewhat intelligent, which implies strong upper bound on heterogenity of environment.
We don't just use intelligence.
On the other hand, words like "durability" imply possibility of categorization, which itself implies intelligence. If environment is sufficiently heterogenous, you are durable at one second and evaporate at another.
???
Vaporization is prevented by outer space which drains away energy.
Not clear why you say durability implies intelligence, surely trees are durable without intelligence.
Right, that’s what I was gonna say. You need intelligence to sort out which traditions should be copied and which ones shouldn’t.
I think the necessity of intelligence for tradition exists on a much more fundamental level than that. Intelligence allows people to from an extremely rich model of the world with tons of different concepts. If one had no intelligence at all, one wouldn't even be able to copy the traditions. Like consider a collection of rocks or a forest; it can't pass any tradition onto itself.
But conversely, just as intelligence cannot be converted into powerful agency, I don't think it can be used to determine which traditions should be copied and which ones shouldn't.
There was a 13-billion-year “tradition” of not building e-commerce megastores, but Jeff Bezos ignored that “tradition”, and it worked out very well for him (and I’m happy about it too). Likewise, the Wright Brothers explicitly followed the “tradition” of how birds soar, but not the “tradition” of how birds flap their wings.
It seems to me that you are treating any variable attribute that's highly correlated across generations as a "tradition", to the point where not doing something is considered on the same ontological level as doing something. That is the sort of ontology that my LDSL series is opposed to.
I'm probably not the best person to make the case for tradition as (despite my critique of intelligence) I'm still a relatively strong believer in equillibration and reinvention.
I feel like I’ve lost the plot here. If you think there are things that are very important, but rare in the training data, and that LLMs consequently fail to learn, can you give an example?
Whenever there's any example of this that's too embarrassing or too big of an obstacle for applying them in a wide range of practical applications, a bunch of people point it out, and they come up with a fix that allows the LLMs to learn it.
The biggest class of relevant examples would all be things that never occur in the training data - e.g. things from my job, innovations like how to build a good fusion reactor, social relationships between the world's elites, etc.. Though I expect you feel like these would be "cheating", because it doesn't have a chance to learn them?
The things in question often aren't things that most humans have a chance to learn, or even would benefit from learning. Often it's enough if just 1 person realizes and handles them, and alternately often if nobody handles them then you just lose whatever was dependent on them. Intelligence is a universal way to catch on to common patterns; other things than common patterns matter too, but there's no corresponding universal solution.
I guess you’re using “empirical data” in a narrow sense. If Joe tells me X, I have gained “empirical data” that Joe told me X. And then I can apply my intelligence to interpret that “data”. For example, I can consider a number of hypotheses: the hypothesis that Joe is correct and honest, that Joe is mistaken but honest, that Joe is trying to deceive me, that Joe said Y but I misheard him, etc. And then I can gather or recall additional evidence that favors one of those hypotheses over another. I could ask Joe to repeat himself, to address the “I misheard him” hypothesis. I could consider how often I have found Joe to be mistaken about similar things in the past. I could ask myself whether Joe would benefit from deceiving me. Etc.
This is all the same process that I might apply to other kinds of “empirical data” like if my car was making a funny sound. I.e., consider possible generative hypotheses that would match the data, then try to narrow down via additional observations, and/or remain uncertain and prepare for multiple possibilities when I can’t figure it out. This is a middle road between “trusting people blindly” versus “ignoring everything that anyone tells you”, and it’s what reasonable people actually do. Doing that is just intelligence, not any particular innate human tendency—smart autistic people and smart allistic people and smart callous sociopaths etc. are all equally capable of traveling this middle road, i.e. applying intelligence towards the problem of learning things from what other people say.
(For example, if I was having this conversation with almost anyone else, I would have quit, or not participated in the first place. But I happen to have prior knowledge that you-in-particular have unusual and well-thought-through ideas, and even they’re wrong, they’re often wrong in very unusual and interesting ways, and that you don’t tend to troll, etc.)
You ran way deeper into the "except essentially by copying someone else's conclusion blindly, and that leaves you vulnerable to deception" point than I meant you to. My main point is that humans have grounding on important factors that we've acquired through non-intelligence-based means. I bring up the possibility of copying other's conclusions because for many of those factors, LLMs still have access to this via copying them.
It might be helpful to imagine what it would look like if LLMs couldn't copy human insights. For instance, imagine if there was a planet with life much like Earth's, but with no species that were capable of language. We could imagine setting up a bunch of cameras or other sensors on the planet and training a self-supervised learning algorithm on them. They could surely learn a lot about the world that way - but it also seems like they would struggle with a lot of things. The exact things they would struggle with might depend a lot on how much prior your build into the algorithm, and how dynamic the sensors are, and whether there's also ways for it to perform interventions upon the planet. But for instance even recognizing the continuity of animal lives as they wander off the screen would either require a lot of prior knowledge built in to the algorithm, or a very powerful learning algorithm (e.g. Solomonoff induction can use a simplicity prior to infer that there must be an entire planet full of animals off-screen, but that's computationally intractable).
(Also, again you still need to distinguish between "Is intelligence a useful tool for bridging lots of common gaps that other methods cannot handle?" vs "Is intelligence sufficient on its own to detect deception?". My claim is that the the answer to the former is yes and the latter is no. To detect deception, you don't just use intelligence but also other facets of human agency.)
I feel like I’m misunderstanding you somehow. You keep saying things that (to me) seem like you could equally well argue that humans cannot possibly survive in the modern world, but here we are. Do you have some positive theory of how humans survive and thrive in (and indeed create) historically-unprecedented heterogeneous environments?
First, some things that might seem like nitpicks but are moderately important to my position:
But more fundamentally, my objection to this question is that I doubt the meaningfulness of a positive theory of how humans survive and thrive. "Intelligence" and "consequentialism" are fruitful explanations of certain things because they can be fairly-straightforwardly constructed, have fairly well-characterizable properties, and even can be fairly well-localized anatomically in humans (e.g. parts of the brain).
Like one can quibble with the details of what counts as intelligence vs understanding vs consequentialism, but under the model where intelligence is about the ability to make use of patterns, you can hand a bunch of data to computer scientists and tell them to get to work untangling the patterns, and then it turns out there are some fairly general algorithms that can work on all sorts of datasets and patterns. (I find it quite plausible that we've already "achieved superhuman intelligence" in the sense that if you give both me and a transformer a big dataset that neither of us are pre-familiar with to study through, then (at least for sufficiently much data) eventually the transformer will clearly outperform me at predicting the next token.) And probably these fairly general algorithms are probably more-or-less the same sort of thing that much of the human brain is doing.
Thus "intelligence" factors out relatively nicely as a concept that can be identified as a major contributor to human success (I think intelligence is the main reason humans outperformed other primates). But this does not mean that the rest of human success can equally well be factored out into a small number of nicely attributable and implementable concepts. (Like, some of it probably can, but there's not as much reason to presume that all of it can. "Durability" and "strength" are examples of things that fairly well can, and indeed we have definitely achieved far-superhuman strength. These are purely physical though, whereas a lot of the important stuff has a strong cognitive element to it - though I suspect it's not purely cognitive...)
(Not sure if by "runtime" you mean "time spent running" or "memory/program state during the running time" (or something else)? I was imagining memory/program state in mine, though that is arguably a simplification since the ultimate goal is probably something to do with the business.)