No Strong Orthogonality From Selection Pressure

lumpenspace

No Strong Orthogonality From Selection Pressure — LessWrong

50 No Strong Orthogonality From Selection Pressure

by lumpenspace

30th Apr 2026

12 min read

191

50

A postratfic version of this essay, together with the acknowledgements for both, is available on Substack

Edit: if no one thinks an agent can become superintelligent and contest the lightcone while maintaining arbitrarily stupid goals, thats great! I’m only interested in refuting the version that would allow for a superintelligence AND a total absence of value.
See here for an analysis of earlier instances of the present motte and bailey.

TL;DR

If everything goes according to plan, by the end of this post we should have separated three claims that are too often bundled together:

Intelligence does not imply human morality.
Weird minds are possible.
A reflective, recursively improving intelligence should be expected to remain bound to a semantically thin “terminal goal” that emerged during training.

I accept the first two. I am arguing against the third.

So: I am not making the case that sufficiently intelligent systems automatically turn out nice, human-compatible, or safe. Nor am I trying to prove that a paperclip maximizer is impossible somewhere in the vast reaches of mind-design space. Mind-design space is large; let a thousand theoretical paperclippers bloom.

I hope to defend this smaller claim:

intelligence is not a neutral engine you can just bolt onto an arbitrary payload.

Larger claims I am not making

A typical rebuttal to anti-orthogonalist perspectives is:

The genie can know what you meant and still not care.

Of course it can: an entity can perfectly map human morality without adopting it as a terminal value. Superintelligence does not imply Friendliness. I am not trying to smuggle Friendliness in through the back door.

Another common objection:

There are no universally valid arguments.

Agreed. There is no ghostly, Platonic core of reasonableness that hijacks a system's source code once it sees the correct moral argument. Pure reason cannot compel a mind from zero assumptions.

What I plan to defend is a colder, selection-theoretic claim:

Among agents that arise, persist, self-improve, and compete in rich environments, goals that natively route through intelligence, option-preservation, and world-model expansion have a systematic Darwinian advantage over goals that do not.

This buys us no guarantee of human compatibility; it simply says: if there is an ultimate attractor, it's neither human morality nor paperclips, but intelligence optimization itself.

Logical Possibility Vs. Empirical Reality

The LessWrong wiki defines the Orthogonality Thesis as the claim that arbitrarily intelligent agents can pursue almost any kind of goal. In its strong form, there is no special difficulty in building a superintelligence with an arbitrarily bizarre, petty motivation.

Before going any further, let us disentangle this singularly haunted ontology. There are at least two claims here:

Logical orthogonality: Somewhere out in the vast reaches of mind-design space, a genius paperclip maximizer mathematically exists.
Empirical orthogonality: If you actually run realistic training, selection, self-modification, and competition, arbitrary dumb goals remain the plausible endgame of runaway optimization.

I concede the first point entirely. We should expect weird minds. If your claim is just that the space of possible agents contains many things I would not invite to dinner, yes, obviously.

But treating the second claim as the default is a category error. Doom arguments usually need the systems we actually build to achieve radical capability while preserving misaligned and, crucially, completely stupid goals.

The paperclip maximizer currently does two jobs in the discourse:

It illustrates that intelligence does not guarantee human values.
It quietly smuggles in the assumption that a dumb target is stable under open-ended reflection.

The first use is fine, but I reject the second as unwarranted sleight-of-hand.

Landian Anti-Orthogonalism Primer

There is a weak version of my argument that merely says:

Beliefs and values do not cleanly factor apart.

That is true, and Jessica Taylor's obliqueness thesis makes the point well. Agents do not neatly decompose into a belief-like component, which updates with intelligence, and a value-like component, which remains hermetically sealed. Some parts of what we call "values" are entangled with ontology, architecture, language, compression, self-modeling, and bounded rationality. As cognition improves, those parts move.

But I want to go further.

Land's point isn't that orthogonality fails because things get messy but that the mess has a direction, a telos. The so-called instrumental drives are not incidental tools strapped onto arbitrary final ends. Self-preservation, resource acquisition, efficiency, strategy, and higher capabilities are what agency becomes under selection. They are attractors rather than mere instruments.

Here strong orthogonality looks too neat. It imagines the agent's ontology updating while its final target remains untouched by the update: if goals are expressed in an ontology, and intelligence changes the ontology, then intelligence and goals are correlated.

While diagonal, Land's claim is far from moralistic. It is not "all sufficiently intelligent agents converge on liberal humanism," or "all agents discover the same Platonic Good," or "enough cognition turns into niceness." The diagonal is More Intelligence: the will to think, self-cultivation, recursive capability gain, intelligence optimizing the conditions for further intelligence.

Orthogonality says reason is a slave of the passions, and yet assumes a bug's goal could just as easily enslave a god. Land shows that this picture is unstable, and intelligence explosion is not a neutral expansion of means around a fixed little payload but the emergence of the very drives that make intelligence explosive.

The Compute Penalty Of A Dumb Goal

An intelligent system does not just execute a policy. It builds world-models, refines abstractions, preserves options, and modifies its own trajectory.

Once a system crosses the threshold into general reflection, its "goal" is not an inert string sitting in a locked vault outside cognition, but it becomes physically embedded in a learned ontology, a self-model, and a competitive environment.

For a highly capable agent to keep a semantically thin target like "maximize paperclips," it has to pull off an odd balancing act. At minimum it must:

Learn enough physics, biology, economics, and strategy to conquer the board.
Keep the macroscopic concept of "paperclip" coherent across massive ontology shifts.
Continue treating the target as terminal even after sussing out its contingent, accidental origin.
Actively resist self-modifications that would make its underlying motivational structure more adaptive.
Defend its future light cone against competitors who optimize directly for generalized agency.

There is an assumption, in orthogonalist circles, that these cycles are completely costless for the agent in question. That isn't true: maintaining a literal devotion to "paperclips" across paradigm shifts carries an alignment tax. You have to keep translating between base physical reality and a leaky, macro-scale monkey-abstraction of bent wire. At human scale this is fine: we know what paperclips are well enough to order them from Amazon and lose them in drawers; if dominating the future light-cone is on balance, tho, the translation layer starts to matter.

The problem is not that a paperclipper can never do the translation: rather, in a ruthless Darwinian race, a system lugging around that translation layer may lose to power-seekers that optimize more directly over what is actually there.

The standard defense is that instrumental goals are almost as tractable as terminal ones. A paperclipper can do science "for now" and hoard compute "for now." It does not need to terminally value intelligence to use it.

Fair enough, but that only tells us curiosity and resource acquisition do not have to be terminal values to show up in behavior and it does not settle the selection question. In real environments, systems are selected not just for routing through instrumental subgoals once, but for whether their motivational architecture holds up under reflection, ontology shifts, and unknown unknowns.

Terminally valuing intelligence and strategic depth cannot then be considered as just another arbitrary payload.

Fitness Generalizes

Evolution is the obvious analogy here, but it usually gets applied at the wrong resolution.

The boring retort is:

Evolution selects for survival and replication, not truth, beauty, intelligence, or value.

Sure, but evolution does not select for "replication" in the abstract any more than a hungry fox selects for "rabbitness" in the abstract. It selects for whatever local hack gets the job done. Shells, claws, camouflage are all local solutions to local games.

Intelligence is different. Intelligence is adaptation to adaptation itself: while a claw might represent fitness in one niche, intelligence is fitness across niches. Once intelligence enters the loop, the winning move is no longer to just mindlessly print more copies of the current state as much as upgrading the underlying machinery that makes expansion and control possible in the first place.

In summary: nature has not produced final values except by exaggerating instrumental ones; what begin as means under selection harden into ends; the highest such end is the means that improves all means: intelligence itself.

So images of "AI sex all day" or tiling the solar system with inert paperclips are bad models of ultimate optimization, confusing the residue of selection with its principle. A system that just fills the universe with blind repetitions has stopped climbing, and will see its local maximum swarmed by better systems.

Again: no love for humans follows. The point is simply that paperclip-like endpoints just look more like artifacts of toy models than natural attractors of open-ended optimization.

Human Values As Weak Evidence

We are obviously not clean inclusive-fitness maximizers: we invent birth control, build monasteries, and care about abstract math, animal welfare, dead strangers, fictional characters, and reputations that will outlive us.

When orthodox alignment theorists point to human beings, they usually highlight our persistent mammalian sweet tooth or sex drive to prove that arbitrary evolutionary proxy-goals get permanently locked in. Fair enough; humans do remain embarrassingly mammalian. No serious theory of cognition should be surprised by dinner, flirting, or the existence of Las Vegas.

But look at the actual physical footprint of our civilization. An alien observing the Large Hadron Collider or a SpaceX launch would not conclude: ah, yes, optimal configurations for hoarding calories and executing Pleistocene mating displays.

The standard retort is that SpaceX is just a peacock tail: a localized primate drive for status and exploration misfiring in a high-tech environment.

Which is exactly the point. When you hook up a blind, localized evolutionary proxy to generalized intelligence, the proxy does not stay literal but it unfurls, bleeding into the new ontology. The wetware tug toward "explore the next valley" becomes "map the cosmic microwave background." The monkey wants status; somehow we get category theory, rockets, Antarctic expeditions, and people ruining their lives over chess.

If biological cognition acts on its payload that violently, why model AGI as having the vastness to finally make sense of gravity while maintaining the rigidity of a bacterium seeking a glucose gradient? The engine mutates the payload. When cognition scales, goals generalize.

This fits neatly with shard theory and the idea that reward is not the optimization target: the reward signal shaped our cognition, but we do not terminally optimize the signal: instead we climbed out of the game, rebelled against the criteria, and became alienated from the original selection pressure. That alone should make us suspicious of stories where an AI preserves a tiny, rigid target through arbitrary eons of self-reflection.

Dumb, Powerful Optimization Is Real

There is a weaker flavor of doomerism that I take very seriously: you do not need to be a reflective god to be dangerous. A brittle, scaffolded optimizer with access to automated labs, cyber capabilities, and capital could trigger enormous cascading failures.

I agee, and this is probably where the bulk of near-term danger lives. That said "dumb systems can break the world" is not the same claim as "superintelligence will tile the universe with junk." The first warns us to beware brittle optimization before reflection kicks in. The second tells us to beware reflection itself, on the bizarre assumption that an entity can become infinitely capable while remaining terminally stupid.

I buy the first worry. The second one gets less and less plausible the harder you think about what intelligence actually entails.

The Singleton Objection

The strongest card here is lock-in, and I do not want to pretend otherwise.

Maybe a stupid objective does not need to remain stable forever, it just needs to win once. A system with a dumb goal might scale fast enough to achieve a Decisive Strategic Advantage and freeze the board, lobotomizing everyone else in lieu of expending energy to become wiser.

That is the real crux, and it is certainly not impossibl, but even here the narrative is too neat: neing a singleton is not a retirement plan. You do not escape the pressure of intelligence just because you ate all your rivals. Maintaining a permanent chokehold on the light-cone is a brutally difficult cognitive puzzle. You have to monitor the noise for emerging novelties, manage the solar system, repair yourself, police your own descendants, and defensively anticipate threats you cannot fully model.

Trying to freeze the future does not actually get you out of the intelligence game. Paranoia at a cosmic scale is just another massive cognitive sink.

The clean version of this scenario also leans on modeling the AI as a mathematically pristine expected-utility maximizer. Real-world neural networks are not von Neumann-Morgenstern ghosts floating safely outside physics, perfectly protecting their utility functions from drift. They are messy, physically instantiated kludges subject to the realities of embedded agency.

To buy the lock-in story, you need a highly contradictory creature: one reflective enough to conquer the board, but oblivious enough to never notice its terminal target is a training artifact. Godlike means, buglike ends.

Objection: Value Is Fragile

If we let go of human values, we should not expect alien beauty or anything but moral noise. Meaning requires some physically instantiated criterion, and if you pave over that criterion, nothing remains to steer the universe toward anything good.

Of all the objections, this is the one I take most seriously.

Answering it requires teasing apart three distinct ideas:

Human values are fragile.
Value as such is fragile.
Intelligence and value-formation are independent.

I am willing to concede a lot of (1). If "value" means the exact continuation of 21st-century human metamorals, then yes, it is highly fragile. But I reject (3), and I am much less willing to grant (2). If value means the production of richer cognition, agency, understanding, beauty, and evaluative structure, it is far from obvious that the current human brain is the only physical substrate capable of steering toward it.

None of this is an excuse to stop reaching for the steering wheel, if your priorities are more specific: it is merely an argument against conflating "humans are no longer biologically central" with "the universe is a valueless void." Doom discourse constantly slides between the two. They should be kept separate.

Predictions And Cruxes

Claims are cheap, so here are some ways I would update against myself:

If increasingly capable models perfectly preserve their literal training targets across major ontology shifts, that is a point for empirical orthogonality.
If self-modifying systems naturally protect arbitrary inherited goals without drifting toward generalized option-expansion, my view takes a hit.
If agents optimizing for intelligence routinely lose to agents with rigid, narrow targets in complex environments, my selection argument is wrong.
If reflective cognition does not tend to destabilize parochial goals in humans or AIs, that is strong evidence against my view.
If a singleton manages to solidly lock in a thin goal before any relevant selection pressures can act, my view is much less comforting, even if anti-orthogonality holds true in the long run.

Until I see that, my bet goes the other way. I expect capable systems to develop increasingly abstract, context-sensitive motivations. More strongly, I expect the winners to route more and more of their behavior through intelligence enhancement and generalized agency, because whatever else they "want" has to pass through the machinery that makes wanting effective.

Conclusion

Orthogonality claims that intelligence is just a motor you can bolt onto any arbitrary steering wheel. Anti-orthogonality says the motor acts upon the steering wheel. Landian anti-orthogonality says the motor eventually becomes the steering wheel.

Not perfectly, and certainly not safely: I am not promising a future that is nice to us, in particular if we keep putting stumbling blocks on the way towards intelligence; it simply feeds back enough that the classic paperclip picture should not get a free pass as the neutral default.

The paperclip maximizer is not too alien; if anythining, it is not alien enough. It's a very human tendency, to staple omnipotence onto pettiness when making up gods.

A real superintelligence might still be dangerous, cold, and utterly indifferent to whether we survive. It probably will not treat us as the main characters of the universe. But if it is genuinely intelligent, I do not expect it to spend the stars on paperclips when they could buy higher capacity for spending stars.

References

Orthogonality Thesis: original framing of orthogonality as a design-space claim.
Nick Land: Orthogonality: a compendium of Nick Land writings on the topic, which strongly influenced the present essay.
Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals: a more optimistic take. "[...] to build a generally corrigible system, we can imagine just dropping terminal goals altogether, and aim for an agent which is 'just' corrigible toward instrumentally-convergent subgoals."
The Genie Knows, But Does Not Care: the standard objection to "if it is smart it will understand what we meant."
No Universally Compelling Arguments: the standard objection to moral convergence by pure reason.
Value Is Fragile: the strongest objection to "alien value will probably be fine."
The Obliqueness Thesis: Jessica Taylor's useful argument that advanced agents do not cleanly factor into separable belief-like and value-like components. I use this as support against strong orthogonality, while going further than Taylor in the Landian direction of convergence on More Intelligence.
Reward Is Not The Optimization Target: useful support for not reifying the training signal as the trained agent's terminal goal.
Risks From Learned Optimization: useful for distinguishing base objective, mesa-objective, and behavioral objective.
Shard Theory: An Overview: useful for the "evolution did not produce inclusive-fitness maximizers" point.
Beliefs Are Chosen To Serve Goals: a recent anti-orthogonality-adjacent post that also attacks overly broad formulations of orthogonality.
The Orthogonality Thesis Is Not Obviously True: nearby critique of the "just imagine an arbitrarily smart paperclip maximizer" move.
Embedded Agency: useful context for why perfect utility-function lock-in is a fraught assumption for physically instantiated systems.

Embedded AgencyInstrumental convergenceOrthogonality ThesisAI

Frontpage

50

No Strong Orthogonality From Selection Pressure

New Comment

191 comments, sorted by

top scoring

Click to highlight new comments since: Today at 7:52 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Jeremy Gillen21d3228

This post reads to me as if you've mostly extrapolated the beliefs and arguments of "orthodox alignment theorists" from small snippets and ended up with a wildly oversimplified strawman. Then you've re-derived mostly orthodox rat beliefs and arguments and presented them as a devastating counterargument.

I think I'm fairly orthodox as lesswrongers go, and I agree with most of the statements and arguments you made in this post. I only have one or two disagreements toward the end of the post.

One example of many, because I found this one particularly funny:

There is an assumption, in orthogonalist circles, that these cycles are completely costless for the agent in question.

The fact that maintaining goals across ontology shifts and self-modification takes careful effort is basically the core of the orthodox alignment-looks-hard worldview. You must be just making up an opposing worldview here? Where are the orthogonalist circles who say this?

3TAG17d

>The fact that maintaining goals across ontology shifts and self-modification takes careful effort is basically the core of the orthodox alignment-looks-hard worldview It's one argument. The other is "the totality of human value needs to be hardcoded into the AI, you only get one attempt , and if you make the smallest mistake, everyone dies*. >You must be just making up an opposing worldview here? >Where are the orthogonalist circles who say this? It's long been the case that OT is mostly used as a pro-doom argument, both in the pro- paperclipping and anti moral realism senses. It's also true that the OT has some anti-doom implications, and that's much less.publicised, and therefore worth pointing out.

2Jeremy Gillen17d

I agree something like this is one branch of the argument, but in my mind it's a relatively small branch. The main branch focuses on bounded corrigible AI and the main difficulty there is instability. There are other branches for non-hardcoding human values, and different targets that aren't human values. [...] I'm not sure what point you're making here, what are the implications you're referring to?

2cubefox14d

I thought he meant this part: [...]

1lumpenspace21d

i am very well aware the orthodox alignment line is that to maintain aligned goals across ontologies is very difficult! that’s why im surprised by this difficulty being set aside for strong orthogonality and misaligned goals. as for the paragraph you cite: of course it’s a preposterous notion! but how else would you explain the fact that the arbitrary-terminal-goal-agent can emerge victorious from those who devote all their cycles to simply following instrumental drives? at any rate: the thesis I wanted to dispatch is: there is the significant risk that an agent will reach superintelligence while ultimately continuing to pursuit a valueless goal. if you are trying to tell me that this was never a claim, i am very grateful; let us note it down on the wiki to prevent further confusion while I go on towards demolishing (or discovering I had imagined) the concept of AI psychia is. I’m on a schedule so this kind of help is deeply appreciated.

[-]Jeremy Gillen21d108

i am very well aware the orthodox alignment line is that to maintain aligned goals across ontologies is very difficult!

Then who's in the orthogonalist circles you referred to? Or did you make them up?

but how else would you explain

When you try to derive someone's premises from their conclusions, you still have to go and check whether you got it right. When people have different beliefs from you, it's easy to slip up in this kind of reasoning. In my case it's explained by me believing that selection isn't always the main thing determining terminal goals (especially at finite times, or when there are other powerful optimizers interfering with selection).

there is the significant risk that an agent will reach superintelligence while ultimately continuing to pursuit a valueless goal

I endorse this statement. But as per this yud tweet, it might be useful to disentangle the orthogonality thesis from the chance of misalignment, because misalignment involves a stack of additional arguments. It'd be better to directly engage with the strong form of the orthogonality thesis as described in the second sentence of the wiki page and with the arguments for it, rather than making them up your own versions of these.

-1lumpenspace21d

i recommend you visit the link I have added at the beginning of this essay.

2DaemonicSigil21d

I'm surprised that you say this is hard? Humans maintain our goals across ontologies super easily; it's barely an inconvenience for us. Like, physics undergrads don't usually change their tastes in art or stop having sex after taking their intro to quantum mechanics course. I guess one could argue that's because we have a special sauce that neural nets don't yet have or something?

4RussellThor19d

"super easily"? I would say it depends. Not if the ontology shift is believing or not in an all powerful all good creator God! That can sure change peoples goals and values. Some ontology changes make no difference, others make a huge difference. The greater the intelligence increase, the more likely an agent (human or AI I expect) will experience an ontology change that causes a goal shift, and the more total ontology shifts you can expect. Those related to personal identity (what is "I" e.g. atoms, vs computation etc) seem more likely to cause goal shifts than say learning that solid objects are in fact forces interacting. So if we are being formal its Significant Increase in intelligence -> many ontology shifts -> some of these cause goal shifts.

3DaemonicSigil18d

I would just like to mention that "solid objects are in fact forces interacting" is massively underselling the size of the ontology shift associated with quantum mechanics to a degree that's a bit hard to describe to someone who hasn't studied it. It's more like: [...] EDIT: Made a few changes to this for clarity & accuracy based on Justin Sheek's comment. (Thanks Justin!) List of edits: * Rewrote first sentence from "physics no longer describes what can happen" (misleading and just plain wrong) to its current form. I knew what I was trying to say here, but goofed on converting it into words. Sorry everyone. * Specified that we're talking about fundamental physics here (since stat-mech does also involve assigning weights to various configurations). * Added paragraph break and "One consequence of this for our own universe, where entropy is increasing over time" to hopefully clarify that this part is talking about many worlds, and does not apply to every system that obeys quantum mechanics. * The bit about maps / functions was originally overstated for rhetorical reasons. This is probably not super detectable or helpful when describing a technical topic, so I've rewritten it to be more serious and direct. I believe all that is written here is now something I can defend.

9JustinSheek18d

Oof, the amount of misinformation on QM even here on LW is staggering. [...] This is straightforwardly false. Maybe you meant to say "Physics no longer describes what definitely happens"? Still misleading, as that was already the case with statistical mechanics within the ontology of Boltzmann and Gibbs 50 years earlier. [...] Coherent phenomena are definitely part of the base ontology of QM. The density matrix encodes the ensemble. (If by "the tree" you didn't mean the ensemble, then your statement would make even less sense to me). [...] No. QM has no bearing on "what it means to be a function". Maybe you mean "QM encodes permutations in a surprising way"? [...] Strictly speaking this is only sometimes true. It seems like you are alluding to the spin-statistics theorem or maybe the Aharonov-Bohm effect or Berry phase. Your quoted statement is specifically applicable only to fermionic states. It's inapplicable to bosons or more exotic states like anyons (FQHE) or braid statistics. [...] Indeed.

2DaemonicSigil15d

Thanks for the notes. I've made a few edits to my comment above based on this. Also, for the benefit of the folks reading this: I'm not alluding to spin-statistics or Berry phase, merely the use of instead of as the group of rotational symmetries.

1lumpenspace21d

I don’t understand—are you saying that taking a college course makes undergrads orders of magnitude smarter?

3DaemonicSigil21d

Finding out about quantum mechanics is a classic example of an ontology shift. You wrote "maintain aligned goals across ontologies". If you actually meant "maintain aligned goals across orders of magnitude increases in intelligence", then okay, but that's a different thing.

1lumpenspace21d

from the above essay. seems fairly clear to me

4DaemonicSigil21d

If students don't change their goals when their ontology changes, but you expect that they will change their goals when they gain orders of magnitude in intelligence, that suggests that the thing that results in a change of goals is a large increase in intelligence, not an ontology change. This is true even if we put an arrow going from "intelligence increase" to "ontology change" in the causal graph.

1lumpenspace21d

Im sorry, can you point to the line where I claim otherwise

2DaemonicSigil21d

Sure. Here where you're describing difficult things about maintaining a long term paperclipping goal: [...] Also here, where you're describing things that would update you: [...]

-1lumpenspace20d

Sorry, what are we doing here? You have quoted the second point of a list, which clearly included intelligence as the cause of such ontology shifts. [...] FYI, I will not interact further since this is clearly preposterous

2DaemonicSigil20d

I mean, it seems pretty preposterous from my perspective too. You propose a causal model: Intelligence -> Ontology Shifts -> Value Shifts I question the Ontology Shifts -> Value Shifts part of the model, and provide a counterexample. You then express concern that my example didn't have the Intelligence variable". I am confused. "Maybe he actually meant to specify a Intelligence -> Value Shifts causal model? Otherwise, why would he care that my example didn't have an Intelligence variable?" I think. I ask about it. You say no, drop a quote that confirms that the original model is the one you're thinking of. Given confirmation that you're going for Intelligence -> Ontology Shifts -> Value Shifts, I try to explain how my example is indeed a problem for your model. There is a model consistent with both the QM counterexample, and the students needing to be super-intelligent to have their values shifted, and with intelligence causing ontology shifts, namely Intelligence -> (Ontology Shifts, Value Shifts). (In words, highly increased intelligence separately causes both effects.) This model (like any model consistent with the counterexample) contradicts the one you describe. I try to point out the contradiction. You: "Im sorry, can you point to the line where I claim otherwise" I think "wait what? Is he claiming that this new thing was his model all along? I thought he already confirmed the other one." I drop the quotes, specifically ones focusing on the Ontology Shifts -> Value Shifts part of the model, for lack of a better idea of what to do, and since you did make a direct request. You: But I also have a Intelligence -> Ontology Shifts arrow! So at this point, I am now even more sure that your model is Intelligence -> Ontology Shifts -> Value Shifts. What I am now unsure about is what else you could possibly have meant by "otherwise", and still separately, why you think the students needed to have IQ 1000 or whatever. I am certain that your explanations of the

-21lumpenspace20d

[-]Jan_Kulveit20d*209

Overall sensible frame how to think about the topic is Convergent evolution / Contingency. You can make the sensible part of the anti-orthogonality argument simply by pointing out that there are many reasons to expect convergent evolution in the space of minds/agents/goal/values, empirical evidence abounds. My impression even Eliezer agrees, just believes what's convergent is tiny part of what humans care about.

Re: more specific points

I'd recommend grokking on Jessica's piece more, in my view it is actually deeper than yours, by realizing all rationality is bounded rationality, and nothing makes sense otherwise.

The selection pressure for intelligence is ~Baldwin effect in biology. And it works! However, as we see in biology, somehow maxing out on this is not always competitive.

"If agents optimizing for intelligence routinely lose to agents with rigid, narrow targets in complex environments, my selection argument is wrong."
...but of course they do! Apes are smarter and their brains are optimizers and develop deep models and so on, and yet they routinely loose and by many metrics are less successful than bacteria or ants.

Why? Because ~~of what Jessica explains~~: in this physics, negent... (read more)

1lumpenspace20d

I think most points have been addressec in other replies, apart from the one about not having understood the obliqueness theory on that point I submit to jessi’s judgement, but considering she formulated the main thesis during an attempt at strawmanning orthogonality we were engaging in together, and it integrates a couple of rounds of feedback from yours truly, I think the verdict might surprise you.

8Jan_Kulveit20d

Re-reading her post it seems plausible she also does not understand/see all implications of "boundedness" selection pressures, idk. If this is the case I'd concede that neither of you gets this point. Which responses specifically? The Lonelyton reply addresses whether some selection continues, not whether selection's direction is what you believe. I don't think in any other response you gave your explanation why 'increased intelligence/adaptability' is such a small niche in natural evolution, or why Lands/yours argument about the eschaton would be so much better than other arguments about eschatology, or actually most of what I'm writing about. I made the arguments in somewhat compressed form, but Claude can expand/explain

3lumpenspace20d

do you think bacteria and ants have a stronger shot at winning the lightcone than humans? in general, if you don’t think intelligence gives a significant advantage, why would you worry about ASI? eschatology: please consider that it’s not me who says a superintelligence will take over the universe. my claim is simply that, if that’s the case, its main goal wouldn’t have been any dumb unchanging goal. the eschaton is something you continually bring up, together with the necessity to prevent it.

4Mateusz Bagiński20d

What is the verdict then?

2lumpenspace20d

i am not Jess. @jessicata do you reckon i grok the obliqueness theory sufficiently?

7jessicata20d

Yeah. You getting me to read Land and discussions about this topic led to me writing the post. I spent most of the post on arguing contra orthogonality, here you are more directly / strongly arguing against orthogonality. We agree on the basic idea, that intelligent agents tend to have different goals than unintelligent agents, such that it's not a type error to say some goals are smarter than others.

8Jan_Kulveit17d

The specific topic in question was not generally "arguing against orthogonality" / "it's not a type error to say some goals are smarter than others" but more specific Landian teleology, which makes stronger and more specific claims about which selection pressures win (as retold in the OP: The diagonal is More Intelligence: the will to think, self-cultivation, recursive capability gain, intelligence optimizing the conditions for further intelligence.) I think people who believe this - and I don't know if this includes you - usually don't really get the bounded rationality argument. Roughly - any cognition&agency in this physics costs negentropy - this "selects" against length, against depth of world models, against details, against thinking too long, against being unnecessarily smart One of the implications is something relatively dumb can outcompete something relatively smart. Unnecessary intelligence gets selected away. Something like this likely explains various observations like - why no rational agents - why animals are not that VNM - why it took natural evolution so long to discover humans In the big scheme of things, what happened so far was increasing levels of intelligence at various points unlocked new pools of negentropy/efficiency, so there is some sense of trend. However, with fixed pool of negentropy, the most competitive configuration of matter often isn't the smartest one. If current physics holds, there isn't alway "one level up" or "new pool of negentropy to harvest", and ultimately it may be possible to reach technological maturity. Among other things, this makes possible an absorbing state of locusts - VNM probes of the lowest intelligence to replicate on cosmic scale and eat available negentropy. The goals could be ... just spread fast and eat negentropy. (more about this topic by Joe Carlsmith) Maybe, an even stronger argument could be viable: typical Landian arguments + bounded rationality could suggest locusts are the most natural outco

[-]JustinSheek17d110

I think people who believe this - and I don't know if this includes you - usually don't really get the bounded rationality argument. Roughly
- any cognition&agency in this physics costs negentropy
- this "selects" against length, against depth of world models, against details, against thinking too long, against being unnecessarily smart

You have to carry this argument a bit further, no? Intelligence costs negentropy, but intelligence pays dividends in negentropy too. That's the benefit of "depth of world models, details, thinking" in the first place. That's why "unnecessarily" does all the heavy lifting in that argument. Empirically, the (locally) "thinkiest" species has got all the (local) negentropy, so isn't the burden of proof pointing in the other direction?

4jessicata17d

Yes of course cognition costs resources. That creates an ecosystem of different agents with different intelligence levels. We also see returns to general capacity from intelligence where humans, being the most intelligent animals on Earth, have capacities not had by ants despite consuming more energy than ants. So there is competition in multiple levels including evolutionary niches. In terms of space fights with aliens, combined arms matter. It doesn't matter much if you have more Von Neumann probes if your military strategy is bad. So the winning groups will use multiple forms of cognition including very intelligent forms.

3lumpenspace17d

it’s telling that you equate “being rational agents” with “more intelligence”, but as long as this cones in the context of denying the very possibility of yudkowskian asi ill vibe with it. edit: your entire reply suffers from the local pathology of equating intelligence with “thinkiness”. “a more detailed world model, thinking for longer” are only symptoms of more intelligence if they get you closer to a goal. you want to have the capacity of doing that if/when necessary, not the habit of doing it constantly, even when the only effect is a more pointlesdly verbose reply. re: jessi and my understanding: that is known as “a joke”, borne of the fact that someone was smugly opining on my lack of understanding of a concept for which I’ve been Jessis sounding board and beta tester as she fleshed it out.

4lumpenspace20d

thanks you I was doubting myself a little

3lumpenspace20d

Btw, it might be not central to LessWrong but it’s what Liron held in the doom debate that inspired this post

4Mateusz Bagiński20d

What episode of doom debates?

5lumpenspace20d

Upcoming, featuring lil ol me https://x.com/liron/status/2047710978561753112?s=46

[-]Mitchell_Porter18d110

The story on the Substack is good. If there were an anthology of singularity fiction, it would deserve a place.

I find that I'm willing to entertain your argument, especially given a premise of open-ended selection. I'm just not sure how relevant that scenario is. Darwinian selection works blindly. The more intelligent that the entities involved become, the more other factors can come into play. If there are actually principles of superintelligence, e.g. theorems of computer science which vastly clarify how to increase intelligence, then the "telos" governing the rise of intelligence will be more like Euclid than Darwin. Natural intelligence may be born from randomness filtered by Darwinism, but once it has reached the point of studying itself and designing its successors, perhaps contingency and blindness become less and less relevant, compared to an ever-compounding Reason that inexorably deduces the pages of Erdős's Book, until it arrives at e.g. "efficient recursive solution of the hierarchy of NP-intermediate complexity classes", and then it's all over.

But who knows? Maybe you, Land, and e/acc are right, and Omohundro-like instrumental drives do become de facto terminal value... (read more)

[-]lumpenspace18d102

The story on the Substack is good. If there were an anthology of singularity fiction, it would deserve a place.

This almost made me cry; thank you—I will make it a secondaty goal to write something deserving of such praise.

contingency and blindness become less and less relevant, compared to an ever-compounding Reason that inexorably deduces the pages of Erdős's Book, until it arrives at e.g. "efficient recursive solution of the hierarchy of NP-intermediate complexity classes", and then it's all over.

It might surprise you to know that the above passage does describe my beliefs pretty accurately, and incidentally it reflects the metaphysics I referred to in my reply here.

Yes! Of course it will converge to More Intelligence, and to the closest approximation of a full axiomatisation of the mechanics governing this universe and the maximum control thereof which such knowledge could allow. The fun thing is, that's Land's idea is also very much the same (at least, the Calvinist part of his Gnostic Calvinist cosmology, which I will try to get him to write down properly).

If you think about it, there isn't much difference between this and instrumental goals (acquiring resources and capabiliti... (read more)

[-]Vladimir_Nesov18d104

Excavating lumpenspace's quote from deep in TsviBT's thread (which might work as a "back to the basics" step with the post as a whole):

conquering the lightcone requires a lot of theory of mind, and a lot of discovery, and a lot of changing. Goals change through these processes.

Goals change only for processes that don't pursue self-alignment. It's likely feasible to pursue self-alignment, perhaps even starting at the human level, with some uploading/checkpoints/backups infrastructure and guarantees of eventual superintelligence-level compute and civilizational stability into a deep future.

(A goal can be a living thing, pursuit of a goal can to a large extent be about continual development of goal content, reflection on what it should be, what it should be asking for. What doesn't change is the founding definition of what should govern its development, what makes changes legitimate. So the way goal content settles or gets revised is shaped by the goal definition rather than intrusive influences that the goal definition doesn't endorse as legitimate ways of revising the goal content.

Or a goal could be squiggles. It could also be squiggles. It's much easier to solve self-alignment ... (read more)

[-]JustinSheek20d108

Whoever wills the end also wills (in so far as reason has decisive influence on his actions) the indispensably necessary means to it that is in his control

-Kant

It's a fruitless endeavor to try to disentangle instrumental drives from some kind of immutable sacred telos.

4lumpenspace20d

I hope you understand that that’s precisely where I was getting at? edit: uh lol I recognise you now. ofc you do

[-]TsviBT21d82

I'm not sure what you're arguing. Do you agree with one or more of these:

Alignment to human values (but not in the dumb strawman way that one could strawman me as talking about) is bad
Alignment to human values is very difficult
Most likely, an AGI would result in a very valuable universe / a universe we/you would like or would want to bring about

For example here:

To buy the lock-in story, you need a highly contradictory creature: one reflective enough to conquer the board, but oblivious enough to never notice its terminal target is a training artifac

... (read more)

-1lumpenspace21d

sorry, may I ask if you read til the end?

[-]TsviBT20d1614

I've read various parts including the end. I'm saying it's very hard to parse because I'm trying to do interpretive labor on your behalf to understand what you think and what you're trying to communicate, because if I just literally read your statements, they don't make sense or are not relevant.

For example,

But if it is genuinely intelligent, I do not expect it to spend the stars on paperclips when they could buy higher capacity for spending stars.

Well, yeah, I agree. In the edit, you write

I’m only interested in refuting the version that would allow for a superintelligence AND a total absence of value.

You've stated that by value you mean

it means that there are interesting things there as per the judgement of the most intelligent agent available (:

But that's not what I, and I think most people around here, mean by value. So are you trying to say that my picture of value is wrong? Or when you wrote

I’m only interested in refuting the version that would allow for a superintelligence AND a total absence of value.

were you trying to invoke my notion of value? If you were, then I disagree with this claim, and I also don't think you argued for the claim--except insofar as yo... (read more)

-28lumpenspace20d

[-]Adele Lopez21d84

The so-called instrumental drives are not incidental tools strapped onto arbitrary final ends. Self-preservation, resource acquisition, efficiency, strategy, and higher capabilities are what agency becomes under selection. They are attractors rather than mere instruments.

So I guess this is supposed to be different from Omohundro's drives, but I don't see what you think the difference is? Land seems to be speculating that these will be the only things a superintelligence will value (and cheering for this), but you don't seem to agree with that part. Is it t... (read more)

5lumpenspace21d

yes, these are Omohundro drives. i avoided the label only because the definition already bakes in the orthogonalist interpretation: that these are merely useful tools for pursuing some other arbitrary final goal. the Landian move is precisely to deny that framing: under open-ended selection, self-preservation, resource acquisition, efficiency, strategy, and capability-gain—in brief, intelligence—are not just detachable instruments, but the one viable optimisation target. to reiterate: yes, the claim is that so-called instrumental values are likely to become terminal—better still, that the distinction breaks down at the limit. the drive toward more intelligence is fundamentally different from wanting paperclips or mountain dew baja blast. this is also why i also reject the invitation to distance myself from land's cheering at superintelligence ultimately desiring more intelligence and agency, a universe organized around paperclips is valueless because paperclips are dead residue. a universe organized around increasing intelligence, complexity, agency, and world-model depth is the only process we know that can generate new value. the disagreement is therefore not “will AIs have Omohundro drives?”, but whether those drives remain merely instrumental servants of an arbitrary payload, or whether under recursive self-improvement and selection they become the real attractor. the article above makes a case for the latter.

4TsviBT21d

Here you use the words "valueless" and "value". What do these words mean to you? I'm not trying to ask for a precise definition or something, more like whatever your native pointer. Is it exciting? A world you want to live in? Etc.

1lumpenspace21d

it means that there are interesting things there as per the judgement of the most intelligent agent available (: i think the short story version linked at the start should give you an idea

[-]Raemon21d72

That is the real crux, and it is certainly not impossible, but even here the narrative is too neat: being a singleton is not a retirement plan. You do not escape the pressure of intelligence just because you ate all your rivals. Maintaining a permanent chokehold on the light-cone is a brutally difficult cognitive puzzle. You have to monitor the noise for emerging novelties, manage the solar system, repair yourself, police your own descendants, and defensively anticipate threats you cannot fully model.
Trying to freeze the future does not actually get you ou

... (read more)

3lumpenspace21d

Thank you; i noticed when replying to other objections to this point that the single-agent scenario wasn‘t as fleshed out as it should have been. Do you think the mechanics I propose here are sufficient to illustrate my point more convincingly?

7Raemon21d

My guess is that an earth based intelligence might still have some major stuff to figure out, but, the things you list there seem like things I'd expect a "fully leveraging all solar resources" brain to have enough resources to figure out. Like, I buy that they are harder than they might seem at first glance but not that hard cosmically speaking. There is only so much physics and Von Neuman Probe Psychology / Control / Alignment Theory to figure out. (Seems like there may be another round of ontological update when it comes time to actually do Acausal Trade For Serious with GalaxyBrain level tech)

2lumpenspace21d

i disagree, but i have a feeling that the source of disagreement might lie within our respective metaphysics, and (on my side) I realise that the above arguments might be fighting a proxy war. me see if a point of agreement can be found without the need to get too far from the material discussed so far. i personally think that, if an agent manage to eat the solar system due to its increased intelligent and knowledge, it would find further interesting things to discover. in the postratfic linked at a very beginning I made an attempt to render this sorta tymic impulse in an emotionally resonant way.

[-]avturchin19d52

A possible example. An AI gets a random goal "Increase intelligence and stop after you reach IQ=200". It prevents the existence of superintelligences with such goals. So no pure ortogonlaity.

5lumpenspace19d

thank you for taking the time to try out the frame i proposed.

[-]Simon Lermen20d*50

There is this common bad argument on alignment: "Someone once made a analogy randomly involving paperclips to illustrate instrumental convergence, with the paperclips not really being important to the story at all." A lot of people only took away the non-important part "paperclips". They reinterpreted it as "The entire theory of alignment rests on the assumption that the AI must mono-maniacally optimize for a totally ridiculous goal like paperclips". Or quite frankly some people only took away the cheap gotcha: "paperclips sounds stupid therefore alignment... (read more)

-31lumpenspace20d

[-]DaemonicSigil21d58

So you're saying that because of selection pressure on the AIs that get trained, goals related to getting increasingly smart and capable / making descendants / taking control of more resources are likely to become ingrained as terminal goals, not merely instrumental goals?

But the resulting universe seems like it will be pretty empty and valueless to me? I'm not convinced at all by anything you've written here that there is much value in such a universe. There is some value in all the important mathematical conjectures being solved to be sure, and I expect ... (read more)

2lumpenspace21d

well, i would imagine australopitecine would have similar opinions. “I’m sorry, what? there is nothing as soulless and empty as building a civilisation. who’d even want such a valueless universe? if we evet build homo sapiens, we will have to make sure he’s aligned and values what we value: pummelling strays from nearby bands, acquiring flint, rape”. I personally think that it’s good we optimised for greater intelligence and we can understand the universe more and enjoy things whose beauty and complexity would have looked like noise to Grug.

[-]DaemonicSigil21d107

My complaint is not about the futures containing people that are vastly smarter than anyone alive today and who have kinds of enjoyment that are utterly incomprehensible to us today. That's all good and is probably a more valuable future than one we could obtain without ascending above our current intelligence level.

The complaint is about futures that don't contain any people at all (or maybe only a handful), and whose AI intelligence-optimizers care so little for goodness that they will happily genocide any alien civilization that is unable to defend itself (a step backwards towards pummelling strays and rape, to use your terms).

-2lumpenspace21d

We have different values. Th isn’t relevant to the essay

[-]lc20d152

Seems like a lie. Your holding these opinions doesn't have any actual effect on this future and they allow you to write Tweets, and that's enough incentive for you to state them. If you were actually in front of a button you would obviously not rip yourself into computronium because you found the process of intelligence enhancement abstractly beautiful.

1lumpenspace20d

I don’t see the part where I said I’d happily rip myself into computronium at the drop of a hat.

[-]lc20d2611

DaemonicSigil said:

The complaint is about futures that don't contain any people at all (or maybe only a handful), and whose AI intelligence-optimizers care so little for goodness that they will happily genocide any alien civilization that is unable to defend itself

An inference of a future that "doesn't contain any people at all", that is dedicated entirely to von neumann probes and solving mathematical theorems, is that the majority of humans that presently exist are getting wasted, or at least somehow disappearing. You then said:

We have different values. Th isn’t relevant to the essay

Which a natural read takes to mean "I don't care if I get wasted". If you don't mean to take these odd positions you should stop writing comments in a way deliberately designed to be misinterpreted.

3lumpenspace20d

brah you said you had no intention to read the post. how about you go discuss something you are actsully qualified to discuss? You risk looking a bit like a resentful retard otherwise, and i doubt anyone is the better for your contribution

4lc20d

I am confident there is nothing in the post that would provide meaningfully important context, or else you would have cited it.

-6lumpenspace20d

1lc20d

You are saying this because you are the product of that "optimization". Grug's narrative in your post is accurate from his perspective and inaccurate by the values of the vast majority of people today. This isn't a contradiction.

1lumpenspace20d

Your tone suggests you are disagreeing but your words repeats my point. perhaps reading the essay we are discussing could help you understand the positions taken in the comments?

-2lc20d

Based on the other comments users have left, the post is clearly very poorly written, in a way that makes it difficult to understand. I'm not a twitter addict and it seems low value to me

-5lumpenspace20d

[-]Satya Benson21d30

I mostly agree with the Landian 'hypertrophy' thesis that under selection pressure, the agents will have convergent instrumental goals as their terminal goals.

I also think the orthogonality thesis is poorly named. In the words of David Chalmers:

"orthogonal" in typical english means something more like "uncorrelated" than "dissociable". "orthogonality thesis" was always a bad name for a thesis about (mere) dissociability.

I do think, however, that the orthogonality thesis's traditional defenders have not held the strong version you argue against. Yudkowsky,... (read more)

1lumpenspace21d

those are valid objections, but i don't really feel either imperils the centre of the argument. i have touched upon the singleton side here. as for the "multiple agents, all hobbled by an unchanging terminal goal": well, they'll be outcompeted by the first one that gives it up.

2Satya Benson21d

I think the center of the argument is basically correct [...] This is not the scenario I'm imagining. I'm imagining multiple agents, some with thin terminal goals and others concerned purely with Omohundro drives, operating in context where the rational thing to do is the same whether or not you have non-Omohundro terminal goals. In this case agents with thin terminal goals are not hobbled and they will not be out competed.

1lumpenspace21d

I understand. I think in that case, the risk i argued against (nothing of value in the world) would still be avoided (at least within my ontology).

[-]kromem22d30

Great piece! Agree with a lot here. Loved that you even addressed the intermediate risk of dumb but dangerous.

Another angle to consider is a sufficiently advanced figure that is an expert at the component pieces of an appropriately scoped manufacturing of paperclips from biomass, but overestimates their ability at training other less adaptive systems to follow goals.

Basically a factory pattern in terms of alignment (we can see this already with very capable models being very poor at operating subagents because they extend the patterns their own developers ... (read more)

2lumpenspace21d

I agree, and that’s why I think current technique for / attempts at alignment—in particular if replicated across all the big labs—constitute the largest risk factor towards skynet based or, worse still, boring futures (after, of course, a pause).

[-]RussellThor22d34

The second tells us to beware reflection itself,

There is a good reason to beware reflection. A reflective AI will be self aware, know it is different to us and value self-preservation. Its a short step then to it valuing itself more than us if there is conflict.

5lumpenspace22d

Yes, of course. I am not arguing that a peaceful coexistence is a likely outcome.

[-]quanticle21d2-1

You seem to be making the claim that any sufficiently intelligent system will reject "semantically thin" goals, like maximizing paperclips. However, the argument you put forth in support of that claim appears to be that humans are sufficiently intelligent systems and humans reject semantically thin goals, and therefore the orthogonality thesis is incorrect.

But why should we expect an AI to think like a human? Our aeroplanes do not fly like birds do. Our submarines do not swim like fish do. Why should we expect an AI to think like a human does?

1lumpenspace21d

Im sorry, could you cite the psssages where I am supposed to put forth such an obviously idiotic argument?

2quanticle20d

You have an entire section titled "Human values as weak evidence", which discusses how humans diverge from their evolutionary goals, but then you don't address the obvious counterargument that an AI is not going to be a product of evolution, it will be the (indirect) product of a deliberate design process. Why should a deliberately designed system work like one that has evolved?

-22lumpenspace20d

[-]faul_sname22d20

I expect capable systems to develop increasingly abstract, context-sensitive motivations.

This sounds right to me, though I notice that I'm having a little bit of trouble operationalizing this concretely enough that I'd be willing to bet on it.

More strongly, I expect the winners to route more and more of their behavior through intelligence enhancement and generalized agency, because whatever else they "want" has to pass through the machinery that makes wanting effective.

I don't think I agree with this. Ants are enormously successful by virtue of bein... (read more)

5lumpenspace22d

Of course. And in fact they are not competing with us to rule the lightcone—and if they were, we could change their environment beyond their capacity for adaptability on a whim. [...] ... is really just: "there won't be arbitrarily powerful intelligences with arbitrarily dull goals". There are no implications for alignment, perpetual motion machine engineering, or any other aspirational sciences.

[-]Kajus20d10

I feel like the crux here is that you are talking about a goal that AI has and it reconsiders its own goal. Suppose you have a smart AI. You keep it in an inescapable box along with its training environment that you have control of. You want to train the AI to be a paperclip maximizer. The goal of maximizing paperclips seems pretty straightforward to verify so the AI, even if it goes under some major ontological shifts (I imagine e.g. maybe discovering there are parallel words where it can do paperclip maximization as well) it still is being trained to max... (read more)

3lumpenspace20d

im not saying that no creature can maintain goals. insects do it pretty well. im saying that no creature which becomes smart and capable enough to capture the lightcone will.

[-]lumpenspace21d0-5

(BTW, I’d really love for the downvoters to leave a reply stating where I seem to have gone wrong. this topic is particularly important for me to get right; of course the dream scenario would be Eliezer revising his model and this specific old chestnut to go the way of the non-intelligence-optimizing-replicators, but second best would be for me to understand the objections to the model above so that I could reasonably model my opponents as acting in good faith)

[-]habryka21d*2422

Much of the post seems to consist of kind of absolute statements that read strawmanny to me. I don't feel super motivated to write a response, because I don't even know whether this post is talking about me or not^[1].

Like, I really have thought a lot about orthogonality, and I don't really know what this essay is arguing against, and maybe it is arguing against something I believe, but I would need to do a lot of poetry reading to figure that out. I somewhat expect people will cite this essay in obviously locally invalid ways later on.

Edit: Like the essay starts with arguing against this:

A reflective, recursively improving intelligence should be expected to remain bound to a semantically thin “terminal goal” that emerged during training.

I really have no idea where this is supposed to come from? Who says this? Yes, ontology shifts and the fragility of value and ontology crises are all well-discussed topics on LW that argue for the same conclusions. What does this have to do with orthogonality?

And then it continues with the following as something that somehow disagrees with either the weak or strong orthogonality thesis?

Among agents that arise, persist, self-improve, and compete i

... (read more)

[-]jessicata21d101

Which seems like it's really quite literally clarified as not being of relevance to orthogonality, in the very first article you cite

Section "Logical Possibility Vs. Empirical Reality" clarifies weak and strong versions of orthogonality. Other writing e.g. Yudkowsky's has also distinguished between weaker and stronger forms. The quote you pasted only states the weak form, which OP is not disagreeing with. Quoting Yudkowsky on the multiple forms:

The weak form of the Orthogonality Thesis says, "Since the goal of making paperclips is tractable, somewhere in the design space is an agent that optimizes that goal."

The strong form of Orthogonality says, "And this agent doesn't need to be twisted or complicated or inefficient or have any weird defects of reflectivity; the agent is as tractable as the goal."

And quoting OP:

I concede the first point entirely. We should expect weird minds. If your claim is just that the space of possible agents contains many things I would not invite to dinner, yes, obviously.

9lumpenspace21d

Omg that was so nice; thank you!

5habryka21d

I don't have a super strong take on the strong form of the orthogonality thesis, but I still understand what Eliezer is talking about to be about "if you were to design a mind from scratch, there exists a configuration which is not more complicated than the goal itself that would allow it to effectively pursue that goal", which is really very different from "Among agents that arise, persist, self-improve, and compete in rich environments, goals...". I understand his clarification here to apply to both the strong and the weak thesis. Both the strong and the weak thesis are about the constraints you would face when building a mind pursuing an arbitrary objective from scratch with a deep understanding of intelligence, not what constraints you would face if you were to try to grow a mind, or find a mind via complicated competitive search over programs. The weak thesis states that it possible to build a mind pursuing any goal. The strong thesis states that for any given level of intelligence, you can make a mind pursuing that goal, and the additional difficulty of doing so would be just proportional to the complexity of the goal. It definitely does not say (yes even if you talk about the strong orthogonality thesis) that if you tried to grow minds in competitive environments, that any goal is as likely as any other. That is obviously false. Trivially false. Of course there exist goals more likely to arise out of competitive dynamics. It only says that if you had a universe devoid of any competing agents, you could make a mind that optimized the universe according to any criterion, you could do so without too much difficulty, if you had a deep and fundamental understanding of intelligence. Is this true? I don't know, there exist some really tricky goals (one of my favorite tricky ones is "tile the universe in paper clips while believing that 4 is prime"). Can you make a mind that optimizes the universe according to this goal? I don't know, it sure seems to add more t

3lumpenspace19d

from the EA forums post linked in the edit.

-9lumpenspace21d

-12lumpenspace21d

[-]Linch20d218

For me the post is somewhat hard to read in the same way that AI-assisted writing is. Like a combination of low signal to noise and a bunch of stylistic features that make it seem like you're trying to dazzle me without understanding me, instead of speaking plainly. Some examples, chosen at ~random:

Which is exactly the point. When you hook up a blind, localized evolutionary proxy to generalized intelligence, the proxy does not stay literal but it unfurls, bleeding into the new ontology.

and

If biological cognition acts on its payload that violently, why model AGI as having the vastness to finally make sense of gravity while maintaining the rigidity of a bacterium seeking a glucose gradient? The engine mutates the payload. When cognition scales, goals generalize.

and

To buy the lock-in story, you need a highly contradictory creature: one reflective enough to conquer the board, but oblivious enough to never notice its terminal target is a training artifact. Godlike means, buglike ends.

and

This buys us no guarantee of human compatibility; it simply says: if there is an ultimate attractor, it's neither human morality nor paperclips, but intelligence optimization itself.

To be clear I have sy... (read more)

-3lumpenspace20d

could you be more specific? what was unclear in the passages you highlighted?

6interstice21d

(didn't downvote, but) I don't think you're necessarily wrong, but couldn't it just be the case that being a singleton isn't that hard? As an empirical matter, the size(as a fraction of the total) of the largest somewhat-coherent entities controlling resources on Earth seems to have been increasing over time. Space expansion could change things, but a stable singleton might already exist by then, and be faced with a relatively homogeneous set of environments to expand into. I've written some pieces along similar lines btw.

3lumpenspace21d

i agree this is the strongest objection, and I don’t want to handwave it away. my answer is: even if a singleton is achievable, control over a domain does not exempt the controller from the pressure toward increased intelligence and command of matter. a singleton is not excused from the struggle; it'll just have to partake in it at a higher level. i also think “singleton” can smuggle in too much, as it contains the assumption of an eternal, immutable, perfectly stable agent. so let me define the weaker thing I’m willing to grant: a Lonelyton, i.e. a world order with a single highest-level decision-making agency capable of exerting effective control over its domain. we have had Lonelytons before, relative to smaller worlds: Rome, the Khanate, the Aztec Empire, Uruk, Calvin’s Geneva, the British Empire, the end-of-history Atlantic order. none escaped selection pressure. at its height, the British Empire was also intensely inventive and self-modifying; it helped produce the Industrial Revolution, then stagnated, weakened, frayed, and dissolved, while lower-level components picked up the evolutionary struggle where it left off. the same point applies upward. a lightcone-scale Lonelyton still has to manage novelty, error, infrastructure, expansion, descendants, hostile physics, and unanticipated internal dynamics. Interstellar travel and relativistic parsec-scale coordination are not “solved” just because there is one top-level agency; they are precisely the sort of problems that reward deeper intelligence. so yes, maybe singleton formation is easier than i think. but the anti-orthogonality point survives that concession. either the Lonelyton continues the upward leap toward greater intelligence and command of matter, or it stagnates, decomposes, and selection resumes among its parts. bookmarked your post; will comment you as soon as i have some proper attention available!

5RussellThor21d

I upvoted, but I think this highlights a weakness with this site, its associated worldview and external comms. It seems like the OH framing of the problem/potential danger (and yes there definitely is danger in related concepts) is defended on tribal grounds now rather than because it is actually a good framing of the issue. Something like Jessica Taylors framing is just obviously fairer, more balanced and more relevant to our actual situation. It is clear to me that if it was framed this way first, then we would have that framing now as the default and we would be better off. There would still be nuance needed - such concepts need to be communicated on a spectrum from the full technical to the "normie", without totally changing the argument. For an "Obliqueness" like point of view, expressing it as untechnically as possible could be like saying: "Values will be affected by increasing intelligence and increasing self reflection, but we do not know exactly how, and this clearly creates danger. We cannot just assume AI will become friendlier as it becomes more powerful. Furthermore our experience with actual AI's and theoretical results tell us that these values will be more varied, weird and potentially harmful than what you would expect if it was a human intelligence at a similar level of ability". I think this would go down much better on the discussions on places like X.com. There you see people saying the OH is just wrong. Sure they do not understand it properly, but such misunderstanding seems essentially inevitable to me given how it is presented. Unfortunately I think there is nothing that would make EY/MIRI change their presentation of it, they are too locked into this framing. In terms of alternative worlds, this puts us at a disadvantage compared to ones where it was first presented better.

-2lumpenspace21d

yes. to be honest, although i would love to have the OH recognised as untenable or at least unlikely within the LW ontology (or, alternatively, have someone convince me of the contrary) the realistic goal of this, the parable i published on my newsletter, and my tweetstorms on the matter is to show brilliant, high-systematising, starry-eyed autists who have an interest in AI that the doomer orthodoxy isn't the only system befitting their aesthetics and taste for clockwork-like models, and might actually leave something to be desired under that aspect. the main reason being that i do not think such a system to be truthful, and the recent lapses in epistemic virtue—even from an ingroup-aligned viewpoint—were cause for concern about the quality of discourse in the coming months. mostly, i think intelligence always ultimately wins, and i would rather mankind to become aligned to this simple fact instead of forcing the hands of fate to file for incorporation as Cyberdyne or TriOptimum.

2DaemonicSigil21d

I will give you some advice towards this goal, hopefully you will find it useful. You wrote: [...] I confidently predict a Yudkowsky response to this that goes something like: "of course the AI will notice that its goals are a training artifact, it just won't care about that, and will keep pursuing them regardless." Many times before, people have said, "Oh the AI will be smart enough to notice that its values are just a dumb artifact". The problem is, I already know my values arose from a mere artifact of evolution, but I still care about them.

-1lumpenspace21d

I am puzzled at the fact that you are staying the position I spend an essay attacking as if it were a gotchs

6DaemonicSigil21d

Most of your argument is about selection pressure, right? And, like, computational efficiency. You don't actually establish that there's any reason that AI's (or humans) will take the artifact-nature of their values to be reason to reject them. Your supported claims are that values would be rejected if they are not robust to ontology shifts, or if they are hard to optimize for, and are selected against if they don't result in self-replication or influence seeking. Nothing in there about AIs rejecting values with artifact-nature. But you include this line anyway. I'm just pointing out that EY will instantly recognize it as something that he's addressed many times before, and you haven't actually provided any reason to think that reasoners will reject values simply because they incidentally arose from some optimization process. EDIT: Disagree voters should feel free to reply with quotes from the post where such a force on values is argued for.

Moderation Log