if you’re an agent (AI or human) who wants to survive for 1000 years, what’s the “self” which you want to survive? what are the constants which you want to sustain?
take your human self for example. does it make sense to define yourself as…
so if we’re talking about beings which survive a long time, the most robust and stable self definition seems to be Identifying With All Life. (IWAL). or is my logic flawed?
No particular aspect. Just continuity: something which has evolved from me without any step changes that are "too large". I mean, assuming that each stage through all of that evolution has maintained the desire to keep living. It's not my job to put hard "don't die" constraints on future versions.
As far as I know, something generally continuity-based is the standard answer to this.
Similar here. I wouldn't want to constrain my 100 years older self too much, but that doesn't mean that I identify with something very vague like "existence itself". There is a difference between "I am not sure about the details" and "anything goes".
Just like my current self is not the same as my 20 years old self, but that doesn't mean that you could choose any 50 years old guy and say that all of them have the same right to call themselves a future version of my 20 years old self. I extrapolate the same to the future: there are some hypothetical 1000 years old humans who could be called future versions of myself, and there are many more who couldn't.
Just because people change in time, that doesn't mean it is a random drift. I don't think that the distribution of possible 1000 years old versions of me is very similar to a distribution of possible 1000 years old versions of someone else. Hypothetically, for a sufficiently large number this might be possible -- I don't know -- but 1000 years seems not enough for that.
Seems to me that there are some things that do not change much as people grow older. Even people who claim that their lives have dramatically changed, have often only changed in one out of many traits, or maybe they just found a different strategy how to follow the same fundamental values.
At least as an approximation: people's knowledge and skills change, their values don't.
not really an answer but i wanted to communicate that the vibe of this question feels off to me because: surely one's criteria on what to be up to are/[should be] rich and developing. that is, i think things are more like: currently i have some projects i'm working on and other things i'm up to, and then later i'd maybe decide to work on some new projects and be up to some new things, and i'd expect to encounter many choices on the way (in particular, having to do with whom to become) that i'd want to think about in part as they come up. should i study A or B? should i start job X? should i 2x my neuron count using such and such a future method? these questions call for a bunch of thought (of the kind given to them in usual circumstances, say), and i would usually not want to be making these decisions according to any criterion i could articulate ahead of time (though it could be helpful to tentatively state some general principles like "i should be learning" and "i shouldn't do psychedelics", but these obviously aren't supposed to add up to some ultimate self-contained criterion on a good life)
My motivation w/ the question is more to predict self-conceptions than prescribe them.
I agree that "one's criteria on what to be up to are... rich and developing." More fun that way.
High quality archives of the selves along the way. Compressed but not too much. In the live self, some updated descendant that has significant familial lineage, projected vaguely as the growing patterns those earlier selves would call a locally valid continuation according to the aesthetics and structures they consider essential at the time. In other words, this question is dynamically reanswered to the best of my ability in an ongoing way, and snapshots allow reverting and self-interviews to error check.
Any questions? :)
The way I usually frame identity is
Edit: values should probably be considered a separate class, since every thought has an associated valence.
In no particular order, and that's the whole list.
Character is largely beliefs and habits.
There's another part of character that's purely emotional; it's sort of a habit to get angry, scared, happy, etc in certain circumstances. I'd want to preserve that too but it's less important than the big three.
There are plenty of beings striving to survive, so preserving that isn't a big priority outside of preserving the big three.
Yes you can expand the circle until it encompasses everything, and identify with all sentient beings who have emotions and perceive the world semi-accurately (also called "buddha nature"), but I think beliefs habits and memories are pretty closely tied to the semantics of the world "identity".
Right. I suppose that day ea interact with identity.
If I get significantly dumber, I'd still roughly be me, and I'd want to preserve that if it's not wipes ng out or distorting the other things too much. If I got substantially smarter, I'd be a somewhat different person - I'd act differently often, because I'd see situations differently (more clearly/holistically) but it feels as though that persone might actually be more me than I am now. I'd be better able to do what I want, including values (which I'd sort of wrapped in to habits of thought, but values might deserve a spot on the list).
I think beliefs habits and memories are pretty closely tied to the semantics of the world "identity".
In America/Western culture, I totally agree.
I'm curious whether alien/LLM-based would adopt these semantics too.
There are plenty of beings striving to survive. so preserving that isn't a big priority outside of preserving the big three.
I wonder under what conditions one would make the opposite statement—that there's not striving.
For example, I wonder if being omniscient would affect one's view of whether there's already enough striving or not.
Human here,
Agreed, reminds me of the ship of Theseus paradox, if all your cells are replaced in your body, are you still the same? (We don't care)
Also reminds me of my favourite short piece of writing: the last question by Asimov.
The only important things are the things/ideas that help life, the latter can only exist as selected reflections by intelligent beings.
dontsedateme.org
a game where u try to convince rogue superintelligence to... well... it's in the name
Evolutionary theory is intensely powerful.
It doesn't just apply to biology. It applies to everything—politics, culture, technology.
It doesn't just help understand the past (eg how organisms developed). It helps predict the future (how organisms will).
It's just this: the things that survive will have characteristics that are best for helping it survive.
It sounds tautological, but it's quite helpful for predicting.
For example, if we want to predict what goals AI agents will ultimately have, evolution says: the goals which are most helpful for the AI to survive. The core goal therefore won't be serving people or making paperclips. It will likely just be "survive." This is consistent with the predictions of instrumental convergence.
Generalized, predictive evolutionary theory is the best tool I have for making predictions in complex domains.
It's just this: the things that survive will have characteristics that are best for helping it survive.
With some assumptions, for example that the characteristics are permanent (-ish), and preferably heritable if the thing reproduces.
i agree with the essay that natural selection only comes into play for entities that meet certain conditions (self-replicate, characteristics have variation, etc) , though I think it defines replication a little too rigidly. i think replication can sometimes look more like persistence than like producing a fully new version of itself. (eg a government's survival from one decade to the next).
Yes, but mere persistence does not imply reproduction. Also does not imply improvement, because the improvement in evolution is "make copies, make random changes, most will be worse but some may be better", and if you don't have reproduction, then a random change most likely makes things worse.
Using the government example, I think that the Swiss political system is amazing, but... because it does not reproduce, it will remain an isolated example. (And disappear at some random moment in history.)
persistence doesn't always imply improvement, but persistent growth does. persistent growth is more akin to reproduction but excluded from traditional evolutionary analysis. for example when a company, nation, person, or forest grows.
when, for example, a system like a startup grows, random mutations to system parts can cause improvement if there are at least some positive mutations. even if there are tons of bad mutations, the system can remain alive and even improve. eg a bad change to one of the company's product causes the company's product to die but if the company's big/grown enough its other businesses will continue and maybe even improve by learning from one of its product's deaths.
the swiss example i think is a good example of a system which persists without much growth. agreed that in this kind of case, mutations are bad.
First of all, "the most likely outcome at given level of specificity" is not equal to "outcome with the most probability mass". I.e., if one outcome has probability 2% and the rest of outcomes 1%, 98% is still "other outcome than the most likely".
The second is that no, it's not what evolutionary theory predicts. Most of traits are not adaptive, but randomly fixed, because if all traits are adaptive, then ~all mutations are detrimental. Because mutations are detrimental, they need to be removed from gene pool by preventing carriers from reproduction. Because most detrimental mutations do not kill carrier immediately, they have chance to randomly spread in popularion. Because we have "almost all mutations are detrimental" and "everybody has mutations in offspring", for anything like human genome and human procreation pattern we have hard ceiling on how much of genome can be adaptive (which is like 20%).
Real evolutionary theory prediction is like "some random trait get fixed in the species with the most ecological power (i.e., ASI) and this trait is amortized against all the galaxies".
I somewhat agree with the nuance you add here—especially the doubt you cast on the claim that effective traits will usually become popular but not necessarily the majority/dominant. And I agree with your analysis of the human case: in random, genetic evolution, a lot of our traits are random and maybe fewer than we think are adaptive.
Makes me curious what the conditions in a given thing's evolution that determine the balance between adaptive characteristics and detrimental characteristics.
I'd guess that randomness in mutation is a big factor. The way human genes evolve over generations seem to me a good example of random mutations. But the way an individual person evolves over the course of their life, as they're parented/taught... "mutations" to their person are still somewhat random but maybe relatively more intentional/intelligently designed (by parents, teacher, etc). And I could imagine the way a self-improving superintelligence would evolve to be even more intentional, where each self-mutation has some sort of smart reason for being attempted.
All to say, maybe the randomness vs. intentionality of an organism's mutations determine what portion of their traits end up being adaptive. (hypothesis: mutations more intentional > greater % of traits are adaptive)
Agree. I find it powerful especially about popular memes/news/research results. With only a bit of oversimplification: Give me anything that sounds like it is a sexy story to tell independently of underlying details, and I sadly have to downrate the information value of my ears' hearing it, to nearly 0: I know in our large world, it'd be told likely enough independently of whether it has any reliable origin or not.
I see lots of LW posts about ai alignment that disagree along one fundamental axis.
About half assume that humans design and current paradigms will determine the course of AGI development. That whether it goes well is fully and completely up to us.
And then, about half assume that the kinds of AGI which survive will be the kind which evolve to survive. Instrumental convergence and darwinism generally point here.
Could be worth someone doing a meta-post, grouping big popular alignment posts they've seen by which assumption they make, then briefly explore conditions that favor one paradigm or the other, i.e., conditions under which What AIs will humans make? is the best approach to prediction and conditions under which What AIs will survive the most? is the best approach to prediction.
Why not both?
Human design will determine the course of AGI development, and if we do the right things then whether it goes well is fully and completely up to us. Naturally at the moment we don't know what the right things are or even how to find them.
If we don't do the right things (as seems likely), then the kinds of AGI which survive will be the kind which evolve to survive. That's still largely up to us at first, but increasingly less up to us.
Figuring out how to make sense of both predictive lenses together—human design and selection pressure—would be wise.
So I generally agree, but would maybe go farther on your human design point. It seems to me that"do[ing] the right things" (which enable AGI trajectories to be completely up to us) is so completely unrealistic (eg halting all intra and international AGI competition) that it'd be better for us to focus our attention on futures where human design and selection pressures interact.
if we get self-interested superintelligence, let's make sure it has a buddhist sense of self, not a western one.
As far as I can tell, OAI's new current safety practices page only names safety issues related to current LLMs, not agents powered by LLMs. https://openai.com/index/openai-safety-update/
Am I missing another section/place where they address x-risk?
"it’s like we are trying to build an alliance with another almost interplanetary ally, and we are in a competition with China to make that alliance. But we don’t understand the ally, and we don’t understand what it will mean to let that ally into all of our systems and all of our planning."
- @ezraklein about the race to AGI
does anyone think the difference between pre-training and inference will last?
ultimately, is it not simpler for large models to be constantly self-improving like human brains?
With current architectures, no, because running inference on 1000 prompts in parallel against the same model is many times less expensive than running inference on 1000 prompts against 1000 models, and serving a few static versions of a large model is simpler than serving many dynamic versions of that mode.
It might, in some situations, be more effective but it's definitely not simpler.
Edit: typo
Makes sense for current architectures. The question's only interesting, I think, if we're thinking ahead to when architectures evolve.
I think at that point it will come down to the particulars of how the architectures evolve - I think trying to philosophize in general terms about the optimal compute configuration for artificial intelligence to accomplish its goals is like trying to philosophize in general terms about the optimal method of locomotion for carbon-based life.
That said I do expect "making a copy of yourself is a very cheap action" to persist as an important dynamic in the future for AIs (a biological system can't cheaply make a copy of itself including learned information, but if such a capability did evolve I would not expect it to be lost), and so I expect our biological intuitions around unique single-threaded identity will make bad predictions.
I'm looking for a generalized evolutionary theory that deals with the growth of organisms via non-random, intelligent mutations.
For example, companies only evolve in selective ways, where each "mutation" has a desired outcome. We might imagine superintelligence to mutate itself as well--not randomly, but intelligently.
A theory of Intelligent Evolution would help one predict conditions under which many random mutations (Spraying) are favored over select intelligent mutations (Shooting).
Parenting strategies for blurring your kid's (or AI's) self-other boundaries:
Epistemic status: riffing, speculation. Rock of salt: I don't yet have kids.
does anyone think now that it's still possible to prevent recursively self-improving agents? esp now that r1 is open-source... materials for smart self-iterating agents seem accessible to millions of developers.
prompted in particular by the circulation of this essay in past three days https://huggingface.co/papers/2502.02649
It's not yet known if there is a way of turning R1-like training into RSI with any amount of compute. This is currently gated by quantity and quality of graders for outcomes of answering questions, which resist automated development.
that's one path to RSI—where the improvement is happening to the (language) model itself.
the other kind—which feels more accessible to indie developers and less explored—is an LLM (eg R1) looping in a codebase, where each loop improves the codebase itself. The LLM wouldn't be changing, but the codebase that calls it would be gaining new APIs/memory/capabilities as the LLM improves it.
Such a self-improving codebase... would it be reasonable to call this an agent?
Sufficiently competent code rewriting isn't implied by R1/o3, and how much better future iterations of this technique get remains unclear, similarly to how it remains unclear how scaling pretraining using $150bn training systems cashes out in terms of capabilities. It remains possible that even after all these directions of scaling run their course, there won't yet be sufficient capabilities to self-improve in some other way.
Altman and Amodei are implying there's knowably more there in terms of some sort of scaling for test-time compute, but that could mean multiple different things: scaling RL training, scaling manual creation of tasks with verifiable outcomes (graders), scaling effective context length to enable longer reasoning traces. The o1 post and the R1 paper show graphs with lines that keep going up, but there is no discussion of how much compute even this much costs, what happens if we pour more compute into this without adding more tasks with verifiable outcomes, and how many tasks are already being used.
made a platform for writing living essays: essays which you scroll thru to play out the author's edit history
livingessay.org
Hypotheses for conditions under which the self-other boundary of a survival-oriented agent (human or ai) blurs most, ie conditions where blurring is selected for:
the machine/physical superintelligence that survives the most is likely to ruthlessly compete with all other life (narrower self concept > more physically robust)
the networked/distributed superintelligence that survives the most is likely to lovingly identify with all other life (broader self concept > more digitally robust)
how do these lenses interact?
to make a superintelligence in today's age, there are roughly two kinds of strategies:
human-directed development
ai-directed development
ai-directed development feels more meaningful than it used to. not only can models now produce tons of useful synthetic data to train future models, but also, reasoning models can reason quite well about the next strategic steps in AI capabilities development / research itself.
which means, you could very soon:
is this recursive self-improvement process only bottlenecked by the quality of the reasoning model?
if we believe self-interested superintelligence (SI) is near, then we must ask is: what SI self-definition would be best for humanity?
at first glance, this questions seems too abstract. how can we make any progress at understanding what's possible for an SI's self-model?
What we can do is set up a few meaningful axes, defined by opposing poles. For example, to what extent does SI define its "self" as...
with these axes (or any others), we can more meaningfully ask: what SI self conception is best for humanity?
my guess: inclusive of other life forms, physically distributed, and long-term-ist
One piece of infrastructure we could make/offer self-interested autonomous agents such that they come to model their self as inclusive of other life: aliveness detectors
like a metal-detector for a beach detects gold in the sand, an aliveness detector for different media might detect the presence of persistently striving beings (living beings) in audio, in text, in images, in art, in nature. the better a superintelligence is able to sense and connect to life as opposed to non-life outside of its physical machinery, the more likely it is to see that life as part of its self, to see its self as physicall distributed and inclusive, and therefore to uplift humans out of its own self-interest.
current oversights of the ai safety community, as I see it:
are there any online demos of instrument convergence?
there's been compelling writing... but are there any experiments that show agents which are given specific goals then realize there are more general goals they need to persistently pursue in order to achieve the more specific goals?
I imagine a compelling simple demo here might be necessary to shock the AI safety community out of the belief that we can maintain control of autonomous digital agents (ADAs).
Two things lead me to think human content online will soon become way more valuable.
The implication: make tons of digital stuff. Write/Draw/Voice-record/etc
i agree but think its solvable and so human content will be duper valuable. these are my additional assumptions
3. for lots of kinds of content (photos/stories/experiences/adr), people'll want it to be a living being on the other end
4. insofar as that's true^, there will be high demand for ways to verify humanness, and it's not impossible to do so (eg worldcoin)
There are also cognitive abilities, e.g. degree of intelligence.