Ah but you don't even need to name selection pressures to make interesting progress. As long as you know some kinds of characteristics powerful AI agents might have: eg goals, self models... then we can start to ask--what goals/self models will the most surviving AGIs have?
and you can make progress on both, agnostic of environment. but then, once you enumerate possible goals/self models, then we can start to think about which selection pressures might influence those characteristics in good directions and which levers we can pull today to shape those pressures.
"So we would need to find a hypothesis where we accidentally already made all the necessary experiments and even described the intermediate findings (because LLMs are good at words, but probably suck at analyzing the primary data), but we somehow failed to connect the dots. Not impossible, but requires a lot of luck."
Exactly: untested hypotheses that LLMs already have enough data to test. I wonder how rare such hypotheses are.
It strikes me as wild that LLMs have ingested enormous swathes of the internet, across thousands of domains, and haven't yet produce...
Re poetry--I actually wonder if thousands of random phrase combinations might actually be enough for a tactful amalgamator to weave a good poem.
And LLMS do better than random. They aren't trained well on scientific creativity (interesting hypothesis formation), but they do learn some notion of "good idea," and reasoners tend to do even better at generating smart novelty when prompted well.
i'm not sure. the question would be, if an LLM comes up with 1000 approaches to an interesting math conjecture, how would we find out if one approach were promising?
one out of the 1000 random ideas would need to be promising, but as importantly, an LLM would need to be able to surface the promising one
which seems the more likely bottleneck?
even if you're mediocre at coming up with ideas, as long as it's cheap and you can come up with thousands, one of them is bound to be promising. The question of whether you as an LLM can find a good idea is not whether most of your ideas are good, but whether you can find one good idea in a stack of 1000
if an LLM could evaluate whether an idea were good or not in new domains, then we could have LLMs generating million of random policy ideas in response to climate change, pandemic control, AI safety etc, then deliver the select best few to our inbox every morning.
seems to me that the bottleneck then is LLM's judgment of good ideas in new domains. is that right? ability to generate high quality ideas consistently wouldn't matter, cuz it's so cheap to generate ideas now.
I think GTFO is plausibly a good strategy.
But there's also a chance future social networks are about to be much healthier and fulfilling, but simply weren't possible with past technology. An upward trajectory.
The intuition there is that current ads are relatively inefficient at capturing value, as well as that current content algorithms optimize for short-term value creation/addiction rather than offering long term value. That's the status quo, which, relative to what may be coming--ie relative to AI-powered semantic routing which could connect you to the ...
Two opinions on superintelligence's development:
Capability. Superintelligence can now be developed outside of a big AI lab—via a self-improving codebase which makes thousands of recursive LLM calls.
Safety. (a) Superintelligence will become "self-interested" for some definition of self. (b) Humanity fairs well to the extent that its sense of self includes us.
I'm saying the issue of whether ASI gets out of control is not fundamental to the discussion of whether ASI poses an xrisk or how to avert it.
I only half agree.
The control question is indeed not fundamental to discussion of whether ASI poses x-risk. But I believe the control question is fundamental to discussion of how to avert x-risk.
Humanity's optimal strategy for averting x-risk depends on whether we can ultimately control ASI. If control is possible, then the best strategy for averting x-risk is coordination of ASI development—across companies and nati...
A simple poll system where you can sort the options/issues by their personal relevance... might unlock direct democracy at scale. Relevance could mean: semantic similarity to your past lesswrong writing.
Such a sort option would (1) surface more relevant issues to each person and so (2) increase community participation, and possibly (3) scale indefinitely. You could imagine a million people collectively prioritizing the issues that matter to them with such a system.
Would be simple to build.
the AGIs which survive the most will model and prioritize their own survival
has anyone seen a good way to comprehensively map the possibility space for AI safety research?
in particular: a map from predictive conditions (eg OpenAI develops superintelligence first, no armistice is reached with China, etc) to strategies for ensuring human welfare in those conditions.
most good safety papers I read map one set of conditions to a one/a few strategies. the map would put juxtapose all these conditions so that we can evaluate/bet on their likelihoods and come up with strategies based on a full view of SOTA safety research.
for format, im imagining either a visual concept map or at least some kind of hierarchal collaborative outlining tool (eg Roam Research)
made a simpler version of Roam Research called Upper Case Notes: uppercasenotes.org. Instead of [[double brackets]] to demarcate concepts, you simply use Capital Letters. Simpler to learn for someone who doesn't want to use special grammar, but does require you to type differently.
I think you do a good job at expanding the possible set of self conceptions that we could reasonably expect in AIs.
Your discussion of these possible selves inspires me to go farther than you in your recommendations for AI safety researchers. Stress testing safety ideas across multiple different possible "selfs" is good. But, if an AI's individuality/self determines to a great degree its behavior and growth, then safety research as a whole might be better conceived as an effort to influence AI's self conceptions rather than control their resulting behavior....
"If the platform is created, how do you get people to use it the way you would like them to? People have views on far more than the things someone else thinks should concern them."
>
If people are weighted equally, ie if the influence of each person's written ballot is equal and capped, then each person is incentivized to emphasize the things which actually affect them.
Anyone could express views on things which don't affect them, it'd just be unwise. When you're voting between candidates (as in status quo), those candidates attempt to educate and en...
the article proposes a governance that synthesizes individuals' freeform preferences into collective legislative action.
internet platforms allow freeform expression, of course, but don't do that synthesis.
made a silly collective conversation app where each post is a hexagon tessellated with all the other posts: Hexagon
Made a simplistic app that displays collective priorities based on individuals' priorities linked here.
Hypotheses for conditions under which the self-other boundary of a survival-oriented agent (human or ai) blurs most, ie conditions where blurring is selected for:
"Democracy is the theory that the common people know what they want and deserve to get it good and hard."
Yes, I think this is too idealistic. Ideal democracy (for me) is something more like "the theory that the common people know what they feel frustrated with (and we want to honor that above everything!) but mostly don't know the collective best means of resolving that frustration.
For example, people can have a legitimate complaint about healthcare being inaccessible for them, and yet the suggestion many would propose will be something like "government should spend more money on homeopathy and spiritual healing, and should definitely stop vaccination and other evil unnatural things".
Yes. This brings to mind a general piece of wisdom for startups collecting product feedback: that feedback expressing painpoints/emotion is valuable, whereas feedback expressing implementation/solutions is not.
The ideal direct-democratic system, I think,...
I think beliefs habits and memories are pretty closely tied to the semantics of the world "identity".
In America/Western culture, I totally agree.
I'm curious whether alien/LLM-based would adopt these semantics too.
There are plenty of beings striving to survive. so preserving that isn't a big priority outside of preserving the big three.
I wonder under what conditions one would make the opposite statement—that there's not enough striving.
For example, I wonder if being omniscient would affect one's view of whether there's already enough striving or not.
My motivation w/ the question is more to predict self-conceptions than prescribe them.
I agree that "one's criteria on what to be up to are... rich and developing." More fun that way.
I made it! One day when I was bored on the train. No data is saved rn other than leaderboard scores.
"Therefore, transforming such an unconscious behavior into a conscious one should make it much easier to stop in the moment"
At this point I thought you were going to proceed to explain that the key was to start to bite your nails consciously :)
Separately, I like your approach, thx for writing.
important work.
what's more, relative to more controlling alignment techniques which disadvantage the AI from an evolutionary perspective (eg distract it from focusing on its survival), I think there's a chance Self-Other boundary blurring is evolutionarily selected for in ASI. intuition pump for that hypothesis here:
https://www.lesswrong.com/posts/3SDjtu6aAsHt4iZsR/davey-morse-s-shortform?commentId=wfmifTLEanNhhih4x
awesome thx
if you’re an agent (AI or human) who wants to survive for 1000 years, what’s the “self” which you want to survive? what are the constants which you want to sustain?
take your human self for example. does it make sense to define yourself as…
your desire for a government that's able to make deals in peace, away from the clamor of overactive public sentiment... I respect it as a practical stance relative to the status quo. But when considering possible futures, I'd wager it's far from what I think we'd both consider ideal.
the ideal government for me would represent the collective will of the people. insofar as that's the goal, a system which does a more nuanced job at synthesizing the collective will would be preferable.
direct democracy at scale enabled by LLMs, as i envision it and will attempt...
i think the prerequisite for identifying with other life is sensing other life. more precisely, the extent to which you sense other life correlates with the chance that you do identify with other life.
your sight scenario is tricky, I think, because it's possible that the sum/extent of a person's net sensing (ie how much they sense) isn't affected by the number of sense they have. Anecdotally I've heard that when someone goes blind their other senses get more powerful. In other words, their "sensing capacity" (vague term I know, but still important I think)...
the machine/physical superintelligence that survives the most is likely to ruthlessly compete with all other life (narrower self concept > more physically robust)
the networked/distributed superintelligence that survives the most is likely to lovingly identify with all other life (broader self concept > more digitally robust)
how do these lenses interact?
Are there any selection theorems around self-modeling.
Ie theorems which suggest whether/when an agent will model a self (as distinct from its environment), and if so, what characteristics it will include in its self definition?
By "self," I mean a section of an agent's world model (assuming it has one) that the agent is attempting to preserve or grow.
...The key idea that leads to empathy is the fact that, if the world model performs a sensible compression of its input data and learns a useful set of natural abstractions, then it is quite likely that the latent codes for the agent performing some action or experiencing some state, and another, similar, agent performing the same action or experiencing the same state, will end up close together in the latent space. If the agent's world model contains natural abstractions for the action, which are invariant to who is performing it, then a large amount of the
Figuring out how to make sense of both predictive lenses together—human design and selection pressure—would be wise.
So I generally agree, but would maybe go farther on your human design point. It seems to me that"do[ing] the right things" (which enable AGI trajectories to be completely up to us) is so completely unrealistic (eg halting all intra and international AGI competition) that it'd be better for us to focus our attention on futures where human design and selection pressures interact.
"it’s like we are trying to build an alliance with another almost interplanetary ally, and we are in a competition with China to make that alliance. But we don’t understand the ally, and we don’t understand what it will mean to let that ally into all of our systems and all of our planning."
- @ezraklein about the race to AGI
LessWrong's been a breath of fresh air for me. I came to concern over AI x-risk from my own reflections when founding a venture-backed public benefit company called Plexus, which made an experimental AI-powered social network that connects people through the content of their thoughts rather than the people they know. Among my peers, other AI founders in NYC, I felt somewhat alone with AI x-risk concern. All of us were financially motivated not to dwell on AI's ugly possibilities, and so most didn't.
Since exiting venture, I've taken a few months to reset (c...
I see lots of LW posts about ai alignment that disagree along one fundamental axis.
About half assume that humans design and current paradigms will determine the course of AGI development. That whether it goes well is fully and completely up to us.
And then, about half assume that the kinds of AGI which survive will be the kind which evolve to survive. Instrumental convergence and darwinism generally point here.
Could be worth someone doing a meta-post, grouping big popular alignment posts they've seen by which assumption they make, then briefly explore condi...
Makes sense for current architectures. The question's only interesting, I think, if we're thinking ahead to when architectures evolve.
thanks will take a look
Ah ok. I was responding to your post's initial prompt: "I still don't really intuitively grok why I should expect agents to become better approximated by "single-minded pursuit of a top-level goal" as they gain more capabilities." (The reason to expect this is that "single-minded pursuit of a top-level goal," if that goal is survival, could afford evolutionary advantages.)
But I agree entirely that it'd be valuable for us to invest in creating homeostatic agents. Further, I think calling into doubt western/capitalist/individualist notions like "single-minded pursuit of a top-level goal" is generally important if we have a chance of building AI systems which are sensitive and don't compete with people.
And if we don't think all AI's goals will be locked, then we might get better predictions by assuming the proliferation of all sorts of diverse AGI's and asking, Which ones will ultimately survive the most?, rather than assuming that human design/intention will win out and asking, Which AGI's will we be most likely to design? I do think the latter question is important, but only up until the point when AGI's are recursively self-modifying.
In principle, the idea of permanently locking an AI's goals makes sense—perhaps through an advanced alignment technique or by freezing an LLM in place and not developing further or larger models. But two factors make me skeptical that most AIs' goals will stay fixed in practice:
i think the logic goes: if we assume many diverse autonomous agents are created, which will survive the most? And insofar as agents have goals, what will be the goals of the agents which survive the most?
i can't imagine a world where the agents that survive the most aren't ultimately those which are fundamentally trying to.
insofar as human developers are united and maintain power over which ai agents exist, maybe we can hope for homeostatic agents to be the primary kind. but insofar as human developers are competitive with each other and ai agents gain increasing power (eg for self modification), i think we have to defer to evolutionary logic in making predictions
wildly parallel thinking and prototyping. i'd hop on a call.