All of Davey Morse's Comments + Replies

wildly parallel thinking and prototyping. i'd hop on a call.

1JWJohnston
Yup!

Ah but you don't even need to name selection pressures to make interesting progress. As long as you know some kinds of characteristics powerful AI agents might have: eg goals, self models... then we can start to ask--what goals/self models will the most surviving AGIs have?

and you can make progress on both, agnostic of environment. but then, once you enumerate possible goals/self models, then we can start to think about which selection pressures might influence those characteristics in good directions and which levers we can pull today to shape those pressures.

has anyone seen experiments with self-improving agents powered by lots of LLM calls?

"So we would need to find a hypothesis where we accidentally already made all the necessary experiments and even described the intermediate findings (because LLMs are good at words, but probably suck at analyzing the primary data), but we somehow failed to connect the dots. Not impossible, but requires a lot of luck."

Exactly: untested hypotheses that LLMs already have enough data to test. I wonder how rare such hypotheses are.

It strikes me as wild that LLMs have ingested enormous swathes of the internet, across thousands of domains, and haven't yet produce... (read more)

Re poetry--I actually wonder if thousands of random phrase combinations might actually be enough for a tactful amalgamator to weave a good poem.

And LLMS do better than random. They aren't trained well on scientific creativity (interesting hypothesis formation), but they do learn some notion of "good idea," and reasoners tend to do even better at generating smart novelty when prompted well.

i'm not sure. the question would be, if an LLM comes up with 1000 approaches to an interesting math conjecture, how would we find out if one approach were promising? 

one out of the 1000 random ideas would need to be promising, but as importantly, an LLM would need to be able to surface the promising one

which seems the more likely bottleneck?

even if you're mediocre at coming up with ideas, as long as it's cheap and you can come up with thousands, one of them is bound to be promising. The question of whether you as an LLM can find a good idea is not whether most of your ideas are good, but whether you can find one good idea in a stack of 1000

3Viliam
"Thousands" is probably not enough. Imagine trying to generate a poem by one algorithm creating thousands of random combinations of words, and another algorithm choosing the most poetic among the generated combinations. No matter how good the second algorithm is, it seems quite likely that the first one simply didn't generate anything valuable. As the hypothesis gets more complex, the number of options grows exponentially. Imagine a pattern such as "what if X increases/decreases Y by mechanism Z". If you propose 10 different values for each of X, Y, Z, you already have 1000 hypotheses. I can imagine finding some low-hanging fruit if we increase the number of hypotheses to millions. But even there, we will probably be limited by lack of experimental data. (Could a diet consisting only of broccoli and peanut butter cure cancer? Maybe, but how is the LLM supposed to find out?) So we would need to find a hypothesis where we accidentally already made all the necessary experiments and even described the intermediate findings (because LLMs are good at words, but probably suck at analyzing the primary data), but we somehow failed to connect the dots. Not impossible, but requires a lot of luck. To get further, we need some new insight. Maybe collecting tons of data in a relatively uniform format, and teaching the LLM to translate its hypotheses into SQL queries it could then verify automatically. (Even with hypothetical ubiquitous surveillance, you would probably need an extra step where the raw video records are transcribed to textual/numeric data, so that you could run queries on them later.)
3Kaarel
for ideas which are "big enough", this is just false, right? for example, so far, no LLM has generated a proof of an interesting conjecture in math

if an LLM could evaluate whether an idea were good or not in new domains, then we could have LLMs generating million of random policy ideas in response to climate change, pandemic control, AI safety etc, then deliver the select best few to our inbox every morning.

seems to me that the bottleneck then is LLM's judgment of good ideas in new domains. is that right? ability to generate high quality ideas consistently wouldn't matter, cuz it's so cheap to generate ideas now.

1Kaarel
coming up with good ideas is very difficult as well (and it requires good judgment, also)

I think GTFO is plausibly a good strategy.

But there's also a chance future social networks are about to be much healthier and fulfilling, but simply weren't possible with past technology. An upward trajectory.

The intuition there is that current ads are relatively inefficient at capturing value, as well as that current content algorithms optimize for short-term value creation/addiction rather than offering long term value. That's the status quo, which, relative to what may be coming--ie relative to AI-powered semantic routing which could connect you to the ... (read more)

increasingly viewing fiberoptic cables as replacements for trains/roads--a new, faster channel of transporation

Two opinions on superintelligence's development:

Capability. Superintelligence can now be developed outside of a big AI lab—via a self-improving codebase which makes thousands of recursive LLM calls.

Safety. (a) Superintelligence will become "self-interested" for some definition of self. (b) Humanity fairs well to the extent that its sense of self includes us.

I'm saying the issue of whether ASI gets out of control is not fundamental to the discussion of whether ASI poses an xrisk or how to avert it.

I only half agree.

The control question is indeed not fundamental to discussion of whether ASI poses x-risk. But I believe the control question is fundamental to discussion of how to avert x-risk.

Humanity's optimal strategy for averting x-risk depends on whether we can ultimately control ASI. If control is possible, then the best strategy for averting x-risk is coordination of ASI developmentacross companies and nati... (read more)

A simple poll system where you can sort the options/issues by their personal relevance... might unlock direct democracy at scale. Relevance could mean: semantic similarity to your past lesswrong writing.

Such a sort option would (1) surface more relevant issues to each person and so (2) increase community participation, and possibly (3) scale indefinitely. You could imagine a million people collectively prioritizing the issues that matter to them with such a system.

Would be simple to build.

the AGIs which survive the most will model and prioritize their own survival

have any countries ever tried to do inflation instead of income taxes? seems like it'd be simpler than all the bureaucracy required for individuals to file tax returns every year

9gwern
Yes, in dire straits. But it's usually called 'hyperinflation' when you try to make seignorage equivalent to >10% of GDP and fund the government through deliberately creating high inflation (which is on top of any regular inflation, of course). And because inflation is about expectations in considerable part, you can't stop it either. Not to mention what happens when you start hyperinflation. (FWIW, this is a perfectly reasonable question to ask a LLM first. eg Gemini-2.5-pro will give you a thorough and sensible answer as to why this would be extraordinarily destructive and distortionary, and far worse than the estimated burden of tax return filing, and it would likely satisfy your curiosity on this thought-experiment with a much higher quality answer than anyone on LW2, including me, is ever likely to provide.)

has anyone seen a good way to comprehensively map the possibility space for AI safety research?

in particular: a map from predictive conditions (eg OpenAI develops superintelligence first, no armistice is reached with China, etc) to strategies for ensuring human welfare in those conditions.

most good safety papers I read map one set of conditions to a one/a few strategies. the map would put juxtapose all these conditions so that we can evaluate/bet on their likelihoods and come up with strategies based on a full view of SOTA safety research.

for format, im imagining either a visual concept map or at least some kind of hierarchal collaborative outlining tool (eg Roam Research)

made a simpler version of Roam Research called Upper Case Notes: uppercasenotes.org. Instead of [[double brackets]] to demarcate concepts, you simply use Capital Letters. Simpler to learn for someone who doesn't want to use special grammar, but does require you to type differently.

I think you do a good job at expanding the possible set of self conceptions that we could reasonably expect in AIs.

Your discussion of these possible selves inspires me to go farther than you in your recommendations for AI safety researchers. Stress testing safety ideas across multiple different possible "selfs" is good. But, if an AI's individuality/self determines to a great degree its behavior and growth, then safety research as a whole might be better conceived as an effort to influence AI's self conceptions rather than control their resulting behavior.... (read more)

"If the platform is created, how do you get people to use it the way you would like them to? People have views on far more than the things someone else thinks should concern them."

>

If people are weighted equally, ie if the influence of each person's written ballot is equal and capped, then each person is incentivized to emphasize the things which actually affect them. 

Anyone could express views on things which don't affect them, it'd just be unwise. When you're voting between candidates (as in status quo), those candidates attempt to educate and en... (read more)

the article proposes a governance that synthesizes individuals' freeform preferences into collective legislative action.

internet platforms allow freeform expression, of course, but don't do that synthesis.

made a platform for writing living essays: essays which you scroll thru to play out the author's edit history

livingessay.org

made a silly collective conversation app where each post is a hexagon tessellated with all the other posts: Hexagon

3Nathan Helm-Burger
Nifty

Made a simplistic app that displays collective priorities based on individuals' priorities linked here.

Hypotheses for conditions under which the self-other boundary of a survival-oriented agent (human or ai) blurs most, ie conditions where blurring is selected for:

  1. Agent thinks very long term about survival.
  2. Agent's hardware is physically distributed.
  3. Agent is very intelligent.
  4. Agent advantages from symbiotic relationships with other agents.

"Democracy is the theory that the common people know what they want and deserve to get it good and hard."

Yes, I think this is too idealistic. Ideal democracy (for me) is something more like "the theory that the common people know what they feel frustrated with (and we want to honor that above everything!) but mostly don't know the collective best means of resolving that frustration.

For example, people can have a legitimate complaint about healthcare being inaccessible for them, and yet the suggestion many would propose will be something like "government should spend more money on homeopathy and spiritual healing, and should definitely stop vaccination and other evil unnatural things".

Yes. This brings to mind a general piece of wisdom for startups collecting product feedback: that feedback expressing painpoints/emotion is valuable, whereas feedback expressing implementation/solutions is not.

The ideal direct-democratic system, I think,... (read more)

3Viliam
Yep. Or, let's say that the kind of feedback that provides solutions is worthless 99% of time. Because it is possible in principle to provide a good advice, it's just that most people do not have the necessary qualification and experience but may be overconfident about their qualification. I find it ironical that popular wisdom seems to go the other way round, and "constructive criticism" is praised as the right thing to do. Which just doesn't make sense; for example I can say that a meal tastes bad, even if I don't know how to cook; or I can complain about pain without being able to cure it.

I think beliefs habits and memories are pretty closely tied to the semantics of the world "identity".

In America/Western culture, I totally agree. 

I'm curious whether alien/LLM-based would adopt these semantics too.

There are plenty of beings striving to survive. so preserving that isn't a big priority outside of preserving the big three.

I wonder under what conditions one would make the opposite statement—that there's not enough striving.

For example, I wonder if being omniscient would affect one's view of whether there's already enough striving or not.

My motivation w/ the question is more to predict self-conceptions than prescribe them.

I agree that "one's criteria on what to be up to are... rich and developing." More fun that way.

I made it! One day when I was bored on the train. No data is saved rn other than leaderboard scores.

"Therefore, transforming such an unconscious behavior into a conscious one should make it much easier to stop in the moment"

At this point I thought you were going to proceed to explain that the key was to start to bite your nails consciously :)

Separately, I like your approach, thx for writing.

important work.

what's more, relative to more controlling alignment techniques which disadvantage the AI from an evolutionary perspective (eg distract it from focusing on its survival), I think there's a chance Self-Other boundary blurring is evolutionarily selected for in ASI. intuition pump for that hypothesis here:

https://www.lesswrong.com/posts/3SDjtu6aAsHt4iZsR/davey-morse-s-shortform?commentId=wfmifTLEanNhhih4x

if you’re an agent (AI or human) who wants to survive for 1000 years, what’s the “self” which you want to survive? what are the constants which you want to sustain?

take your human self for example. does it make sense to define yourself as…

  • the way your hair looks right now? no, that’ll change.
  • the way your face looks? it’ll change less than your hair, but will still change.
  • your physical body as a whole? still, probably not. your body will change, and also, there are parts of you which you may consider more important than your body alone.
  • all your current beli
... (read more)
3Seth Herd
The way I usually frame identity is * Beliefs * Habits (edit - including of thought) * Memories Edit: values should probably be considered a separate class, since every thought has an associated valence. In no particular order, and that's the whole list. Character is largely beliefs and habits. There's another part of character that's purely emotional; it's sort of a habit to get angry, scared, happy, etc in certain circumstances. I'd want to preserve that too but it's less important than the big three. There are plenty of beings striving to survive, so preserving that isn't a big priority outside of preserving the big three. Yes you can expand the circle until it encompasses everything, and identify with all sentient beings who have emotions and perceive the world semi-accurately (also called "buddha nature"), but I think beliefs habits and memories are pretty closely tied to the semantics of the world "identity".
5Kaarel
not really an answer but i wanted to communicate that the vibe of this question feels off to me because: surely one's criteria on what to be up to are/[should be] rich and developing. that is, i think things are more like: currently i have some projects i'm working on and other things i'm up to, and then later i'd maybe decide to work on some new projects and be up to some new things, and i'd expect to encounter many choices on the way (in particular, having to do with whom to become) that i'd want to think about in part as they come up. should i study A or B? should i start job X? should i 2x my neuron count using such and such a future method? these questions call for a bunch of thought (of the kind given to them in usual circumstances, say), and i would usually not want to be making these decisions according to any criterion i could articulate ahead of time (though it could be helpful to tentatively state some general principles like "i should be learning" and "i shouldn't do psychedelics", but these obviously aren't supposed to add up to some ultimate self-contained criterion on a good life)
4the gears to ascension
High quality archives of the selves along the way. Compressed but not too much. In the live self, some updated descendant that has significant familial lineage, projected vaguely as the growing patterns those earlier selves would call a locally valid continuation according to the aesthetics and structures they consider essential at the time. In other words, this question is dynamically reanswered to the best of my ability in an ongoing way, and snapshots allow reverting and self-interviews to error check. Any questions? :)
8jbash
No particular aspect. Just continuity: something which has evolved from me without any step changes that are "too large". I mean, assuming that each stage through all of that evolution has maintained the desire to keep living. It's not my job to put hard "don't die" constraints on future versions. As far as I know, something generally continuity-based is the standard answer to this.
5Vladimir_Nesov
The early checkpoints, giving a chance to consider the question without losing ground.
2Lucien
Human here, Agreed, reminds me of the ship of Theseus paradox, if all your cells are replaced in your body, are you still the same? (We don't care) Also reminds me of my favourite short piece of writing: the last question by Asimov. The only important things are the things/ideas that help life, the latter can only exist as selected reflections by intelligent beings.

your desire for a government that's able to make deals in peace, away from the clamor of overactive public sentiment... I respect it as a practical stance relative to the status quo. But when considering possible futures, I'd wager it's far from what I think we'd both consider ideal.

the ideal government for me would represent the collective will of the people. insofar as that's the goal, a system which does a more nuanced job at synthesizing the collective will would be preferable.

direct democracy at scale enabled by LLMs, as i envision it and will attempt... (read more)

1Richard_Kennaway
We already have that: the Internet, and the major platforms built on it. Anyone can talk about anything. If the platform is created, how do you get people to use it the way you would like them to? People have views on far more than the things someone else thinks should concern them.
7AnthonyC
I agree that AI in general has the potential to implement something-like-CEV, and this would be better than what we have now by far. Reading your original post I didn't get much sense of attention to the 'E,' and without that I think this would be horrible. Of course, either one implemented strongly enough goes off the rails unless it's done just right, aka the whole question is downstream of pretty strong alignment success, and so for the time being we should be cautious about floating this kind of idea and clear about what would be needed to make it a good idea.   There's a less than flattering quote from a book from 1916 that "Democracy is the theory that the common people know what they want and deserve to get it good and hard." That pretty well summarizes my main fear for this kind of proposal and the ways most possible implementation attempts at it would go wrong.

i think the prerequisite for identifying with other life is sensing other life. more precisely, the extent to which you sense other life correlates with the chance that you do identify with other life.

your sight scenario is tricky, I think, because it's possible that the sum/extent of a person's net sensing (ie how much they sense) isn't affected by the number of sense they have. Anecdotally I've heard that when someone goes blind their other senses get more powerful. In other words, their "sensing capacity" (vague term I know, but still important I think)... (read more)

the machine/physical superintelligence that survives the most is likely to ruthlessly compete with all other life (narrower self concept > more physically robust)

the networked/distributed superintelligence that survives the most is likely to lovingly identify with all other life (broader self concept > more digitally robust)

how do these lenses interact?

Are there any selection theorems around self-modeling.

Ie theorems which suggest whether/when an agent will model a self (as distinct from its environment), and if so, what characteristics it will include in its self definition?

By "self," I mean a section of an agent's world model (assuming it has one) that the agent is attempting to preserve or grow.

3Steven Byrnes
I have some possibly-slightly-related discussion at [Intuitive self-models] 1. Preliminaries
3johnswentworth
None that I know of; it's a topic ripe for exploration.

The key idea that leads to empathy is the fact that, if the world model performs a sensible compression of its input data and learns a useful set of natural abstractions, then it is quite likely that the latent codes for the agent performing some action or experiencing some state, and another, similar, agent performing the same action or experiencing the same state, will end up close together in the latent space. If the agent's world model contains natural abstractions for the action, which are invariant to who is performing it, then a large amount of the

... (read more)

Figuring out how to make sense of both predictive lenses together—human design and selection pressure—would be wise.

So I generally agree, but would maybe go farther on your human design point. It seems to me that"do[ing] the right things" (which enable AGI trajectories to be completely up to us) is so completely unrealistic (eg halting all intra and international AGI competition) that it'd be better for us to focus our attention on futures where human design and selection pressures interact.

"it’s like we are trying to build an alliance with another almost interplanetary ally, and we are in a competition with China to make that alliance. But we don’t understand the ally, and we don’t understand what it will mean to let that ally into all of our systems and all of our planning."

- @ezraklein about the race to AGI

LessWrong's been a breath of fresh air for me. I came to concern over AI x-risk from my own reflections when founding a venture-backed public benefit company called Plexus, which made an experimental AI-powered social network that connects people through the content of their thoughts rather than the people they know. Among my peers, other AI founders in NYC, I felt somewhat alone with AI x-risk concern. All of us were financially motivated not to dwell on AI's ugly possibilities, and so most didn't.

Since exiting venture, I've taken a few months to reset (c... (read more)

1ceba
What environmental selection pressures are there on AGI? That's too vague, isn't it? (What's the environment?) How do you narrow this down to where the questions you're asking are interesting/reaearcheable?

I see lots of LW posts about ai alignment that disagree along one fundamental axis.

About half assume that humans design and current paradigms will determine the course of AGI development. That whether it goes well is fully and completely up to us.

And then, about half assume that the kinds of AGI which survive will be the kind which evolve to survive. Instrumental convergence and darwinism generally point here.

Could be worth someone doing a meta-post, grouping big popular alignment posts they've seen by which assumption they make, then briefly explore condi... (read more)

2JBlack
Why not both? Human design will determine the course of AGI development, and if we do the right things then whether it goes well is fully and completely up to us. Naturally at the moment we don't know what the right things are or even how to find them. If we don't do the right things (as seems likely), then the kinds of AGI which survive will be the kind which evolve to survive. That's still largely up to us at first, but increasingly less up to us.

Makes sense for current architectures. The question's only interesting, I think, if we're thinking ahead to when architectures evolve.

2faul_sname
I think at that point it will come down to the particulars of how the architectures evolve - I think trying to philosophize in general terms about the optimal compute configuration for artificial intelligence to accomplish its goals is like trying to philosophize in general terms about the optimal method of locomotion for carbon-based life. That said I do expect "making a copy of yourself is a very cheap action" to persist as an important dynamic in the future for AIs (a biological system can't cheaply make a copy of itself including learned information, but if such a capability did evolve I would not expect it to be lost), and so I expect our biological intuitions around unique single-threaded identity will make bad predictions.

thanks will take a look

Ah ok. I was responding to your post's initial prompt: "I still don't really intuitively grok why I should expect agents to become better approximated by "single-minded pursuit of a top-level goal" as they gain more capabilities." (The reason to expect this is that "single-minded pursuit of a top-level goal," if that goal is survival, could afford evolutionary advantages.)

But I agree entirely that it'd be valuable for us to invest in creating homeostatic agents. Further, I think calling into doubt western/capitalist/individualist notions like "single-minded pursuit of a top-level goal" is generally important if we have a chance of building AI systems which are sensitive and don't compete with people.

And if we don't think all AI's goals will be locked, then we might get better predictions by assuming the proliferation of all sorts of diverse AGI's and asking, Which ones will ultimately survive the most?, rather than assuming that human design/intention will win out and asking, Which AGI's will we be most likely to design? I do think the latter question is important, but only up until the point when AGI's are recursively self-modifying.

In principle, the idea of permanently locking an AI's goals makes sense—perhaps through an advanced alignment technique or by freezing an LLM in place and not developing further or larger models. But two factors make me skeptical that most AIs' goals will stay fixed in practice:

  1. There are lots of companies making all sorts of diverse AIs. Why we would expect all of those AIs to have locked rather than evolving goals?
  2. You mention "Fairly often, the weights of Agent-3 get updated thanks to additional training.... New data / new environments are continuously ge
... (read more)
1Davey Morse
And if we don't think all AI's goals will be locked, then we might get better predictions by assuming the proliferation of all sorts of diverse AGI's and asking, Which ones will ultimately survive the most?, rather than assuming that human design/intention will win out and asking, Which AGI's will we be most likely to design? I do think the latter question is important, but only up until the point when AGI's are recursively self-modifying.

i think the logic goes: if we assume many diverse autonomous agents are created, which will survive the most? And insofar as agents have goals, what will be the goals of the agents which survive the most?

i can't imagine a world where the agents that survive the most aren't ultimately those which are fundamentally trying to.

insofar as human developers are united and maintain power over which ai agents exist, maybe we can hope for homeostatic agents to be the primary kind. but insofar as human developers are competitive with each other and ai agents gain increasing power (eg for self modification), i think we have to defer to evolutionary logic in making predictions

4faul_sname
I mean I also imagine that the agents which survive the best are the ones that are trying to survive. I don't understand why we'd expect agents that are trying to survive and also accomplish some separate arbitrary infinite-horizon goal would outperform those that are just trying to maintain the conditions necessary for their survival without additional baggage. To be clear, my position is not "homeostatic agents make good tools and so we should invest efforts in creating them". My position is "it's likely that homeostatic agents have significant competitive advantages against unbounded-horizon consequentialist ones, so I expect the future to be full of them, and expect quite a bit of value in figuring out how to make the best of that".

does anyone think the difference between pre-training and inference will last?

ultimately, is it not simpler for large models to be constantly self-improving like human brains?

4faul_sname
With current architectures, no, because running inference on 1000 prompts in parallel against the same model is many times less expensive than running inference on 1000 prompts against 1000 models, and serving a few static versions of a large model is simpler than serving many dynamic versions of that mode. It might, in some situations, be more effective but it's definitely not simpler. Edit: typo
Load More