osmarks - LessWrong

We probably use a mix of strategies. Certainly people take "delve" and "tapestry" as LLM signals these days.

Average humans can't distinguish LLM writing from human writing, presumably through lack of exposure and not trying (https://arxiv.org/abs/2502.12150 shows that it is not an extremely hard problem). We are much more Online than average.

How AI Takeover Might Happen in 2 Years

osmarks1moΩ230

Why is it a narrow target? Humans fall into this basin all the time -- loads of human ideologies exist that self-identify as prohuman, but justify atrocities for the sake of the greater good.

AI goals can maybe be broader than human goals or human goals subject to the constraint that lots of people (in an ideology) endorse them at once.

and the best economic models we have of AI R&D automation (e.g. Davidson's model) seem to indicate that it could go either way but that more likely than not we'll get to superintelligence really quickly after full AI R&D automation.

I will look into this. takeoffspeeds.com?

Economic Topology, ASI, and the Separation Equilibrium

osmarks2mo10

Abundance elsewhere: Human-legible resources exist in vastly greater quantities outside Earth (asteroid belt, outer planets, solar energy in space) making competition inefficient

It's harder to get those (starting from Earth) than things on Earth, though.

Intelligence-dependent values: Higher intelligence typically values different resource classes - just as humans value internet memes (thank god for nooscope.osmarks.net), money, and love while bacteria "value" carbon

Satisfying higher-level values has historically required us to do vast amounts of farming and strip-mining and other resource extraction.

Synthesis efficiency: Advanced synthesis or alternative acquisition methods would likely require less energy than competing with humans for existing supplies

It is barely "competition" for an ASI to take human resources. This does not seem plausible for bulk mass-energy.

Negotiated disinterest: Humans have incentives to abandon interest in overlap resources:

Right, but we still need lots of things the ASI also probably wants.

Economic Topology, ASI, and the Separation Equilibrium

osmarks2mo10

ASI utilizing resources humans don't value highly (such as the classic zettaflop-scale hyperwaffles, non-Euclidean eigenvalue lubbywubs, recursive metaquine instantiations, and probability-foam negentropics) One-way value flows: Economic value flowing into ASI systems likely never returns to human markets in recognizable form

If it also values human-legible resources, this seems to posit those flowing to the ASI and never returning, which does not actually seem good for us or the same thing as effective isolation.

How AI Takeover Might Happen in 2 Years

osmarks2mo*Ω230

Sorry, I forgot how notifications worked here.

I agree, but there's a way for it to make sense: if the underlying morals/values/etc. are aggregative and consequentialist.

I agree that this could make an AGI with some kind of slightly prohuman goals act this way. It seems to me that being "slightly prohuman" in that way is an unreasonably narrow target, though.

are you sure it is committed to the relationship being linear like that?

It does not specifically say there is a linear relationship, but I think the posited RSI mechanisms are very sensitive to this. Edit: this problem is mentioned explicitly ("More than ever, compute is the lifeblood of AI development, and the ‘bottleneck’ is deciding how to use it."), but it doesn't seem to be directly addressed beyond the idea of building "research taste" into the AI, which seems somewhat tricky because that's quite a long-horizon task with bad feedback signals.

How AI Takeover Might Happen in 2 Years

osmarks2moΩ141

I don't find the takeover part especially plausible. It seems odd for something which cares enough about humans to keep them around like that to also kill the vast majority of us earlier, when there are presumably better ways.

This seems broadly plausible up to there though. One unaddressed thing is that algorithmic progress might be significantly bottlenecked on compute to run experiments, such that adding more researchers roughly as smart as humans doesn't lead to corresponding amounts of progress.

The Gentle Romance

osmarks2mo30

https://gwern.net/idea#deep-learning has a sketch of it.

The Gentle Romance

osmarks2mo90

I am reminded of Scott's "whispering earring" story (https://www.reddit.com/r/rational/comments/e71a6s/the_whispering_earring_by_scott_alexander_there/). But I'm not sure whether that's actually bad in general rather than specifically because the earring is maybe misaligned.

How to prevent collusion when using untrusted models to monitor each other

osmarks4mo30

I don't expect them to have human-legible CoT forever. o1/o3 already seem to veer into strangeness sometimes.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments