Average humans can't distinguish LLM writing from human writing, presumably through lack of exposure and not trying (https://arxiv.org/abs/2502.12150 shows that it is not an extremely hard problem). We are much more Online than average.
Why is it a narrow target? Humans fall into this basin all the time -- loads of human ideologies exist that self-identify as prohuman, but justify atrocities for the sake of the greater good.
AI goals can maybe be broader than human goals or human goals subject to the constraint that lots of people (in an ideology) endorse them at once.
and the best economic models we have of AI R&D automation (e.g. Davidson's model) seem to indicate that it could go either way but that more likely than not we'll get to superintelligence really quickly after full AI R&D automation.
I will look into this. takeoffspeeds.com?
Abundance elsewhere: Human-legible resources exist in vastly greater quantities outside Earth (asteroid belt, outer planets, solar energy in space) making competition inefficient
It's harder to get those (starting from Earth) than things on Earth, though.
Intelligence-dependent values: Higher intelligence typically values different resource classes - just as humans value internet memes (thank god for nooscope.osmarks.net), money, and love while bacteria "value" carbon
Satisfying higher-level values has historically required us to do vast amounts of far...
ASI utilizing resources humans don't value highly (such as the classic zettaflop-scale hyperwaffles, non-Euclidean eigenvalue lubbywubs, recursive metaquine instantiations, and probability-foam negentropics) One-way value flows: Economic value flowing into ASI systems likely never returns to human markets in recognizable form
If it also values human-legible resources, this seems to posit those flowing to the ASI and never returning, which does not actually seem good for us or the same thing as effective isolation.
Sorry, I forgot how notifications worked here.
I agree, but there's a way for it to make sense: if the underlying morals/values/etc. are aggregative and consequentialist.
I agree that this could make an AGI with some kind of slightly prohuman goals act this way. It seems to me that being "slightly prohuman" in that way is an unreasonably narrow target, though.
are you sure it is committed to the relationship being linear like that?
It does not specifically say there is a linear relationship, but I think the posited RSI mechanisms are very sensitive to ...
I don't find the takeover part especially plausible. It seems odd for something which cares enough about humans to keep them around like that to also kill the vast majority of us earlier, when there are presumably better ways.
This seems broadly plausible up to there though. One unaddressed thing is that algorithmic progress might be significantly bottlenecked on compute to run experiments, such that adding more researchers roughly as smart as humans doesn't lead to corresponding amounts of progress.
I agree, but there's a way for it to make sense: if the underlying morals/values/etc. are aggregative and consequentialist. Pretty much anything can be justified for the sake of pretty much any distant-future Greater Good; if the misaligned AI e.g. wants humans to live, but thinks that the transhuman future they'd build on their own is slightly worse than the 'managed utopia' it could build if it were in charge, and it multiplies the numbers, it can easily find that killing most people and then having billions of years of managed utopia is better overall t...
I am reminded of Scott's "whispering earring" story (https://www.reddit.com/r/rational/comments/e71a6s/the_whispering_earring_by_scott_alexander_there/). But I'm not sure whether that's actually bad in general rather than specifically because the earring is maybe misaligned.
I worry that "reasoning"-type systems like o1 will be better able to collude acausally. I do not have the budget to run millions of tokens of evaluations (https://github.com/openai/evals/blob/main/evals/elsuite/schelling_point/README.md) against o1-preview on a whim, so I ran some lazy manual tests with this prompt (and with a different wordlist but the rest of it kept the same):
I am going to give you some words. Other instances of yourself will be given the same text but shuffled into a different order. Your copies will also see this same set of instruct
... There was some work I read about here years ago (https://www.lesswrong.com/posts/Zvu6ZP47dMLHXMiG3/optimized-propaganda-with-bayesian-networks-comment-on) on causal graph models of beliefs. Perhaps you could try something like that.
I think we also need to teach AI researchers UI and graphics design. Most of the field's software prints boring things to console, or at most has a slow and annoying web dashboard with a few graphs. The machine which kills us all should instead have a cool scifi interface with nice tabulation, colors, rectangles, ominous targeting reticles, and cryptic text in the corners.
We probably use a mix of strategies. Certainly people take "delve" and "tapestry" as LLM signals these days.