I will admit I'm not an expert here. The intuition behind this is that if you grant extreme performance at mathsy things very soon, it doesn't seem unreasonable that the AIs will make some radical breakthrough in the hard sciences surprisingly soon, while still being bad at many other things. In the scenario, note that it's a "mathematical framework" (implicitly a sufficiently big advance in what we currently have such that it wins a Nobel) but not the final theory of everything, and it's explicitly mentioned empirical data bottlenecks it.
Thanks for these speculations on the longer-term future!
while I do think Mars will be exploited eventually, I expect the moon to be first for serious robotics effort
Maybe! My vague Claude-given sense is that the Moon is surprisingly poor in important elements though.
not being the fastest amongst them all (because replicating a little better will usually only get a little advantage, not an utterly dominant one), combined with a lot of values being compatible with replicating fast, so value alignment/intent alignment matters more than you think
This is a good...
I built this a few months ago: https://github.com/LRudL/devcon
Definitely not production-ready and might require some "minimal configuration and tweaking" to get working.
Includes a "device constitution" that you set; if you visit a website, Claude will judge whether the page follows that written document, and if not it will block you, and the only way past it is winning a debate with it about why your website visit is in-line with your device constitution.
I found it too annoying but some of my friends liked it.
...However, I think there is a group of people who over-optimize for Direction and neglect the Magnitude. Increasing Magnitude often comes with the risk of corrupting the Direction. For example, scaling fast often makes it difficult to hire only mission-aligned people, and it requires you to give voting power to investors that prioritizes profit. To increase Magnitude can therefore feel risky, what if I end up working at something that is net-negative for the world? Therefore it might be easier for one's personal sanity to optimize for Direction, to do someth
As far as I know, my post started the recent trend you complain about.
Several commenters on this thread (e.g. @Lucius Bushnaq here and @MondSemmel here) mention LessWrong's growth and the resulting influx of uninformed new users as the likely cause. Any such new users may benefit from reading my recently-curated review of Planecrash, the bulk of which is about summarising Yudkowsky's worldview.
...i continue to feel so confused at what continuity led to some users of this forum asking questions like, "what effect will superintelligence have on the economy?" or
- The bottlenecks to compute production are constructing chip fabs; electricity; the availability of rare earth minerals.
Chip fabs and electricity generation are capital!
Right now, both companies have an interest in a growing population with growing wealth and are on the same side. If the population and its buying power begins to shrink, they will be in an existential fight over the remainder, yielding AI-insider/AI-outsider division.
Yep, AI buying power winning over human buying power in setting the direction of the economy is an important dynamic that I'm ...
Great post! I'm also a big (though biased) fan of Owain's research agenda, and share your concerns with mech interp.
...I'm therefore coining the term "prosaic interpretability" - an approach to understanding model internals [...]
Concretely, I've been really impressed by work like Owain Evans' research on the Reversal Curse, Two-Hop Curse, and Connecting the Dots[3]. These feel like they're telling us something real, general, and fundamental about how language models think. Despite being primarily empirical, such work is well-formulated conceptually, and
If takeoff is more continuous than hard, why is it so obvious that there exists exactly one superintelligence rather than multiple? Or are you assuming hard takeoff?
Also, your post writes about "labor-replacing AGI" but writes as if the world it might cause near-term lasts eternally
If things go well, human individuals continue existing (and humans continue making new humans, whether digitally or not). Also, it seems more likely than not that fairly strong property rights continue (if property rights aren't strong, and humans aren't augmented to be competit...
I already added this to the start of the post:
...Edited to add: The main takeaway of this post is meant to be: Labour-replacing AI will shift the relative importance of human v non-human factors of production, which reduces the incentives for society to care about humans while making existing powers more effective and entrenched. Many people are reading this post in a way where either (a) "capital" means just "money" (rather than also including physical capital like factories and data centres), or (b) the main concern is human-human inequality (rather than br
For example:
Note, firstly, that money will continue being a thing, at least unless we have one single AI system doing all economic planning. Prices are largely about communicating information. If there are many actors and they trade with each other, the strong assumption should be that there are prices (even if humans do not see them or interact with them). Remember too that however sharp the singularity, abundance will still be finite, and must therefore be allocated.
Though yes, I agree that a superintelligent singleton controlling a command economy means this breaks...
Zero to One is a book that everyone knows about, but somehow it's still underrated.
Indefinite v definite in particular is a frame that's stuck with me.
Indefinite:
Definite:
I think I agree with all of this.
(Except maybe I'd emphasise the command economy possibility slightly less. And compared to what I understand of your ranking, I'd rank competition between different AGIs/AGI-using factions as a relatively more important factor in determining what happens, and values put into AGIs as a relatively less important factor. I think these are both downstream of you expecting slightly-to-somewhat more singleton-like scenarios than I do?)
EDIT: see here for more detail on my take on Daniel's takes.
Overall, I'd emphasize as the main p...
the strategy of saving money in order to spend it after AGI is a bad strategy.
This seems very reasonable and likely correct (though not obvious) to me. I especially like your point about there being lots of competition in the "save it" strategy because it happens by default. Also note that my post explicitly encourages individuals to do ambitious things pre-AGI, rather than focus on safe capital accumulation.
Shapley value is not that kind of Solution. Coherent agents can have notions of fairness outside of these constraints. You can only prove that for a specific set of (mostly natural) constraints, Shapeley value is the only solution. But there’s no dutchbooking for notions of fairness.
I was talking more about "dumb" in the sense of violates the "common-sense" axioms that were earlier established (in this case including order invariance by assumption), not "dumb" in the dutchbookable sense, but I think elsewhere I use "dumb" as a stand-in for dutchbookable so...
It doesn't contain anything I would consider a spoiler.
If you're extra scrupulous, the closest things are:
Important other types of capital, as the term is used here, include:
Capital is not just money!
Why would an AI want to transfer resources to someone just because they have some fiat currency?
Because humans and other AIs will accept fiat currency as an input and give you valuable things as an output.
Surely they have some better way of coordinating exchanges.
All the infra for fiat currency exists; I don't see why the AIs would need to reinvent that, unless they're hiding fr...
To be somewhat more fair, the worry here is that in a regime where you don't need society anymore because AIs can do all the work for your society, value conflicts become a bigger deal than today, because there is less reason to tolerate other people's values if you can just found your own society based on your own values, and if you believe in the vulnerable world hypothesis, as a lot of rationalists do, then conflict has existential stakes, and even if not, can be quite bad, so one group controlling the future is better than inevitable conflict.
So to sum...
This post seems to misunderstand what it is responding to
fwiw, I see this post less as "responding" to something, and more laying out considerations on their own with some contrasting takes as a foil.
(On Substack, the title is "Capital, AGI, and human ambition", which is perhaps better)
that material needs will likely be met (and selfish non-positional preferences mostly satisfied) due to extreme abundance (if humans retain control).
I agree with this, though I'd add: "if humans retain control" and some sufficient combination of culture/economics/politics/in...
If you have [a totalising worldview] too, then it's a good exercise to put it into words. What are your most important Litanies? What are your noble truths?
The Straussian reading of Yudkowsky is that this does not work. Even if your whole schtick is being the arch-rationalist, you don't get people on board by writing out 500 words explicitly summarising your worldview. Even when you have an explicit set of principles, it needs to have examples and quotes to make it concrete (note how many people Yudkowsky quotes and how many examples he gives i...
Every major author who has influenced me has "his own totalising and self-consistent worldview/philosophy". This list includes Paul Graham, Isaac Asimov, Joel Spolsky, Brett McKay, Shakyamuni, Chuck Palahniuk, Bryan Caplan, qntm, and, of course, Eliezer Yudkowsky, among many others.
Maybe this is not the distinction you're focused on, but to me there's a difference between thinkers who have a worldview/philosophy, and ones that have a totalising one that's an entire system of the world.
Of your list, I only know of Graham, Asimov, Caplan, and, of cours...
I copy-pasted markdown from the dev version of my own site, and the images showed up fine on my computer because I was running the dev server; images now fixed to point to the Substack CDN copies that the Substack version uses. Sorry for that.
Thanks for the review! Curious what you think the specific fnords are - the fact that it's very space-y?
What do you expect the factories to look like? I think an underlying assumption in this story is that tech progress came to a stop on this world (presumably otherwise it would be way weirder, and eventually spread to space).
I was referring to McNamara's government work, forgot about his corporate job before then. I agree there's some SpaceX to (even pre-McDonnell Douglas merger?) Boeing axis that feels useful, but I'm not sure what to call it or what you'd do to a field (like US defence) to perpetuate the SpaceX end of it, especially over events like handovers from Kelly Johnson to the next generation.
That most developed countries, and therefore most liberal democracies, are getting significantly worse over time at building physical things seems like a Big Problem (see e.g. here). I'm glad this topic got attention on LessWrong through this post.
The main criticism I expect could be levelled on this post is that it's very non-theoretical. It doesn't attempt a synthesis of the lessons or takeaways. Many quotes are presented but not analysed.
(To take one random thing that occurred to me: the last quote from Anduril puts significant blame on McNamara. From m...
This post rings true to me because it points in the same direction as many other things I've read on how you cultivate ideas. I'd like more people to internalise this perspective, since I suspect that one of the bad trends in the developed world is that it keeps getting easier and easier to follow incentive gradients, get sucked into an existing memeplex that stops you from thinking your own thoughts, and minimise the risks you're exposed to. To fight back against this, ambitious people need to have in their heads some view of how uncomfortable chasing of ...
It's striking that there are so few concrete fictional descriptions of realistic AI catastrophe, despite the large amount of fiction in the LessWrong canon. The few exceptions, like Gwern's here or Gabe's here, are about fast take-offs and direct takeover.
I think this is a shame. The concreteness and specificity of fiction make it great for imagining futures, and its emotional pull can help us make sense of the very strange world we seem to be heading towards. And slower catastrophes, like Christiano's What failure looks like, are a large fraction of a lot...
Really like the song! Best AI generation I've heard so far. Though I might be biased since I'm a fan of Kipling's poetry: I coincidentally just memorised the source poem for this a few weeks ago, and also recently named my blog after a phrase from Hymn of Breaking Strain (which was already nicely put to non-AI music as part of Secular Solstice).
I noticed you had added a few stanzas of your own:
...As the Permian Era ended, we were promised a Righteous Cause,
To fight against Oppression or take back what once was ours.
But all the Support for our Troops didn'
The AI time estimates are wildly high IMO, across basically every category. Some parts are also clearly optional (e.g. spending 2 hours reviewing). If you know what you want to research, writing a statement can be much shorter. I have previously applied to ML PhDs in two weeks and gotten an offer. The recommendation letters are the longest and most awkward to request at such notice, but two weeks isn't obviously insane, especially if you have a good relationship with your reference letter writers (many students do things later than is recommended, no refer...
You have restored my faith in LessWrong! I was getting worried that despite 200+ karma and 20+ comments, no one had actually nitpicked the descriptions of what actually happens.
The zaps of light are diffraction limited.
In practice, if you want the atmospheric nanobots to zap stuff, you'll need to do some complicated mirroring because you need to divert sunlight. And it's not one contiguous mirror but lots of small ones. But I think we can still model this as basic diffraction with some circular mirror / lens.
Intensity , where is th...
Do the stories get old? If it's trying to be about near-future AI, maybe the state-of-the-art will just obsolete it. But that won't make it bad necessarily, and there are many other settings than 2026. If it's about radical futures with Dyson spheres or whatever, that seems like at least a 2030s thing, and you can easily write a novel before then.
Also, I think it is actually possible to write pretty fast. 2k/day is doable, which gets you a good length novel in 50 days; even x3 for ideation beforehand and revising after the first draft only gets you to 150 days. You'd have to be good at fiction beforehand, and have existing concepts to draw on in your head though
Good list!
I personally really like Scott Alexander's Presidential Platform, it hits the hilarious-but-also-almost-works spot so perfectly. He also has many Bay Area house party stories in addition to the one you link (you can find a bunch (all?) linked at the top of this post). He also has this one from a long time ago, which has one of the best punchlines I've read.
Agreed! Transformative AI is hard to visualise, and concrete stories / scenarios feel very lacking (in both disasters and positive visions, but especially in positive visions).
I like when people try to do this - for example, Richard Ngo has a bunch here, and Daniel Kokotajlo has his near-prophetic scenario here. I've previously tried to do it here (going out with a whimper leading to Bostrom's "disneyland without children" is one of the most poetic disasters imaginable - great setting for a story), and have a bunch more ideas I hope to get to.
But overall: ...
I've now posted my entries on LessWrong:
I'd also like to really thank the judges for their feedback. It's a great luxury to be able to read many pages of thoughtful, probing questions about your work. I made several revisions & additions (and also split the entire thing into parts) in response to feedback, which I think improved the finished sequence a lot, and wish I had had the time to engage even more with the feedback.
[...] instead I started working to get evals built, especially for situational awareness
I'm curious what happened to the evals you mention here. Did any end up being built? Did they cover, or plan to cover, any ground that isn't covered by the SAD benchmark?
On a meta level, I think there's a difference in "model style" between your comment, some of which seems to treat future advances as a grab-bag of desirable things, and our post, which tries to talk more about the general "gears" that might drive the future world and its goodness. There will be a real shift in how progress happens when humans are no longer in the loop, as we argue in this section. Coordination costs going down will be important for the entire economy, as we argue here (though we don't discuss things as galaxy-brained as e.g. Wei Dai's rela...
Regarding:
In my opinion you are still shying away from discussing radical (although quite plausible) visions. I expect the median good outcome from superintelligence involves everyone being mind uploaded / living in simulations experiencing things that are hard to imagine currently. [emphasis added]
I agree there's a high chance things end up very wild. I think there's a lot of uncertainty about what timelines that would happen under; I think Dyson spheres are >10% likely by 2040, but I wouldn't put them >90% likely by 2100 even conditioning on no rad...
Re your specific list items:
...
- Listen to new types of music, perfectly designed to sound good to you.
- Design the biggest roller coaster ever and have AI build it.
- Visit ancient Greece or view all the most important events of history based on superhuman AI archeology and historical reconstruction.
- Bring back Dinosaurs and create new creatures.
- Genetically modify cats to play catch.
- Design buildings in new architectural styles and have AI build them.
- Use brain computer interfaces to play videogames / simulations that feel 100% real to all senses, but which are not co
But then, if the model were to correctly do this, it would score 0 in your test, right? Because it would generate a different word pair for every random seed, and what you are scoring is "generating only two words across all random seeds, and furthermore ensuring they have these probabilities".
I think this is where the misunderstanding is. We have many questions, each question containing a random seed, and a prompt to pick two words and have e.g. a 70/30 split of the logits over those two words. So there are two "levels" here:
I was wondering the same thing as I originally read this post on Beren's blog, where it still says this. I think it's pretty clearly a mistake, and seems to have been fixed in the LW post since your comment.
I raise other confusions about the maths in my comment here.
I was very happy to find this post - it clarifies & names a concept I've been thinking about for a long time. However, I have confusions about the maths here:
...Mathematically, direct optimization is your standard AIXI-like optimization process. For instance, suppose we are doing direct variational inference optimization to find a Bayesian posterior parameter from a data-point , the mathematical representation of this is:
By contrast, the amortized objective optimizes some other set of parameters $\p
For the output control task, we graded models as correct if they were within a certain total variation distance of the target distribution. Half the samples had a requirement of being within 10%, the other of being within 20%. This gets us a binary success (0 or 1) from each sample.
Since models practically never got points from the full task, half the samples were also an easier version, testing only their ability to hit the target distribution when they're already given the two words (rather than the full task, where they have to both decide the two words themselves, and match the specified distribution).
The scenario does not say that AI progress slows down. What I imagined to be happening is that after 2028 or so, there is AI research being done by AIs at unprecedented speeds, and this drives raw intelligence forward more and more, but (1) the AIs still need to run expensive experiments to make progress sometimes, and (2) basically nothing is bottlenecked by raw intelligence anymore so you don't really notice it getting even better.