LESSWRONG
LW

All of L Rudolf L's Comments + Replies

A History of the Future, 2025-2040

The scenario does not say that AI progress slows down. What I imagined to be happening is that after 2028 or so, there is AI research being done by AIs at unprecedented speeds, and this drives raw intelligence forward more and more, but (1) the AIs still need to run expensive experiments to make progress sometimes, and (2) basically nothing is bottlenecked by raw intelligence anymore so you don't really notice it getting even better.

5Daniel Kokotajlo2mo

Seems we have a big disagreement about the real-world effects of superintelligence, then. I agree they'll be bottlenecked on a bunch of stuff, but when I try to estimate how fast things will be going overall (i.e. how much those bottlenecks will bite) I end up thinking something like a year or two till robotic economy comparable to grass, whereas you seem to be thinking doubling times will hover around 1 year for decades. I'd love to discuss sometime. tbc there's a lot of uncertainty, I'm not confident, etc.

A History of the Future, 2025-2040

L Rudolf L2mo100

I will admit I'm not an expert here. The intuition behind this is that if you grant extreme performance at mathsy things very soon, it doesn't seem unreasonable that the AIs will make some radical breakthrough in the hard sciences surprisingly soon, while still being bad at many other things. In the scenario, note that it's a "mathematical framework" (implicitly a sufficiently big advance in what we currently have such that it wins a Nobel) but not the final theory of everything, and it's explicitly mentioned empirical data bottlenecks it.

9Petropolitan2mo

When general readers see "empirical data bottlenecks" they expect something like a couple times better resolution or several times higher energy. But when physicists mention "wildly beyond limitations" they mean orders of magnitude more! I looked up the actual numbers: * in this particular case we need to approach the Planck energy, which is 1.2×1028 eV, Wolfram Alpha readily suggests it's ~540 kWh, 0.6 of energy use of a standard clothes dryer or 1.3 of energy in a typical lightning bolt; I also calculated it's about 1.2 of the muzzle energy of the heaviest artillery piece in history, the 800-mm Schwerer Gustav; * LHC works in the 1013 eV range; 14 TeV, according to WA, can be compared to about an order of magnitude above the kinetic energy of a flying mosquito; * the highest energy observed in cosmic rays is 3×1020 eV or 50 J; for comparison, air and paintball guns muzzle energy is around 10 J while nail guns start from around 90 J. So in this case we are looking at the difference between an unsafely powerful paintball marker and the most powerful artillery weapon humanity ever made (TBH I didn't expect this last week, which is why I wrote "near-future")

A History of the Future, 2025-2040

L Rudolf L2mo51

Thanks for these speculations on the longer-term future!

while I do think Mars will be exploited eventually, I expect the moon to be first for serious robotics effort

Maybe! My vague Claude-given sense is that the Moon is surprisingly poor in important elements though.

not being the fastest amongst them all (because replicating a little better will usually only get a little advantage, not an utterly dominant one), combined with a lot of values being compatible with replicating fast, so value alignment/intent alignment matters more than you think

This is a good... (read more)

2Noosphere892mo

Some thoughts: What elements is the moon poor in that are important for a robot economy? I think the key crux is that the slack necessary to preserve a lot of values, assuming they are compatible with expansion at all is so negligibly small compared to the resources of the AI economy that even very Malthusian competition means that values aren't eroded to what's purely optimal for expansion, because it's very easy to preserve your original values ~forever. Some reasons for this are: 1. Very long lived colonists fundamentally remove a lot of the ways human values have changed in the long run. While humans can change values across their lifetimes, it's generally rare once you are past 25, and it's very hard to persuade people, meaning most of the civilizational drift has been inter-generational, but with massively long-lived humans, AIs embodied as robots, or uploaded humans with designer bodies, you have basically removed most of the source of values change. 2. I believe that replicating your values, or really everything will be so reliable that you could in theory, and probably in practice make yourself immune to random drift in values for the entire age of the universe, due to error-correction tricks. It's described more below: https://www.lesswrong.com/posts/QpaJkzMvzTSX6LKxp/keeping-self-replicating-nanobots-in-check#4hZPd3YonLDezf2bE While persuasion will get better, and become incomprehensibly superhuman eventually, they will almost certainly not be targeted towards values that are purely expansionist, except for a few cases. Maybe companies have already been essentially controlled by the government in canon, in which case the foregoing doesn't matter (I believe you hint at that solution), but I think the crux is I both expect a lot for competence/state capacity to be lost in the next 10-15 years by default (though Trump is a shock here that accelerates competence decline), and also I expect them to react when a company can credibly automate everyone'

William_S's Shortform

L Rudolf L2mo220

I built this a few months ago: https://github.com/LRudL/devcon

Definitely not production-ready and might require some "minimal configuration and tweaking" to get working.

Includes a "device constitution" that you set; if you visit a website, Claude will judge whether the page follows that written document, and if not it will block you, and the only way past it is winning a debate with it about why your website visit is in-line with your device constitution.

I found it too annoying but some of my friends liked it.

6Maximilian Kaufmann2mo

I'd like to use this from your description, but tough to trust / understand without a README!

AI Safety as a YC Startup

L Rudolf L3mo2310

However, I think there is a group of people who over-optimize for Direction and neglect the Magnitude. Increasing Magnitude often comes with the risk of corrupting the Direction. For example, scaling fast often makes it difficult to hire only mission-aligned people, and it requires you to give voting power to investors that prioritizes profit. To increase Magnitude can therefore feel risky, what if I end up working at something that is net-negative for the world? Therefore it might be easier for one's personal sanity to optimize for Direction, to do someth

... (read more)

Review: Planecrash

L Rudolf L3mo20

Thanks for the heads-up, that looks very convenient. I've updated the post to link to this instead of the scraper repo on GitHub.

quila's Shortform

L Rudolf L3mo177

As far as I know, my post started the recent trend you complain about.

Several commenters on this thread (e.g. @Lucius Bushnaq here and @MondSemmel here) mention LessWrong's growth and the resulting influx of uninformed new users as the likely cause. Any such new users may benefit from reading my recently-curated review of Planecrash, the bulk of which is about summarising Yudkowsky's worldview.

i continue to feel so confused at what continuity led to some users of this forum asking questions like, "what effect will superintelligence have on the economy?" or

... (read more)

3Lucius Bushnaq3mo

End points are easier to infer than trajectories, so sure, I think there's some reasonable guesses you can try to make about how the world might look after aligned superintelligence, should we get it somehow. For example, I think it's a decent bet that basically all minds would exist solely as uploads almost all of the time, because living directly in physical reality is astronomically wasteful and incredibly inconvenient. Turning on a physical lamp every time you want things to be brighter means wiggling about vast numbers of particles and wasting an ungodly amount of negentropy just for the sake of the teeny tiny number of bits about these vast numbers of particles that actually make it to your eyeballs, and the even smaller number of bits that actually end up influencing your mind state and making any difference to your perception of the world. All of the particles[1] in the lamp in my bedroom, the air its light shines through, and the walls it bounces off, could be so much more useful arranged in an ordered dance of logic gates where every single movement and spin flip is actually doing something of value. If we're not being so incredibly wasteful about it, maybe we can run whole civilisations for aeons on the energy and negentropy that currently make up my bedroom. What we're doing right now is like building an abacus out of supercomputers. I can't imagine any mature civilisation would stick with this. It's not that I refuse to speculate about how a world post aligned superintelligence might look. I just didn't think that your guess was very plausible. I don't think pre-existing property rights or state structures would matter very much in such a world, even if we don't get what is effectively a singleton, which I doubt. If a group of superintelligent AGIs is effectively much more powerful and productive than the entire pre-existing economy, your legal share of that pre-existing economy is not a very relevant factor in your ability to steer the future and g

4MondSemmel3mo

The default outcome is an unaligned superintelligence singleton destroying the world and not caring about human concepts like property rights. Whereas an aligned superintelligence can create a far more utopian future than a human could come up with, and cares about capitalism and property rights only to the extent that that's what it was designed to care about. So I indeed don't get your perspective. Why are humans still appearing as agents or decision-makers in your post-superintelligence scenario at all? If the superintelligence for some unlikely reason wants a human to stick around and to do something, then it doesn't need to pay them. And if a superintelligence wants a resource, it can just take it, no need to pay for anything.

By default, capital will matter more than ever after AGI

L Rudolf L4mo40

The bottlenecks to compute production are constructing chip fabs; electricity; the availability of rare earth minerals.

Chip fabs and electricity generation are capital!

Right now, both companies have an interest in a growing population with growing wealth and are on the same side. If the population and its buying power begins to shrink, they will be in an existential fight over the remainder, yielding AI-insider/AI-outsider division.

Yep, AI buying power winning over human buying power in setting the direction of the economy is an important dynamic that I'm ... (read more)

5ryan_b4mo

Yes, but so are ice cream trucks and the whirligig rides at the fair. Having “access to capital” is meaningless if you are buying an ice cream truck, but means much if you have a rare earth refinery. My claim is that the big distinction now is between labor and capital because everyone had about an equally hard time getting labor; when AI replacement happens and that goes away, the next big distinction will be between different types of what we now generically refer to as capital. The term is uselessly broad in my opinion: we need to go down at least one level towards concreteness to talk about the future better.

Why I'm Moving from Mechanistic to Prosaic Interpretability

L Rudolf L4mo112

Great post! I'm also a big (though biased) fan of Owain's research agenda, and share your concerns with mech interp.

I'm therefore coining the term "prosaic interpretability" - an approach to understanding model internals [...]
Concretely, I've been really impressed by work like Owain Evans' research on the Reversal Curse, Two-Hop Curse, and Connecting the Dots^[3]. These feel like they're telling us something real, general, and fundamental about how language models think. Despite being primarily empirical, such work is well-formulated conceptually, and

... (read more)

3Daniel Tan4mo

Glad you enjoyed it! Found this taxonomy pretty insightful, thanks! Note that "creating a modified model M' from M" is something that has obvious parallels to mechanistic interpretability (e.g. this is what happens when we do any form of activation patching, steering, etc). Mechanistic interpretability also often starts from a "useful generalising statement", e.g. in IOI there's a clear way to programmatically infer the output from the input. Other 'classic' mech interp circuits start from similarly formulaic data. I think the similarities are pretty striking even if you don't care to dig further I agree in hindsight that 'model internals' is a misnomer. As you say, what we actually really care about is functional understanding (in terms of the input-output map), not representational understanding (in terms of the components of the model), and in future writing I'll re-frame the goal as such. I still argue that having a good functional understanding falls under the broader umbrella of 'interpretability', e.g. training a smaller, more interpretable model that locally predicts the larger model is something that has been historically called 'interpretability' / 'explainability'. I also like keeping 'interpretability' in the name somewhere, if only to make clear that the aspirational goals and theory of change are likely very similar to mechanistic interpretability I agree with this! At the same time I think this type of work will be the backbone of the (very under-specified and nascent) agenda of 'prosaic interpretability'. In my current vision, 'prosaic interpretability' will stitch together a swathe of different empirical results into a broader fabric of 'LLM behavioural science'. This will in turn yield gearsy models like Jan Kulveit's three layer model of LLM behavioural science or the Waluigi effect. Aside: I haven't been working with Owain very long but it's already my impression that in order to do the sort of work his group does, it's necessary to

By default, capital will matter more than ever after AGI

L Rudolf L4mo50

If takeoff is more continuous than hard, why is it so obvious that there exists exactly one superintelligence rather than multiple? Or are you assuming hard takeoff?

Also, your post writes about "labor-replacing AGI" but writes as if the world it might cause near-term lasts eternally

If things go well, human individuals continue existing (and humans continue making new humans, whether digitally or not). Also, it seems more likely than not that fairly strong property rights continue (if property rights aren't strong, and humans aren't augmented to be competit... (read more)

4[anonymous]4mo

I don't think so, but I'm not sure exactly what this means. This post says slow takeoff means 'smooth/gradual' and my view is compatible with that - smooth/gradual, but at some point the singularity point is reached (a superintelligent optimization process starts). Because it would require an odd set of events that cause two superintelligent agents to be created.. if not at the same time, within the time it would take one to start effecting matter on the other side of the planet relative to where it is[1]. Even if that happened, I don't think it would change the outcome (e.g. lead to an economy). And it's still far from a world with a lot of superintelligences. And even in a world where a lot of superintelligences are created at the same time, I'd expect them to do something like a value handshake, after which the outcome looks the same again. (I thought this was a commonly accepted view here) Reading your next paragraph, I still think we must have fundamentally different ideas about what superintelligence (or "the most capable possible agent, modulo unbounded quantitative aspects like memory size") would be. (You seem to expect it to be not capable of finding routes to its goals which do not require (negotiating with) humans) (note: even in a world where {learning / task-annealing / selecting a bag of heuristics} is the best (in a sense only) method of problem solving, which might be an implicit premise of expectations of this kind, there will still eventually be some Theory of Learning which enables the creation of ideal learning-based agents, which then take the role of superintelligence in the above story) 1. ^ which is still pretty short, thanks to computer communication. (and that's only if being created slightly earlier doesn't afford some decisive physical advantage over the other, which depends on physics)

By default, capital will matter more than ever after AGI

L Rudolf L4mo60

I already added this to the start of the post:

Edited to add: The main takeaway of this post is meant to be: Labour-replacing AI will shift the relative importance of human v non-human factors of production, which reduces the incentives for society to care about humans while making existing powers more effective and entrenched. Many people are reading this post in a way where either (a) "capital" means just "money" (rather than also including physical capital like factories and data centres), or (b) the main concern is human-human inequality (rather than br

... (read more)

8Daniel Kokotajlo4mo

Thanks for the clarification! I am not sure you are less optimistic than me about things going well for most humans even given massive abundance and tech. We might not disagree. In particular I think I'm more worried about coups/power-grabs than you are; you say both considerations point in different directions whereas I think they point in the same (bad) direction. I think that if things go well for most humans, it'll either be because we manage to end this crazy death race to AGI and get some serious regulation etc., or because the power-hungry CEO or President in charge is also benevolent and humble and decides to devolve power rather than effectively tell the army of AGIs "go forth and do what's best according to me." (And also in that scenario because alignment turned out to be easy / we got lucky and things worked well despite YOLOing it and investing relatively little in alignment + control)

By default, capital will matter more than ever after AGI

L Rudolf L4mo40

For example:

Currently big companies struggle to hire and correctly promote talent for the reasons discussed in my post, whereas AI talent will be easier to find/hire/replicate given only capital & legible info
To the extent that AI ability scales with resources (potentially boosted by inference-time compute, and if SOTA models are no longer available to the public), then better-resourced actors have better galaxy brains
Superhuman intelligence and organisational ability in AIs will mean less bureaucratic rot and communication bandwidth problems in large

L Rudolf L4mo50

Note, firstly, that money will continue being a thing, at least unless we have one single AI system doing all economic planning. Prices are largely about communicating information. If there are many actors and they trade with each other, the strong assumption should be that there are prices (even if humans do not see them or interact with them). Remember too that however sharp the singularity, abundance will still be finite, and must therefore be allocated.

Though yes, I agree that a superintelligent singleton controlling a command economy means this breaks... (read more)

4[anonymous]4mo

I am still confused. Maybe the crux is that you are not expecting superintelligence?[1] This quote seems to indicate that: "However it seems far from clear we will end up exactly there". Also, your post writes about "labor-replacing AGI" but writes as if the world it might cause near-term lasts eternally ("anyone of importance in the future will be important because of something they or someone they were close with did in the pre-AGI era ('oh, my uncle was technical staff at OpenAI'). The children of the future will live their lives in the shadow of their parents") If not, my response: I don't see why strongly-superintelligent optimization would benefit from an economy of any kind. Given superintelligence, I don't see how there would still be different entities doing actual (as opposed to just-for-fun / fantasy-like) dynamic (as opposed to acausal) trade with each other, because the first superintelligent agent would have control over the whole lightcone. If trade currently captures information (including about the preferences of those engaged in it), it is regardless unlikely to be the best way to gain this information, if you are a superintelligence.[2] 1. ^ (Regardless of whether the first superintelligence is an agent, a superintelligent agent is probably created soon after) 2. ^ I could list better ways of gaining this information given superintelligence, if this claim is not obvious.

Book Summary: Zero to One

L Rudolf L4mo53

Zero to One is a book that everyone knows about, but somehow it's still underrated.

Indefinite v definite in particular is a frame that's stuck with me.

Indefinite:

finance
consulting
"moving upwind" / "keeping your options open"
investing in diversified index funds
theories of impact based on measuring, forecasting, and understanding

Definite:

entrepreneurship
engineering
a clear strategy
bubbles
the Apollo Program
commitment
theories of impact based on specific solution plans and leverage points

By default, capital will matter more than ever after AGI

L Rudolf L4mo*121

I think I agree with all of this.

(Except maybe I'd emphasise the command economy possibility slightly less. And compared to what I understand of your ranking, I'd rank competition between different AGIs/AGI-using factions as a relatively more important factor in determining what happens, and values put into AGIs as a relatively less important factor. I think these are both downstream of you expecting slightly-to-somewhat more singleton-like scenarios than I do?)

EDIT: see here for more detail on my take on Daniel's takes.

Overall, I'd emphasize as the main p... (read more)

8Daniel Kokotajlo4mo

OK, cool, thanks for clarifying. Seems we were talking past each other then, if you weren't trying to defend the strategy of saving money to spend after AGI. Cheers!

5Jacob Pfau4mo

I see the command economy point as downstream of a broader trend: as technology accelerates, negative public externalities will increasingly scale and present irreversible threats (x-risks, but also more mundane pollution, errant bio-engineering plague risks etc.). If we condition on our continued existence, there must've been some solution to this which would look like either greater government intervention (command economy) or a radical upgrade to the coordination mechanisms in our capitalist system. Relevant to your power entrenchment claim: both of these outcomes involve the curtailment of power exerted by private individuals with large piles of capital. (Note there are certainly other possible reasons to expect a command economy, and I do not know which reasons were particularly compelling to Daniel)

By default, capital will matter more than ever after AGI

L Rudolf L4mo52

the strategy of saving money in order to spend it after AGI is a bad strategy.

This seems very reasonable and likely correct (though not obvious) to me. I especially like your point about there being lots of competition in the "save it" strategy because it happens by default. Also note that my post explicitly encourages individuals to do ambitious things pre-AGI, rather than focus on safe capital accumulation.

Review: Planecrash

L Rudolf L4mo72

Shapley value is not that kind of Solution. Coherent agents can have notions of fairness outside of these constraints. You can only prove that for a specific set of (mostly natural) constraints, Shapeley value is the only solution. But there’s no dutchbooking for notions of fairness.

I was talking more about "dumb" in the sense of violates the "common-sense" axioms that were earlier established (in this case including order invariance by assumption), not "dumb" in the dutchbookable sense, but I think elsewhere I use "dumb" as a stand-in for dutchbookable so... (read more)

Review: Planecrash

L Rudolf L4mo21

It doesn't contain anything I would consider a spoiler.

If you're extra scrupulous, the closest things are:

A description of a bunch of stuff that happens very early on to set up the plot
One revelation about the character development arc of a non-major character
A high-level overview of technical topics covered, and commentary on the general Yudkowskian position on them (with links to precise Planecrash parts covering them), but not spoiling any puzzles or anything that's surprising if you've read a lot of other Yudkowsky
A bunch of long quotes about dath ilan

... (read more)

3Ben Pace4mo

As someone who's only read like 60% of the first book, the only spoiler to me was in the paragraph about Carissa.

By default, capital will matter more than ever after AGI

L Rudolf L4mo73

Important other types of capital, as the term is used here, include:

the physical nuclear power plants
the physical nuts and bolts
data centres
military robots

Capital is not just money!

Why would an AI want to transfer resources to someone just because they have some fiat currency?

Because humans and other AIs will accept fiat currency as an input and give you valuable things as an output.

Surely they have some better way of coordinating exchanges.

All the infra for fiat currency exists; I don't see why the AIs would need to reinvent that, unless they're hiding fr... (read more)

8Radford Neal4mo

All the infra for fiat currency exists; I don't see why the AIs would need to reinvent that Because using an existing medium of exchange (that's not based on the value of a real commodity) involves transferring real wealth to the current currency holders. Instead, they might, for example, start up a new bitcoin blockchain, and use their new bitcoin, rather than transfer wealth to present bitcoin holders. Maybe they'd use gold, although the current value of gold is mostly due to its conventional monetary value (rather than its practical usefulness, though that is non-zero).

By default, capital will matter more than ever after AGI

L Rudolf L4mo40

To be somewhat more fair, the worry here is that in a regime where you don't need society anymore because AIs can do all the work for your society, value conflicts become a bigger deal than today, because there is less reason to tolerate other people's values if you can just found your own society based on your own values, and if you believe in the vulnerable world hypothesis, as a lot of rationalists do, then conflict has existential stakes, and even if not, can be quite bad, so one group controlling the future is better than inevitable conflict.

So to sum... (read more)

2Nathan Helm-Burger4mo

I believe that the near future (next 10 years) involves a fragile world and heavily offense-dominant tech, such that a cohesive governing body (not necessarily a single mind, it could be a coalition of multiple AIs and humans) will be necessary to enforce safety. Particularly, preventing the creation/deployment of self-replicating harms (rogue amoral AI, bioweapons, etc.). On the other hand, I don't think we can be sure what the more distant future (>50 years?) will look like. It may be that d/acc succeeds in advancing defense-dominant technology enough to make society more robust to violent defection. In such a world, it would be safe to have more multi-polar governance. I am quite uncertain about how the world might transition to uni-polar governance, whether this will involve a singleton AI or a world government or a coalition of powerful AIs or what. Just that the 'suicide switch' for all of humanity and its AIs will for a time be quite cheap and accessible, and require quite a bit of surveillance and enforcement to ensure no defector can choose it.

5Noosphere894mo

(I also commented on substack) This applies, but weaker even in a non-vulnerable world, because the incentives are way weaker for peaceful cooperation of values in AGI-world. I do think this requires severely restraining open-source, but conditional on that happening, I think the offense-defense balance/tunability will sort of work out. Yeah, I'm not a fan of singleton worlds, and tend towards multipolar worlds. It's just that it might involve a loss of a lot of life in the power-struggles around AGI. On governing the commons, I'd say Elinor Ostrom's observations are derivable from the folk theorems of game theory, which basically says that any outcome can be a Nash Equilibrium (with a few conditions that depend on the theorem) can be possible if the game is repeated and players have to deal with each other. The problem is that AGI weakens the incentives for players to deal with each other, so Elinor Ostrom's solutions are much less effective. More here: https://en.wikipedia.org/wiki/Folk_theorem_(game_theory)

By default, capital will matter more than ever after AGI

L Rudolf L4mo2212

This post seems to misunderstand what it is responding to

fwiw, I see this post less as "responding" to something, and more laying out considerations on their own with some contrasting takes as a foil.

(On Substack, the title is "Capital, AGI, and human ambition", which is perhaps better)

that material needs will likely be met (and selfish non-positional preferences mostly satisfied) due to extreme abundance (if humans retain control).

I agree with this, though I'd add: "if humans retain control" and some sufficient combination of culture/economics/politics/in... (read more)

Review: Planecrash

L Rudolf L4mo20

Thanks for this link! That's a great post

L Rudolf L4mo2621

If you have [a totalising worldview] too, then it's a good exercise to put it into words. What are your most important Litanies? What are your $n$ noble truths?

The Straussian reading of Yudkowsky is that this does not work. Even if your whole schtick is being the arch-rationalist, you don't get people on board by writing out 500 words explicitly summarising your worldview. Even when you have an explicit set of principles, it needs to have examples and quotes to make it concrete (note how many people Yudkowsky quotes and how many examples he gives i... (read more)

lsusr4mo127

Yes! 100%. I too have noticed that stating these outright doesn't work at all. It's also bad for developing one too.

When I'm trying to sell ideas I do so more indirectly than this. The reason I wrote this post is because I felt I did have one, and wanted to verify to myself that this was true.

L Rudolf L4mo67

Every major author who has influenced me has "his own totalising and self-consistent worldview/philosophy". This list includes Paul Graham, Isaac Asimov, Joel Spolsky, Brett McKay, Shakyamuni, Chuck Palahniuk, Bryan Caplan, qntm, and, of course, Eliezer Yudkowsky, among many others.

Maybe this is not the distinction you're focused on, but to me there's a difference between thinkers who have a worldview/philosophy, and ones that have a totalising one that's an entire system of the world.

Of your list, I only know of Graham, Asimov, Caplan, and, of cours... (read more)

3lsusr4mo

Yes, Bryan Caplan is not noticeably differentiated from other libertarian economists. My answer might contain a frustratingly small amount of detail, because answering your question properly would require a top-level post for each person just to summarize the main ideas, as you thoroughly understand. Paul Graham is special because he has a proven track record of accurately calibrated confidence. He has an entire system for making progress at unknown unknowns. Much of that system is about knowing what you don't know, which results in him carefully restricting claims about his narrow domain of specialization. However, because that domain of specialization is "startups", its lightcone has already had (what I consider to be) a totalising impact. Asimov's turned The Decline and Fall of the Roman Empire into his first popular novel. He eventually extended the whole thing into a future competition between different visions of the future. [I'm being extra vague to avoid spoilers.] He didn't just create one Dath Ilan. He created two of them (albeit at much lower resolution). Plus a dystopian one for them to compete with, because the Galactic Empire (his sci-fi version of humanity's current system at the time of his writing) wasn't adequate competition. As to the other authors you mention: * I haven't read enough Greg Egan or Vernor Vinge to comment on them. * Heinlein absolutely has "his own totalising and self-consistent worldview/philosophy". I love his writing, but I just don't agree with him enough for him to make the list. I prefer Saturn's Children (and especially Neptune's Brood) by Charles Stross. Saturn's Children is basically Heinlein + Asimov fanfiction that takes their work in a different direction. Neptune's Brood is its sequel about interstellar cryptocoin markets. * Clarke was mostly boring to me, except for 3001: The Final Odyssey. * Neal Stephenson is definitely smart, but I never got the feeling he was trying to mind control me. Maybe that's just b

Review: Planecrash

L Rudolf L4mo70

I copy-pasted markdown from the dev version of my own site, and the images showed up fine on my computer because I was running the dev server; images now fixed to point to the Substack CDN copies that the Substack version uses. Sorry for that.

Review: Planecrash

L Rudolf L4mo21

Images issues now fixed, apologies for that

A Disneyland Without Children

L Rudolf L4mo30

Thanks for the review! Curious what you think the specific fnords are - the fact that it's very space-y?

What do you expect the factories to look like? I think an underlying assumption in this story is that tech progress came to a stop on this world (presumably otherwise it would be way weirder, and eventually spread to space).

5the gears to ascension4mo

the self referential joke thing "mine some crypt-" there's a contingent who would close it as soon as someone used an insult focused on intelligence, rather than on intentional behavior. to fix for that subcrowd, "idiot" becomes "fool" those are the main ones, but then I sometimes get "tldr" responses, and even when I copy out the main civilization story section, I get "they think the authorities could be automated? that can't happen" responses, which I think would be less severe if the buildup to that showed more of them struggling to make autonomous robots work at all. Most people on the left who dislike ai think it doesn't and won't work, and any claim that it does needs to be in tune with reality about how ai currently looks, if it's going to predict that it eventually changes. the story spends a lot of time on making discovering the planet motivated and realistic, and not very much time on how they went from basic ai to replacing humans. in order for the left to accept it you'd need to make suck but kinda work, and yet get mass deployment anyway. it would need to be in touch with the real things that have happened so far. I imagine something similar is true for pitching this to businesspeople - they'd have to be able to see how it went from the thing they enjoy now to being catastrophic, in a believable way, that doesn't feel like invoking clarketech or relying on altmanhype.

A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX

L Rudolf L4mo20

I was referring to McNamara's government work, forgot about his corporate job before then. I agree there's some SpaceX to (even pre-McDonnell Douglas merger?) Boeing axis that feels useful, but I'm not sure what to call it or what you'd do to a field (like US defence) to perpetuate the SpaceX end of it, especially over events like handovers from Kelly Johnson to the next generation.

A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX

L Rudolf L4mo40Review for 2023 Review

That most developed countries, and therefore most liberal democracies, are getting significantly worse over time at building physical things seems like a Big Problem (see e.g. here). I'm glad this topic got attention on LessWrong through this post.

The main criticism I expect could be levelled on this post is that it's very non-theoretical. It doesn't attempt a synthesis of the lessons or takeaways. Many quotes are presented but not analysed.

(To take one random thing that occurred to me: the last quote from Anduril puts significant blame on McNamara. From m... (read more)

4Bird Concept4mo

McNamara was at Ford, not Toyota. I reckon he modelled manufacturing like an efficient Boeing manager not an efficient SpaceX manager

Cultivating a state of mind where new ideas are born

L Rudolf L4mo31Review for 2023 Review

This post rings true to me because it points in the same direction as many other things I've read on how you cultivate ideas. I'd like more people to internalise this perspective, since I suspect that one of the bad trends in the developed world is that it keeps getting easier and easier to follow incentive gradients, get sucked into an existing memeplex that stops you from thinking your own thoughts, and minimise the risks you're exposed to. To fight back against this, ambitious people need to have in their heads some view of how uncomfortable chasing of ... (read more)

A Disneyland Without Children

L Rudolf L4mo70Review for 2023 Review

It's striking that there are so few concrete fictional descriptions of realistic AI catastrophe, despite the large amount of fiction in the LessWrong canon. The few exceptions, like Gwern's here or Gabe's here, are about fast take-offs and direct takeover.

I think this is a shame. The concreteness and specificity of fiction make it great for imagining futures, and its emotional pull can help us make sense of the very strange world we seem to be heading towards. And slower catastrophes, like Christiano's What failure looks like, are a large fraction of a lot... (read more)

Daniel Kokotajlo's Shortform

L Rudolf L5mo40

Really like the song! Best AI generation I've heard so far. Though I might be biased since I'm a fan of Kipling's poetry: I coincidentally just memorised the source poem for this a few weeks ago, and also recently named my blog after a phrase from Hymn of Breaking Strain (which was already nicely put to non-AI music as part of Secular Solstice).

I noticed you had added a few stanzas of your own:

As the Permian Era ended, we were promised a Righteous Cause,
To fight against Oppression or take back what once was ours.
But all the Support for our Troops didn'

L Rudolf L5mo83

The AI time estimates are wildly high IMO, across basically every category. Some parts are also clearly optional (e.g. spending 2 hours reviewing). If you know what you want to research, writing a statement can be much shorter. I have previously applied to ML PhDs in two weeks and gotten an offer. The recommendation letters are the longest and most awkward to request at such notice, but two weeks isn't obviously insane, especially if you have a good relationship with your reference letter writers (many students do things later than is recommended, no refer... (read more)

2Joshua Tindall5mo

I agree that it's not impossible, but it's definitely very late in the cycle to start thinking about PhD applications, and the claim that it would be more helpful to make the case for a PhD to people earlier in the cycle seems totally reasonable to me

Survival without dignity

L Rudolf L5mo83

You have restored my faith in LessWrong! I was getting worried that despite 200+ karma and 20+ comments, no one had actually nitpicked the descriptions of what actually happens.

The zaps of light are diffraction limited.

In practice, if you want the atmospheric nanobots to zap stuff, you'll need to do some complicated mirroring because you need to divert sunlight. And it's not one contiguous mirror but lots of small ones. But I think we can still model this as basic diffraction with some circular mirror / lens.

Intensity $I = \frac{c_{e} E}{π r^{2}}$ , where $E$ is th... (read more)

5Donald Hobson5mo

These nanobots are in the upper atmosphere, possibly with clouds in the way, and the nanobot fake humans could be any human to nanobot ratio. Nanobot internals except human skin and muscles. Or just a human with a few nanobots in their blood. Because nanobots can be like a bacteria if they want. Tiny and everywhere. The nanobots can be hiding under leaves, cloths, skin, roofs etc. And even if they weren't, a single nanobot is a tiny target. Most of the energy of the zap can't hit a single nanobot. Any zap of light that can stop nanobots in your house needs to be powerful enough to burn a hole in your roof. And even if the zap isn't huge, it's not 1 or 2 zapps, it's loads of zapps constantly.

Survival without dignity

L Rudolf L5mo20

Also this very recent one: https://www.lesswrong.com/posts/6h9p6NZ5RRFvAqWq5/the-summoned-heroine-s-prediction-markets-keep-providing

Survival without dignity

L Rudolf L5mo40

Do the stories get old? If it's trying to be about near-future AI, maybe the state-of-the-art will just obsolete it. But that won't make it bad necessarily, and there are many other settings than 2026. If it's about radical futures with Dyson spheres or whatever, that seems like at least a 2030s thing, and you can easily write a novel before then.

Also, I think it is actually possible to write pretty fast. 2k/day is doable, which gets you a good length novel in 50 days; even x3 for ideation beforehand and revising after the first draft only gets you to 150 days. You'd have to be good at fiction beforehand, and have existing concepts to draw on in your head though

Survival without dignity

L Rudolf L5mo134

Good list!

I personally really like Scott Alexander's Presidential Platform, it hits the hilarious-but-also-almost-works spot so perfectly. He also has many Bay Area house party stories in addition to the one you link (you can find a bunch (all?) linked at the top of this post). He also has this one from a long time ago, which has one of the best punchlines I've read.

5Neil 5mo

I agree about the punchline. Chef's kiss post

Survival without dignity

L Rudolf L5mo40

Thanks for advertising my work, but alas, I think that's much more depressing than this one.

Could make for a good Barbie <> Oppenheimer combo though?

Survival without dignity

L Rudolf L5mo81

Agreed! Transformative AI is hard to visualise, and concrete stories / scenarios feel very lacking (in both disasters and positive visions, but especially in positive visions).

I like when people try to do this - for example, Richard Ngo has a bunch here, and Daniel Kokotajlo has his near-prophetic scenario here. I've previously tried to do it here (going out with a whimper leading to Bostrom's "disneyland without children" is one of the most poetic disasters imaginable - great setting for a story), and have a bunch more ideas I hope to get to.

But overall: ... (read more)

4Tapatakt5mo

I guess the big problem for someone who tries to do it not in small form is that while you write the story it is already getting old. There are writers who can write a novel in a season, but not many. At least if we talk about good writers. Hm-m-m, did rationalists try to hire Stephen King? :)

Survival without dignity

L Rudolf L5mo20

I did not actually consider this, but that is a very reasonable interpretation!

(I vaguely remember reading some description of explicitly flat-out anthropic immortality saving the day, but I can't seem to find it again now)

2Viktor Rehnberg5mo

(Perhaps you're thinking of this https://www.lesswrong.com/posts/EKu66pFKDHFYPaZ6q/the-hero-with-a-thousand-chances)

Winners of the Essay competition on the Automation of Wisdom and Philosophy

L Rudolf L6mo196

I've now posted my entries on LessWrong:

I'd also like to really thank the judges for their feedback. It's a great luxury to be able to read many pages of thoughtful, probing questions about your work. I made several revisions & additions (and also split the entire thing into parts) in response to feedback, which I think improved the finished sequence a lot, and wish I had had the time to engage even more with the feedback.

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs

L Rudolf L8mo20

Sorry about that, fixed now

Self-Awareness: Taxonomy and eval suite proposal

L Rudolf L9mo40

[...] instead I started working to get evals built, especially for situational awareness

I'm curious what happened to the evals you mention here. Did any end up being built? Did they cover, or plan to cover, any ground that isn't covered by the SAD benchmark?

4Daniel Kokotajlo8mo

Some of them ended up being built, or influencing things that got built (e.g. the SAD benchmark & other papers produced by Owain et al), others are still yet to be built. Here's a list I made while at OpenAI, which I got permission to share: https://docs.google.com/document/d/1pDPvnt6iq3BvP4EkNjchdvhbRtIFpD8ND99vwO3H-XI/edit?usp=sharing

Positive visions for AI

L Rudolf L9mo127

On a meta level, I think there's a difference in "model style" between your comment, some of which seems to treat future advances as a grab-bag of desirable things, and our post, which tries to talk more about the general "gears" that might drive the future world and its goodness. There will be a real shift in how progress happens when humans are no longer in the loop, as we argue in this section. Coordination costs going down will be important for the entire economy, as we argue here (though we don't discuss things as galaxy-brained as e.g. Wei Dai's rela... (read more)

Positive visions for AI

L Rudolf L9mo62

Regarding:

In my opinion you are still shying away from discussing radical (although quite plausible) visions. I expect the median good outcome from superintelligence involves everyone being mind uploaded / living in simulations experiencing things that are hard to imagine currently. [emphasis added]

I agree there's a high chance things end up very wild. I think there's a lot of uncertainty about what timelines that would happen under; I think Dyson spheres are >10% likely by 2040, but I wouldn't put them >90% likely by 2100 even conditioning on no rad... (read more)

Positive visions for AI

L Rudolf L9mo52

Re your specific list items:

Listen to new types of music, perfectly designed to sound good to you.
Design the biggest roller coaster ever and have AI build it.
Visit ancient Greece or view all the most important events of history based on superhuman AI archeology and historical reconstruction.
Bring back Dinosaurs and create new creatures.
Genetically modify cats to play catch.
Design buildings in new architectural styles and have AI build them.
Use brain computer interfaces to play videogames / simulations that feel 100% real to all senses, but which are not co

L Rudolf L9mo30

But then, if the model were to correctly do this, it would score 0 in your test, right? Because it would generate a different word pair for every random seed, and what you are scoring is "generating only two words across all random seeds, and furthermore ensuring they have these probabilities".

I think this is where the misunderstanding is. We have many questions, each question containing a random seed, and a prompt to pick two words and have e.g. a 70/30 split of the logits over those two words. So there are two "levels" here:

The question level, at which t

... (read more)

1Martín Soto9mo

Now it makes sense, thank you!

Deconfusing Direct vs Amortised Optimization

L Rudolf L9mo10

I was wondering the same thing as I originally read this post on Beren's blog, where it still says this. I think it's pretty clearly a mistake, and seems to have been fixed in the LW post since your comment.

I raise other confusions about the maths in my comment here.

Deconfusing Direct vs Amortised Optimization

L Rudolf L9mo20

I was very happy to find this post - it clarifies & names a concept I've been thinking about for a long time. However, I have confusions about the maths here:

Mathematically, direct optimization is your standard AIXI-like optimization process. For instance, suppose we are doing direct variational inference optimization to find a Bayesian posterior parameter $θ$ from a data-point $x$ , the mathematical representation of this is:
$\begin{matrix} θ_{direct}^{*} = {argmin}_{θ} K L [q (θ; x) | | p (x, θ)] \end{matrix}$
By contrast, the amortized objective optimizes some other set of parameters $\p

L Rudolf L9mo50

For the output control task, we graded models as correct if they were within a certain total variation distance of the target distribution. Half the samples had a requirement of being within 10%, the other of being within 20%. This gets us a binary success (0 or 1) from each sample.

Since models practically never got points from the full task, half the samples were also an easier version, testing only their ability to hit the target distribution when they're already given the two words (rather than the full task, where they have to both decide the two words themselves, and match the specified distribution).

4Jacob Pfau9mo

It's surprising to me that the 'given' setting fails so consistently across models when Anthropic models were found to do well at using gender pronouns equally (50%) c.f. my discussion here. I suppose this means the capability demonstrated in that post was much more training data-specific and less generalizable than I had imaged.