Vladimir_Nesov

Wikitag Contributions

Comments

Sorted by

The human delegation and verification vs. generation discussion is in instrumental values mode, so what matters there is alignment of instrumental goals via incentives (and practical difficulties of gaming them too much), not alignment of terminal values. Verifying all work is impractical comparing to setting up sufficient incentives to align instrumental values to the task.

For AIs, that corresponds to mundane intent alignment, which also works fine while AIs don't have practical options to coerce or disassemble you, at which point ambitious value alignment (suddenly) becomes relevant. But verification/generation is mostly relevant for setting up incentives for AIs that are not too powerful (what it would do to ambitious value alignment is anyone's guess, but probably nothing good). Just as a fox's den is part of its phenotype, incentives set up for AIs might have the form of weight updates, psychological drives, but that doesn't necessarily make them part of AI's more reflectively stable terminal values when it's no longer at your mercy.

It's like reading papers, skimming a lot of them on some topic is quite valuable in the long run, even those superficially uninteresting, and even as you only end up paying attention to 1% of their combined text. In this case, the advantage is over finding the relevant random tweets yourself.

if you think such a model is sufficiently behind the capabilities and efficiency frontiers as to be useless, one can also release the weights

The weights are also architecture, which can have key algorithmic secrets that will be quickly adopted by competitors in their future stronger models.

So you'll train open weights models in a way that doesn't use the algorithmic/architectural secrets of your closed weights models, and you won't be releasing weights for closed weights models until the world independently stumbles on enough algorithmic secrets for the ones in your model to stop being advantageous, which could take years.

OpenAI is going to remove GPT-4.5 from the API on July 14, 2025.

At the time I read that as an early announcement of when they are releasing GPT-4.5-thinking (with a controllable thinking budget), possibly to be called "GPT-5", so that the non-thinking GPT-4.5 becomes obsolete. The first GB200 NVL72s might be coming online about now, which should allow both fast serving and more reasonable pricing for very large models, even with reasoning.

I don’t get it. Have these people not heard of prices?

The issue with very large models is that you need some minimal number of GPUs to keep them in memory, and you can't serve them at all if you use fewer GPUs than that. If almost nobody uses the model, you are still paying for all the time of those GPUs. If GPT-4.5 is a 1:8 sparse MoE model pretrained in FP8 (announcement video mentioned training in low precision) on 100K H100s (Azure Goodyear campus), it could be about 5e26 FLOPs. At 1:8 sparsity, compute optimal tokens per active param ratio is 3x the dense ratio, and the dense ratio is about 40 tokens per param. So that gives 830B active params, 6.7T total params, and 100T training tokens.

The reason I chose 1:8 sparsity in the estimate is that a GB200 NVL72 rack has about 13 TB of HBM, and so 6.7T FP8 total params comfortably fit, leaving space for KV-cache. A GB200 NVL72 rack costs about $3M as Huang recently announced, or alternatively you might need 12 nodes of H200 (140 GB of HBM per chip, 96 chips), which is 3 racks (this will work more slowly and so will serve fewer requests for the same GPU-time), these racks might cost about $6M.

So that's an anchor for the fixed costs, to reach marginal costs there needs to be a lot of active users to occupy multiple times of that amount most of the time. If there aren't enough users, you still need to pay for the dedicated time of at least those 3 racks of H200 or 1 rack of GB200 NVL72, and that's not something you can reasonably price.

So I guess our expectations about the future are similar, but you see the same things as a broadly positive distribution of outcomes, while I see it as a broadly negative distribution. And Yudkowsky sees the bulk of the outcomes both of us are expecting (the ones with significant disempowerment) as quickly leading to human extinction.

Another big reason why I put a lot of weight on the possibility of "we survive indefinitely, but are disempowered" is I think muddling through is non-trivially likely to just work, and muddling through on alignment gets us out of extinction, but not out of disempowerment by humans or AIs by default.

Right, the reason I think muddling through is non-trivially likely to just work to get a moderate disempowerment outcome is that AIs are going to be sufficiently human-like in their psychology and hold sufficiently human-like sensibilities from their training data or LLM base models, that they won't like things like needless loss of life or autonomy when it's trivially cheap to avoid. Not because the alignment engineers figure out how to put this care in deliberately. They might be able to amplify it, or avoid losing it, or end up ruinously scrambling it.

The reason it might appear expensive to preserve the humans is the race to launch the von Neumann probes to capture the most distant reachable galaxies under the accelerating expansion of the universe that keep irreversibly escaping if you don't catch them early. So AIs wouldn't want to lose any time on playing politics with humanity or not eating Earth as early as possible and such. But as the cheapest option that preserves everyone AIs can just digitize the humans and restore later when more convenient. They probably won't be doing that if they care more, but it's still an option, a very very cheap one.

but not out of disempowerment by humans or AIs by default

I don't think "disempowerment by humans" is a noticeable fraction of possible outcomes, it's more like a smaller silent part of my out-of-model 5% eutopia that snatches defeat from the jaws of victory, where humans somehow end up in charge and then additionally somehow remain adamant for the cosmic all always in keeping the other humans disempowered. So the first filter is that I don't see it likely that humans end up in charge at all, that AIs will be doing any human's bidding with an impact that's not strictly bounded, and the second filter is that these impossibly-in-charge humans don't ever decide to extend potential for growth to the others (or even possibly to themselves).

If humans do end up non-disempowered, in the more likely eutopia timelines (following from the current irresponsible breakneck AGI development regime) that's only because they are given leave by the AIs to grow up arbitrarily far in a broad variety of self-directed ways, which the AIs decide to bestow for some reason I don't currently see, so that eventually some originally-humans become peers of the AIs rather than specifically in charge, and so they won't even be in the position to permanently disempower the other originally-humans if that's somehow in their propensity.

and 10-25% existential risk in total, with the rest of the probability being a somewhat survivable kind of initial chaos followed by some level of disempowerment

Bostrom's existential risk is about curtailment of long term potential, so my guess is any significant levels of disempowerment would technicaly fall under "existential risk". So your "10-25% existential risk" is probably severe disempowerment plus extinction plus some stranger things, but not the whole of what should classically count as "existential risk".

I consider eutopia with disempowerment to actually be mostly fine by my values, so long as I can delegate to more powerful AIs who do execute on my values.

Again, if they do execute on your values, including the possible preference for you to grow under your own rather than their direction, far enough that you are as strong as they might be, then this is not a world in a state of disempowerment as I'm using this term, even if you personally start out or choose to remain somewhat disempowered compared to AIs that exist at that time.

A world in which verification was just as hard as generation, or verification is harder than generation is a very different world than our world, and would predict that delegation to solve a problem basically totally fails

I think in human delegation, alignment is more important than verification. There is certainly some amount of verification, but not nearly enough to prevent sufficiently Eldritch reward hacking, which just doesn't happen that often with humans, and so the society keeps functioning, mostly. The purpose of verification on the tasks is in practice more about incentivising and verifying alignment of the counterparty, not directly about verifying the state of their work, even if it does take the form of verifying their work.

Subjectively, it seems the things that are important to track are facts and observations, possibly specific sources (papers, videos), but not your own path or provisional positions expressed at various points along the way. So attention to detail, but not the detail of your own positions or externally expressed statements about them, that's rarely of any value. You track the things encountered along your own path, so long as they remain relevant and not otherwise, but never the path itself.

If you change your mind about something, then it behooves you to not then behave as if your previous beliefs are bizarre, surprising, and explainable only as deliberate insults or other forms of bad faith.

That's the broadly accepted norm. My point is that I think it's a bad norm that damages effectiveness of lightness and does nothing useful.

I think being inconsistent and contradicting yourself and not really knowing your own position on most topics is good, actually, as long as you keep working to craft better models of those topics and not just flail about randomly. Good models grow and merge from disparate fragments, and in the growing pains those fragments keep intermittently getting more and less viable. Waiting for them to settle makes you silent about the process, while talking it through is not a problem as long as the epistemic status of your discussion is clear.

Sticking to what you've said previously, simply because it's something you happened to have said before, opposes lightness, stepping with the winds of evidence at the speed of their arrival (including logical evidence from your own better understandings as they settle). Explicitly noting the changes to your point of view, either declaring them publicly or even taking the time to note them privately for yourself, can make this slightly inconvenient, and that can have significant impact on ability to actually make progress, for the numerous tiny things that are not explicitly seen as some important project that ought to be taken seriously and given the effort. There is little downside to this, as far as I can tell, except for the norms with some influence that resist this kind of behavior, and if given leave can hold influence even inside one's own mind.

My point doesn't depend on ability or willingness of the original poster/commenter to back up or clearly make any claim, or even participate in the discussion, it's about their initial post/comment creating a place where others can discuss its topic, for topics where that happens too rarely for whatever reason. If the original poster/commenter ends up fruitfully participating in that discussion, even better, but that is not necessary, the original post/comment can still be useful in expectation.

(You are right that tailcalled specifically is vagueposting a nontrivial amount, even in this thread the response to my request for clarification ended up unclear. Maybe that propensity crosses the threshold for not ignoring the slop effect of individual vaguepostings in favor of vague positive externalities they might have.)

I think vague or poorly crafted posts/comments are valuable when there is a firm consensus in the opposite direction of their point, because they champion a place and a permission to discuss dissent on that topic that otherwise became too sparse (this only applies if it really is sparse on the specific topic). A low quality post/comment can still host valuable discussion, and downvoting the post/comment below the positives punishes that discussion.

(Keeping such comments below +5 or something still serves the point you are making. I'm objecting specifically to pushing the karma into the negatives, which makes the Schelling point and the discussion below it less convenient to see. This of course stops applying if the same author does this too often.)

My take is that the most likely outcome is still eutopia-with disempowerment for baseline humans, but for transhumans I'd expect eutopia straight-up.

This remains ambiguous with respect to the distinction I'm making in the post section I linked. If baseline humans don't have the option to escape their condition arbitrarily far, under their own direction from a very broad basin of allowed directions, I'm not considering that eutopia. If some baseline humans choose to stay that way, them not having any authority over the course of the world still counts as a possible eutopia that is not disempowerment in my terms.

The following statement mostly suggests the latter possibility for your intended meaning:

That said, for those with the wish and will to upgrade to transhumanism, my most likely outcome is still eutopia.

By the eutopia/disempowerment distinction I mean more the overall state of the world, rather than conditions for specific individuals, let alone temporary conditions. There might be pockets of disempowerment in a eutopia (in certain times and places), and pockets of eutopia in a world of disempowerment (individuals or communities in better than usual circumstances). A baseline human who has no control of the world but has a sufficiently broad potential for growing up arbitrarily far is still living in a eutopia without disempowerment.

60% on eutopia without complete disempowerment

So similarly here, "eutopia without complete disempowerment" but still with significant disempowerment is not in the "eutopia without disempowerment" bin in my terms. You are drawing different boundaries in the space of timelines.

The probabilities on the scenarios, conditional on AGI and then ASI being reached by us, is probably 60% on eutopia without complete disempowerment, 30% on complete disempowerment by either preventing us from using the universe to killing billions of present day humans, and 10% on it killing us all.

My expectation is more like model-uncertainty-induced 5% eutopia-without-disempowerment (I don't have a specific sense of why AIs would possibly give us more of the world than a little bit if we don't maintain control in the acute risk period through takeoff), 20% extinction, and the rest is a somewhat survivable kind of initial chaos followed by some level of disempowerment (possibly with growth potential, but under a ceiling that's well-below what some AIs get and keep, in cosmic perpetuity). My sense of Yudkowsky's view is that he sees all of my potential-disempowerment timelines as shortly leading to extinction.

I believe verification is easier than generation in general

I think the correct thesis that sounds like this is that whenever verification is easier than generation, it becomes possible to improve generation, and therefore it's useful to pay attention to where that happens to be the case. But in the wild either can be easier, and once most instances of verification that's easier than generation have been used up to improve their generation counterparts, the remaining situations where verification is easier get very unusual and technical.

Load More