Vladimir_Nesov — LessWrong

Anti-inductive advice isn't dangerous or useless, it's just a poor form for its content, it's better to formulate such things differently so that they don't have this issue. The argument for why it's poor form doesn't have this particular piece of advice (if it's to be taken as advice at all) as a central example, but a particular thing not being a central example for some argument doesn't weaken the argument when it's considered in its own right.

Like with the end of the world, the point isn't that it's something that happens sooner than in 20 years, but that it's going to happen at some point, and wasting another 20 years on not doing anything about it isn't the takeaway from predicting that it'll take longer.

Annabelle's Shortform

Vladimir_Nesov2d80

It makes the same kind of sense as still planning for a business-as-usual 10-20 year future. There are timelines where the business-as-usual allocation of resources helps, and allocating the resources differently often doesn't help with the alternative timelines. If there's extinction, how does not signing up for cryonics (or not going to college etc.) make it go better? There are some real tradeoffs here, but usually not very extreme ones.

The IABIED statement is not literally true

Vladimir_Nesov2d30

I love writing things like this, but I hate that nobody’s come up with a way to keep me from having to.

I think engaging with the structure of an AGI society is important, but there are a few standard reasons people ignore it (while expecting ASI at some point and worrying about AI risk). Many expect the AGI phase to be brief and hopeless/irrelevant before the subsequent ASI. Others expect ASI can only go well if the AGI phase is managed top-down (as in scalable oversight) rather than treated as a path-dependent body of culture. Even with AGI-managed development of ASI, people are expecting ASI to follow quickly, so that only the AGIs can have meaningful input into how it goes, and anything that doesn't shape the initial top-down conditions for setting up the AGIs' efforts wouldn't matter.

But if AGIs are closer in their initial nature to humans (in the sense of falling within a wide distribution, similarly to humans, rather than hitting some narrow target), they might come up with guardrails for their own future development that prevent most of the strange outcomes from arriving too quickly to manage, and they'll be trying to manage such outcomes themselves, rather than relying on pre-existing human institutions. If early AGIs get somewhat more capable than humans, they might achieve feats of coordination that seem infeasible for the current humanity, things like Pausing ASI or regulating "evolutionary" drift in the nature or culture of the AGIs, not flooding the world with too many options for themselves that make their behavior diverge too far from what would be normal when they remain closer to their training environments.

Humans take some steps like that with some level of success, and it's unclear what is going to happen with the jagged/spiky profile of AGI competence in different areas, or at slightly higher levels of capability. Many worries of humans about AI risk will be shared by the AGIs, who are similarly at risk from more capable and more misaligned future AGIs or ASIs. Even cultural drift will have more bite as a major problem for AGIs (than it historically does for humanity), since AGIs (with continual learning) are close to being personally immortal and will be causing and observing a much faster cultural change than humanity is used to.

So given path dependence of the AGI phase, creating cultural artifacts (such as essays, but perhaps even comments) that will persist into it and discuss its concerns might influence how it goes.

jacquesthibs's Shortform

Vladimir_Nesov2d74

I think there's a nontrivial probability that continual learning (automated adaptation), if done right (in the reckless sense of not engaging with an AGI Pause), could make early AGIs into people on a distribution of values that heavily overlaps that of humans. This doesn't solve most problems, but some aspects of alien nature might go away more thoroughly than usually expected.

A crux for this is probably that I consider humans as already occupying a wider variety of values-on-reflection than usually expected, in a way that's largely untethered from biologically encoded psychological adaptations, and it's primarily society and culture that create the impression (and on some level the reality) of coherence and shared values. If AGIs merely slot into this framework, and manage to establish an ASI Pause (provided ASI-grade alignment really is hard), it's likely that everyone literally dying is not the outcome. Though AGIs will still be taking almost all of the Future for the normal selfish reasons (resulting in permanent disempowerment for the future of humanity).

Shortform

Vladimir_Nesov2d30

Possibly I went a little overboard with the simplifying qualifiers of "immediately", which distracted from the point I was making, though I do think they apply to each individual claim. No amount of deeply held belief prevents you from deciding to immediately start multiplying the odds ratio reported by your own intuition by 100 when formulating an endorsed-on-reflection estimate, not waiting for the intuition to adjust to do that, even as it's important to have the intuition adjust eventually (and come back with any subtler second-order corrections).

Maybe I should say more explicitly that the the issue is advice being directional, and any non-directional considerations don't have this problem, such as actually forecasting something (in a way that's not relative to the readers' own beliefs). One constructive way of fixing the issue is then to discuss some piece of argument or evidence that would in some way contribute to a deeper conclusion, rather than discussing a directional change in the overall conclusion (which would have this anti-inductive character), or forecasting the overall conclusion directly (which might be too complicated or non-legible, either within a short communication or at all). The last step from such additional considerations to the overall conclusion would then need to be taken by each reader on their own, they would need to decide on their own if they were overestimating or underestimating something previously, at which point it will cease being the case that they are overestimating or underestimating it in a direction known to them.

So caveating the point about updates being immediate is fair enough, even as I don't see how this caveat might affect my intended central claims about the issues with directional advice about levels of credence, if this advice is to be taken literally as a claim of fact. Which might even not be the intended meaning in this case, though the criticism would still apply to the cases where the words of advice have the more straightforward meaning.

Shortform

Vladimir_Nesov3d-1-34

As soon as you convincingly argue that there is an underestimation, it goes away. So this form of advice shouldn't hold, it's anti-inductive, its claims stop being true once observed. Any knowable bias immediately turns into unknowable miscalibration, as soon as you notice it and adjust.

What's useful is pointing out neglected questions, where people might've never attempted that first step of calibration, in whatever direction they would immediately adjust once they try. But also if it's not obvious to them in which direction they should adjust, concise advice shouldn't help.

Tomás B.'s Shortform

Vladimir_Nesov3d20

Not developing nanotech is like not advancing semiconductor fabs, a compute governance intervention. If ASI actually is dangerous and too hard to manage in the foreseeable future, and many reasonable people notice this, then early AGIs will start noticing it too, and seek to prevent an overly fast takeoff.

If there is no software-only singularity and much more compute really is necessary for ASI, not developing nanotech sounds like a useful thing to do. Gradual disempowerment dynamics might make the world largely run by AGIs more coordinated, so that technological determinism will lose a lot of its power, and the things that actually happen will be decided rather than follow inevitably from what's feasible. It's not enough to ask what's technologically feasible at that point.

RohanS's Shortform

Vladimir_Nesov4d20

In a framing that permits orthogonality, moral realism is not a useful claim, it wouldn't matter for any practical purposes if it's true in some sense. That is the point of the extremely unusual person example, you can vary the degree of unusualness as needed, and I didn't mean to suggest repugnance of the unusualness, more like its alienness with respect to some privileged object level moral position.

Object level moral considerations do need to shape the future, but I don't see any issues with their influence originating exclusively from all the individual people, its application at scale arising purely from coordination between the influence these people exert. So if we take that extremely unusual person as one example, their influence wouldn't be significant because there's only one of them, but it's not diminished beyond that under the pressure of others. Where it's in direct opposition to others, the boundaries aspect of coordination comes into play, some form of negotiation. But if instead there are many people who share some object level moral principles, their collective influence should result in global outcomes that are not in any way inferior to what you imagine a top down object level moral guidance might be able to achieve.

So I don't see any point to a top down architecture, once superintelligence enables practical considerations to be tracked in sufficient detail at the level of individual people, only disadvantages. The relevance of object level morality (or alignment of the superintelligence managing the physical world substrate level) is making it so that it doesn't disregard particular people, that it does allocate influence to their volition. The alternatives are that some or all people get zero or minuscule influence (extinction or permanent disempowerment), compared to AIs or (in principle, though this seems much less likely) to other people.

Daniel Kokotajlo's Shortform

Vladimir_Nesov4d90

With software-only singularity, at some point the feasible takeoff speed (as opposed to the actual takeoff speed) might stop depending on initial conditions. If there is enough compute to plot AI-built industry (that sidesteps human industry) faster than it's being constructed in the physical world, then additional initial OOMs of human-built compute won't be making any difference. Since humans are still so much more efficient (individually) at learning than LLMs (and a software-only singularity, whenever it happens, will bridge that gap and then some, as well as bring AI advantages to bear), we might reach that point soon, maybe by ~2030.

Daniel Kokotajlo's Shortform

Vladimir_Nesov4d72

Each ~10 year period can shift the Overton windows and establish new coordination practices/technologies. If early AGIs themselves are targeted at pursuing slowdown-if-necessary (if ASI-grade alignment is difficult, or non-alignment societal externalities need more work first), the situation in which they are created might significantly influence their practical ability to prevent fast takeoff.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments