All of jaan's Comments + Replies

jaan32

correct! i’ve tried to use this symmetry argument (“how do you know you’re not the clone?”) over the years to explain the multiverse: https://youtu.be/29AgSo6KOtI?t=869

jaan40

interesting! still, aestivation seems to easily trump the black hole heat dumping, no?

Wei Dai124

From Bennett et el's reply to the aestivation paper:

Thus we come to our first conclusion: a civilization can freely erase bits without forgoing larger future rewards up until the point when all accessible bounded resources are jointly thermalized.

They don't mention black holes specifically, but my interpretation of this is that a civilization can first dump waste heat into a large black hole, and then later when the CMB temperature drops below that of the black hole, reverse course to use Hawking radiation of the black hole as energy source and CMB as ... (read more)

jaan126

dyson spheres are for newbs; real men (and ASIs, i strongly suspect) starlift.

8Wei Dai
Yes, advanced civilizations should convert stellar matter 100% into energy using something like the Hawking radiation of small black holes, then dump waste heat into large black holes.
jaan2312

thank you for continuing to stretch the overton window! note that, luckily, the “off-switch” is now inside the window (though just barely so, and i hear that big tech is actively - and very myopically - lobbying against on-chip governance). i just got back from a UN AIAB meeting and our interim report does include the sentence “Develop and collectively maintain an emergency response capacity, off-switches and other stabilization measures” (while rest of the report assumes that AI will not be a big deal any time soon).

jaan40

thanks! basically, i think that the top priority should be to (quickly!) slow down the extinction race. if that’s successful, we’ll have time for more deliberate interventions — and the one you propose sounds confidently net positive to me! (with sign uncertainties being so common, confident net positive interventions are surprisingly rare).

jaan30

i might be confused about this but “witnessing a super-early universe” seems to support “a typical universe moment is not generating observer moments for your reference class”. but, yeah, anthropics is very confusing, so i’m not confident in this.

owencb100

OK hmm I think I understand what you mean.

I would have thought about it like this:

  • "our reference class" includes roughly the observations we make before observing that we're very early in the universe
    • This includes stuff like being a pre-singularity civilization
  • The anthropics here suggest there won't be lots of civs later arising and being in our reference class and then finding that they're much later in universe histories
  • It doesn't speak to the existence or otherwise of future human-observer moments in a post-singularity civilization

... but as you say anthropics is confusing, so I might be getting this wrong.

2plex
By my models of anthropics, I think this goes through.
jaan127

three most convincing arguments i know for OP’s thesis are:

  1. atoms on earth are “close by” and thus much more valuable to fast running ASI than the atoms elsewhere.

  2. (somewhat contrary to the previous argument), an ASI will be interested in quickly reaching the edge of the hubble volume, as that’s slipping behind the cosmic horizon — so it will starlift the sun for its initial energy budget.

  3. robin hanson’s “grabby aliens” argument: witnessing a super-young universe (as we do) is strong evidence against it remaining compatible with biological life for l

... (read more)
4ryan_greenblatt
I've thought a bit about actions to reduce the probability that AI takeover involves violent conflict. I don't think there are any amazing looking options. If goverments were generally more competent that would help. Having some sort of apparatus for negotiating with rogue AIs could also help, but I expect this is politically infeasible and not that leveraged to advocate for on the margin.
2Mitchell_Porter
In preparation for what?
6owencb
I think point 2 is plausible but doesn't super support the idea that it would eliminate the biosphere; if it cared a little, it could be fairly cheap to take some actions to preserve at least a version of it (including humans), even if starlifting the sun. Point 1 is the argument which I most see as supporting the thesis that misaligned AI would eliminate humanity and the biosphere. And then I'm not sure how robust it is (it seems premised partly on translating our evolved intuitions about discount rates over to imagining the scenario from the perspective of the AI system).
2owencb
Wait, how does the grabby aliens argument support this? I understand that it points to "the universe will be carved up between expansive spacefaring civilizations" (without reference to whether those are biological or not), and also to "the universe will cease to be a place where new biological civilizations can emerge" (without reference to what will happen to existing civilizations). But am I missing an inferential step?
jaan135

i would love to see competing RSPs (or, better yet, RTDPs, as @Joe_Collman pointed out in a cousin comment).

jaanΩ81812

Sure, but I guess I would say that we're back to nebulous territory then—how much longer than six months? When if ever does the pause end?

i agree that, if hashed out, the end criteria may very well resemble RSPs. still, i would strongly advocate for scaling moratorium until widely (internationally) acceptable RSPs are put in place.

I'd very surprised if there was substantial x-risk from the next model generation.

i share the intuition that the current and next LLM generations are unlikely an xrisk. however, i don't trust my (or anyone else's) intuitons stron... (read more)

jaanΩ184226

the FLI letter asked for “pause for at least 6 months the training of AI systems more powerful than GPT-4” and i’m very much willing to defend that!

my own worry with RSPs is that they bake in (and legitimise) the assumptions that a) near term (eval-less) scaling poses trivial xrisk, and b) there is a substantial period during which models trigger evals but are existentially safe. you must have thought about them, so i’m curious what you think.

that said, thank you for the post, it’s a very valuable discussion to have! upvoted.

4evhub
Sure, but I guess I would say that we're back to nebulous territory then—how much longer than six months? When if ever does the pause end? I agree that this is mostly baked in, but I think I'm pretty happy to accept it. I'd very surprised if there was substantial x-risk from the next model generation. But also I would argue that, if the next generation of models do pose an x-risk, we've mostly already lost—we just don't yet have anything close to the sort of regulatory regime we'd need to deal with that in place. So instead I would argue that we should be planning a bit further ahead than that, and trying to get something actually workable in place further out—which should also be easier to do because of the dynamic where organizations are more willing to sacrifice potential future value than current realized value. Yeah, I agree that this is tricky. Theoretically, since we can set the eval bar at any capability level, there should exist capability levels that you can eval for and that are safe but scaling beyond them is not. The problem, of course, is whether we can effectively identify the right capabilities levels to evaluate in advance. The fact that different capabilities are highly correlated with each other makes this easier in some ways—lots of different early warning signs will all be correlated—but harder in other ways—the dangerous capabilities will also be correlated, so they could all come at you at once. Probably the most important intervention here is to keep applying your evals while you're training your next model generation, so they trigger as soon as possible. As long as there's some continuity in capabilities, that should get you pretty far. Another thing you can do is put strict limits on how much labs are allowed to scale their next model generation relative to the models that have been definitively evaluated to be safe. And furthermore, my sense is that at least in the current scaling paradigm, the capabilities of the next model generation
jaan42

the werewolf vs villager strategy heuristic is brilliant. thank you!

2jimrandomh
Credit to Benquo's writing for giving me the idea.
jaan30

if i understand it correctly (i may not!), scott aaronson argues that hidden variable theories (such as bohmian / pilot wave) imply hypercomputation (which should count as an evidence against them): https://www.scottaaronson.com/papers/npcomplete.pdf

6Mitchell_Porter
If hypercomputation is defined as computing the uncomputable, then that's not his idea. It's just a quantum speedup better than the usual quantum speedup (defining a quantum complexity class DQP that is a little bigger than BQP). Also, Scott's Bohmian speedup requires access to what the hidden variables were doing at arbitrary times. But in Bohmian mechanics, measuring an observable perturbs complementary observables (i.e. observables that are in some kind of "uncertainty relation" to the first) in exactly the same way as in ordinary quantum mechanics.  There is a way (in both Bohmian mechanics and standard quantum mechanics) to get at this kind of trajectory information, without overly perturbing the system evolution - "weak measurements". But weak measurements only provide weak information about the measured observable - that's the price of not violating the uncertainty principle. A weak measuring device is correlated with the physical property it is measuring, but only weakly.  I mention this because someone ought to see how it affects Scott's Bohmian speedup, if you get the history information using weak measurements. (Also because weak measurements may have an obscure yet fundamental relationship to Bohmian mechanics.) Is the resulting complexity class DQP, BQP, P, something else? I do not know. 
jaan142

interesting, i have bewelltuned.com in my reading queue for a few years now -- i take your comment as an upvote!

myself i swear by FDT (somewhat abstract, sure, but seems to work well) and freestyle dancing (the opposite of abstract, but also seems to work well). also coding (eg, just spent several days using pandas to combine and clean up my philanthropy data) -- code grounds one in reality.

jaan113

having seen the “kitchen side” of the letter effort, i endorse almost all zvi’s points here. one thing i’d add is that one of my hopes urging the letter along was to create common knowledge that a lot of people (we’re going to get to 100k signatures it looks like) are afraid of the thing that comes after GPT4. like i am.

thanks, everyone, who signed.

EDIT: basically this: https://twitter.com/andreas212nyc/status/1641795173972672512

jaan249

while it’s easy to agree with some abstract version of “upgrade” (as in try to channel AI capability gains into our ability to align them), the main bottleneck to physical upgrading is the speed difference between silicon and wet carbon: https://www.lesswrong.com/posts/Ccsx339LE9Jhoii9K/slow-motion-videos-as-ai-risk-intuition-pumps

9Jed McCaleb
Yeah to be clear I don't think "upgrading" is easy. It might not even be possible in a way that makes it relevant. But I do think it offers some hope in an otherwise pretty bleak landscape.
jaan32

yup, i tried invoking church-turing once, too. worked about as well as you’d expect :)

jaan51

looks great, thanks for doing this!

one question i get every once in a while and wish i had a canonical answer to is (probably can be worded more pithily):

"humans have always thought their minds are equivalent to whatever's their latest technological achievement -- eg, see the steam engines. computers are just the latest fad that we currently compare our minds to, so it's silly to think they somehow pose a threat. move on, nothing to see here."

note that the canonical answer has to work for people whose ontology does not include the concepts of "computation"... (read more)

2Richard_Kennaway
Most of the threat comes from the space of possible super-capable minds that are not human. (This does not mean that human-like AIs would be less dangerous, only that they are a small part of the space of possibilities.)
2Ben Livengood
Agents are the real problem. Intelligent goal-directed adversarial behavior is something almost everyone understands whether it is other humans or ants or crop-destroying pests. We're close to being able to create new, faster, more intelligent agents out of computers.
2Lone Pine
I think the technical answer comes down to the Church-Turing thesis and the computability of the physical universe, but obviously that's not a great answer for the compscidegreeless among us.
jaan42

the potentially enormous speed difference (https://www.lesswrong.com/posts/Ccsx339LE9Jhoii9K/slow-motion-videos-as-ai-risk-intuition-pumps) will almost certainly be an effective communications barrier between humans and AI. there’s a wonderful scene of AIs vs humans negotiation in william hertling’s “A.I. apocalypse” that highlights this.

jaan223

i agree that there's the 3rd alternative future that the post does not consider (unless i missed it!):

3. markets remain in an inadequate equilibrium until the end of times, because those participants (like myself!) who consider short timelines remain in too small minority to "call the bluff".

see the big short for a dramatic depiction of such situation.

great post otherwise. upvoted.

 

4soth02
Coincidentally, that scene in The Big Short takes place on January 11 (2007) :D
jaan*10

yeah, this seems to be the crux: what will CEV prescribe for spending the altruistic (reciprocal cooperation) budget on. my intuition continues to insist that purchasing the original star systems from UFAIs is pretty high on the shopping list, but i can see arguments (including a few you gave above) against that.

oh, btw, one sad failure mode would be getting clipped by a proto-UFAI that’s too stupid to realise it’s in a multi-agent environment or something,

ETA: and, tbc, just like interstice points out below, my “us/me” label casts a wider net than “us in this particular everett branch where things look particularly bleak”.

jaan73

roger. i think (and my model of you agrees) that this discussion bottoms out in speculating what CEV (or equivalent) would prescribe.

my own intuition (as somewhat supported by the moral progress/moral circle expansion in our culture) is that it will have a nonzero component of “try to help out the fellow humans/biologicals/evolved minds/conscious minds/agents with diminishing utility function if not too expensive, and especially if they would do the same in your position”.

So8res107

tbc, i also suspect & hope that our moral circle will expand to include all fellow sentients. (but it doesn't follow from that that paying paperclippers to unkill their creators is a good use of limited resources. for instance, those are resources that could perhaps be more efficiently spent purchasing and instantiating the stored mindstates of killed aliens that the surviving-branch humans meet at the edge of their own expansion.)

but also, yeah, i agree it's all guesswork. we have friends out there in the multiverse who will be willing to give us some... (read more)

jaan266

yeah, as far as i can currently tell (and influence), we’re totally going to use a sizeable fraction of FAI-worlds to help out the less fortunate ones. or perhaps implement a more general strategy, like mutual insurance pact of evolved minds (MIPEM).

this, indeed, assumes that human CEV has diminishing returns to resources, but (unlike nate in the sibling comment!) i’d be shocked if that wasn’t true.

So8res115

one thing that makes this tricky is that, even if you think there's a 20% chance we make it, that's not the same as thinking that 20% of Everett branches starting in this position make it. my guess is that whether we win or lose from the current board position is grossly overdetermined, and what we're fighting for (and uncertain about) is which way it's overdetermined. (like how we probably have more than one in a billion odds that the light speed limit can be broken, but that doesn't mean that we think that one in every billion photons breaks the limit.) ... (read more)

jaan70

sure, this is always a consideration. i'd even claim that the "wait.. what about the negative side effects?" question is a potential expected value spoiler for pretty much all longtermist interventions (because they often aim for effects that are multiple causal steps down the road), and as such not really specific to software.

jaan80

great idea! since my metamed days i’ve been wishing there was a prediction market for personal medical outcomes — it feels like manifold mechanism might be a good fit for this (eg, at the extreme end, consider the “will this be my last market if i undertake the surgery X at Y?” question). should you decide to develop such aspect at one point, i’d be very interested in supporting/subsidising.

3Austin Chen
Yes, that's absolutely the kind of prediction market we'd love to enable at Manifold! I'd love to chat more about specifically the personal medical use case, and we'd already been considering applying to SFF -- let's get in touch (I'm akrolsmir@gmail.com).
jaan50

actually, the premise of david brin’s existence is a close match to moravec’s paragraph (not a coincidence, i bet, given that david hung around similar circles).

jaan250

confirmed. as far as i can tell (i’ve talked to him for about 2h in total) yi really seems to care, and i’m really impressed by his ability to influence such official documents.

jaan150

indeed, i even gave a talk almost a decade ago about the evolution:humans :: humans:AGI symmetry (see below)!

what confuses me though is that "is general reasoner" and "can support cultural evolution" properties seemed to emerge pretty much simultaneously in humans -- a coincidence that requires its own explanation (or dissolution). furthermore, eliezer seems to think that the former property is much more important / discontinuity causing than the latter. and, indeed, outsized progress being made by individual human reasoners (scientists/inventors/etc.) see... (read more)

1Gram Stone
If information is 'transmitted' by modified environments and conspecifics biasing individual search, marginal fitness returns on individual learning ability increase, while from the outside it looks just like 'cultural 'evolution.''
4Vanessa Kosoy
I think that these properties encourage each other's evolution. When you're a more general reasoner, you have a bigger hypothesis space, specifying a hypothesis requires more information, so you also benefit more from transmitting information. Conversely, once you can transmit information, general reasoning becomes much more useful since you effectively have access to much bigger datasets.
9Vaniver
David Deutsch (in The Beginning of Infinity) argues, as I recall, that they're basically the same faculty. In order to copy someone else / "carry on a tradition", you need to model what they're doing (so that you can copy it), and similarly for originators to tell whether students are correctly carrying on the tradition. The main thing that's interesting about his explanation is how he explains the development of general reasoning capacity, which we now think of as a tradition-breaking faculty, in the midst of tradition-promoting selection. If you buy that story, it ends up being another example of treacherous turn from human history (where individual thinkers, operating faster than cultural evolution, started pursuing their own values).
jaan230

amazing post! scaling up the community of independent alignment researchers sounds like one of the most robust ways to convert money into relevant insights.

jaan30

indeed they are now. retrocausality in action? :)

1AnthonyC
Obligatory: https://xkcd.com/2480/
jaan70

well, i've always considered human life extension as less important than "civilisation's life extension" (ie, xrisk reduction). still, they're both very important causes, and i'm happy to support both, especially given that they don't compete much for talent. as for the LRI specifically, i believe they simply haven't applied to more recent SFF grant rounds.