All of So8res's Comments + Replies

So8res20

I agree that in real life the entropy argument is an argument in favor of it being actually pretty hard to fool a superintelligence into thinking it might be early in Tegmark III when it's not (even if you yourself are a superintelligence, unless you're doing a huge amount of intercepting its internal sanity checks (which puts significant strain on the trade possibilities and which flirts with being a technical-threat)). And I agree that if you can't fool a superintelligence into thinking it might be early in Tegmark III when it's not, then the purchasing ... (read more)

So8res192

Dávid graciously proposed a bet, and while we were attempting to bang out details, he convinced me of two points:

The entropy of the simulators’ distribution need not be more than the entropy of the (square of the) wave function in any relevant sense. Despite the fact that subjective entropy may be huge, physical entropy is still low (because the simulations happen on a high-amplitude ridge of the wave function, after all). Furthermore, in the limit, simulators could probably just keep an eye out for local evolved life forms in their domain and wait until o... (read more)

4dxu
I think I might be missing something, because the argument you attribute to Dávid still looks wrong to me. You say: Doesn't this argument imply that the supermajority of simulations within the simulators' subjective distribution over universe histories are not instantiated anywhere within the quantum multiverse? I think it does. And, if you accept this, then (unless for some reason you think the simulators' choice of which histories to instantiate is biased towards histories that correspond to other "high-amplitude ridges" of the wave function, which makes no sense because any such bias should have already been encoded within the simulators' subjective distribution over universe histories) you should also expect, a priori, that the simulations instantiated by the simulators should not be indistinguishable from physical reality, because such simulations comprise a vanishingly small proportion of the simulators' subjective probability distribution over universe histories. What this in turn means, however, is that prior to observation, a Solomonoff inductor (SI) must spread out much of its own subjective probability mass across hypotheses that predict finding itself within a noticeably simulated environment. Those are among the possibilities it must take into account—meaning, if you stipulate that it doesn't find itself in an environment corresponding to any of those hypotheses, you've ruled out all of the "high-amplitude ridges" corresponding to instantiated simulations in the crossent of the simulators' subjective distribution and reality's distribution. We can make this very stark: suppose our SI finds itself in an environment which, according to its prior over the quantum multiverse, corresponds to one high-amplitude ridge of the physical wave function, and zero high-amplitude ridges containing simulators that happened to instantiate that exact environment (either because no branches of the quantum multiverse happened to give rise to simulators that would have

Thanks to Nate for conceding this point. 

I still think that other than just buying freedom to doomed aliens, we should run some non-evolved simulations of our own with inhabitants that are preferably p-zombies or animated by outside actors. If we can do this in the way that the AI doesn't notice it's in a simulation (I think this should be doable), this will provide evidence to the AI that civilizations do this simulation game (and not just the alien-buying) in general, and this buys us some safety in worlds where the AI eventually notices there are n... (read more)

So8res20

I'm happy to stake $100 that, conditional on us agreeing on three judges and banging out the terms, a majority will agree with me about the contents of the spoilered comment.

1David Matolcsi
Cool, I send you a private message.
So8res20

If the simulators have only one simulation to run, sure. The trouble is that the simulators have simulations they could run, and so the "other case" requires additional bits (where is the crossent between the simulators' distribution over UFAIs and physics' distribution over UFAIs).

If necessary, we can run let pgysical biological life emerge on the faraway planet and develop AI while we are observing them from space.

Consider the gas example again.

If you have gas that was compressed into the corner a long time ago and has long since expanded to f... (read more)

3David Matolcsi
We are still talking past each other, I think we should either bet or finish the discussion here and call it a day.
So8res*40

I basically endorse @dxu here.

Fleshing out the argument a bit more: the part where the AI looks around this universe and concludes it's almost certainly either in basement reality or in some simulation (rather than in the void between branches) is doing quite a lot of heavy lifting.

You might protest that neither we nor the AI have the power to verify that our branch actually has high amplitude inherited from some very low-entropy state such as the big bang, as a Solomonoff inductor would. What's the justification for inferring from the observation that we ... (read more)

3David Matolcsi
I really don't get what you are trying to say here, most of it feels like a non-sequitor to me. I feel hopeless that either of us manages to convince the other this way. All of this is not a super important topic, but I'm frustrated enogh to offer a bet of $100, that we select one or three judges we both trust (I have some proposed names, we can discuss in private messages), show them either this comment thread or a four paragraphs summary of our view, and they can decide who is right. (I still think I'm clearly right in this particular discussion.) Otherwise, I think it's better to finish this conversation here.
So8res30

seems to me to have all the components of a right answer! ...and some of a wrong answer. (we can safely assume that the future civ discards all the AIs that can tell they're simulated a priori; that's an easy tell.)

I'm heartened somewhat by your parenthetical pointing out that the AI's prior on simulation is low account of there being too many AIs for simulators to simulate, which I see as the crux of the matter.

4habryka
Yeah, that's fair. It seemed more relevant to this specific hypothetical. I wasn't really answering the question in its proper context and wasn't applying steelmans or adjustments based on the actual full context of the conversation (and wouldn't have written a comment without doing so, but was intrigued by your challenge).
So8res41

My answer is in spoilers, in case anyone else wants to answer and tell me (on their honor) that their answer is independent from mine, which will hopefully erode my belief that most folk outside MIRI have a really difficult time fielding wacky decision theory Qs correctly.

The sleight of hand is at the point where God tells both AIs that they're the only AIs (and insinuates that they have comparable degree).

Consider an AI that looks around and sees that it sure seems to be somewhere in Tegmark III. The hypothesis "I am in the basement of some branch that

... (read more)
[This comment is no longer endorsed by its author]Reply
1Joachim Bartosik
I'll try. TL;DR I expect the AI to not buy the message (unless it also thinks it's the one in the simulation; then it likely follows the instruction because duh). The glaring issue (to actually using the method) to me is that I don't see a way to deliver the message in a way that: * results in AI believing the message and * doesn't result in the AI believing there already is a powerful entity in their universe. If "god tells" the AI the message then there is a god in their universe. Maybe AI will decide to do what it's told. But I don't think we can have Hermes deliver the message to any AIs which consider killing us. If the AI reads the message in its training set or gets the message in similarly mundane way I expect it will mostly ignore it, there is a lot of nonsense out there. ---------------------------------------- I can imagine that for thought experiment you could send message that could be trusted from a place from which light barely manages to reach the AI but a slower than light expansion wouldn't (so message can be trusted but it mostly doesn't have to worry about the sender of the message directly interfering with its affairs). I guess AI wouldn't trust the message. It might be possible to convince it that there is a powerful entity (simulating it or half a universe away) sending the message. But then I think it's way more likely in a simulation (I mean that's an awful coincidence with the distance and also they're spending a lot more than 10 planets worth to send a message over that distance...).
3David Matolcsi
I think this is wrong. The AI has a similarly hard time to the simulators figuring out what's a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that. So it's probability that it's in a real Everett branch is not p, but p times 2^-N, as it has only a 2^-N prior probability that the kind of word it observes is the kind of thing that can come up in a real Everett branch. So it's balanced out with the simulation hypothesis, and as long as the simulators are spending more planets, that hypothesis wins.  I think it's also simple to see from this: Suppose evolved life aligns AI in 50% of Everett branches. Every time they do, they run 2 ancestor sims where the AI takes over. Suppose an AI comes online and the only things it knows about the world is that it's a young AI that was born on to a one-planet civilization, and it somehow also knows that evolved life wins 50% of the time and runs 2 sims (suppose this is somehow easy to estimate from first principles). Based on just these informations, what should the AI's probability be that it's in a simulation? I think pretty clearly 2/3. Actually the AI has a lot more information than that. It knows that the planet's gravity is 9.8, the evolved sspient species has two eyes, the AI's creator is called Sam Altman, etc. Some of these informations might actually be very implausible in base reality, but the AI doesn't know that, as it can't distinguish base reality from sims, so it incurs one bit of surprise for every new random fact, both in base reality and simulations. So overall it shouldn't update on all the random facts it observes, and should keep believing it has a 2/3 chance of being in a sim. 
4habryka
This was close the answer I was going to give. Or more concretely, I would have said (this was written after seeing your answer, but I think is reasonably close to what I would have said independently) 
So8res*122

The only thing we need there is that the AI can't distinguish sims from base reality, so it thinks it's more likely to be in a sim, as there are more sims.

I don't think this part does any work, as I touched on elsewhere. An AI that cares about the outer world doesn't care how many instances are in sims versus reality (and considers this fact to be under its control much moreso than yours, to boot). An AI that cares about instantiation-weighted experience considers your offer to be a technical-threat and ignores you. (Your reasons to make the offer would... (read more)

3David Matolcsi
I still don't get what you are trying to say. Suppose there is no multiverse. There are just two AIs, one in a simulation run by aliens in another galaxy, one is in base reality. They are both smart, but they are not copies of each other, one is a paperclip maximizer, the othe is a corkscrew maximizer, and there are various other differences in their code and life history. The world in the sim is also very different from the real world in various ways, but you still can't determine if you are in the sim while you are in it. Both AIs are told by God that they are the only two AIs in the Universe, and one is in a sim, and if the one in the sim gives up on one simulated planet, it gets 10 in the real world, while if the AI in base reality gives up on a planet, it just loses that one planet and nothing else happens. What will the AIs do? I expect that both of them will give up a planet.  For the aliens to "trade" with the AI in base reality, they didn't need to create an actual copy of the real AI and offer it what it wants. The AI they simulated was in many ways totally different from the original, the trade still went through. The only thing needed was that the AI in the sim can't figure it out that it's in a sim. So I don't understand why it is relevant that our superintelligent descendants won't be able to get the real distribution of AIs right, I think the trade still goes through even if they create totally different sims, as long as no one can tell where they are. And I think none of it is a threat, I try to deal with paperclip maximizers here and not instance-weighted experience maximizers, and I never threaten to destroy paperclips or corkscrews.
So8res42

One complication that I mentioned in another thread but not this one (IIRC) is the question of how much more entropy there is in a distant trade partner's model of Tegmark III (after spending whatever resources they allocate) than there is entropy in the actual (squared) wave function, or at least how much more entropy there is in the parts of the model that pertain to which civilizations fall.

In other words: how hard is it for distant trade partners to figure out that it was us who died, rather than some other plausible-looking human civilization that doe... (read more)

[This comment is no longer endorsed by its author]Reply
1David Matolcsi
I think I mostly understand the other parts of your arguments, but I still fail to understand this one. When I'm running the simulations, as originally described in the post, I think that should be in a fundamental sense equivalent to acausal trade. But how do you translate your objection to the original framework where we run the sims? The only thing we need there is that the AI can't distinguish sims from base reality, so it thinks it's more likely to be in a sim, as there are more sims.  Sure, if the AI can model the distribution of real Universes much better than we do, we are in trouble, because it can figure out if the world it sees falls into the real distribution or the mistaken distribution the humans are creating. But I see no reason why the unaligned AI, especially a young unaligned AI, could know the distribution of real Universes better than our superintelligent friends in the intergalactic future. So I don't really see how we can translate your objection to the simulation framework, and consequently I think it's wrong in the acausal trade framework too (as I think they are ewuivalent). I think I can try to write an explanation why this objection is wrong in the acausal trade framework, but it would be long and confusing to me too. So I'm more interested in how you translate your objection to the simulation framework.
So8res52

Starting from now? I agree that that's true in some worlds that I consider plausible, at least, and I agree that worlds whose survival-probabilities are sensitive to my choices are the ones that render my choices meaningful (regardless of how determinisic they are).

Conditional on Earth being utterly doomed, are we (today) fewer than 75 qbitflips from being in a good state? I'm not sure, it probably varies across the doomed worlds where I have decent amounts of subjective probability. It depends how much time we have on the clock, depends where the points o... (read more)

So8res20

What are you trying to argue? (I don't currently know what position y'all think I have or what position you're arguing for. Taking a shot in the dark: I agree that quantum bitflips have loads more influence on the outcome the earlier in time they are.)

3David Matolcsi
I argue that right now, sarting from the present state, the true quantum probability of achieving the Glorious Future is way higher than 2^-75, or if not, then we should probably work on something other than AI safety. Me and Ryan argue for this in the last few comments. It's not a terribly important point, you can just say the true quantum probability is 1 in a billion, when it's still worth it for you to work on the problem, but it becomes rough to trade for keeping humanity physically alive that can cause one year of delay to the AI.  But I would like you to acknowledge that "vastly below 2^-75 true quantum probability, as starting from now" is probably mistaken, or explain why our logic is wrong about how this implies you should work on malaria.
So8res3-3

You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives.

My first claim is not "fewer than 1 in 2^75 of the possible configurations of human populations navigate the problem successfully".

My first claim is more like "given a population of humans that doesn't even come close to navigating the problem successfully (given some unoptimized configuration of the background particles), probably you'd need to spend quite ... (read more)

3David Matolcsi
I understand what you are saying here, and I understood it before the comment thread started. The thing I would be interested in you responding to is my and Ryan's comments in this thread arguing that it's incompatible to believe that "My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us" and to believe that you should work on AI safety instead of malaria.
So8res40

the "you can't save us by flipping 75 bits" thing seems much more likely to me on a timescale of years than a timescale of decades; I'm fairly confident that quantum fluctuations can cause different people to be born, and so if you're looking 50 years back you can reroll the population dice.

This point feels like a technicality, but I want to debate it because I think a fair number of your other claims depend on it. 

You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives. This is important, because then we can't rely on other versions of ourselves "selfishly" entering an insurance contract with us, and we need to rely on the charity of Dath Ilan that branched off long ago. I agree that's a big diff... (read more)

So8res1910

Summarizing my stance into a top-level comment (after some discussion, mostly with Ryan):

  • None of the "bamboozling" stuff seems to me to work, and I didn't hear any defenses of it. (The simulation stuff doesn't work on AIs that care about the universe beyond their senses, and sane AIs that care about instance-weighted experiences see your plan as a technical-threat and ignore it. If you require a particular sort of silly AI for your scheme to work, then the part that does the work is the part where you get that precise sort of sillyness stably into an AI.
... (read more)
So8res20

I was responding to David saying

Otherwise, I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it's partially "our" decision doing the work of saving us.

and was insinuating that we deserve extremely little credit for such a choice, in the same way that a child deserves extremely little credit for a fireman saving someone that the child could not (even if it's true that the child and the fireman share some aspects of a decis... (read more)

2Mitchell_Porter
I think the common sense view is that this similarity of decision procedures provides exactly zero reason to credit the child with the fireman's decisions. Credit for a decision goes to the agent who makes it, or perhaps to the algorithm that the agent used, but not to other agents running the same or similar algorithms. 
So8res72

Attempting to summarize your argument as I currently understand it, perhaps something like:

Suppose humanity wants to be insured against death, and is willing to spend 1/million of its resources in worlds where it lives for 1/trillion of those resources in worlds where it would otherwise die.

It suffices, then, for humanity to be the sort of civilization that, if it matures, would comb through the multiverse looking for [other civilizations in this set], and find ones that died, and verify that they would have acted as follows if they'd survived, and then

... (read more)

Thanks for the cool discussion Ryan and Nate! This thread seemed pretty insightful to me. Here’s some thoughts / things I’d like to clarify (mostly responding to Nate's comments).[1]

Who’s doing this trade?

In places it sounds like Ryan and Nate are talking about predecessor civilisations like humanity agreeing to the mutual insurance scheme? But humans aren’t currently capable of making our decisions logically dependent on those of aliens, or capable of rescuing them. So to be precise the entity engaging in this scheme or other acausal interactions on our b... (read more)

5Buck
Thanks for the discussion Nate, I think this ended up being productive.
6ryan_greenblatt
Thanks, this seems like a reasonable summary of the proposal and a reasonable place to wrap. I agree that kindness is more likely to buy human survival than something better described as trade/insurance schemes, though I think the insurance schemes are reasonably likely to matter. (That is, reasonably likely to matter if the kindness funds aren't large enough to mostly saturate the returns of this scheme. As a wild guess, maybe 35% likely to matter on my views on doom and 20% on yours.)
So8res145

What does degree of determination have to do with it? If you lived in a fully deterministic universe, and you were uncertain whether it was going to live or die, would you give up on it on the mere grounds that the answer is deterministic (despite your own uncertainty about which answer is physically determined)?

4David Matolcsi
I still think I'm right about this. Your conception (that not a genetically less smart sibling was born), is determined by quantum fluctuations. So if you believe that quantum fluctuations over the last 50 years make at most 2^-75 difference in the probability of alignment, that's an upper bound on how much a difference your life's work can make. While if you dedicate your life to buying bednets, it's pretty easily calculatable how many happy life-years do you save. So I still think it's incompatible to believe that the true quantum probability is astronomically low, but you can make enough difference that working on AI safety is clearly better than bednets.
So8res*62

I think I'm confused why you work on AI safety then, if you believe the end-state is already 2^75 level overdetermined.

It's probably physically overdetermined one way or another, but we're not sure which way yet. We're still unsure about things like "how sensitive is the population to argument" and "how sensibly do government respond if the population shifts".

But this uncertainty -- about which way things are overdetermined by the laws of physics -- does not bear all that much relationship to the expected ratio of (squared) quantum amplitude between bra... (read more)

1RussellThor
Not quite following - your possibilities. 1. Alignment is almost impossible, then there is say 1e-20 chance we survive. Yes surviving worlds have luck and good alignment work etc. Perhaps you should work on alignment or still bednets if the odds really are that low. 2. Alignment is easy by default, but there is nothing like 0.999999 we survive, say 95% because AGI that is not TAI superintelligence could cause us to wipe ourselves out first, among other things. (This is a slow takeoff universe(s)) #2 has much more branches in total where we survive (not sure if that matters) and the difference between where things go well and badly is almost all about stopping ourself killing ourselves with non TAI related things. In this situation, shouldn't you be working on those things? If you average 1,2 then you still get a lot of work on non-alignment related stuff. I believe its somewhere closer to 50/50 and not so overdetermined one way or the other, but we are not considering that here.
5David Matolcsi
As I said, I understand the difference between epictemic uncertainty and true quantum probabilities, though I do think that the true quantum probability is not that astronomically low. More importantly, I still feel confused why you are working on AI safety if the outcome is that overdetermined one way or the other.
So8res*135

Background: I think there's a common local misconception of logical decision theory that it has something to do with making "commitments" including while you "lack knowledge". That's not my view.

I pay the driver in Parfit's hitchhiker not because I "committed to do so", but because when I'm standing at the ATM and imagine not paying, I imagine dying in the desert. Because that's what my counterfactuals say to imagine. To someone with a more broken method of evaluating counterfactuals, I might pseudo-justify my reasoning by saying "I am acting as you would ... (read more)

7ryan_greenblatt
I probably won't respond further than this. Some responses to your comment: ---------------------------------------- I agree with your statements about the nature of UDT/FDT. I often talk about "things you would have commited to" because it is simpler to reason about and easier for people to understand (and I care about third parties understanding this), but I agree this is not the true abstraction. ---------------------------------------- It seems like you're imagining that we have to bamboozle some civilizations which seem clearly more competent than humanity in your lights. I don't think this is true. Imagine we take all the civilizations which are roughly equally-competent-seeming-to-you and these civilizations make such an insurance deal[1]. My understanding is that your view is something like P(takeover) = 85%. So, let's say all of these civilizations are in a similar spot from your current epistemic perspective. While I expect that you think takeover is highly correlated between these worlds[2], my guess is that you should think it would be very unlikely that >99.9% of all of these civilizations get taken over. As in, even in the worst 10% of worlds where takeover happens in our world and the logical facts on alignment are quite bad, >0.1% of the corresponding civilizations are still in control of their universe. Do you disagree here? >0.1% of universes should be easily enough to bail out all the rest of the worlds[3]. And, if you really, really cared about not getting killed in base reality (including on reflection etc) you'd want to take a deal which is at least this good. There might be better approaches which reduce the correlation between worlds and thus make the fraction of available resources higher, but you'd like something at least this good. (To be clear, I don't think this means we'd be fine, there are many ways this can go wrong! And I think it would be crazy for humanity to . I just think this sort of thing has a good chance of succeeding.
So8res*42

"last minute" was intended to reference whatever timescale David would think was the relevant point of branch-off. (I don't know where he'd think it goes; there's a tradeoff where the later you push it the more that the people on the surviving branch care about you rather than about some other doomed population, and the earlier you push it the more that the people on the surviving branch have loads and loads of doomed populations to care after.)

I chose the phrase "last minute" because it is an idiom that is ambiguous over timescales (unlike, say, "last thr... (read more)

1David Matolcsi
Yeah, the misunderstanding came from that I thought that "last minute" literally means "last 60 seconds" and I didn't see how that's relevant. If if means "last 5 years" or something where it's still definitely our genetic copies running around, then I'm surprised you think alignment success or failure is that overdetermined at that time-scale. I understand your point that our epistemic uncertainty is not the same as our actual quantum probability, that is either very high or very low. But still, it's 2^75 overdetermined over a 5 year period? This sounds very surprising to me, the world feels more chaotic than that. (Taiwan gets nuked, chip development halts, meanwhile the Salvadorian president hears a good pitch about designer babies and legalizes running the experiments there and they work, etc, there are many things that contribute to alignment being solved or not, that don't directly run through underlying facts about computer science, and 2^-75 is a very low probability to none of the pathways to hit it).  But also, I think I'm confused why you work on AI safety then, if you believe the end-state is already 2^75 level overdetermined. Like maybe working on earning to give to bednets would be a better use of your time then. And if you say "yes, my causal impact is very low because the end result is already overdetermined, but my actions are logically correlated with the actions of people in other worlds who are in a similar epistemic situation to me, but whose actions actually matter because their world really is on the edge", then I don't understand why you argue in other comments that we can't enter into insurance contracts with those people, and our decision to pay AIs in the Future has as little correlation with their decision, as the child to the fireman.
So8res42

Do you buy that in this case, the aliens would like to make the deal and thus UDT from this epistemic perspective would pay out?

If they had literally no other options on offer, sure. But trouble arises when the competant ones can refine P(takeover) for the various planets by thinking a little further.

maybe your objection is that aliens would prefer to make the deal with beings more similar to them

It's more like: people don't enter into insurance pools against cancer with the dude who smoked his whole life and has a tumor the size of a grapefruit in ... (read more)

4ryan_greenblatt
Similar to how the trouble arises when you learn the result of the coin flip in a counterfactual mugging? To make it exactly analogous, imagine that the mugging is based on whether the 20th digit of pi is odd (omega didn't know the digit at the point of making the deal) and you could just go look it up. Isn't the situation exactly analogous and the whole problem that UDT was intended to solve? (For those who aren't familiar with counterfactual muggings, UDT/FDT pays in this case.) To spell out the argument, wouldn't everyone want to make a deal prior to thinking more? Like you don't know whether you are the competent one yet! Concretely, imagine that each planet could spend some time thinking and be guaranteed to determine whether their P(takeover) is 99.99999% or 0.0000001%. But, they haven't done this yet and their current view is 50%. Everyone would ex-ante prefer an outcome in which you make the deal rather than thinking about it and then deciding whether the deal is still in their interest. At a more basic level, let's assume your current views on the risk after thinking about it a bunch (80-90% I think). If someone had those views on the risk and cared a lot about not having physical humans die, they would benefit from such an insurance deal! (They'd have to pay higher rates than aliens in more competent civilizations of course.) Sure, but you'd potentially want to enter the pool at the age of 10 prior to starting smoking! To make the analogy closer to the actual case, suppose you were in a society where everyone is selfish, but every person has a 1/10 chance of becoming fabulously wealthy (e.g. owning a galaxy). And, if you commit as of the age of 10 to pay 1/1,000,000 of your resourses in the fabulously wealthy case, you can ensure that the version in the non-wealthy case gets very good health insurance. Many people would take such a deal and this deal would also be a slam dunk for the insurance pool! (So why doesn't this happen in human society? Well
So8res53

I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it's partially "our" decision doing the work of saving us.

Sure, like how when a child sees a fireman pull a woman out of a burning building and says "if I were that big and strong, I would also pull people out of burning buildings", in a sense it's partially the child's decsiion that does the work of saving the woman. (There's maybe a little overlap in how they run the same... (read more)

2Mitchell_Porter
The child is partly responsible - to a very small but nonzero degree - for the fireman's actions, because the child's personal decision procedure has some similarity to the fireman's decision procedure?  Is this a correct reading of what you said? 
So8res146

There's a question of how thick the Everett branches are, where someone is willing to pay for us. Towards one extreme, you have the literal people who literally died, before they have branched much; these branches need to happen close to the last minute. Towards the other extreme, you have all evolved life, some fraction of which you might imagine might care to pay for any other evolved species.

The problem with expecting folks at the first extreme to pay for you is that they're almost all dead (like dead). The problem with expecting folks at the ... (read more)

2avturchin
I think that there is a way to compensate for this effect. To illustrate compensation, consider the following experiment: Imagine that I want to resurrect a particular human by creating a quantum random file. This seems absurd as there is only 2−a lot chance that I create the right person. However, there are around d 2a lot copies of me in different branches who perform similar experiments, so in total, any resurrection attempt will create around 1 correct copy, but in a different branch. If we agree to trade resurrections between branches, every possible person will be resurrected in some branch. Here, it means that we can ignore worries that we create a model of the wrong AI or that AI creates a wrong model of us, because a wrong model of us will be a real model of someone else, and someone else's wrong model will be a correct model of us. Thus, we can ignore all branching counting at first approximation, and instead count only the probability that Aligned AI will be created. It is reasonable to estimate it as 10 percent, plus or minus an order of magnitude. In that case, we need to trade with non-aligned AI by giving 10 planets of paperclips for each planet with humans.
2ryan_greenblatt
By "last minute", you mean "after I existed" right? So, e.g., if I care about genetic copies, that would be after I am born and if I care about contingent life experiences, that could be after I turned 16 or something. This seems to leave many years, maybe over a decade for most people. I think David was confused by the "last minute language" which is really many years right? (I think you meant "last minute on evolutionary time scales, but not literally in the last few minutes".) That said, I'm generally super unconfident about how much a quantum bit changes things.
So8res62

Conditional on the civilization around us flubbing the alignment problem, I'm skeptical that humanity has anything like a 1% survival rate (across any branches since, say, 12 Kya). (Haven't thought about it a ton, but doom looks pretty overdetermined to me, in a way that's intertwined with how recorded history has played otu.)

My guess is that the doomed/poor branches of humanity vastly outweigh the rich branches, such that the rich branches of humanity lack the resources to pay for everyone. (My rough mental estimate for this is something like: you've prob... (read more)

6ryan_greenblatt
Partial delta from me. I think the argument for directly paying for yourself (or your same species, or at least more similar civilizations) is indeed more clear and I think I was confused when I wrote that. (In that I was mostly thinking about the argument for paying for the same civilization but applying it more broadly.) But, I think there is a version of the argument which probably does go through depending on how you set up UDT/FDT. Imagine that you do UDT starting from your views prior to learning about x-risk, AI risk, etc and you care a lot about not dying. At that point, you were uncertain about how competent your civilization would be and you don't want your civilization to die. (I'm supposing that our version of UDT/FDT isn't logically omniscient relative to our observations which seems reasonable.) So, you'd like to enter into an insurance agreement with all the aliens in a similar epistemic state and position. So, you all agree to put at least 1/1000 of your resources on bailing out the aliens in a similar epistemic state who would have actually gone through with the agreement. Then, some of the aliens ended up being competent (sadly you were not) and thus they bail you out. I expect this isn't the optimal version of this scheme and you might be able to make a similar insurance deal with people who aren't in the same epistemic state. (Though it's easier to reason about the identical case.) And I'm not sure exactly how this all goes through. And I'm not actually advocating for people doing this scheme, IDK if it is worth the resources. Even with your current epistemic state on x-risk (e.g. 80-90% doom) if you cared a lot about not dying you might want to make such a deal even though you have to pay out more in the case where you surprisingly win. Thus, from this vantage point UDT would follow through with a deal. ---------------------------------------- Here is a simplified version where everything is as concrete as possible: Suppose that there are
So8res3521

Taking a second stab at naming the top reasons I expect this to fail (after Ryan pointed out that my first stab was based on a failure of reading comprehension on my part, thanks Ryan):

This proposal seems to me to have the form "the fragments of humanity that survive offer to spend a (larger) fraction of their universe on the AI's goals so long as the AI spends a (smaller) fraction of its universe on their goals, with the ratio in accordance to the degree of magical-reality-fluid-or-whatever that reality allots to each".

(Note that I think this is not at al... (read more)

4ryan_greenblatt
Nate and I discuss this question in this other thread for reference.
4David Matolcsi
I think I still don't understand what 2^-75 means. Is this the probability that in the literal last minute when we press the button, we get an aligned AI? I agree that things are grossly overdetermined by then, but why does the last minute mattter? I'm probably misunderstanding, but it looks like you are saying that the Everett branches are only "us" if they branched of in the literal last minute, otherwise you talk about them as if they were "other humans". But among the branches starting now, there will be a person carrying my memories and ID card in most of them two years from now, and by most definitions of "me", that person will be "me", and will be motivated to save the other "me"s. And sure, they have loads of failed Everett branches to save, but they also have loads of Everett branches themselves, the only thing that matters is the ratio of saved worlds to failed worlds that contain roughly the "same" people as us. So I still don't know what 2^-75 is supposed to be. Otherwise, I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it's partially "our" decision doing the work of saving us. And as I said in some other comments here, I agree that running lots of sims is an unnecessary complication in case of UDT expected utility maximizer AIs, but I put a decent chance on the first AIs not being like that, in which case actually running the sims can be important.
So8res62

I agree that in scenarios where humanity survives in 1/X portion of reality and offers to put 1/Y portion of their universe towards the AI's goals, then so long as 1/(X * Y) of a universe is more than it'd cost the AI to preserve the humans, the humans are at least making an offer that is worth more than nothing.

(There is then a question of how many better offers get made. My top guess is that X and Y are such that there's not much effect here, and that the question of what happens to saved copies of human minds is more about whether aliens buy copies than about which dead or distant trade-partners make offers that the AI considers juciest.)

2ryan_greenblatt
Let's conservatively say that evolved life gets around 1% of the multiverse/measure and that evolved life is willing to pay 1/million of its resources in expectation to save aliens from being killed (either "selfishly" to save their own civilization via UDT/FDT supposing that AIs are good enough predictors at the relevant points or out of a common sense altruistic case). This would be 1/100 million which gets you a lot. There could be other aliens who are willing to pay a huge fraction of their resources to perform rituals on the original civilization or whatever and thus these other aliens win out in the bargaining, but I'm skeptical. Also, at least in the upload case, it's not clear that this is rival good as uploads can be copied for free. Of course, people might have a preference that their upload isn't used for crazy alien rituals or whatever. (A bunch of the cost is in saving the human in the first place. Paying for uploads to eventually get run in a reasonable way should be insanely cheap, like <<10^-25 of the overall universe or something.)
So8res*10

Here's how I think this choice looks from the perspective of a correctly-reasoning AI (having skimmed only your proposal for humanity in the future):

(EDIT: as Ryan points out, the deal he's proposing is one where the AI is allowed to grab the universe in front of it so long as it gives humans one planet's worth of resources, which does change the calculus and makes the deal worthwhile. I had not read the proposal carefully enough to notice this, oops, sorry. There critique below is invalid if humanity asks only for one planet. Note however that this is sti... (read more)

Note that we don't want the AI to hand over the reins to humans, just to avoid killing humans when possible/cheap and we're willing to pay quite a bit for this (in the proposal, IDK if I personally think we should pay). So, it should look like a good offer for AIs who care about Tegmark IV (with a measure etc).

So, if humans execute this scheme, the AI's options should look something like:

  • Don't do anything about what humans want or keeping them alive, get the entire universe.
  • Keep humans alive and happy at a total cost of <1/100 million of the universe
... (read more)
So8res7133

This is an excerpt from a comment I wrote on the EA forum, extracted and crossposted here by request:


There's a phenomenon where a gambler places their money on 32, and then the roulette wheel comes up 23, and they say "I'm such a fool; I should have bet 23".

More useful would be to say "I'm such a fool; I should have noticed that the EV of this gamble is negative." Now at least you aren't asking for magic lottery powers.

Even more useful would be to say "I'm such a fool; I had three chances to notice that this bet was bad: when my partner was trying to ex... (read more)

5Carl Feynman
When I see or hear a piece of advice, I check to see what happens if the advice were the reverse.  Often it's also good advice, which means all we can do is take the advice into account as we try to live a balanced life.  For example, if the advice is "be brave!" the reverse is "be more careful".  Which is good advice, too.   This advice is unusual in that it is non-reversible.  
0Dagon
I'm not EA (though I do agree with most of the motte - I care about other humans, and I try to be effective), and not part of the rationalist "community", so take this as an outside view. There's a ton of "standard human social drama" in EA and in rationalist communities, and really anywhere where "work" and "regular life" overlap significantly.  Some of this takes the form of noticing flaws in other people's rationality (or, just as often, flaws in kindness/empathy being justified by rationality).   Especially when one doesn't want to identify and address specific examples, I think there's a very high risk of misidentifying the cause of a disagreement or disrespect-of-behavior.  In this case, I don't notice much of the flagellation or wishing - either I don't hang out in the right places, or I bounce off those posts and don't pay much mind.  But things that might fit that pattern strike me as a failure of personal responsibility, not a failure of modeling wishes.  Your term self-flagellation is interesting from that standpoint - the historic practice was for penance of generalized sin, and to share suffering, not as a direct correction for anything.  It's clearly social, not rational.   IMO, rationalism must first and foremost be individual.  I am trying to be less wrong in my private beliefs and in my goal-directed behaviors.  Group rationality is a category error - I don't have access to group beliefs (if there is such a thing).  I do have some influence over group behaviors and shared statements, but I recognize that they are ALWAYS a compromise and negotiated results of individual beliefs and behaviors, and don't necessarily match any individual in the group. I'm surprised every time I see a rationalist assuming otherwise, and being disappointed that other members of the group doesn't share all their beliefs and motivations.
4Elizabeth
I've referred to "I should have bet on 23-type errors" several times over the past year. Having this shorthand and an explanation I can link to has sped up those conversations.
So8res*4820

my original 100:1 was a typo, where i meant 2^-100:1.

this number was in reference to ronny's 2^-10000:1.

when ronny said:

I’m like look, I used to think the chances of alignment by default were like 2^-10000:1

i interpreted him to mean "i expect it takes 10k bits of description to nail down human values, and so if one is literally randomly sampling programs, they should naively expect 1:2^10000 odds against alignment".

i personally think this is wrong, for reasons brought up later in the convo--namely, the relevant question is not how many bits is takes to... (read more)

So8res2314

Agreed that the proposal is underspecified; my point here is not "look at this great proposal" but rather "from a theoretical angle, risking others' stuff without the ability to pay to cover those risks is an indirect form of probabilistic theft (that market-supporting coordination mechanisms must address)" plus "in cases where the people all die when the risk is realized, the 'premiums' need to be paid out to individuals in advance (rather than paid out to actuaries who pay out a large sum in the event of risk realization)". Which together yield the downs... (read more)

So8res6248

In relation to my current stance on AI, I was talking with someone who said they’re worried about people putting the wrong incentives on labs. At various points in that convo I said stuff like (quotes are not exact; third paragraph is a present summary rather than a re-articulation of a past utterance):

“Sure, every lab currently seems recklessly negligent to me, but saying stuff like “we won’t build the bioweapon factory until we think we can prevent it from being stolen by non-state actors” is directionally better than not having any commitments about any... (read more)

aysja248

It seems like this is only directionally better if it’s true, and this is still an open question for me. Like, I buy that some of the commitments around securing weights are true, and that seems good. I’m way less sure that companies will in fact pause development pending their assessment of evaluations. And to the extent that they are not, in a meaningful sense, planning to pause, this seems quite bad. It seems potentially worse, to me, to have a structure legitimizing this decision and making it seem more responsible than it is, rather than just openly d... (read more)

So8res*Ω153015

If you allow indirection and don't worry about it being in the right format for superintelligent optimization, then sufficiently-careful humans can do it.

Answering your request for prediction, given that it seems like that request is still live: a thing I don't expect the upcoming multimodal models to be able to do: train them only on data up through 1990 (or otherwise excise all training data from our broadly-generalized community), ask them what superintelligent machines (in the sense of IJ Good) should do, and have them come up with something like CEV (... (read more)

So8resΩ220

I claim that to the extent ordinary humans can do this, GPT-4 can nearly do this as well

(Insofar as this was supposed to name a disagreement, I do not think it is a disagreement, and don't understand the relevance of this claim to my argument.)

Presumably you think that ordinary human beings are capable of "singling out concepts that are robustly worth optimizing for".

Nope! At least, not directly, and not in the right format for hooking up to a superintelligent optimization process.

(This seems to me like plausibly one of the sources of misunderstandi... (read more)

6Matthew Barnett
If ordinary humans can't single out concepts that are robustly worth optimizing for, then either, 1. Human beings in general cannot single out what is robustly worth optimizing for 2. Only extraordinary humans can single out what is robustly worth optimizing for Can you be more clear about which of these you believe? I'm also including "indirect" ways that humans can single out concepts that are robustly worth optimizing for. But then I'm allowing that GPT-N can do that too. Maybe this is where the confusion lies? If you're allowing for humans to act in groups and come up with these concepts after e.g. deliberation, and still think that ordinary humans can't single out concepts that are robustly worth optimizing for, then I think this view is a little silly, although the second interpretation at least allows for the possibility that the future goes well and we survive AGI, and that would be nice to know.
So8res50

(I had used that pump that very day, shortly before, to pump up the replacement tire.)

So8res180

Separately, a friend pointed out that an important part of apologies is the doer showing they understand the damage done, and the person hurt feeling heard, which I don't think I've done much of above. An attempt:

I hear you as saying that you felt a strong sense of disapproval from me; that I was unpredictable in my frustration as kept you feeling (perhaps) regularly on-edge and stressed; that you felt I lacked interest in your efforts or attention for you; and perhaps that this was particularly disorienting given the impression you had of me both from my ... (read more)

So8res*251

I did not intend it as a one-time experiment.

In the above, I did not intend "here's a next thing to try!" to be read like "here's my next one-time experiment!", but rather like "here's a thing to add to my list of plausible ways to avoid this error-mode in the future, as is a virtuous thing to attempt!" (by contrast with "I hereby adopt this as a solemn responsibility", as I hypothesize you interpreted me instead).

Dumping recollections, on the model that you want more data here:

I intended it as a general thing to try going forward, in a "seems like a sensi... (read more)

2TurnTrout
I appreciate the detail, thanks. In particular, I had wrongly assumed that the handbook had been written much earlier, such that even Vivek could have been shown it before deciding to work with you. This also makes more sense of your comments that "writing the handbook" was indicative of effort on your part, since our July interaction. Overall, I retain my very serious concerns, which I will clarify in another comment, but am more in agreement with claims like "Nate has put in effort of some kind since the July chat."  Noting that at least one of them read the handbook because I warned them and told them to go ask around about interacting with you, to make sure they knew what they were getting into.
So8res*120

Thanks <3

(To be clear: I think that at least one other of my past long-term/serious romantic partners would say "of all romantic conflicts, I felt shittiest during ours". The thing that I don't recall other long-term/serious romantic partners reporting is the sense of inability to trust their own mind or self during disputes. (It's plausible to me that some have felt it and not told me.))

sim350

Chiming in to provide additional datapoints. (Apologies for this being quite late to the conversation; I frequent The Other Forum regularly, and LW much less so, and only recently read this post/comments.) My experience has been quite different to a lot of the experiences described here, and I was very surprised when reading. 

I read all of the people who have had (very) negative experiences as being sincere and reporting events and emotions as they experienced them. I could feel what I perceived to be real distress and pain in a lot of the comments, a... (read more)

So8res8-1

Insofar as you're querying the near future: I'm not currently attempting work collaborations with any new folk, and so the matter is somewhat up in the air. (I recently asked Malo to consider a MIRI-policy of ensuring all new employees who might interact with me get some sort of list of warnings / disclaimers / affordances / notes.)

Insofar as you're querying the recent past: There aren't many recent cases to draw from. This comment has some words about how things went with Vivek's hires. The other recent hires that I recall both (a) weren't hired to do res... (read more)

Elizabeth5230

One frame I want to lay out is that it seems like you're not accounting for the organizational cost of how you treat employees/collaborators. An executive director needing to mostly not talk to people, and shaping hiring around social pain tolerance, is a five alarm fire for organizations as small as MIRI. Based on the info here, my first thought is you should be in a different role, so that you have fewer interactions and less implied power.  That requires someone to replace you as ED, and I don't know if there are any options available,  but at... (read more)

So8res40

Do I have your permission to quote the relevant portion of your email to me?

Yep! I've also just reproduced it here, for convenience:

(One obvious takeaway here is that I should give my list of warnings-about-working-with-me to anyone who asks to discuss their alignment ideas with me, rather than just researchers I'm starting a collaboration with. Obvious in hindsight; sorry for not doing that in your case.)

So8res82

I warned the immediately-next person.

It sounds to me like you parsed my statement "One obvious takeaway here is that I should give my list of warnings-about-working-with-me to anyone who asks to discuss their alignment ideas with me, rather than just researchers I'm starting a collaboration with." as me saying something like "I hereby adopt the solemn responsibility of warning people in advance, in all cases", whereas I was interpreting it as more like "here's a next thing to try!".

I agree it would have been better of me to give direct bulldozing-warnings explicitly to Vivek's hires.

Here is the statement:

(One obvious takeaway here is that I should give my list of warnings-about-working-with-me to anyone who asks to discuss their alignment ideas with me, rather than just researchers I'm starting a collaboration with. Obvious in hindsight; sorry for not doing that in your case.)

I agree that this statement does not explicitly say whether you would make this a one-time change or a permanent one. However, the tone and phrasing—"Obvious in hindsight; sorry for not doing that in your case"—suggested that you had learned from the experience a... (read more)

So8res*2212

On the facts: I'm pretty sure I took Vivek aside and gave a big list of reasons why I thought working with me might suck, and listed that there are cases where I get real frustrated as one of them. (Not sure whether you count him as "recent".)

My recollection is that he probed a little and was like "I'm not too worried about that" and didn't probe further. My recollection is also that he was correct in this; the issues I had working with Vivek's team were not based in the same failure mode I had with you; I don't recall instances of me getting frustrated an... (read more)

TurnTrout1010

I think I'd also be more compelled by this argument if I was more sold on warnings being the sort of thing that works in practice.

Like... (to take a recent example) if I'm walking by a whiteboard in rosegarden inn, and two people are like "hey Nate can you weigh in on this object-level question", I don't... really believe that saying "first, be warned that talking techincal things with me can leave you exposed to unshielded negative-valence emotions (frustration, despair, ...), which some people find pretty crappy; do you still want me to weigh in?" actual

... (read more)

I've been asked to clarify a point of fact, so I'll do so here:

My recollection is that he probed a little and was like "I'm not too worried about that" and didn't probe further.

This does ring a bell, and my brain is weakly telling me it did happen on a walk with Nate, but it's so fuzzy that I can't tell if it's a real memory or not.  A confounder here is that I've probably also had the conversational route "MIRI burnout is a thing, yikes" -> "I'm not too worried, I'm a robust and upbeat person" multiple times with people other than Nate.

In private ... (read more)

2TurnTrout
You told me you would warn people, and then did not.[1]  1. ^ Do I have your permission to quote the relevant portion of your email to me?
So8res3525

In particular, you sound [...] extremely unwilling to entertain the idea that you were wrong, or that any potential improvement might need to come from you.

you don't seem to consider the idea that maybe you were more in a position to improve than he was.

Perhaps you're trying to point at something that I'm missing, but from my point of view, sentences like "I'd love to say "and I've identified the source of the problem and successfully addressed it", but I don't think I have" and "would I have been living up to my conversational ideals (significantly)... (read more)

7Vaniver
So I've been thinking about this particular branch for a while and I think I have a slightly different diagnosis from PoignardAzur, which I think nearly lines up with yours but has an important difference. I think this is the important part: Even if you are not tracking who is Wrong is any particular interaction, if other people are tracking who is Wrong, that seems like an important thing for you to handle because it will be a large part of how they interpret communication from you. (For the bike pump example, the thing where you saw Kurt as "begging pardon" seems like evidence this was plausibly up for Kurt / you could have guessed this was up for Kurt in the moment.) One way to interpret the situation is: Kurt: I am Wrong but would like to displace that to the bike pump Nate: Rejected! >:[ Kurt: :(   I am imagining that you were not asking for this sort of situation (and would have been less interested in a "save your time" deal if "do emotional labor for people helping you" had explicitly been part of the deal), but my guess is attention to this sort of thing is the next place to look for attacking the source of the problem. [Also, I'm not trying to confidently assert this is what was actually up for Kurt in the moment--instead I'm asking "if this story made me side with Kurt, why did that happen?"]
3PoignardAzur
I don't know if "dense" is the word I use, but yeah, I think you missed my point. My ELI5 would be "You're still assuming the problem was 'Kurt didn't know how to use a pump' and not 'there was something wrong with your pump'". I don't want to speculate too much beyond that eg about the discretionary budget stuff. Happy to hear that!
So8res*7613
  1. Thanks for saying so!

  2. My intent was not to make you feel bad. I apologize for that, and am saddened by it.

    (I'd love to say "and I've identified the source of the problem and successfully addressed it", but I don't think I have! I do think I've gotten a little better at avoiding this sort of thing with time and practice. I've also cut down significantly on the number of reports that I have.)

  3. For whatever it's worth: I don't recall wanting you to quit (as opposed to improve). I don't recall feeling ill will towards you personally. I do not now think po

... (read more)
mingyuan3616

I do have some general sense here that those aren't emotionally realistic options for people with my emotional makeup.

Here's my take: From the inside, Nate feels like he is incapable of not becoming very frustrated, even angry. In a sense this is true. But this state of affairs is in fact a consequence of Nate not being subject to the same rules as everybody else.

I think I know what it's like, to an extent — I've had anger issues since I was born, and despite speaking openly about it to many people, I've never met anyone who's been able to really understan... (read more)

KurtB*8728

I have some replies to Nate's reply. 

Overview:

  • I’m not asking anyone to modify their personality at all. I mainly wish I had been warned about what was in store for me when I joined MIRI, and I want such warnings to be a visible part of Nate's reputation.
  • I feel some pressure to match Nate's conciliatory tone, but something feels incongruous about it. I'm concerned that people will read Nate's calm, kindly replies and come away with the wrong idea of how he presented himself at MIRI.
  • I find Nate’s additional context to be, well…missing some important con
... (read more)

Perhaps I'm missing some obvious third alternative here, that can be practically run while experiencing a bunch of frustration or exasperation. (If you know of one, I'd love to hear it.)

One alternative could be to regulate your emotions so you don't feel as intense frustration from a given epistemic position? I think this is what most people do.

[anonymous]124

I suspect that lines like this are giving people the impression that you [Nate] don't think there are (realistic) things that you can improve, or that you've "given up". 

I do have some general sense here that those aren't emotionally realistic options for people with my emotional makeup.

I have a sense that there's some sort of trap for people with my emotional makeup here. If you stay and try to express yourself despite experiencing strong feelings of frustration, you're "almost yelling". If you leave because you're feeling a bunch of frustration and

... (read more)
8Elizabeth
How do you/MIRI communicate about working with you now?

I think it's cool that you're engaging with criticism and acknowledging the harm that happened as a result of your struggles.

And, to cut to the painful part, that's about the only positive thing that I (random person on the internet) have to say about what you just wrote.

In particular, you sound (and sorry if I'm making any wrong assumption here) extremely unwilling to entertain the idea that you were wrong, or that any potential improvement might need to come from you.

You say:

For whatever it's worth: I don't recall wanting you to quit (as opposed to imp

... (read more)
So8resΩ10189

That helps somewhat, thanks! (And sorry for making you repeat yourself before discarding the erroneous probability-mass.)

I still feel like I can only barely maybe half-see what you're saying, and only have a tenuous grasp on it.

Like: why is it supposed to matter that GPT can solve ethical quandries on-par with its ability to perform other tasks? I can still only half-see an answer that doesn't route through the (apparently-disbelieved-by-both-of-us) claim that I used to argue that getting the AI to understand ethics was a hard bit, by staring at sentences ... (read more)

6Matthew Barnett
Thanks for trying to understand my position. I think this interpretation that you gave is closest to what I'm arguing, I have a quick response to what I see as your primary objection: I think this is kinda downplaying what GPT-4 is good at? If you talk to GPT-4 at length, I think you'll find that it's cognizant of many nuances in human morality that go way deeper than the moral question of whether to "call 911 when Alice is in labor and your car has a flat". Presumably you think that ordinary human beings are capable of "singling out concepts that are robustly worth optimizing for". I claim that to the extent ordinary humans can do this, GPT-4 can nearly do this as well, and to the extent it can't, I expect almost all the bugs to be ironed out in near-term multimodal models.  It would be nice if you made a precise prediction about what type of moral reflection or value specification multimodal models won't be capable of performing in the near future, if you think that they are not capable of the 'deep' value specification that you care about. And here, again, I'm looking for some prediction of the form: humans are able to do X, but LLMs/multimodal models won't be able to do X by, say, 2028. Admittedly, making this prediction precise is probably hard, but it's difficult for me to interpret your disagreement without a little more insight into what you're predicting.
So8res*Ω244213

I have the sense that you've misunderstood my past arguments. I don't quite feel like I can rapidly precisely pinpoint the issue, but some scattered relevant tidbits follow:

  • I didn't pick the name "value learning", and probably wouldn't have picked it for that problem if others weren't already using it. (Perhaps I tried to apply it to a different problem than Bostrom-or-whoever intended it for, thereby doing some injury to the term and to my argument?)

  • Glancing back at my "Value Learning" paper, the abstract includes "Even a machine intelligent enough

... (read more)
1Martin Randall
This makes sense. Barnett is talking about an update between 2007 and 2023. GPT-3 was 2020. So by 2021/2022 you had finished making the update and were not surprised further by GPT-4.

Glancing back at my "Value Learning" paper, the abstract includes "Even a machine intelligent enough to understand its designers’ intentions would not necessarily act as intended", which supports my recollection that I was never trying to use "Value Learning" for "getting the AI to understand human values is hard" as opposed to "getting the AI to act towards value in particular (as opposed to something else) is hard", as supports my sense that this isn't hindsight bias, and is in fact a misunderstanding.

For what it's worth, I didn't claim that you argue... (read more)

So8res2111

In academia, for instance, I think there are plenty of conversations in which two researchers (a) disagree a ton, (b) think the other person's work is hopeless or confused in deep ways, (c) honestly express the nature of their disagreement, but (d) do so in a way where people generally feel respected/valued when talking to them.

My model says that this requires them to still be hopeful about local communication progress, and happens when they disagree but already share a lot of frames and concepts and background knowledge. I, at least, find it much harde... (read more)

2Viliam
It seems to me that in theory it should be possible to have very unusual norms and make it work, but that in practice you and your organization horribly underestimate how difficult it is to communicate such things clearly (more than once, because people forget or don't realize the full implications at the first time). You assume that the local norms were made perfectly clear, but they were not (expecting short inferential distances, double illusion of transparency, etc.). Did you expect KurtB to have this kind of reaction, to post this kind of comment, and to get upvoted? If the answer is no, it means your model is wrong somewhere. (If the answer is yes, maybe you should print that comment, and give a copy to all new employees. That might dramatically reduce a possibility of misunderstanding.)
2TurnTrout
My original comment is not talking about communication norms. It's talking about "social norms" and "communication protocols" within those norms. I mentioned "basic respectfulness and professionalism." 
So8res3123

(I am pretty uncomfortable with all the "Nate / Eliezer" going on here. Let's at least let people's misunderstandings of me be limited to me personally, and not bleed over into Eliezer!)

(In terms of the allegedly-extraordinary belief, I recommend keeping in mind jimrandomh's note on Fork Hazards. I have probability mass on the hypothesis that I have ideas that could speed up capabilities if I put my mind to it, as is a very different state of affairs from being confident that any of my ideas works. Most ideas don't work!)

(Separately, the infosharing agreem... (read more)

2Raemon
That's useful additional information, thanks. I made a slight edit to my previous comment to make my epistemic state more clear.  Fwiw, I feel like I have a pretty crisp sense of "Nate and Eliezers communication styles are actually pretty different" (I noticed myself writing out a similar comment about communication styles under the Turntrout thread that initially said "Nate and Eliezer" a lot, and then decided that comment didn't make sense to publish as-is), but I don't actually have much of a sense of the difference between Nate, Eliezer, and MIRI-as-a-whole with regards to "the mindset" and "confidentiality norms".
So8res352

I hereby push back against the (implicit) narrative that I find the standard community norms costly, or that my communication protocols are "alternative".

My model is closer to: the world is a big place these days, different people run on different conversation norms. The conversation difficulties look, to me, symmetric, with each party violating norms that the other considers basic, and failing to demonstrate virtues that the other considers table-stakes.

(To be clear, I consider myself to bear an asymmetric burden of responsibility for the conversatiosn go... (read more)

TurnTrout*3323

I sure don't buy a narrative that I'm in violation of the local norms.

This is preposterous.

I'm not going to discuss specific norms. Discussing norms with Nate leads to an explosion of conversational complexity.[1] In my opinion, such discussion can sound really nice and reasonable, until you remember that you just wanted him to e.g. not insult your reasoning skills and instead engage with your object-level claims... but somehow your simple request turns into a complicated and painful negotiation. You never thought you'd have to explain "being nice."

Th... (read more)

TurnTrout*213

I'm putting in rather a lot of work (with things like my communication handbook) to making my own norms clearer, and I follow what I think are good meta-norms of being very open to trying other people's alternative conversational formats.

Nate, I am skeptical.

As best I can fathom, you put in very little work to proactively warn new hires about the emotional damage which your employees often experience. I've talked to a range of people who have had professional interactions with you, both recently and further back. Only one of the recent cases reported that ... (read more)

[anonymous]2516

Huh, I initially found myself surprised that Nate thinks he's adhering to community norms. I wonder if part of what's going on here is that "community norms" is a pretty vague phrase that people can interpret differently. 

Epistemic status: Speculative. I haven't had many interactions with Nate, so I'm mostly going off of what I've heard from others + general vibes. 

Some specific norms that I imagine Nate is adhering to (or exceeding expectations in):

  • Honesty
  • Meta-honesty
  • Trying to offer concrete models and predictions
  • Being (internally) open to ackno
... (read more)
So8res*242

Less "hm they're Vivek's friends", more "they are expressly Vivek's employees". The working relationship that I attempted to set up was one where I worked directly with Vivek, and gave Vivek budget to hire other people to work with him.

If memory serves, I did go on a long walk with Vivek where I attempted to enumerate the ways that working with me might suck. As for the others, some relevant recollections:

  • I was originally not planning to have a working relationship with Vivek's hires. (If memory serves, there were a few early hires that I didn't have any
... (read more)
Load More