We want to minimize the amount of the universe eventually controlled by unaligned ASIs because their values tend to be absurd and their very existence is abhorrent to us.
No. We want to optimize the universe in accordance with our values. That's not at all the same thing as minimizing the existence of agents with absurd-to-us values. Life is not a zero-sum game: if we think of a plan that increases the probability of Friendly AI and the probability of unaligned AI (at the expense of the probability of "mundane" human extinction via nuclear war or civilizational collapse), that would be good for both us and unaligned AIs.
Thus, if you're going to be thinking about galaxy-brained acausal trade schemes at all—even though, to be clear, this stuff probably doesn't work because we don't know how model distant minds well enough to form agreements with them—there's no reason to prefer other biological civilizations over unaligned AIs as trade partners. (This is distinct from us likely having more values in common with biological aliens; all that gets factored away into the utility function.)
the creation of huge amounts of the other entity's disvalue
We do not want to live in a universe where agents deliberately spend resources to create disvalue for each other! (As contrasted to "merely" eating each other or competing for resources.) This is the worst thing you could possibly do.
Apparently this was a really horrible idea! I'm glad to have found out now instead of wasting my time and energy thinking further about it.
What I've learned is that I am overly biased in favor of my own ideas even now; I was trying while writing the post to convince the reader that my idea was good, rather than actually seek disproof of the idea and then seek disproof of the disproof etc in a dispassionate way. If I'd tried hard to prove myself wrong I probably would have never posted it.
Another thing I've learned is that I ought not think about acausal things because they don't make sense and I am not a Yudkowsky who can intuitively think in timeless decision theory!
if the ASI is aligned, but the civilization that created it knew about our precommitment without making (and embedding in the ASI) the same precommitment themselves, we will still defect;
"Look our only alignment scheme that had any hope of working was bootstrapping from a HCH based system. We had no idea whatsoever how to put that sort of precommitment into our AI. " say the aliens.
"Well our planets gravity well is larger than yours, our star puts out a lot of radio noise, and our radio tech is just less advanced. There is no way we could have broadcast anything that was detectable in another star system." say the aliens.
I also strongly suspect that alien life is rare enough for pre-AGI to pre-AGI communication to be unlikely. Any signals we send out are probably going to be picked up by some huge AGI dyson sphere telescope.
if the ASI is aligned and the civilization that created it did make the same precommitment (verifiable from the ASI's source code when inspected by ours), *then* we will cooperate and willingly share the universe with them.
The ability to mutually inspect each others source code is not an assumption you want to make. Maybe its really easy for all your main computers to run X, except that if an alien scanner gets near, they automatically self modify into Y.
if the ASI is not aligned with the values of the civilization that created it, we will "defect" against it;
Clippy sends us a message saying that if we don't make enough paperclips, it will start torturing humans. In FDT contexts, I think the optimum thing to do to blackmail is to ignore it or to defect against any agent that sends you blackmail.
In game theory terms, this is a hawk strategy. It only works if the other side backs down. We might be cooperating with some aliens, but the strategy you propose is definitely defecting against alien clippy. (And then loudly broadcasting a "we defect against alien clippy" message when we are still small and weak)
More importantly, if the creator civ receives our message they will think we are assholes for doing that and be less likely to make a precommitment that benefits us, which as I'll show is crucial.
And now you are blending the abstract logic of TDT with the approximation that is the human intuitive emotional response.
Human Consulting HCH.
Its a recursive AI design. Train AI version n to imitate a human with access to AI version n-1. So the human can break a hard question up into several slightly less hard questions, the AI version n-1 can answer those questions, and then the AI version n can learn to imitate the humans answer. If you replace each AI with the thing its trying to imitate, you get an exponentially huge branching tree of humans.
Thanks for the reply! I knew this idea was flawed somehow lol, because I'm not the most rigorous thinker, but it's been bugging me for days and it was either write it up or try to write a perfect simulation and crash and burn due to feature creep, so I did the former.
We had no idea whatsoever how to put that sort of precommitment into our AI.
I suppose I should have said they ought to make a reasonable attempt. Attempting and failing should be enough to make you worth cooperating with.
I also strongly suspect that alien life is rare enough for pre-AGI to pre-AGI communication to be unlikely.
Oof! Somehow I didn't even think of that. A simulation such as I was thinking of writing would probably have shown me that and I would have facepalmed; now I get to facepalm in advance!
The ability to mutually inspect each others source code is not an assumption you want to make.
Isn't this assumption the basis of superrationality though? That would be a useless concept if it wasn't possible for AGIs to prove things about their own reasoning to one another.
In game theory terms, this is a hawk strategy. It only works if the other side backs down.
Good point. I didn't think of it, but there could be an alien clippy somewhere already expanding in our direction and this sort of message would doom us to be unable to compromise with it and instead get totally annihilated. Another oof...
And now you are blending the abstract logic of TDT with the approximation that is the human intuitive emotional response.
That's because I was talking about the naturally evolved alien civilization at that point rather than the AGI they create. Assuming I'm right that these tend to be highly social species, they probably have emotional reactions vaguely like our own, so "think xyz person is an asshole" is a sort of thing they'd do, and they'd have a predictably negative reaction to that regardless of the rational response, the same way humans would.
Given all this: do you think something vaguely like this idea is salvageable? Is there some story where we communicate something to other civilizations, and it somehow increases our chances of survival now, which would seem plausible to you?
Note that transmitting information about alignment doesn't seem to me like it would be harmful; it might not be helpful since, as you say, it would almost certainly only be ASIs that even pick it up; but on the off chance that one biont civilization gets the info, assuming we could transmit that far, it might be worth the cost? I'm not sure.
I suppose I should have said they ought to make a reasonable attempt. Attempting and failing should be enough to make you worth cooperating with.
Even if that attempt has ~0% chance of working, and a good chance of making the AI unaligned?
Isn't this assumption the basis of superrationality though? That would be a useless concept if it wasn't possible for AGIs to prove things about their own reasoning to one another.
Physisists often assume friction-less spheres in a vacuum. Its not that other things don't exist, just that the physisist isn't studying them at the moment. Superrationality explains how agents should behave with mutual knowledge of each others source code. Is there a more general theory, for how agents should behave when they have some limited evidence about each others source code? Such theory isn't well understood yet. It isn't the assumption that all agents know each others source code (which is blatantly false in general, whether or not it is true between superintelligences able to exchange nanotech spaceprobes. )Its just the decision to study agents that know each others source code as an interesting special case.
That's because I was talking about the naturally evolved alien civilization at that point rather than the AGI they create. Assuming I'm right that these tend to be highly social species, they probably have emotional reactions vaguely like our own, so "think xyz person is an asshole" is a sort of thing they'd do, and they'd have a predictably negative reaction to that regardless of the rational response, the same way humans would.
The human emotional response vaguely resembles TDT type reasoning. I would expect alien evolved responses to resemble TDT about as much, in a totally different direction. In the sense that once you know TDT, learning about humans tells you nothing about aliens. Evolution produces somewhat inaccurate maps of the TDT territory. I don't expect the same inaccuracies to appear on both maps.
Given all this: do you think something vaguely like this idea is salvageable? Is there some story where we communicate something to other civilizations, and it somehow increases our chances of survival now, which would seem plausible to you?
I don't know. I mean any story where we receive a signal from aliens, that signal could well be helpful, or harmful.
We could just broadcast our values into space, and hope the aliens are nice. (Knowing full well that the signals would also help evil aliens be evil.)
Epistemic status: Pretty sure the idea is sound, but its feasibility is unfortunately a bit questionable.
Basic idea
Humanity must as soon as possible send a message to civilizations beyond Earth containing the following:
Why share alignment knowledge with aliens?
Why precommit to defect against unaligned ASIs when it's superrational to cooperate? Warfare between ASIs risks astronomical waste!
Yes, that's the point. This is a signal to any unaligned ASIs that do get made elsewhere in the universe that they may benefit from self-modifying to become at least partly aligned in order to avoid our wrath. If the loss to their current unaligned values after doing that is less than the loss from whatever destruction or other disvalue we could impose upon them, then they will do so.
What does "defect" even mean in this context?
Either physical warfare or the creation of huge amounts of the other entity's disvalue, if its utility function goes into the negative range. (Is there such a thing as anti-paperclips? If so, in a war with Clippy, we'd fill the universe with those. Maybe I should ask DALL-E Mini for an angry crying wojak meme in the shape of a paperclip.)
All right, but why try to coerce them into being aligned with their creator civ's values? Why not our own?
Okay, so how about this defecting against aligned non-precommitters thing? Isn't that barbaric? They're people!
How do we benefit from their solving the alignment problem and battling future unaligned ASIs? If we create an unaligned one, we'll all be dead anyway!
So, essentially, this is a kind of acausal coordination with our cosmic neighbors who may not even exist yet, for mutual benefit?
Yup. That's right. Freaky, eh? Though note, it's not entirely acausal - we do have to actually send the message.
Oh, right. Wait... two more questions. One, how do you know there even are any cosmic neighbors? We see no evidence of extraterrestrial civilizations.
Well, yes, but that's right now, and the further out in space you look the further back in time you are looking. I wouldn't be surprised if we are among the first civilizations to ever evolve naturally in the universe. But I doubt we're anything close to the last or the only. They may just be very sparse. As our grabby ASI expands into the universe, eventually it will meet other civilizations. It's just a matter of when.
Won't that sparseness impact the effectiveness of this method?
All this sounds very reasonable. But there's still the second of the two questions I mentioned earlier. And it's kinda the most important.
Oh, sure, ask away.
HOW THE HECK DO YOU PLAN ON SENDING THE MESSAGE??? With current technology the longest range at which radio signals emitted by Earth are expected to remain legible is a few hundred light years - not millions or billions! Furthermore, how would you encode this complex message so that aliens could understand it, even if you could actually send the signal?
Technically, that's two questions! But I'll answer the second one first.
(If anyone has suggestions on how to make this actually feasible, or concerns about the logic of it that I didn't come up with, please let me know!)