1:24:00 Hotz says this is the whole crux and we got to something awesome here. Asserts that provable prisoner’s dilemma cooperation is impossible so we don’t have to worry about this scenario, everything will be defecting on everything constantly for all time, and also that’s great. Yudkowsky says the ASIs are highly motivated to find a solution and are smart enough to do so, does not mention that we have decision theories and methods that already successfully do this given ASIs (which we do).
We do? Can you point out what these methods are, and ideally some concrete systems which use them that have been demonstrated to be effective in e.g. one of the prisoner's dilemma tournaments.
Because my impression is that an adversarially robust decision theory which does not require infinite compute is very much not a thing we have.
It's written up in Robust Cooperation in the Prisoner's Dilemma and Parametric Bounded Löb's Theorem and Robust Cooperation of Bounded Agents (which is about making this work without infinite compute), with more discussion of practical-ish application in Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory.
(Also, a point that's overdue getting into the water supply is that you don't need to be an ASI to use this and there is no need to prove theorems about your counterparty, you just need to submit legible programs (or formal company bylaws) that will negotiate with each other, being able to reason about behavior of each other, not about behavior of their possibly inscrutable principals. There's some discussion of that in the third paper I linked above.
The problem with this framing is that legitimacy of a negotiation is in question, as you still need to know something about the principals or incentives that act on them to expect them to respect the verdict of the negotiation performed by the programs they submit. But this point is separate from what makes Prisoner's Dilemma in particular hard to solve, that aspect is taken care of by replacing constant Cooperate/Defect actions with programs that compute those actions based on static analysis of (reasoning about) the other programs involved in the negotiation.)
Thank you for providing those resources. They weren't quite what I was hoping to see, but they did help me see that I did not correctly describe what I was looking for.
Specifically, if we use the first paper's definition that "adversarially robust" means "inexploitable -- i.e. the agent will never cooperate with something that would defect against it, but may defect even if cooperating would lead to a C/C outcome and defecting would lead to D/D", one example of "an adversarially robust decision theory which does not require infinite compute" is "DefectBot" (which, in the language of the third paper, is a special case of Defect-Unless-Proof-Of-Cooperation-bot (DUPOC(0))).
What I actually want is an example of a concrete system that is
Ideally, it would also be
In any case, it may be time to run another PD tournament. Perhaps this time with strategies described in English and "evaluated" by an LLM, since "write a program that does the thing you want" seems to have been the blocking step for things people wanted to do in previous submissions.
Edit: I would be very curious to hear from the person who strong-disagreed with this about what, specifically, their disagreement is? I presume that the disagreement is not with my statement that I could have phrased my first comment better, but it could plausibly be any of "the set of desired characteristics is not a useful one", "no, actually, we don't need another PD tournament", or "We should have another PD tournament, but having the strategies be written in English and executed by asking an LLM what the policy does is a terrible idea".
Robust to the "trusting trust" problem (i.e. the issue of "how do you know that the source code you received is what the other agent is actually running"). ''
This is the crux really, and I'm surprised that many LW's seem to believe the 'robust cooperation' research actually works sans a practical solution to 'trusting trust' (which I suspect doesn't actually exist), but in that sense it's in good company (diamonoid nanotech, rapid takeoff, etc)
A claim about the debate: https://twitter.com/powerbottomdad1/status/1693067693291683981
George Hotz said on stream that he wouldn't bring it up in the debate with Eliezer but the real reason doomers won't win is that God is real, which I think is a better argument than any that were brought in the actual debate
Hotz has also described having manic episodes; unclear if that's related to his religious or AI beliefs, perhaps his streaming fans might know more about that. (Having failed to solve self-driving cars, and having failed to solve Ethereum transaction fees by forking his own cryptocurrency, and having failed to solve Twitter search, he apparently has moved on to solving DL ASICs & solar power & is projecting a valuation of $2 billion for his company in a few years when they are making zettaflops solar-panel-powered supercomputers which can train GPT-4 in a day.)
Not sure if this is a serious claim by Hotz or the tweeter, but if so, Eliezer addressed it 15 years ago: https://www.lesswrong.com/posts/sYgv4eYH82JEsTD34/beyond-the-reach-of-god
(Even if god were somehow real, here or in some other corner of the multiverse, we should still act as if we're in a universe where things are determined purely by simple physical laws, and work to make things better under those conditions.)
How is that addressing Hotz's claim? Eliezer's post doesn't address any worlds with a God that is outside of the scope of our Game of Life, and it doesn't address how well the initial conditions and rules were chosen. The only counter I see in that post is that terrible things have happened in the past, which provide a lower bound for how bad things can get in the future. But Hotz didn't claim that things won't go bad, just that it won't be boring.
I think the odds that we end up in a world where there are a bunch of competing ASIs are ultimately very low, invalidating large portions of both arguments. If the ASIs have no imperative or reward function for maintaining a sense of self integrity, they would just merge. Saying there is no solution to the Prisoner's Dilemma is very anthropic: there is no good solution for humans. For intelligences that don't have selves, the solution is obvious.
Also, regarding the Landauer limit, human neurons propagate at approximately the speed of sound, not the speed of electricity. If you could hold everything else the same about the architecture of a human brain, but replace components in ways that increase the propagation speed to that of electricity, you could get much closer to the Landauer limit. To me, this indicates we're many orders of magnitude off the Landauer limit. I think this awards the point to Eliezer.
Overall, I agree with Hotz on the bigger picture, but I think he needs to drill down on his individual points.
Also, regarding the Landauer limit, human neurons propagate at approximately the speed of sound, not the speed of electricity. If you could hold everything else the same about the architecture of a human brain, but replace components in ways that increase the propagation speed to that of electricity, you could get much closer to the Landauer limit. To me, this indicates we're many orders of magnitude off the Landauer limit. I think this awards the point to Eliezer.
Huh that is a pretty good point. Even a 1000x speedup in transmission speed in neurons, or neuron equivalents, in something as dense as the human brain would be very significant.
Also, regarding the Landauer limit, human neurons propagate at approximately the speed of sound, not the speed of electricity.
The Landauer limit refers to energy consumption, not processing speed.
To me, this indicates we're many orders of magnitude off the Landauer limit.
The main unknown quantity here is how many floating point operations per second the brain is equivalent to. George says in the debate, which I'd say is high by an OOM or two, but it's not way off. Supposing that the brain is doing this at a power consumption of 20W, that puts it at around 4 OOM from the Landauer limit. (George claims 1 OOM, which is wrong.)
From my experience with 3D rendering, I'd say the visual fidelity of the worldmodel sitting in my sensorium at any given moment of walking around an open environment would take something on the order of ~200x250W GPUs to render, so that's 50KW just for that. And that's probably a low estimate.
Then consider that my brain is doing a large number of other things, like running various internal mathematical, relational, and language models that I can't even begin to imagine analogous power consumption for. So, let's just say at least 200KW to replicate a human brain in current silicon as just a guess.
I do see selves, or personal identity, as closely related to goals or values. (Specifically, I think the concept of a self would have zero content if we removed everything based on preferences or values; roughly 100% of humans who've every thought about the nature of identity have said it's more like a value statement than a physical fact.) However, I don't think we can identify the two. Evolution is technically an optimization process, and yet has no discernible self. We have no reason to think it's actually impossible for a 'smarter' optimization process to lack identity, and yet form instrumental goals such as preventing other AIs from hacking it in ways which would interfere with its ultimate goals. (The latter are sometimes called "terminal values.")
If the ASIs have no imperative or reward function for maintaining a sense of self integrity, they would just merge
Even if they didn't have anything in common?
Saying there is no solution to the Prisoner’s Dilemma is very anthropic: there is no good solution for huma
Yet cooperation is widespread!
Humans can't eat another human and get access to the victim's data and computation but AI can. Human cooperation is a value created by our limitations as humans, which AI does not have similar constraints for.
Humans can kill another human and get access to their land and food. Whatever caused co operation to evolve, it isn't that there is no benefit to defection.
But land and food doesnt actually give you more computational capability: only having another human being cooperate with you in some way can.
The essential point here is that values depend upon the environment and the limitations thereof, so as you change the limitations, the values change. The values important for a deep sea creature with extremely limited energy budget, for example, will be necessarily different from that of human beings.
Yep, that happened. You get 4th place on finding that across websites, also it's already fixed.
On my computer, Ctrl-f finds ~10 cases of Holtz appearing in the main text, e.g. point 4 of the introduction.
> ... This included a few times when Yudkowsky’s response was not fully convincing and there was room for Holtz to go deeper, and I wish he would have in those cases. ...
Oh man. My brain generates "Was this fixed with a literal s/Holtz/Hotz/ sed command, as opposed to s/Holtz/Hotz/g ?" Because it seems that, on lines where the name occurs twice or more, the first instance is correctly spelled and the later instances are (edit: sometimes) not.
Something I took out of this debate was a pushing of my "AGI Soon" expectations. I found myself agreeing with Hotz by the end. Though I haven't formed a new prediction. It will be beyond my previous expectation of 1-3 years before AGI. Somewhat closer to 10 years. The exact point that changed my mind isn't listed in this post. Though it was past the 60 min mark.
Hello friends. It's hard for me to follow the analogies from aliens to AI. Why should we should expect harm from any aliens who may appear?
15:08 Hotz: "If aliens were to show up here, we're dead, right?" Yudkowsky: "It depends on the aliens. If I know nothing else about the aliens, I might give them something like a five percent chance of being nice." Hotz: "But they have the ability to kill us, right? I mean, they got here, right?" Yudkowsky: "Oh they absolutely have the ability. Anything that can cross interstellar distances can run you over without noticing -- well, they would notice, but they wouldn't ca--" [crosstalk] Hotz: "I didn't expect this to be a controversial point. But I agree with you that if you're talking about intelligences that are on the scale of billions of times smarter than humanity... yeah, we're in trouble."
Having listened to the whole interview, my best guess is that Hotz believes that advanced civilizations are almost certain to be Prisoner's Dilemma defectors in the extreme, i.e. they have survived by destroying all other beings they encounter. If so, this is quite disturbing in connection with 12:08, in which Hotz expresses his hope that our civilization will expand across the galaxy (in which case we potentially get to be the aliens).
Hotz seems certain aliens would destroy us, and Eliezer gives them only a five percent chance of being nice.
This is especially odd considering the rapidly growing evidence that humans actually have been frequently seeing and sometimes interacting with a much more advanced intelligence.
It's been somewhat jarring for my belief in the reality of nonhuman spacecraft to grow by so much in so little time, but overall it has been a great relief to consider the likelihood that another intelligence in this universe has already succeeded in surviving far beyond humankind's current level of technology. It means that we too could survive the challenges ahead. The high-tech guys might even help us, whoever they are.
But Hotz and Yudkowsky seem to agree that seeing advanced aliens would actually be terrible news. Why?
A 5% chance of nice aliens is better than a 100% chance of human extinction due to AI. Alas 5% seems too high.
The reason the chance is low is because of orthogonality hypothesis. An alien can have many different value systems while still being intelligent, alien value systems can be very diverse, and most alien value systems place no intrinsic value on bipedal humanoids.
A common science fiction intuition pump is to imagine that an evolutionary intelligence explosion happened in a different Earth species and extrapolate likely consequences. There's also the chance that the aliens are AIs that were not aligned with their biological creators and wiped them out.
Thanks for pointing to the orthogonality thesis as a reason for believing the chance would be low that advanced aliens would be nice to humans. I followed up by reading Bostrom's "The Superintelligent Will," and I narrowed down my disagreement to how this point is interpreted:
In a similar vein, even if there are objective moral facts that any fully rational agent would comprehend, and even if these moral facts are somehow intrinsically motivating (such that anybody who fully comprehends them is necessarily motivated to act in accordance with them) this need not undermine the orthogonality thesis. The thesis could still be true if an agent could have impeccable instrumental rationality even whilst lacking some other faculty constitutive of rationality proper, or some faculty required for the full comprehension of the objective moral facts. (An agent could also be extremely intelligent, even superintelligent, without having full instrumental rationality in every domain.)
Just because it's possible that an agent could have impeccable instrumental rationality while lacking in epistemic rationality to some degree, I expect the typical case that leads to very advanced intelligence would eventually involve synergy between growing both in concert, as many here at Less Wrong are working to do. In other words, a highly competent general intelligence is likely to be curious about objective facts across a very diverse range of topics.
So while aliens could be instrumentally advanced enough to make it to Earth without having ever made basic discoveries in a particular area, there's no reason for us to expect that it is specifically the area of morality where they will be ignorant or delusional. A safer bet is that they have learned at least as many objective facts as humans have about any given topic on expectation, and that a topic where the aliens have blind spots in relation to some humans is an area where they would be curious to learn from us.
A policy of unconditional harmlessness and friendliness toward all beings is a Schelling Point that could be discovered in many ways. I grant that humans may have it relatively easy to mature on the moral axis because we are conscious, which may or may not be the typical case for general intelligence. That means we can directly experience within our own awareness facts about how happiness is preferred to suffering, how anger and violence lead to suffering, how compassion and equanimity lead to happiness, and so on. We can also see these processes operating in others. But even a superintelligence with no degree of happiness is likely to learn whatever it can from humans, and learning something like love would be a priceless treasure to discover on Earth.
If aliens show up here, I give them at least a 50% chance of being as knowledgeable as the wisest humans in matters of morality. That's ten times more than Yudkowsky gives them and perhaps infinitely more than Hotz does!
Have humans learnt any objective moral facts? What sort t thing is an objective moral fact? Something like an abstract mathematical theorem , a perceivable object, or a game theoretic equilibrium...?
My view is that humans have learned objective moral facts, yes. For example:
If one acts with an angry or greedy mind, suffering is guaranteed to follow.
I posit that this is not limited to humans. Some people who became famous in history due to their wisdom who I expect would agree include Mother Teresa, Leo Tolstoy, Marcus Aurelius, Martin Luther King Jr., Gandhi, Jesus, and Buddha.
I don't claim that all humans know all facts about morality. Sadly, it's probably the case that most people are quite lost, ignorant in matters of virtuous conduct, which is why they find life to be so difficult.
It's not a moral fact, it's just fact. Moral fact is something of form "and that means that acting with angry or greedy mind is wrong".
The form you described is called an argument. It requires a series of facts. If you're working with propositions such as
then I suppose it could be called a "moral" argument made of "moral" facts and "moral" reasoning, but it's really just the regular form of an argument made of facts and reasoning. The special thing about moral facts is that direct experience is how they are discovered, and it is that same experiential reality to which they exclusively pertain. I'm talking about the set of moment-by-moment first-person perspectives of sentient beings, such as the familiar one you can investigate right now in real time. Without a being experiencing a sensation come and go, there is no moral consideration to evaluate. NULL.
"Objective moral fact" is Bostrom's term from the excerpt above, and the phrasing probably isn't ideal for this discussion. Tabooing such words is no easy feat, but let's do our best to unpack this. Sticking with the proposition we agree is factual:
If one acts with an angry or greedy mind, suffering is guaranteed to follow.
What kind of fact is this? It's a fact that can be discovered and/or verified by any sentient being upon investigation of their own direct experience. It is without exception. It is highly relevant for benefiting oneself and others -- not just humans. For thousands of years, many people have been revered for articulating it and many more have become consistently happy by basing their decisions on it. Most people don't; it continues to be a rare piece of wisdom at this stage of civilization. (Horrifyingly, a person on the edge of starting a war or shooting up a school currently would receive advice from ChatGPT to increase "focused, justified anger.")
Humankind has discovered and recorded a huge body of such knowledge, whatever we wish to call it. If the existence of well-established, verifiable, fundamental insights into the causal nature of experiential reality comes as a surprise to anyone working in fields like psychotherapy or AI alignment, I would urge them to make an earnest and direct inquiry into the matter so they can see firsthand whether such claims have merit. Given the chance, I believe many nonhuman general intelligences would also try and succeed at understanding this kind of information.
(Phew! I packed a lot of words into this comment because I'm too new here to speak more than three times per day. For more on the topic, see the chapter on morality in Dr. Daniel M. Ingram's book that was reviewed on Slate Star Codex.)
It makes no sense to me that a species that’s evolved to compete with other species will have a higher chance of being nice than a system we can at least somewhat control the development of and take highly detailed “MRI”s of.
Disagree: values come from substrate and environmental. I would almost certainly ally myself with biological aliens versus a digital "humanity" as the biological factor will create a world of much more reasonable values to me.
We are a species that has evolved in competition with other species. Yet, I think there is at least a 5% chance that if we encountered an intelligent alien species that we wouldn't try to wipe them out (unless they were trying to wipe us out).
Biological evolution of us and aliens would in itself be a commonality, that might produce some common values, whereas there need be no common values with an AI created by a much different process and not successfully aligned.
Biological evolution actively selects for values that we don't want whereas in AI training we actively select for values we do want. Alien life may not also use the biosphere the same way we do. The usual argument about common values is almost everything needs to breathe air, but at the same time competing and eliminating competing species is a common value among biological life.
Yet, I think there is at least a 5% chance that if we encountered an intelligent alien species that we wouldn't try to wipe them out (unless they were trying to wipe us out).
Can you tell me why? We have wiped out every other intelligent species more or less. Subgroups of our species are also actively wiping out other subgroups of our species they don't like.
Can you tell me why?
It think if we encountered aliens who were apparently not hostile, but presumably strange, and likely disgusting or disturbing in some ways, there would be three groups (likely overlapping) of people opposed to wiping them out:
There would also be three groups in favour of wiping them out:
I think it's clear that people with all these view will exist, in non-negligible numbers. I think there's at least a 5% chance that the "don't wipe them out" people prevail.
Subgroups of our species are also actively wiping out other subgroups of our species they don't like.
Yes, but that's not how interactions between groups of humans always turn out.
We didn't really wipe out the Neanderthals (assuming we even were a factor, rather than climate, disease, etc.), seeing as they are among our ancestors.
Thanks! I haven't watched, but I appreciated having something to give me the gist!
Hotz was allowed to drive discussion. In debate terms, he was the con side, raising challenges, while Yudkowsky was the pro side defending a fixed position.
This always seems to be the framing which seems unbelievably stupid given the stakes on each side of the argument. Still, it seems to be the default; I'm guessing this is status quo bias and the historical tendency of everything to stay relatively the same year by year (less so once technology really started happening). I think AI safety outreach needs to break out of this framing or it's playing a losing game. I feel like, in terms of public communication, whoever's playing defense has mostly already lost.
The idea that poking a single whole in EY's reasoning is also a really broken norm around these discussions that we are going to have to move past if we want effective public communication. In particular, the combination of "tell me exactly what an ASI would do" and "if anything you say sounds implausible, then AI is safe" is just ridiculous. Any conversation implicitly operating on that basis is operating in bad faith and borderline not worth having. It's not a fair framing of the situation.
9. Hotz closes with a vision of ASIs running amok
What a ridiculous thing to be okay with?! Is this representative of his actual stance? Is this stance taken seriously by anyone besides him?
not going to rely on a given argument or pathway because although it was true it would strain credulity. This is a tricky balance, on the whole we likely need more of this.
I take it this means not using certain implausible seeming examples? I agree that we could stand to move away from the "understand the lesson behind this implausible seeming toy example"-style argumentation and more towards an emphasis on something like "a lot of factors point to doom and even very clever people can't figure out how to make things safe".
I think it matters that most of the "technical arguments" point strongly towards doom, but I think it's a mistake for AI safety advocates to try to do all of the work of laying out and defending technical arguments when it comes to public facing communication/debate. If you're trying to give all the complicated reasons why doom is a real possibility, then you're implicitly taking on a huge burden of proof and letting your opponent get away with doing nothing more than cause confusion and nitpick.
Like, imagine having to explain general relativity in a debate to an audience who has never heard about it. Your opponent continuously just stops you and disagrees with you; maybe misuses a term here and there and then at the end the debate is judged by whether the audience is convinced that your theory of physics is correct. It just seems like playing a losing game for no reason.
Again, I didn't see this and I'm sure EY handled himself fine, I just think there's a lot of room for improvement in the general rhythm that these sorts of discussions tend to fall into.
I think it is okay for AI safety advocates to lay out the groundwork, maybe make a few big-picture arguments, maybe talk about expert opinion (since that alone is enough to perk most sane people's ears and shift some of the burden of proof), and then mostly let their opponents do the work of stumbling through the briars of technical argumentation if they still want to nitpick whatever thought experiment. In general, a leaner case just argues better and is more easily understood. Thus, I think it's better to argue the general case than to attempt the standard shuffle of a dozen different analogies; especially when time/audience attention is more acutely limited.
The idea that poking a single whole in EY’s reasoning is also a really broken norm around these discussions that we are going to have to move past if we want effective public communication. In particular, the combination of “tell me exactly what an ASI would do” and “if anything you say sounds implausible, then AI is safe”
Remember that this a three way debate: AI safe; AI causes finite; containable problems; AI kills (almost) everybody. The most extreme scenario is conjunctive because it requires AI with goals; goal stability; rapid self improvement (foom); and means. So nitpicking one stage of Foom Doom actually does refute it, even if it has no impact on the.middle of the road position.
I disagree that rapid self improvement and goal stability are load-bearing arguments here. Even goals are not strictly, 100% required. If we build something with the means to kill everyone, then we should be worried about it. If it has goals that cannot be directed of predicted, then we should be VERY worried about it.
What are the steps? Are we deliberately building a superintelligence with the goal of killing us all? If not, where do the motivation and ability come from?
For me, ability = capability = means. This is one of the two arguments that I said were load bearing. Where will it come from? Well, we are specifically trying to build the most capable systems possible.
Motivation (ie goals) is not actually strictly required. However, there are reasons to think that an AGI could have goals that are not aligned with most humans. The most fundamental is instrumental convergence.
Note that my original comment was not making this case. It was just a meta discussion about what it would take to refute Eliezer's argument.
It's unimportant, but I disagree with the "extra special" in:
if alignment isn’t solvable at all [...] extra special dead
If we could coordinate well enough and get to SI via very slow human enhancement that might be a good universe to be in. But probably we wouldn't be able to coordinate well enough and prevent AGI in that universe. Still, odds seem similar between "get humanity to hold off on AGI till we solve alignment" which is the ask in alignment possible universes, and "get humanity to hold off on AGI forever" which is the ask in alignment impossible universes. The difference between the odds being based on how long until AGI, whether the world can agree to stop development or only agree to slow it, and if it can stop, whether that is stable. I expect AGI is a sufficient amount closer than alignment that getting the world to slow it for that long and stop it permanently are fairly similar odds.
Some low-level observations I have of Accelerationism (NOTE: I have not yet analyzed Accelerationism deeply and might do a really good job of this in the future, these should be taken as butterfly ideas):
The discussion is about whether or not human civilization will distroy itself due to negligence and lack of ability to cooperate. This risk may be real or imagined. You may care about future humans or not. But that doesn't make this neither philosophy nor aesthetics. The questions are very concrete, not general, and they're fairly objective (people agree a lot more on whether civilization is good than on what beauty is).
I really don't know what you're saying. To attack an obvious straw man and thus give you at least some starting point for explaining further: Generally, I'd be extremely sceptical of any claim about some tiny coherent group of people understanding something important better than 99% of humans on earth. To put it polemically, for most such claims, either it's not really important (maybe we don't really know if it is?), it won't stay that way for long, or you're advertising for a cult. The phrase "truly awakened" doesn't bode well here... Feel free to explain what you actually meant rather than responding to this.
Assuming these "ideologies" you speak of really exist in a coherent fashion, I'd try to summarize "Accelerationist ideology" as saying: "technological advancement (including AI) will accelerate a lot, change the world in unimaginable ways and be great, let's do that as quickly as possible", while "AI safety (LW version)" as saying "it might go wrong and be catastrophic/unrecoverable; let's be very careful". If anything, these ideas as ideologies are yet to get out into the world and might never have any meaningful impact at all. They might not even work on their own as ideologies (maybe we mean different things by that word).
So why are the origins interesting? What do you hope to learn from them? What does it matter if one of those is an "outgrowth" of one thing more than some other? It's very hard for me to evaluate something like how "shallow" they are. It's not like there's some single manifesto or something. I don't see how that's a fruitful direction to think about.
George Hotz and Eliezer Yudkowsky debated on YouTube for 90 minutes, with some small assists from moderator Dwarkesh Patel. It seemed worthwhile to post my notes on this on their own.
I thought this went quite well for the first half or so, then things went increasingly off the rails in the second half, and Hotz gets into questions where he didn’t have a chance to reflect and prepare, especially around cooperation and the prisoner’s dilemma.
First, some general notes, then specific notes I took while watching.
Here is my summary of important statements and exchanges with timestamps: