I want to keep picking a fight about “will the AI care so little about humans that it just kills them all?” This is different from a broader sense of cosmopolitanism, and moreover I'm not objecting to the narrow claim "doesn't come for free." But it’s directly related to the actual emotional content of your parables and paragraphs, and it keeps coming up recently with you and Eliezer, and I think it’s an important way that this particular post looks wrong even if the literal claim is trivially true.
(Note: I believe that AI takeover has a ~50% probability of killing billions and should be strongly avoided, and would be a serious and irreversible decision by our society that's likely to be a mistake even if it doesn't lead to billions of deaths.)
Humans care about the preferences of other agents they interact with (not much, just a little bit!), even when those agents are weak enough to be powerless. It’s not just that we have some preferences about the aesthetics of cows, which could be better optimized by having some highly optimized cow-shaped objects. It’s that we actually care (a little bit!) about the actual cows getting what they actually want, trying our best to understand the...
Eliezer has a longer explanation of his view here.
My understanding of his argument is: there are a lot of contingencies that reflect how and whether humans are kind. Because there are so many contingencies, it is somewhat unlikely that aliens would go down a similar route, and essentially impossible for ML. So maybe aliens have a 5% probability of being nice and ML systems have ~0% probability of being nice. I think this argument is just talking about why we shouldn't have update too much from humans, and there is an important background assumption that kindness is super weird and so won't be produced very often by other processes, i.e. the only reason to think it might happen is that it happened in the single case we observed.
I find this pretty unconvincing. He lists like 10 things (humans need to trade favors, we're not smart enough to track favors and kinship explicitly, and we tend to be allied with nearby humans so want to be nice to those around us, we use empathy to model other humans, and we had religion and moral realism for contingent reasons, we weren't optimized too much once we were smart enough that our instrumental reasoning screens off kindness heuristics).
But no ar...
Short version: I don't buy that humans are "micro-pseudokind" in your sense; if you say "for just $5 you could have all the fish have their preferences satisfied" I might do it, but not if I could instead spend $5 on having the fish have their preferences satisfied in a way that ultimately leads to them ascending and learning the meaning of friendship, as is entangled with the rest of my values.
Meta:
Note: I believe that AI takeover has a ~50% probability of killing billions and should be strongly avoided, and would be a serious and irreversible decision by our society that's likely to be a mistake even if it doesn't lead to billions of deaths.
So for starters, thanks for making acknowledgements about places we apparently agree, or otherwise attempting to demonstrate that you've heard my point before bringing up other points you want to argue about. (I think this makes arguments go better.) (I'll attempt some of that myself below.)
Secondly, note that it sounds to me like you took a diametric-opposite reading of some of my intended emotional content (which I acknowledge demonstrates flaws in my writing). For instance, I intended the sentence "At that very moment they hear the di...
I disagree with this but am happy your position is laid out. I'll just try to give my overall understanding and reply to two points.
Like Oliver, it seems like you are implying:
Humans may be nice to other creatures in some sense, But if the fish were to look at the future that we'd achieve for them using the 1/billionth of resources we spent on helping them, it would be as objectionable to them as "murder everyone" is to us.
I think that normal people being pseudokind in a common-sensical way would instead say:
If we are trying to help some creatures, but those creatures really dislike the proposed way we are "helping" them, then we should try a different tactic for helping them.
I think that some utilitarians (without reflection) plausibly would "help the humans" in a way that most humans consider as bad as being murdered. But I think this is an unusual feature of utilitarians, and most people would consult the beneficiaries, observe they don't want to be murdered, and so not murder them.
I think that saying "Helping someone in a way they like, sufficiently precisely to avoid things like murdering them, requires precisely the right form of caring---and that's super rare" is a really mi...
We're not talking about practically building minds right now, we are talking about humans.
We're not talking about "extrapolating volition" in general. We are talking about whether---in attempting to help a creature with preferences about as coherent as human preferences---you end up implementing an outcome that creature considers as bad as death.
For example, we are talking about what would happen if humans were trying to be kind to a weaker species that they had no reason to kill, that could nevertheless communicate clearly and had preferences about as coherent as human preferences (while being very alien).
And those creatures are having a conversation amongst themselves before the humans arrive wondering "Are the humans going to murder us all?" And one of them is saying "I don't know, they don't actually benefit from murdering us and they seem to care a tiny bit about being nice, maybe they'll just let us do our thing with 1/trillionth of the universe's resources?" while another is saying "They will definitely have strong opinions about what our society should look like and the kind of transformation they implement is about as bad by our lights as being murdered."
In practice attempts to respect someone's preferences often involve ideas like autonomy and self-determination and respect for their local preferences. I really don't think you have to go all the way to extrapolated volition in order to avoid killing everyone.
Is this a reasonable paraphrase of your argument?
Humans wound up caring at least a little about satisfying the preferences of other creatures, not in a "grant their local wishes even if that ruins them" sort of way but in some other intuitively-reasonable manner.
Humans are the only minds we've seen so far, and so having seen this once, maybe we start with a 50%-or-so chance that it will happen again.
You can then maybe drive this down a fair bit by arguing about how the content looks contingent on the particulars of how humans developed or whatever, and maybe that can drive you down to 10%, but it shouldn't be able to drive you down to 0.1%, especially not if we're talking only about incredibly weak preferences.
If so, one guess is that a bunch of disagreement lurks in this "intuitively-reasonable manner" business.
A possible locus of disagreemet: it looks to me like, if you give humans power before you give them wisdom, it's pretty easy to wreck them while simply fulfilling their preferences. (Ex: lots of teens have dumbass philosophies, and might be dumb enough to permanently commit to them if given that power.)
More generally, I think that if mere-humans met very-alien minds wit...
More generally, I think that if mere-humans met very-alien minds with similarly-coherent preferences, and if the humans had the opportunity to magically fulfill certain alien preferences within some resource-budget, my guess is that the humans would have a pretty hard time offering power and wisdom in the right ways such that this overall went well for the aliens by their own lights (as extrapolated at the beginning), at least without some sort of volition-extrapolation.
Isn't the worst case scenario just leaving the aliens alone? If I'm worried I'm going to fuck up some alien's preferences, I'm just not going to give them any power or wisdom!
I guess you think we're likely to fuck up the alien's preferences by light of their reflection process, but not our reflection process. But this just recurs to the meta level. If I really do care about an alien's preferences (as it feels like I do), why can't I also care about their reflection process (which is just a meta preference)?
I feel like the meta level at which I no longer care about doing right by an alien is basically the meta level at which I stop caring about someone doing right by me. In fact, this is exactly how it seems mentally constructed: what I mean by "doing right by [person]" is "what that person would mean by 'doing right by me'". This seems like either something as simple as it naively looks, or sensitive to weird hyperparameters I'm not sure I care about anyway.
I sometimes mention the possibility of being stored and sold to aliens a billion years later, which seems to me to validly incorporate most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that I don't expect Earthlings to think about validly.
Some more less-important meta, that is in part me writing out of frustration from how the last few exchanges have gone:
I'm not quite sure what argument you're trying to have here. Two explicit hypotheses follow, that I haven't managed to distinguish between yet.
Background context, for establishing common language etc.:
Hypothesis 1: Nate's trying to make a point about cosmopolitan values that Paul basically agrees with. But Paul thinks Nate's delivery gives a wrong impression about the tangentially-related question of pico-pseudokindness, probably because (on Paul's model) Nate's wrong about pico-pseudokindness, and Paul is taking the opportunity to argue about it.
Hypothesis 2: Nate's trying to make a point about cosmopolitan values that Paul basically disagrees with. Paul maybe agrees with all the literal words, but...
Hypothesis 1 is closer to the mark, though I'd highlight that it's actually fairly unclear what you mean by "cosmopolitan values" or exactly what claim you are making (and that ambiguity is hiding most of the substance of disagreements).
I'm raising the issue of pico-pseudokindness here because I perceive it as (i) an important undercurrent in this post, (ii) an important part of the actual disagreements you are trying to address. (I tried to flag this at the start.)
More broadly, I don't really think you are engaging productively with people who disagree with you. I suspect that if you showed this post to someone you perceive yourself to be arguing with, they would say that you seem not to understand the position---the words aren't really engaging with their view, and the stories aren't plausible on their models of the world but in ways that go beyond the literal claim in the post.
I think that would hold in particular for Robin Hanson or Rich Sutton. I don't think they are accessing a pre-theoretic intuition that you are discarding by premature theorizing. I think the better summary is that you don't understand their position very well or are choosing not to engage with the important parts of it. (Just as Robin doesn't seem to understand your position ~at all.)
I don't think the point about pico-pseudokindness is central for either Robert Hanson or Rich Sutton. I think it is more obviously relevant to a bunch of recent arguments Eliezer has gotten into on Twitter.
I think a closer summary is:
...Humans and AI systems probably want different things. From the human perspective, it would be better if the universe was determined by what the humans wanted. But we shouldn't be willing to pay huge costs, and shouldn't attempt to create a slave society where AI systems do humans' bidding forever, just to ensure that human values win out. After all, we really wouldn't want that outcome if our situations had been reversed. And indeed we are the beneficiary of similar values-turnover in the past, as our ancestors have been open (perhaps by necessity rather than choice) to values changes that they would sometimes prefer hadn't happened.
We can imagine really sterile outcomes, like replicators colonizing space with an identical pattern repeated endlessly, or AI systems that want to maximize the number of paperclips. And considering those outcomes can help undermine the cosmopolitan intuition that we should respect the AI we build. But in fact that intuition pump relies crucially on its wildly unrealistic premises, that the kind of thing brought about by AI systems will be sterile and uninteresting. If we instead treat "paperclip" as an analog for some crazy w
Might write a longer reply at some point, but the reason why I don't expect "kindness" in AIs (as you define it here) is that I don't expect "kindness" to be the kind of concept that is robust to cosmic levels of optimization pressure applied to it, and I expect will instead come apart when you apply various reflective principles and eliminate any status-quo bias, even if it exists in an AI mind (and I also think it is quite plausible that it is completely absent).
Like, different versions of kindness might or might not put almost all of their considerateness on all the different types of minds that could hypothetically exist, instead of the minds that currently exist right now. Indeed, I expect it's more likely than not that I myself will end up in that moral equilibrium, and won't be interested in extending any special consideration to systems that happened to have been alive in 2022, instead of the systems that could have been alive and seem cooler to me to extend consideration towards.
Another way to say the same thing is that if AI extends consideration towards something human-like, I expect that it will use some superstimuli-human-ideal as a reference point, which will be...
Is this a fair summary?
Humans might respect the preferences of weak agents right now, but if they thought about it for longer they'd pretty robustly just want to completely destroy the existing agents (including a hypothetical alien creator) and replace them with something better. No reason to honor that kind of arbitrary path dependence.
If so, it seems like you wouldn't be making an argument about AI or aliens at all, but rather an empirical claim about what would happen if humans were to think for a long time (and become more the people we wished to be and so on).
That seems like an important angle that my comment didn't address at all. I personally don't believe that humans would collectively stamp out 99% of their kindness to existing agents (in favor of utilitarian optimization) if you gave them enough time to reflect. That sounds like a longer discussion. I also think that if you expressed the argument in this form to a normal person they would be skeptical about the strong claims about human nature (and would be skeptical of doomer expertise on that topic), and so if this ends up being the crux it's worth being aware of where the conversation goes and my bottom line recommend...
I think some of the confusion here comes from my using "kind" to refer to "respecting the preferences of existing weak agents," I don't have a better handle but could have just used a made up word.
Yeah, sorry, I noticed the same thing a few minutes ago, that I was probably at least somewhat misled by the more standard meaning of kindness.
Tabooing "kindness" I am saying something like:
Yes, I don't think extrapolated current humans assign approximately any value to the exact preference of "respecting the preferences of existing weak agents" and I don't really believe that you would on-reflection endorse that preference either.
Separately (though relatedly), each word in that sentence sure feels like the kind of thing that I do not feel comfortable leaning on heavily as I optimize strongly against it, and that hides a ton of implicit assumptions, like 'agent' being a meaningful concept in the first place, or 'existing' or 'weak' or 'preferences', all of which I expect I would think are probably terribly confused concepts to use after I had understood the real concepts that carve reality more at its joints, and this means this sentence sounds deceptively simple or robu...
Yes, I don't think extrapolated current humans assign approximately any value to the exact preference of "respecting the preferences of existing weak agents" and I don't really believe that you would on-reflection endorse that preference either.
I am quite confident that I do, and it tends to infuriate my friends who get cranky that I feel a moral obligation to respect the artistic intent of bacterial genomes: all bacteria should go vegan, yet survive, and eat food equivalent to their previous.
If the result of an optimization process will be predictably horrifying to the agents which are applying that optimization process to themselves, then they will simply not do so.
In other words: AIs which feel anything in the vicinity of kindness before applying cosmic amounts of optimization pressure to themselves will try to steer that optimization pressure towards something which is recognizably kind at the end.
And I don't think there's any good argument for why AIs will lack any scrap of kindness with very high confidence at the point where they're just starting to recursively self-improve.
Meta: I feel pretty annoyed by the phenomenon of which this current conversation is an instance, because when people keep saying things that I strongly disagree with which will be taken as representing a movement that I'm associated with, the high-integrity (and possibly also strategically optimal) thing to do is to publicly repudiate those claims*, which seems like a bad outcome for everyone. I model it as an epistemic prisoner's dilemma with the following squares:
D, D: doomers talk a lot about "everyone dies with >90% confidence", non-doomers publicly repudiate those arguments
C, D: doomer...
Meta: I feel pretty annoyed by the phenomenon of which this current conversation is an instance, because when people keep saying things that I strongly disagree with which will be taken as representing a movement that I'm associated with, the high-integrity (and possibly also strategically optimal) thing to do is to publicly repudiate those claims*, which seems like a bad outcome for everyone.
For what it's worth, I think you should just say that you disagree with it? I don't really understand why this would be a "bad outcome for everyone". Just list out the parts you agree on, and list the parts you disagree on. Coalitions should mostly be based on epistemological principles and ethical principles anyways, not object-level conclusions, so at least in my model of the world repudiating my statements if you disagree with them is exactly what I want my allies to do.
If you on the other hand think the kind of errors you are seeing are evidence about some kind of deeper epistemological problems, or ethical problems, such that you no longer want to be in an actual coalition with the relevant people (or think that the costs of being perceived to be in some trade-coalition with them wo...
When I say "repudiate" I mean a combination of publicly disagreeing + distancing. I presume you agree that this is suboptimal for both of us, and my comment above is an attempt to find a trade that avoids this suboptimal outcome.
Note that I'm fine to be in coalitions with people when I think their epistemologies have problems, as long as their strategies are not sensitively dependent on those problems. (E.g. presumably some of the signatories of the recent CAIS statement are theists, and I'm fine with that as long as they don't start making arguments that AI safety is important because of theism.) So my request is that you make your strategies less sensitively dependent on the parts of your epistemology that I have problems with (and I'm open to doing the same the other way around in exchange).
If the result of an optimization process will be predictably horrifying to the agents which are applying that optimization process to themselves, then they will simply not do so.
In other words: AIs which feel anything in the vicinity of kindness before applying cosmic amounts of optimization pressure to themselves will try to steer that optimization pressure towards something which is recognizably kind at the end.
And I don't think there's any good argument for why AIs will lack any scrap of kindness with very high confidence at the point where they're just starting to recursively self-improve.
This feels like it somewhat misunderstands my point. I don't expect the reflection process I will go through to feel predictably horrifying from the inside. But I do expect the reflection process the AI will go through to feel horrifying to me (because the AI does not share all my metaethical assumptions, and preferences over reflection, and environmental circumstances, and principles by which I trade off values between different parts of me).
This feels like a pretty common experience. Many people in EA seem to quite deeply endorse various things like hedonic utilitarianism, in a way where the reflection process that led them to that opinion feels deeply horrifying to me. Of course it didn't feel deeply horrifying to them (or at least it didn't on the dimensions that were relevant to their process of meta-ethical reflection), otherwise they wouldn't have done it.
Paul, this is very thought provoking, and has caused me to update a little. But:
I loathe factory-farming, and I would spend a large fraction of my own resources to end it, if I could.
I believe that makes me unusually kind by human standards, and by your definition.
I like chickens, and I wish them well.
And yet I would not bat an eyelid at the thought of a future with no chickens in it.
I would not think that a perfect world could be improved by adding chickens.
And I would not trade a single happy human soul for an infinity of happy chickens.
I think that your single known example is not as benevolent as you think.
If a misaligned AI had 1/trillion "protecting the preferences of whatever weak agents happen to exist in the world", why couldn't it also have 1/trillion other vaguely human-like preferences, such as "enjoy watching the suffering of one's enemies" or "enjoy exercising arbitrary power over others"?
From a purely selfish perspective, I think I might prefer that a misaligned AI kills everyone, and take my chances with continuations of myself (my copies/simulations) elsewhere in the multiverse, rather than face whatever the sum-of-desires of the misaligned AI decides to do with humanity. (With the usual caveat that I'm very philosophically confused about how to think about all of this.)
As I said:
I’m not talking about whether the AI has spite or other strong preferences that are incompatible with human survival, I’m engaging specifically with the claim that AI is likely to care so little one way or the other that it would prefer just use the humans for atoms.
I think it's totally plausible for the AI to care about what happens with humans in a way that conflicts with our own preferences. I just don't believe it's because AI doesn't care at all one way or the other (such that you should make predictions based on instrumental reasoning like "the AI will kill humans because it's the easiest way to avoid future conflict" or other relatively small considerations).
My objection is that the simplified message is wrong, not that it's too alarming. I think "misaligned AI has a 50% chance of killing everyone" is practically as alarming as "misaligned AI has a 95% chance of killing everyone," while being a much more reasonable best guess. I think being wrong is bad for a variety of reasons. It's unclear if you should ever be in the business of telling lies-told-to-children to adults, but you certainly shouldn't be doubling down on them in the position in argument.
I don't think misaligned AI drives the majority of s-risk (I'm not even sure that s-risk is higher conditioned on misaligned AI), so I'm not convinced that it's a super relevant communication consideration here. The future can be scary in plenty of ways other than misaligned AI, and it's worth discussing those as part of "how excited should we be for faster technological change."
I regret mentioning "lie-to-children" as it seems a distraction from my main point. (I was trying to introspect/explain why I didn't feel as motivated to express disagreement with the OP as you, not intending to advocate or endorse anyone going into "the business of telling lies-told-to-children to adults".)
My main point is that I think "misaligned AI has a 50% chance of killing everyone" isn't alarming enough, given what I think happens in the remaining 50% of worlds, versus what a typical person is likely to infer from this statement, especially after seeing your top-level comment where you talk about "kindness" at length. Can you try to engage more with this concern? (Apologies if you already did, and I missed your point instead.)
I think “misaligned AI has a 50% chance of killing everyone” is practically as alarming as “misaligned AI has a 95% chance of killing everyone,” while being a much more reasonable best guess.
(Addressing this since it seems like it might be relevant to my main point.) I find it very puzzling that you think “misaligned AI has a 50% chance of killing everyone” is practically as alarming as “misaligned AI has a 95% chance of killing everyone”. Intuitive...
It seems to me that many of my disagreements with others in this space come from them hearing me say "I want the AI to like vanilla ice cream, as I do", whereas I hear them say "the AI will automatically come to like the specific and narrow thing (broad cosmopolitan value) that I like".
At the moment I'm just trying to state my position, in the hopes that this helps us skip over the step where people think I'm arguing for carbon chauvanism.
I think posts like these would benefit a lot from even a little bit of context, such as:
In the absence of these, the post feels like it's setting up weak-men on an issue where I disagree with you, but in a way that's particularly hard to engage with, and in a way that will plausibly confuse readers who, e.g., think you speak for the alignment community as a whole.
My take: I don't disagree that it's probably not literally free, but I think it's hard to rule out a fairly wide range of possibilities for how cheap it is.
feels like it's setting up weak-men on an issue where I disagree with you, but in a way that's particularly hard to engage with
My best guess as to why it might feel like this is that you think I'm laying groundwork for some argument of the form "P(doom) is very high", which you want to nip in the bud, but are having trouble nipping in the bud here because I'm building a motte ("cosmopolitan values don't come free") that I'll later use to defend a bailey ("cosmopolitan values don't come cheap").
This misunderstands me (as is a separate claim from the claim "and you're definitely implying this").
The impetus for this post is all the cases where I argue "we need to align AI" and people retort with "But why do you want it to have our values instead of some other values? What makes the things that humans care about so great? Why are you so biased towards values that you personally can understand?". Where my guess is that many of those objections come from a place of buying into broad cosmopolitan value much more than any particular local human desire.
And all I'm trying to do is say here is that I'm on board with buying into broad cosmopolitan value more than any particular local human ...
The big reason why humans are cosmopolitan might be that we evolved in multipolar environments, where helping others is instrumental. If so, just training AIs in multipolar environments that incentivize cooperation could be all it takes to get some amount of instrumental-made-terminal-by-optimization-failure cosmopolitanism.
My disagreement with this post is that I am a human-centric carbon[1] chauvinist. You write:
I'm saying something more like: we humans have selfish desires (like for vanilla ice cream), and we also have broad inclusive desires (like for everyone to have ice cream that they enjoy, and for alien minds to feel alien satisfaction at the fulfilment of their alien desires too). And it's important to get the AI on board with those values.
Why would my "selfish" desires be any less[2] important than my "broad inclusive" desires? Assuming even that it makes...
There's a kind of midgame / running around like chickens with our heads cut off vibe lately, like "you have to be logging hours in pytorch, you can't afford idle contemplation". Hanging out with EAs, scanning a few different twitter clusters about forecasting and threatmodeling, there's a looming sense that these issues are not being confronted at all and that the sophistication level is lower than it used to be (subject obviously to sampling biases or failure to factor in "community building" growth rate and other outreach activities into my prediction). ...
How sure are you that we're not going to end up building AGI with cognitive architectures that consist of multiple psuedo-agent specialists coordinating and competing in an evolutionary economic process that, at some point, constitutionalises, as an end goal, its own perpetuation, and the perpetuation of this multipolar character?
Because, that's not an implausible ontogeny, and if it is the simplest way to build AGI, then I think cosmopolitanism basically is free after all.
And ime advocates of cosmopolitanism-for-free often do distantly tacitly assume that...
I think another common source of disagreement is that people sometimes conflate a mind or system's ability to comprehend and understand some particular cosmopolitan, human-aligned values and goals, with the system itself actually sharing those values, or caring about them at all. Understanding a value and actually valuing it are different kinds of things, and this is true even if some component piece of the system has a deep, correct, fully grounded understanding of cosmopolitan values and goals, and is capable of generalizing them in the way that hu...
I'd like to see a debate between you, or someone who shares your views, and Hanson on this topic. Partly because I think revealing your cruxes w/ each other will clarify your models to us. And partly because I'm unsure if Hanson is right on the topic. He's probably wrong, but this is important to me. Even if I and those I care for die, will there be something left in this world that I value?
My summary of Hanson's views on this topic:
Hanson seems to think that any of our "descendants", if they spread to the stars, will be doing complex, valuable thing...
I think I'm with you on the kinds of value that I'd like to spread (a set of qualia, with some mix of variety and quantity being "better"). But I'm not sure I believe that this preference is qualitatively different from chocolate vs vanilla. It's a few degrees more meta, but by no means at any limit of the hierarchy.
I'd be stoked if we created AIs that are the sort of thing that can make the difference between an empty gallery, and a gallery with someone in it to appreciate the art (where a person to enjoy the gallery makes all the difference). And I'd be absolutely thrilled if we could make AIs that care as we do, about sentience and people everywhere, however alien they may be, and about them achieving their weird alien desires.
That's great! So, let's assume that we are just trying to encode this as a value (taking into account interests of sentient beings and ca...
Aside from any disagreements, there's something about the way the parables are written that I find irrationally bothersome and extremely hard to point at. I went through a number of iterations of attempts to get Claude+ to understand what direction I'd like the story to move in order to make the point in a less viscerally bothersome way, and this is the first attempt (the seventh or so) which I didn't find too silly to share; I added brackets around parts I still feel bothered by or that newly bother me, {} are things I might add:
...Here is a revised versio
But those values aren’t universally compelling, just because they’re broader or more inclusive. Those are still our values.
"But those values aren’t necessarily universally compelling, just because they’re broader or more inclusive. Those are still possibly only our values."
Note also that universality doesn't have to directly be a value at all: it can emerge from game theoretical considerations.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
I think (and I hope) that something like "maximize positive experiences of sentient entities" could actually be a convergent goal of any AI that are capable of reflecting on these questions. I don't think that humans just gravitate towards this kind of utility maximization because they evolved some degree of pro-sociality. Instead, something like this seems like it's the only thing inherently worth striving to, in the absence of any other set of values or goals.
The grabby aliens type scenario in the first parable seems like the biggest threat to the idea t...
I propose a goal of perpetuating interesting information, rather than goals of maximizing "fun" or "complexity". In my opinion, such goal solves both problems of complex but bleak and desolate future and the fun maximizing drug haze or Matrix future. Of course, the rigorous technical definition of "interesting" must be developed. At least "interesting" assumes there is an appreciating agent and continuous development.
Short version: if the future is filled with weird artificial and/or alien minds having their own sort of fun in weird ways that I might struggle to understand with my puny meat-brain, then I'd consider that a win. When I say that I expect AI to destroy everything we value, I'm not saying that the future is only bright if humans-in-particular are doing human-specific things. I'm saying that I expect AIs to make the future bleak and desolate, and lacking in fun or wonder of any sort[1].
Here's a parable for you:
Here's another parable for you:
There are many different sorts of futures that minds can want.
Ours are a very narrow and low-dimensional band, in that wide space.
When I say it's important to make the AIs care about valuable stuff, I don't mean it's important to make them like vanilla ice cream more than chocolate ice cream (as I do).
I'm saying something more like: we humans have selfish desires (like for vanilla ice cream), and we also have broad inclusive desires (like for everyone to have ice cream that they enjoy, and for alien minds to feel alien satisfaction at the fulfilment of their alien desires too). And it's important to get the AI on board with those values.
But those values aren't universally compelling, just because they're broader or more inclusive. Those are still our values.
The fact that we think fondly of the ant-queen and wish her to fulfill her desires, does not make her think fondly of us, nor wish us to fulfill our desires.
That great inclusive cosmopolitan dream is about others, but it's written in our hearts; it's not written in the stars. And if we want the AI to care about it too, then we need to figure out how to get it written into the AI's heart too.
It seems to me that many of my disagreements with others in this space come from them hearing me say "I want the AI to like vanilla ice cream, as I do", whereas I hear them say "the AI will automatically come to like the specific and narrow thing (broad cosmopolitan value) that I like".
As is often the case in my writings, I'm not going to spend a bunch of time arguing for my position.
At the moment I'm just trying to state my position, in the hopes that this helps us skip over the step where people think I'm arguing for carbon chauvanism.
(For more reading on why someone might hold this position, consider the metaethics sequence on LessWrong.)
I'd be stoked if we created AIs that are the sort of thing that can make the difference between an empty gallery, and a gallery with someone in it to appreciate the art (where a person to enjoy the gallery makes all the difference). And I'd be absolutely thrilled if we could make AIs that care as we do, about sentience and people everywhere, however alien they may be, and about them achieving their weird alien desires.
But I don't think we're on track for that.
And if you, too, have the vision of the grand pan-sentience cosmopolitan dream--as might cause you to think I'm a human-centric carbon chauvinist, if you misread me--then hear this: we value the same thing, and I believe it is wholly at risk.
at least within the ~billion light-year sphere of influence that Earth-originated life seems pretty likely to have; maybe there are distant aliens and hopefully a bunch of aliens will do fun stiff with the parts of the universe under their influence, but it's still worth ensuring that the great resources at Earth's disposal go towards fun and love and beauty and wonder and so on, rather than towards bleak desolation. ↩︎