All of astridain's Comments + Replies

Unless those people are selected for extreme levels of attachment to specific celestial bodies, as opposed to the function and benefit of those celestial bodies, I don’t see why those people would decide to not replace the sun with a better sun, and also get orders of magnitude richer by doing so.

Because hopefully those people will include, and (depending on population control) might indeed be overwhelmingly composed of, the current, pre-singularity population of Earth. I don't think a majority of currently-alive humans would ever agree to destroy the Sun,... (read more)

That might be a fault with my choice of example. (I am not infact in fact a master of etiquette.) But I'm sure examples can be supplied where "the polite thing to say" is a euphemism that you absolutely do expect the other person to understand. At a certain level of obviousness and ubiquity, they tend to shift into figures of speech. “Your loved one has passed on” instead of “you loved one is dead”, say.

And yes, that was a typo. Your way of expressing it might be considered an example of such unobtrusive politeness. My guess is that you said “I assume that... (read more)

-1qvalq
"Your loved one has passed on" I'm not sure I've ever used a euphemism (I don't know what a euphemism is). When should I?

Some of it might be actual-obfuscation if there are other people in the room, sure. But equally-intelligent equally-polite people are still expected to dance the dance even if they're alone. 

Your last paragraph gets at what I think is the main thing, which is basically just an attempt at kindness. You find a nicer, subtler way to phrase the truth in order to avoid shocking/triggering the other person. If both people involved were idealised Bayesian agents this would be unnecessary, but idealised Bayesian agents don't have emotions, or at any rate they... (read more)

But equally-intelligent equally-polite people are still expected to dance the dance even if they're alone

I think this could be considered to be a sort of "residue" of the sort of deception Zack is talking about. If you imagine agents with different levels of social savviness, the savviest ones might adopt a deceptively polite phrasing, until the less savvy ones catch on, and so on down the line until everybody can interpret the signal correctly. But now the signaling equilibrium has shifted, so all communication uses the polite phrasing even though no o... (read more)

I think this misses the extent to which a lot of “social grace” doesn't actually decrease the amount of information conveyed; it's purely aesthetic — it's about finding comparatively more pleasant ways to get the point across. You say — well, you say “I think she's a little out of your league” instead of saying “you're ugly”. But you expect the ugly man to recognise the script you're using, and grok that you're telling him he's ugly! The same actual, underlying information is conveyed!

The cliché with masters of etiquette is that they can fight subtle duels... (read more)

6gjm
I think "I think she's a little out of your league"[1] doesn't convey the same information as "you're ugly" would, because (1) it's relative and the possibly-ugly person might interpret it as "she's gorgeous" and (2) it's (in typical use, I think) broader than just physical appearance so it might be commenting on the two people's wittiness or something, not just on their appearance. [1] Parent actually says "you're a little out of her league" but I assume that's just a slip. It's not obvious to me how important this is to the difference in graciousness, but it feels to me as if saying that would be ruder if it did actually allow the person it was said to to infer "you're ugly" rather than merely "in some unspecified way(s) that may well have something to do with attractiveness, I rate her more highly than you". So in this case, at least, I think actual-obfuscation as well as pretend-obfuscation is involved.
2RamblinDash
  We do this so that the ugly guy can get the message without creating Common Knowledge of his ugliness.

What is the function of pretend-obfuscation, though? I don't think that the brainpower expenditure of encrypting conversations so that other people can decrypt them again is unnecessary at best; I think it's typically serving the specific function of using the same message to communicate to some audiences but not others, like an ambiguous bribe offer that corrupt officeholders know how to interpret, but third parties can't blow the whistle on.

In general, when you find yourself defending against an accusation of deception by saying, "But nobody was really f... (read more)

-1Said Achmiz
Amount of information conveyed to whom? More pleasant for whom? Obfuscation from whom? Without these things, your account is underspecified. And if you specify these things, you may find that your claim is radically altered thereby.

 My guess is mostly that the space is so wide that you don't even end up with AIs warping existing humans into unrecognizable states, but do in fact just end up with the people dead

Why? I see a lot of opportunities for s-risk or just generally suboptimal future in such options, but "we don't want to die, or at any rate we don't want to die out as a species" seems like an extremely simple, deeply-ingrained goal that almost any metric by which the AI judges our desires should be expected to pick up, assuming it's at all pseudokind. (In many cases, humans do a lot to protect endangered species even as we do diddly-squat to fulfill individual specimens' preferences!) 

It's about trade-offs. HPMOR/an equally cringey analogue will attract a certain sector of weird people into the community who can then be redirected towards A.I. stuff — but it will repel a majority of novices because it "taints" the A.I. stuff with cringiness by association.

This is a reasonable trade-off if:

  1. the kind of weird people who'll get into HPMOR are also the kind of weird people who'd be useful to A.I. safety;
  2. the normies were already likely to dismiss the A.I. stuff with or without the added load of cringe.

In the West, 1. is true because there's a... (read more)

If I was feeling persistently sad or hopeless and someone asked me for the quality of my mental health, and I had the energy to reply, I would reply ‘poor, thanks for asking.’

I wouldn't, not if I was in fact experiencing a rough enough patch of life that I rationally and correctly believed these feelings to be accurate. If I had been diagnosed with terminal cancer, for example, I would probably say that I was indeed sad and hopeless, but not that I had any mental health issues; indeed I'd be concerned with my mental health if I wasn't feeling that way. I f... (read more)

At a guess, focusing on transforming information from images and videos into text, rather than generating text qua text, ought to help — no? 

4Vladimir_Nesov
That's not reflection, just more initial training data. Reflection acts on the training data it already has, the point is to change the learning problem, by introducing an inductive bias that's not part of the low level learning algorithm, that improves sample efficiency with respect to loss that's also not part of low level learning. LLMs are a very good solution to the wrong problem, and a so-so solution to the right problem. Changing the learning incentives might get a better use out of the same training data for improving performance on the right problem. A language model retrained on generated text (which is one obvious form of implementing reflection) likely does worse as a language model of the original training data, it's only a better model of the original data with respect to some different metric of being a good model (such as being a good map of the actual world, whatever that means). Machine learning doesn't know how to specify or turn this different metric into a learning algorithm, but an amplification process that makes use of faculties an LLM captured from human use of language might manage to do this by generating appropriate text for low level learning.
2DragonGod
We could do auto captioning of movies and videos. Or we could just train multimodal simulators. We probably will (e.g. such models could be useful for generating videos from descriptions).

We maybe need an introduction to all the advance work done on nanotechnology for everyone who didn't grow up reading "Engines of Creation" as a twelve-year-old or "Nanosystems" as a twenty-year-old.

Ah. Yeah, that does sound like something LessWrong resources have been missing, then — and not just for my personal sake. Anecdotally, I've seen several why-I'm-an-AI-skeptic posts circulating on social media for whom "EY makes crazy leaps of faith about nanotech" was a key point of why they rejected the overall AI-risk argument.

(As it stands, my objection to yo... (read more)

Hang on — how confident are you that this kind of nanotech is actually, physically possible? Why? In the past I've assumed that you used "nanotech" as a generic hypothetical example of technologies beyond our current understanding that an AGI could develop and use to alter the physical world very quickly. And it's a fair one as far as that goes; a general intelligence will very likely come up with at least one thing as good as these hypothetical nanobots. 

But as a specific, practical plan for what to do with a narrow AI, this just seems like it makes ... (read more)

We maybe need an introduction to all the advance work done on nanotechnology for everyone who didn't grow up reading "Engines of Creation" as a twelve-year-old or "Nanosystems" as a twenty-year-old.  We basically know it's possible; you can look at current biosystems and look at physics and do advance design work and get some pretty darned high confidence that you can make things with covalent-bonded molecules, instead of van-der-Waals folded proteins, that are to bacteria as airplanes to birds.

For what it's worth, I'm pretty sure the original author of this particular post happens to agree with me about this.

Slightly boggling at the idea that nuts and eggs aren't tasty? And I completely lose the plot at "condiments". Isn't the whole point of condiments that they are tasty? What sort of definition of "tasty" are you going with?

Yes, I agree. This is why I said "I don't think this is correct". But unless you specify this, I don't think a layperson would guess this.

Thank you! This is helpful. I'll start with the bit where I still disagree and/or am still confused, which is the future people. You write:

The reductio for caring more about future peoples' agency is in cases where you can just choose their preferences for them. If the main thing you care about is their ability to fulfil their preferences, then you can just make sure that only people with easily-satisfied preferences (like: the preference that grass is green) come into existence.

Sure. But also, if the main thing you care about is their ability to be happy,... (read more)

I like this breakdown! But I have one fairly big asterisk — so big, in fact, that I wonder if I'm misunderstanding you completely.

Care-morality mainly makes sense as an attitude towards agents who are much less capable than you - for example animals, future people, and people who aren’t able to effectively make decisions for themselves.

I'm not sure animals belong on that list, and I'm very sure that future people don't. I don't see why it should be more natural to care about future humans' happiness than about their preferences/agency (unless, of course, o... (read more)

4Richard_Ngo
I assume that you do think it makes sense to care about the welfare of animals and future people, and you're just questioning why we shouldn't care more about their agency? The reductio for caring more about animals' agency is when they're in environments where they'll very obviously make bad decisions - e.g. there are lots of things which are poisonous and they don't know; there are lots of cars that would kill them, but they keep running onto the road anyway; etc. (The more general principle is that the preferences of dumb agents aren't necessarily well-defined from the perspective of smart agents, who can elicit very different preferences by changing the inputs slightly.) The reductio for caring more about future peoples' agency is in cases where you can just choose their preferences for them. If the main thing you care about is their ability to fulfil their preferences, then you can just make sure that only people with easily-satisfied preferences (like: the preference that grass is green) come into existence. The other issue I have with focusing primarily on agency is that, as we think about creatures which are increasingly different from humans, my intuitions about why I care about their agency start to fade away. If I think about a universe full of paperclip maximizers with very high agency... I'm just not feeling it. Whereas at least if it's a universe full of very happy paperclip maximizers, that feels more compelling. (I do care somewhat about future peoples' agency; and I personally define welfare in a way which includes some component of agency, such that wireheading isn't maximum-welfare. But I don't think it should be the main thing.) (Also, as I wrote this comment, I realized that the phrasing in the original sentence you quoted is infelicitous, and so will edit it now.)

The common-man's answer here would presumably be along the lines of "so we'll just make it illegal for an A.I. to control vast sums of money long before it gets to owning a trillion — maybe an A.I. can successfully pass off as an obscure investor when we're talking tens of thousands or even millions, but if a mysterious agent starts claiming ownership of a significant percentage of the world GDP, its non-humanity will be discovered and the appropriate authorities will declare its non-physical holdings void, or repossess them, or something else sensible".

To be clear I don't think this is correct, but this is a step you would need to have an answer for.

5Viliam
Huh, why? The agent can pretend to be multiple agents, possibly thousands of them. It can also use fake human identities.

"Self-improvement" is one of those things which most humans can nod along to, but only because we're all assigning different meanings to it. Some people will read "self-improvement" and think self-help books, individual spiritual growth, etc.; some will think "transhumanist self-alteration of the mind and body"; some will think "improvement of the social structure of humanity even if individual humans remain basically the same"; etc. 

It looks like a non-controversial thing to include on the list, but that's basically an optical illusion. 

For thos... (read more)

1Shoshannah Tekofsky
The operalationalization would indeed be the next step. I disagree the first step is meaningless without it though. E.g. having some form of self-improvement in the goal set is important as we want to do more than just survive as a species.

I agree the point as presented by OP is weak, but I think there is a stronger version of this argument to be made. I feel like there are a lot of world-states where A.I. is badly-aligned but non-murderous simply because it's not particularly useful to it to kill all humans.

Paperclip-machine is a specific kind of alignment failure; I don't think it's hard to generate utility functions orthogonal to human concerns that don't actually require the destruction of humanity to implement. 

The scenario I've been thinking the most about lately, is an A.I. that ... (read more)

I don't think those are contradictory? It can both be "there would be value drift" and "this might be quite bad, actually". Anyway, whatever the actual actual spirit of that bit in TWC, that doesn't change my question of wanting some clarity on whether the worse bits of Dath Ilan are intended in the same spirit.

Quite a good story. But I think at this point I would quite like Eliezer to make some sort of statement about to what degree he endorses Dath Ilan, ethically speaking.  As a fictional setting it's a great machine for fleshing out thought experiments, of course, but it seems downright dystopian in many ways. 

(I mean, the fact that they're cryopreserving everyone and have AGI under control means they're morally "preferable" to Earth, but that's sort of a cheat. For example, you could design an alt. history where the world is ruled by a victorious T... (read more)

3jimrandomh
A surprising number of people seem to have missed what the point of this was in Three Worlds Collide.  It's not a prediction about future (human) societies. AFAICT it's there to remind people that changing values is actually bad. That when we talk about, for example, AIs getting random values instead of inheriting our human values, that we should not think of this like we think of a foreign country's cultural quirks, we should think of this as terrifying and revolting. This is a misconception that a lot of people actually have, and TWC as a whole is aimed squarely at dispelling it.

I have a slightly different perspective on this — I don't know how common this is, but looking back on my feelings on Santa Claus as a young child, they had more to do with belief-in-belief than with an "actual" belief in an "actual" Santa. It was religious faith as I understand it; I wanted, vaguely, to be the sort of kid who believed in Santa Claus; I looked for evidence that Santa Claus was real, for theories of how he could be real even if magic wasn't. So the lesson it taught me when I stopped believing in the whole thing was more of an insight about what it was like inside religious people's heads.

Most fictional characters are optimised to make for entertaining stories, hence why "generalizing from fictional evidence" is usually a failure-mode. The HPMOR Harry and the Comet King were optimized by two rationalists as examples of rationalist heroes — and are active in allegorical situations engineered to say something that rationalists would find to be “of worth” about real world problems. 

They are appealing precisely because they encode assumptions about what a real-world, rationalist “hero” ought to be like. Or at least, that's the hope. So, th... (read more)

0Rob Bensinger
+1

We don't actually know the machine works more than once, do we? It creates "a" duplicate of you "when" you pull the lever. That doesn't necessarily imply that it outputs additional duplicates if you keep pulling the lever. Maybe it has a limited store of raw materials to make the duplicates from, who knows.

Besides, I was just munchkinning myself out of a situation where a sentient individual has to die (i.e. a version of myself). Creating an army up there may have its uses but does not relate to the solving of the initial problem. Unless we are proposing the army make a human ladder? Seems unpleasant.

But you make it sound as though these people are objectively “wrong”, as if they're *trying* to actually reduce animal suffering in the absolute but end up working on the human proxy because of a bias. That may be true of some, but surely not all. What ozymandias was, I believe, trying to express, is that some of the people who'd reject your solutions consciously find them ethically unacceptable, not merely recoil from them because they'd *instinctively* be against their being used on humans.

8Shmi
Clearly I have not phrased it well in my post. See my reply to ozy. I am advocating self-honesty about your values, not a particular action.

Being a hopeless munchkin, I will note that the thought experiment has an obvious loophole: for the choice to truly be a choice, we would have to assume, somewhat arbitrarily, that using the duplication lever will disintegrate the machinery. Else, you could pull the lever to create a duplicate who'll deliver the message, and *then* the you at the bottom of the well could rip up the machinery and take their shot at climbing up.

9Stuart_Armstrong
You feeble attempts at munchkining are noted and scorned at. The proper munchkin would pull the lever again and again, creating an army of yous...