LESSWRONG
LW

Comment Permalink

A plug for another post I’d be interested in: If anyone has actually evaluated the arguments for “What if your consciousness is ~tortured in simulation?” as a reason to not pursue cryo. Intuitively I don’t think this is super likely to happen, but various moral atrocities have and do happen, and that gives me a lot of pause, even though I know I’m exhibiting some status quo bias

See in context

TsviBT's Shortform

by TsviBT

16th Jun 2024

AI Alignment Forum

1 min read

7 Ω 4

This is a special post for quick takes by TsviBT. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

2Alexander Gietelink Oldenziel

96 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:00 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]TsviBT9d480

The Berkeley Genomics Project is fundraising for the next forty days and forty nights at Manifund: https://manifund.org/projects/human-intelligence-amplification--berkeley-genomics-project

6Eric Neyman9d

Probably don't update on this too much, but when I hear "Berkeley Genomics Project", it sounds to me like a project that's affiliated with UC Berkeley (which it seems like you guys are not). Might be worth keeping in mind, in that some people might be misled by the name.

2TsviBT9d

Ok, thanks for noting. Right, we're not affiliated--just located in Berkeley. (I'm not sure I believe people will commonly be misled thus, and, I mean, UC Berkeley doesn't own the city, but will keep an eye out.) (In theory I'm open to better names, though it's a bit late for that and also probably doesn't matter all that much. An early candidate in my head was "The Demeter Project" or something like that; I felt it wasn't transparent enough. Another sort of candidate was "Procreative Liberty Institute" or similar, though this is ambiguous with reproductive freedom (though there is real ideological overlap). Something like "Genomic Emancipation/Liberty org/project" could work. Someone suggested Berkeley Genomics Institute as sounding more "serious", and I agreed, except that BGI is already a genomics acronym.)

2Raemon8d

I also kinda thought this. I actually thought it sounded sufficiently academic that I didn't realize at first it was your org, instead of some other thing you were supporting.

1rahulxyz9d

I'm very dubious that we'll solve alignment in time, and it seems like my marginal dollar would do better in non-obvious causes for AI safety. So I'm very open to funding something like this in the hope we get a AI winter / regulatory pause etc. I don't know if you or anyone else has thought about this, but what is your take on whether this or WBE is the more likely chance to getting done successfully? WBE seems a lot more funding intensive, but also possible to measure progress easier and potentially less regulatory burdens?

[-]TsviBT8d100

I discuss this here: https://www.lesswrong.com/posts/jTiSWHKAtnyA723LE/overview-of-strong-human-intelligence-amplification-methods#Brain_emulation

You can see my comparisons of different methods in the tables at the top:

[-]TsviBT9moΩ194720

An important thing that the AGI alignment field never understood:

Reflective stability. Everyone thinks it's about, like, getting guarantees, or something. Or about rationality and optimality and decision theory, or something. Or about how we should understand ideal agency, or something.

But what I think people haven't understood is

If a mind is highly capable, it has a source of knowledge.
The source of knowledge involves deep change.
Lots of deep change implies lots of strong forces (goal-pursuits) operating on everything.
If there's lots of strong goal-pursuits operating on everything, nothing (properties, architectures, constraints, data formats, conceptual schemes, ...) sticks around unless it has to stick around.
So if you want something to stick around (such as the property "this machine doesn't kill all humans") you have to know what sort of thing can stick around / what sort of context makes things stick around, even when there are strong goal-pursuits around, which is a specific thing to know because most things don't stick around.
The elements that stick around and help determine the mind's goal-pursuits have to do so in a way that positively makes them stick around (refl

... (read more)

9Seth Herd9mo

Agreed! I tried to say the same thing in The alignment stability problem. I think most people in prosaic alignment aren't thinking about this problem. Without this, they're working on aligning AI, but not on aligning AGI or ASI. It seems really likely on the current path that we'll soon have AGI that is reflective. In addition, it will do continuous learning, which introduces another route to goal change (e.g., learning that what people mean by "human" mostly applies to some types of artificial minds, too). The obvious route past this problem, that I think prosaic alignment often sort of assumes without being explicit about it, is that humans will remain in charge of how the AGI updates its goals and beliefs. They're banking on corrigible or instruction-following AGI. I think that's a viable approach, but we should be more explicit about it. Aligning AI probably helps with aligning AGI, but they're not the same thing, so we should try to get more sure that prosaic alignment really helps align a reflectively stable AGI.

2TsviBT9mo

Thanks. (I think we have some ontological mismatches which hopefully we can discuss later.)

8Lorxus9mo

Say more about point 2 there? Thinking about 5 and 6 though - I think I now maybe have a hopeworthy intuition worth sharing later.

3TsviBT9mo

Say you have a Bayesian reasoner. It's got hypotheses; it's got priors on them; it's got data. So you watch it doing stuff. What happens? Lots of stuff changes, tide goes in, tide goes out, but it's still a Bayesian, can't explain that. The stuff changing is "not deep". There's something stable though: the architecture in the background that "makes it a Bayesian". The update rules, and the rest of the stuff (for example, whatever machinery takes a hypothesis and produces "predictions" which can be compared to the "predictions" from other hypotheses). And: it seems really stable? Like, even reflectively stable, if you insist? So does this solve stability? I would say, no. You might complain that the reason it doesn't solve stability is just that the thing doesn't have goal-pursuits. That's true but it's not the core problem. The same issue would show up if we for example looked at the classical agent architecture (utility function, counterfactual beliefs, argmaxxing actions). The problem is that the agency you can write down is not the true agency. "Deep change" is change that changes elements that you would have considered deep, core, fundamental, overarching... Change that doesn't fit neatly into the mind, change that isn't just another piece of data that updates some existing hypotheses. See https://tsvibt.blogspot.com/2023/01/endo-dia-para-and-ecto-systemic-novelty.html

1Lorxus9mo

Not so - I'd just call it the trivial case and implore us to do better literally at all! Apart from that, thanks - I have a better sense of what you meant there. "Deep change" as in "no, actually, whatever you pointed to as the architecture of what's Really Going On... can't be that, not for certain, not forever."

3TsviBT9mo

I'd go stronger than just "not for certain, not forever", and I'd worry you're not hearing my meaning (agree or not). I'd say in practice more like "pretty soon, with high likelihood, in a pretty deep / comprehensive / disruptive way". E.g. human culture isn't just another biotic species (you can make interesting analogies but it's really not the same).

1Lorxus9mo

That's entirely possible. I've thought about this deeply for entire tens of minutes, after all. I think I might just be erring (habitually) on the side of caution in qualities of state-changes I describe expecting to see from systems I don't fully understand. OTOH... I have a hard time believing that even (especially?) an extremely capable mind would find it worthwhile to repeatedly rebuild itself from the ground up, such that few of even the ?biggest?/most salient features of a mind stick around for long at all.

6TsviBT9mo

I have no idea what goes on in the limit, and I would guess that what determines the ultimate effects (https://tsvibt.blogspot.com/2023/04/fundamental-question-what-determines.html) would become stable in some important senses. Here I'm mainly saying that the stuff we currently think of as being core architecture would be upturned. I mean it's complicated... like, all minds are absolutely subject to some constraints--there's some Bayesian constraint, like you can't "concentrate caring in worlds" in a way that correlates too much with "multiversally contingent" facts, compared to how much you've interacted with the world, or something... IDK what it would look like exactly, and if no one else know then that's kinda my point. Like, there's 1. Some math about probabilities, which is just true--information-theoretic bounds and such. But: not clear precisely how this constrains minds in what ways. 2. Some rough-and-ready ways that minds are constrained in practice, such as obvious stuff about like you can't know what's in the cupboard without looking, you can't shove more than such and such amount of information through a wire, etc. These are true enough in practice, but also can be broken in terms of their relevant-in-practice implications (e.g. by "hypercompressing" images using generative AI; you didn't truly violate any law of probability but you did compress way beyond what would be expected in a mundane sense). 3. You can attempt to state more absolute constraints, but IDK how to do that. Naive attempts just don't work, e.g. "you can't gain information just by sitting there with your eyes closed" just isn't true in real life for any meaning of "information" that I know how to state other than a mathematical one (because for example you can gain "logical information", or because you can "unpack" information you already got (which is maybe "just" gaining logical information but I'm not sure, or rather I'm not sure how to really distinguish non/logical info), or

7Thomas Kwa9mo

This argument does not seem clear enough to engage with or analyze, especially steps 2 and 3. I agree that concepts like reflective stability have been confusing, which is why it is important to develop them in a grounded way.

2TsviBT9mo

Well, it's a quick take. My blog has more detailed explanations, though not organized around this particular point.

1Jan_Kulveit9mo

That's why solving hierarchical agency is likely necessary for success

5TsviBT9mo

We'd have to talk more / I'd have to read more of what you wrote, for me to give a non-surface-level / non-priors-based answer, but on priors (based on, say, a few dozen conversations related to multiple agency) I'd expect that whatever you mean by hierarchical agency is dodging the problem. It's just more homunculi. It could serve as a way in / as a centerpiece for other thoughts you're having that are more so approaching the problem, but the hierarchicalness of the agency probably isn't actually the relevant aspect. It's like if someone is trying to explain how a car goes and then they start talking about how, like, a car is made of four wheels, and each wheel has its own force that it applies to a separate part of the road in some specific position and direction and so we can think of a wheel as having inside of it, or at least being functionally equivalent to having inside of it, another smaller car (a thing that goes), and so a car is really an assembly of 4 cars. We're just... spinning our wheels lol. Just a guess though. (Just as a token to show that I'm not completely ungrounded here w.r.t. multi-agency stuff in general, but not saying this addresses specifically what you're referring to: https://tsvibt.blogspot.com/2023/09/the-cosmopolitan-leviathan-enthymeme.html)

3Jan_Kulveit9mo

Agreed we would have to talk more. I think I mostly get the homunculi objection. Don't have time now to write an actual response, so here are some signposts: - part of what you call agency is explained by roughly active inference style of reasoning -- some type of "living" system is characteristic by having boundaries between them and the environment (boundaries mostly in sense of separation of variables) -- maintaining the boundary leads to need to model the environment -- modelling the environment introduces a selection pressure toward approximating Bayes - other critical ingredient is boundedness -- in this universe, negentropy isn't free -- this introduces fundamental tradeoff / selection pressure for any cognitive system: length isn't free, bitflips aren't free, etc. (--- downstream of that is compression everywhere, abstractions) -- empirically, the cost/returns function for scaling cognition usually hits diminishing returns, leading to minds where it's not effective to grow the single mind further --- this leads to the basin of convergent evolution I call "specialize and trade" -- empirically, for many cognitive systems, there is a general selection pressure toward modularity --- I don't know what are all the reasons for that, but one relatively simple is 'wires are not free'; if wires are not free, you get colocation of computations like brain regions or industry hubs --- other possibilities are selection pressures from CAP theorem, MVG, ... (modularity also looks a bit like box-inverted specialize and trade) So, in short, I think where I agree with the spirit of If humans didn't have a fixed skull size, you wouldn't get civilization with specialized members and my response is there seems to be extremely general selection pressure in this direction. If cells were able to just grow in size and it was efficient, you wouldn't get multicellulars. If code bases were able to just grow in size and it was efficient, I wouldn't get a myriad of packages on my laptop,

3niplav9mo

It's a bit annoying to me that "it's just more homunculi" is both kind of powerful for reasoning about humans, but also evades understanding agentic things. I also find it tempting because it gives a cool theoretical foothold to work off, but I wonder whether the approach is hiding most of the complexity of understanding agency.

[-]TsviBT4d33-13

Are people fundamentally good? Are they practically good? If you make one person God-emperor of the lightcone, is the result something we'd like?

I just want to make a couple remarks.

Conjecture: Generally, on balance, over longer time scales good shards express themselves more than bad ones. Or rather, what we call good ones tend to be ones whose effects accumulate more.
Example: Nearly all people have a shard, quite deeply stuck through the core of their mind, which points at communing with others.
- Communing means: speaking with; standing shoulder to shoulder with, looking at the same thing; understanding and being understood; lifting the same object that one alone couldn't lift.
- The other has to be truly external and truly a peer. Being a truly external true peer means they have unboundedness, infinite creativity, self- and pair-reflectivity and hence diagonalizability / anti-inductiveness. They must also have a measure of authority over their future. So this shard (albeit subtly and perhaps defeasibly) points at non-perfect subjugation of all others, and democracy. (Would an immortalized Genghis Khan, having conquered everything, after 1000 years, continue to wish to see in th

... (read more)

[-]Thane Ruthenis4d2112

This assumes that the initially-non-eudaimonic god-king(s) would choose to remain psychologically human for a vast amount of time, and keep the rest of humanity around for all that time. Instead of:

Self-modify into something that's basically an eldritch abomination from a human perspective, either deliberately or as part of a self-modification process gone wrong.
Make some minimal self-modifications to avoid value drift, precisely not to let the sort of stuff you're talking about happen.
Stick to behavioral patterns that would lead to never changing their mind/never value-drifting, either as an "accidental" emergent property of their behavior (the way normal humans can surround themselves in informational bubbles that only reinforce their pre-existing beliefs; the way normal human dictators end up surrounded by yes-men; but elevated to transcendence, and so robust enough to last for eons) or as an implicit preference they never tell their aligned ASI to satisfy, but which it infers and carefully ensures the satisfaction of.
Impose some totalitarian regime on the rest of humanity and forget about it, spending the rest of their time interacting only with each other/with tailor-built non

... (read more)

[-]TsviBT4d113

This assumes

Yes, that's a background assumption of the conjecture; I think making that assumption and exploring the consequences is helpful.

Self-modify into something that's basically an eldritch abomination from a human perspective, either deliberately or as part of a self-modification process gone wrong.

Right, totally, then all bets are off. The scenario is underspecified. My default imagination of "aligned" AGI is corrigible AGI. (In fact, I'm not even totally sure that it makes much sense to talk of aligned AGI that's not corrigible.) Part of corrigibility would be that if:

the human asks you to do X,
and X would have irreversible consequences,
and the human is not aware of / doesn't understand those consequences,
and the consequences would make the human unable to notice or correct the change,
and the human, if aware, would have really wanted to not do X or at least think about it a bunch more before doing it,

then you DEFINITELY don't just go ahead and do X lol!

In other words, a corrigible AGI is supposed to use its intelligence to possibilize self-alignment for the human.

Make some minimal self-modifications to avoid value drift, precisely not to let the sort of st

... (read more)

7Thane Ruthenis3d

Unless the human, on reflection, doesn't want some specific subset of their current values to be open to change / has meta-level preferences to freeze some object-level values. Which I think is common. (Source: I have meta-preferences to freeze some of my object-level values at "eudaimonia", and I take specific deliberate actions to avoid or refuse value-drift on that.) Callousness. "We probably need to do something about the rest of humanity, probably shouldn't just wipe them all out, lemme draft some legislation, alright looks good, rubber-stamp it and let's move on". Tons of bureaucracies and people in power seem to act this way today, including decisions that impact the fates of millions. I don't know that Genghis Khan or Stalin wouldn't have. Some clinical psychopaths or philosophical extremists (e. g., the human successionists) certainly would. Mm... First, I think "corrigibility to a human" is underdefined. A human is not, themselves, a coherent agent with a specific value/goal-slot to which an AI can be corrigible. Like, is it corrigible to a human's momentary impulses? Or to the command the human would give if they thought for five minutes? For five days? Or perhaps to the command they'd give if the AI taught them more wisdom? But then which procedure should the AI choose for teaching them more wisdom? The outcome is likely path-dependent on that: on the choice between curriculum A and curriculum B. And if so, what procedure should the AI use to decide what curriculum to use? Or should the AI perhaps basically ignore the human in front of them, and simply interpret them as a rough pointer to CEV? Well, that assumes the conclusion, and isn't really "corrigibility" at all, is it? The underlying issue here is that "a human's values" are themselves underdefined. They're derived in a continual, path-dependent fashion, by a unstable process with lots of recursions and meta-level interference. There's no unique ground-true set of values which the AI should t

6TsviBT3d

How about for example: Not saying this is some sort of grand solution to corrigibility, but it's obviously better than the nonsense you listed. If a human were going to try to help me out, I'd want this, for example, more than the things you listed, and it doesn't seem especially incompatible with corrigible behavior.

6TsviBT3d

I mean, yes, but you wrote a lot of stuff after this that seems weird / missing the point, to me. A "corrigible AGI" should do at least as well as--really, much better than--you would do, if you had a huge team of researchers under you and your full time, 100,000x speed job is to do a really good job at "being corrigible, whatever that means" to the human in the driver's seat. (In the hypothetical you're on board with this for some reason.)

6TsviBT3d

I would guess fairly strongly that you're mistaken or confused about this, in a way that an AGI would understand and be able to explain to you. (An example of how that would be the case: the version of "eudaimonia" that would not horrify you, if you understood it very well, has to involve meta+open consciousness (of a rather human flavor).)

2Mateusz Bagiński3d

I'm curious to hear more about those specific deliberate actions.

2TsviBT3d

Your and my beliefs/questions don't feel like they're even much coming into contact with each other... Like, you (and also other people) just keep repeating "something bad could happen". And I'm like "yeah obviously something extremely bad could happen; maybe it's even likely, IDK; and more likely, something very bad at the beginning of the reign would happen (Genghis spends is first 200 years doing more killing and raping); but what I'm ASKING is, what happens then?". If you're saying then, ok, you can say that, but I want to understand why; and I have some reasons (as presented) for thinking otherwise.

2Thane Ruthenis3d

Your hypothesis is about the dynamics within human minds embedded in something like contemporary societies with lots of other diverse humans whom the rulers are forced to model for one reason or another. My point is that evil, rash, or unwise decisions at the very start of the process are likely, and that those decisions are likely to irrevocably break the conditions in which the dynamics you hypothesize are possible. Make the minds in charge no longer human in the relevant sense, or remove the need to interact with/model other humans, etc. In my view, it doesn't strongly bear on the final outcome-distribution whether the "humans tend to become nicer to other humans over time" hypothesis is correct, because "the god-kings remain humans hanging around all the other humans in a close-knit society for millennia" is itself a very rare class of outcomes.

2TsviBT3d

Absolutely not, no. Humans want to be around (some) other people, so the emperor will choose to be so. Humans want to be [many core aspects of humanness, not necessarily per se, but individually], so the emperor will choose to be so. Yes, the emperor could want these insufficiently for my argument to apply, as I've said earlier. But I'm not immediately recalling anyone (you or others) making any argument that, with high or even substantial probability, the emperor would not want these things sufficiently for my question, about the long-run of these things, to be relevant.

7Thane Ruthenis3d

Yes: some other people. The ideologically and morally aligned people, usually. Social/informational bubbles that screen away the rest of humanity, from which they only venture out if forced to (due to the need to earn money/control the populace, etc.). This problem seems to get worse as the ability to insulate yourself from other improves, as could be observed with modern internet-based informational bubbles or the surrounded-by-yes-men problem of dictators. ASI would make this problem transcendental: there would truly be no need to ever bother with the people outside your bubble again, they could be wiped out or their management outsourced to AIs. Past this point, you're likely never returning to bothering about them. Why would you, if you can instead generate entire worlds of the kinds of people/entities/experiences you prefer? It seems incredibly unlikely that human social instincts can only be satisfied – or even can be best satisfied – by other humans.

4TsviBT3d

You're 100% not understanding my argument, which is sorta fair because I didn't lay it out clearly, but I think you should be doing better anyway. Here's a sketch: 1. Humans want to be human-ish and be around human-ish entities. 2. So the emperor will be human-ish and be around human-ish entities for a long time. (Ok, to be clear, I mean a lot of developmental / experiential time--the thing that's relevant for thinking about how the emperor's way of being trends over time.) 3. When being human-ish and around human-ish entities, core human shards continue to work. 4. When core human shards continue to work, MAYBE this implies EVENTUALLY adopting beneficence (or something else like cosmopolitanism), and hence good outcomes. 5. Since the emperor will be human-ish and be around human-ish entities for a long time, IF 4 obtains, then good outomes. And then I give two IDEAS about 4 (communing->[universalist democracy], and [information increases]->understanding->caring).

4Thane Ruthenis3d

I don't know what's making you think I don't understand your argument. Also, I've never publicly stated that I'm opting into Crocker's Rules, so while I happen not to particularly mind the rudeness, your general policy on that seems out of line here. My argument is that the process you're hypothesizing would be sensitive to the exact way of being human-ish, the exact classes of human-ish entities around, and the exact circumstances in which the emperor has to be around them. As a plain and down-to-earth example, if a racist surrounds themselves with a hand-picked group of racist friends, do you expect them to eventually develop universal empathy, solely through interacting with said racist friends? Addressing your specific ideas: nobody in that group would ever need to commune with non-racists, nor have to bother learning more about non-racists. And empirically, such groups don't seem to undergo spontaneous deradicalizations.

2TsviBT3d

I expect they'd get bored with that.

2TsviBT3d

So what do you think happens when they are hanging out together, and they are in charge, and it has been 1,000 years or 1,000,000 years?

2Thane Ruthenis3d

One or both of: * They keep each other radicalized forever as part of some transcendental social dynamic. * They become increasingly non-human as time goes on, small incremental modifications and personality changes building on each other, until they're no longer human in the senses necessary for your hypothesis to apply. I assume your counter-model involves them getting bored of each other and seeking diversity/new friends, or generating new worlds to explore/communicate with, with the generating processes not constrained to only generate racists, leading to the extremists interacting with non-extremists and eventually incrementally adopting non-extremist perspectives? If yes, this doesn't seem like the overdetermined way for things to go: * The generating processes would likely be skewed towards only generating things the extremists would find palatable, meaning more people sharing their perspectives/not seriously challenging whatever deeply seated prejudices they have. They're there to have a good time, not have existential/moral crises. * They may make any number of modifications to themselves to make them no longer human-y in the relevant sense. Including by simply letting human-standard self-modification algorithms run for 10^3-10^6 years, becoming superhumanly radicalized. * They may address the "getting bored" part instead, periodically wiping their memories (including by standard human forgetting) or increasing each other's capacity to generate diverse interactions.

4TsviBT3d

Ok so they only generate racists and racially pure people. And they do their thing. But like, there's no other races around, so the racism part sorta falls by the wayside. They're still racially pure of course, but it's usually hard to tell that they're racist; sometimes they sit around and make jokes to feel superior over lesser races, but this is pretty hollow since they're not really engaged in any type of race relations. Their world isn't especially about all that, anymore. Now it's about... what? I don't know what to imagine here, but the only things I do know how to imagine involve unbounded structure (e.g. math, art, self-reflection, self-reprogramming). So, they're doing that stuff. For a very long time. And the race thing just is not a part of their world anymore. Or is it? I don't even know what to imagine there. Instead of having tastes about ethnicity, they develop tastes about questions in math, or literature. In other words, [the differences between people and groups that they care about] migrate from race to features of people that are involved in unbounded stuff. If the AGI has been keeping the racially impure in an enclosure all this time, at some point the racists might have a glance back, and say, wait, all the interesting stuff about people is also interesting about these people. Why not have them join us as well.

3Mateusz Bagiński3d

For the same reason that most people (if given the power to do so) wouldn't just replace their loved ones with their altered versions that are better along whatever dimensions the person judged them as deficient/imperfect.

2TsviBT3d

Yeah I mean this is perfectly plausible, it's just that even these cases are not obvious to me.

6Garrett Baker4d

If this were true, I’d expect much lower divorce rates. After all, who do you have the most information about other than your wife/husband, and many of these divorces are un-amicable, though I wasn’t quickly able to get particular numbers. [EDIT:] Though in either case, this indeed indicates a much decreasing level of love over long periods of time & greater mutual knowledge. See also the decrease in all objective measures of quality of life after divorce for both parties after long marriages.

4TsviBT4d

(I wrote my quick take quickly and therefore very elliptically, and therefore it would require extra charity / work on the reader's part (like, more time spent asking "huh? this makes no sense? ok what could he have meant, which would make this statement true?").) It's an interesting point, but I'm talking about time scales of, say, thousands of years or millions of years. So it's certainly not a claim that could be verified empirically by looking at any individual humans because there aren't yet any millenarians or megaannumarians. Possibly you could look at groups that have had a group consciousness for thousands of years, and see if pairs of them get friendlier to each other over time, though it's not really comparable (idk if there are really groups like that in continual contact and with enough stable collectivity; like, maybe the Jews and the Indians or something).

2Garrett Baker4d

If its not a conclusion which could be disproven empirically, then I don’t know how you came to it. I mean, I did ask myself about counter-arguments you could have with my objection, and came to basically your response. That is, something approximating “well they just don’t have enough information, and if they had way way more information then they’d love each other again” which I don’t find satisfying. Namely because I expect people in such situations get stuck in a negative-reinforcement cycle, where the things which used to be fun which the other did lose their novelty over time as they get repetitive, which leads to the predicted reward of those interactions overshooting the actual reward, which in a TD learning sense is just as good (bad) as a negative reinforcement event. I don’t see why this would be fixed with more knowledge, and it indeed does seem likely to be exacerbated with more knowledge as more things the other does become less novel & more boring, and worse, fundamental implications of their nature as a person, rather than unfortunate accidents they can change easily. I also think intuitions in this area are likely misleading. It is definitely the case now that marginally more understanding of each other would help with coordination problems, since people love making up silly reasons to hate each other. I do also think this is anchoring too much on our current bandwidth limitations, and generalizing too far. Better coordination does not always imply more love.

4TsviBT4d

This does not sound like the sort of problem you'd just let yourself wallow in for 1000 years. And again, with regards to what is fixed by more information, I'm saying that capacity for love increases more. After 1000 years, both people would have gotten bored with themselves, and learned to do infinite play!

4TsviBT4d

Oh my god. Do you think when I said this, I meant "has no evidentiary entanglement with sense observatiosn we can make"?

2TsviBT4d

Maybe there's a more basic reading comprehension fail: I said capacity to love increases more with more information, not that you magically start loving each other.

5Viliam4d

Maybe some people are, and some people are not? Not sure if we are talking about the same thing, but I think that there are many people who just "play it safe", and in a civilized society that generally means following the rules and avoiding unnecessary conflicts. The same people can behave differently if you give them power (even on a small scale, e.g. when they have children). But I think there are also people who try to do good even when the incentives point the other way round. And also people who can't resist hurting others even when that predictably gets them punished. Knowing more about people allows you to have a better model of them. So if you started with the assumption e.g. that people who don't seem sufficiently similar to you are bad, then knowing them better will improve your attitude towards them. On the other hand, if you started from some kind of Pollyanna perspective, knowing people better can make you disappointed and bitter. Finally, if you are a psychopath, knowing people better just gives you more efficient ways to exploit them.

2TsviBT4d

Right. Presumably, maybe. But I am interested in considering quite extreme versions of the claim. Maybe there's only 10,000 people who would, as emperor, make a world that is, after 1,000,000 years, net negative according to us. Maybe there's literally 0? I'm not even sure that there aren't literally 0, though quite plausibly someone else could know this confidently. (For example, someone could hypothetically have solid information suggesting that someone could remain truly delusionally and disorganizedly psychotic and violent to such an extent that they never get bored and never grow, while still being functional enough to give directions to an AI that specify world domination for 1,000,000 years.)

8Viliam4d

Sounds to me like wishful thinking. You basically assume that in 1 000 000 years people will get bored of doing the wrong thing, and start doing the right thing. My perspective is that "good" is a narrow target in the possibility space, and if someone already keeps missing it now, if we expand their possibility space by making them a God-emperor, the chance of converging to that narrow target only decreases. Basically, for your model to work, kindness would need to be the only attractor in the space of human (actually, post-human) psychology. A simple example of how things could go wrong is for Genghis Khan to set up an AI to keep everyone else in horrible conditions forever, and then (on purpose, or accidentally) wirehead himself. Another example is the God-emperor editing their own brain to remove all empathy, e.g. because they consider it a weakness at the moment. Once all empathy is uninstalled, there is no incentive to reinstall it. EDIT: I see that Thane Ruthenis already made this argument, and didn't convince you.

3TsviBT4d

No, I ask the question, and then I present a couple hypothesis-pieces. (Your stance here seems fairly though not terribly anti-thought AFAICT, so FYI I may stop engaging without further warning.) I'm seriously questioning whether it's a narrow target for humans. Curious to hear other attractors, but your proposals aren't really attractors. See my response here: https://www.lesswrong.com/posts/Ht4JZtxngKwuQ7cDC/tsvibt-s-shortform?commentId=jfAoxAaFxWoDy3yso Ah I see you saw Ruthenis's comment and edited your comment to say so, so I edited my response to your comment to say that I saw that you saw.

2Viliam4d

Well, if we assume that humans are fundamentally good / inevitably converging to kindness if given enough time... then, yeah, giving someone God-emperor powers is probably going to be good in long term. (If they don't accidentally make an irreparable mistake.) I just strongly disagree with this assumption.

3TsviBT4d

It's not an assumption, it's the question I'm asking and discussing.

2Viliam4d

Ah, then I believe the answer is "no". On the time scale of current human lifespan, I guess I could point out that some old people are unkind, or that some criminals keep re-offending a lot, so it doesn't seem like time automatically translates to more kindness. But an obvious objection is "well, maybe they need 200 years of time, or 1000", and I can't provide empirical evidence against that. So I am not sure how to settle this question. On average, people get less criminal as they get older, so that would point towards human kindness increasing in time. On the other hand, they also get less idealistic, on average, so maybe a simpler explanation is that as people get older, they get less active in general. (Also, some reduction in crime is caused by the criminals getting killed as a result of their lifestyle.) There is probably a significant impact of hormone levels, which means that we need to make an assumption about how the God-emperor would regulate their own hormones. For example, if he decides to keep a 25 years old human male body, maybe his propensity to violence will match the body? tl;dr - what kinds of arguments should even be used in this debate?

5TsviBT4d

Ok, now we have a reasonable question. I don't know, but I provided two argument-sketches that I think are of a potentially relevant type. At an abstract level, the answer would be "mathematico-conceptual reasoning", just like in all previous instances where there's a thing that has never happened before, and yet we reason somewhat successfully about it--of which there are plenty examples, if you think about it for a minute.

2Mateusz Bagiński3d

When I read Tsvi's OP, I was imagining something like a (trans-/post- but not too post-)human civilization where everybody by default has an unbounded lifespan and healthspan, possibly somewhat boosted intelligence and need for cognition / open intellectual curiosity. (In which case, "people tend to X as they get older", where X is something mostly due to things related to default human aging, doesn't apply.) Now start it as a modern-ish democracy or a cluster of (mostly) democracies, run for 1e4 to 1e6 years, and see what happens.

2Noosphere894d

I basically don't buy the conjecture of humans being super-cooperative in the long run, or hatred decreasing and love increasing. To the extent that something like this is true, I expect it to be a weird industrial to information age relic that utterly shatters if AGI/ASI is developed, and this remains true even if the AGI is aligned to a human.

3TsviBT4d

So just don't make an AGI, instead do human intelligence amplification.

1Purplehermann4d

People love the idea (as opposed to reality) of other people quite often, and knowing the other better can allow for plenty of hate

2TsviBT4d

Seems true. I don't think this makes much contact with any of my claims. Maybe you're trying to address: To clarify the question (which I didn't do a good job of in the OP), the question is more about 1000 years or 1,000,000 years than 1 or 10 years.

[-]TsviBT2mo260

It is still the case that some people don't sign up for cryonics simply because it takes work to figure out the process / financing. If you do sign up, it would therefore be a public service to write about the process.

[-]Mo Putera2mo120

Mingyuan has written Cryonics signup guide #1: Overview.

4Joseph Miller2mo

For those in Europe, Tomorrow Biostasis makes the process a lot easier and they have people who will talk you through step by step.

1sjadler2mo

5niplav2mo

I tried to write a little bit about that here.

[-]TsviBT5mo183

Protip: You can prevent itchy skin from being itchy for hours by running it under very hot water for 5-30 seconds. (Don't burn yourself; I use tap water with some cold water, and turn down the cold water until it seems really hot.)

4ryan_greenblatt5mo

I think this works on the same principle as this device which heats up a patch that you press to your skin. I've also found that it works to heat up a spoon in near boiling water and press it to my skin for a few seconds.

4Nathan Helm-Burger5mo

I recently bought a battery-powered tool that creates a brief pulse of heat in a small metal applicator, specifically designed for treating itchy mosquito bites. It works well! In the case of the mosquito bite, there is the additional aspect of denaturing the proteins left behind by the mosquito in order to cause them to be less allergenic.

[-]TsviBT1mo*170

(These are 100% unscientific, just uncritical subjective impressions for fun. CQ = cognitive capacity quotient, like generally good at thinky stuff)

Overeat a bit, like 10% more than is satisfying: -4 CQ points for a couple hours.
Overeat a lot, like >80% more than is satisfying: -9 CQ points for 20 hours.
Sleep deprived a little, like stay up really late but without sleep debt: +5 CQ points.
Sleep debt, like a couple days not enough sleep: -11 CQ points.
Major sleep debt, like several days not enough sleep: -20 CQ points.
Oversleep a lot, like 11 hours: +6 CQ points.
Ice cream (without having eaten ice cream in the past week): +5 CQ points.
Being outside: +4 CQ points.
Being in a car: -8 CQ points.
Walking in the hills: +9 CQ points.
Walking specifically up a steep hill: -5 CQ points.
Too much podcasts: -8 CQ points for an hour.
Background music: -6 to -2 CQ points.
Kinda too hot: -3 CQ points.
Kinda too cold: +2 CQ points.

(stimulants not listed because they tend to pull the features of CQ apart; less good at real thinking, more good at relatively rote thinking and doing stuff)

7Mateusz Bagiński1mo

Are you sure about the sign here? I think I'm more prone to some kinds of creative/divergent thinking when I'm mildly to moderately sleep-deprived (at least sometimes in productive directions) but also worse in precise/formal/mathetmatical thinking about novel/unfamiliar stuff. So the features are pulled apart.

4TsviBT1mo

No yeah that's my experience too, to some extent. But I would say that I can do good mathematical thinking there, including correctly truth-testing; just less good at algebra, and as you say less good at picking up an unfamiliar math concept.

6Steven Byrnes1mo

I feel like I’ve really struggled to identify any controllable patterns in when I’m “good at thinky stuff”. Gross patterns are obvious—I’m reliably great in the morning, then my brain kinda peters out in the early afternoon, then pretty good again at night—but I can’t figure out how to intervene on that, except scheduling around it. I’m extremely sensitive to caffeine, and have a complicated routine (1 coffee every morning, plus in the afternoon I ramp up from zero each weekend to a full-size afternoon tea each Friday), but I’m pretty uncertain whether I’m actually getting anything out of that besides a mild headache every Saturday. I wonder whether it would be worth investing the time and energy into being more systematic to suss out patterns. But I think my patterns would be pretty subtle, whereas yours sound very obvious and immediate. Hmm, is there an easy and fast way to quantify “CQ”? (This pops into my head but seems time-consuming and testing the wrong thing.) …I’m not really sure where to start tbh. …I feel like what I want to measure is a 1-dimensional parameter extremely correlated with “ability to do things despite ugh fields”—presumably what I’ve called “innate drive to minimize voluntary attention control” being low a.k.a. “mental energy” being high. Ugh fields are where the parameter is most obvious to me but it also extends into thinking well about other topics that are not particularly aversive, at least for me, I think.

6Mateusz Bagiński1mo

(BTW first you say "CQ" and then "GQ")

2TsviBT1mo

Ohhh. Thanks. I wonder why I did that.

2Mateusz Bagiński1mo

Major sleep debt? Probably either one of (or some combination of): (1) "g" is the next consonant after "c" in "cognitive"; (2) leakage from "g-factor"; (3) leakage from "general(ly good at thinking)"

2TsviBT1mo

(1) was my guess. Another guess is that there's a magazine "GQ".

1[anonymous]1mo

I notice the potential for combo between these two: (One can stay up till sleep deprived and oversleep on each sleep/wake cycle, though they'll end up with a non-24-hour schedule)

4TsviBT1mo

Yep! Without cybernetic control (I mean, melatonin), I have a non-24-hour schedule, and I believe this contributes >10% of that.

1[anonymous]1mo

Also, That might generalize to "minimizing sound good", in which case I'd suggest trying these earplugs. Generalizing to sensory deprivation in general, an easy way to do that is to lay in bed with eyes closed and lights off (not to sleep). (I've found this helpful, but maybe it's a side effect of not being distracted by a computer)

3TsviBT1mo

I quite dislike earplugs. Partly it's the discomfort, which maybe those can help with; but partly I just don't like being closed away from hearing what's around me. But maybe I'll try those, thanks (even though the last 5 earplugs were just uncomfortable contra promises). Yeah, I mean I think the music thing is mainly nondistraction. The quiet of night is great for thinking, which doesn't help the sleep situation.

[-]TsviBT2d163

Recommendation for gippities as research assistants: Treat them roughly like you'd treat RationalWiki, i.e. extremely shit at summaries / glosses / inferences, quite good at citing stuff and fairly good at finding random stuff, some of which is relevant.

4Richard_Kennaway2d

Works for me, I don't use either!

[-]TsviBT2mo132

"The Future Loves You: How and Why We Should Abolish Death" by Dr Ariel Zeleznikow-Johnston is now available to buy. I haven't read it, but I expect it to be a definitive anti-deathist monograph. https://www.amazon.com/Future-Loves-You-Should-Abolish-ebook/dp/B0CW9KTX76

The description (copied from Amazon):

A brilliant young neuroscientist explains how to preserve our minds indefinitely, enabling future generations to choose to revive us

Just as surgeons once believed pain was good for their patients, some argue today that death brings meaning to life. But given humans rarely live beyond a century – even while certain whales can thrive for over two hundred years – it’s hard not to see our biological limits as profoundly unfair. No wonder then that most people nearing death wish they still had more time.

Yet, with ever-advancing science, will the ends of our lives always loom so close? For from ventilators to brain implants, modern medicine has been blurring what it means to die. In a lucid synthesis of current neuroscientific thinking, Zeleznikow-Johnston explains that death is no longer the loss of heartbeat or breath, but of personal identity – that the core of our identities is ou... (read more)

[-]TsviBT5d122

Discourse Wormholes.

In complex or contentious discussions, the central or top-level topic is often altered or replaced. We're all familiar from experience with this phenomenon. Topologically this is sort of like a wormhole:

Imagine two copies of $R^{3}$ minus the open unit ball, glued together along the unit spheres. Imagine enclosing the origin with a sphere of radius 2. This is a topological separation: The origin is separated from the rest of your space, the copy of $R^{3}$ that you're standing in. But, what's contained in the enclosure is an entire world just as large; therefore, the origin is not really contained, merely separated. One could walk through the enclosure, and pass through the unit ball boundary, and then proceed back out through the unit ball boundary into the other alternative copy of $R^{3}$ .

You come to a crux of the issue, or you come to a clash of discourse norms or background assumptions; and then you bloop, where now that is the primary motive or top-level criterion for the conversation.

This has pluses and minuses. You are finding out what the conversation really wanted to be, finding what you most care about here, finding out what the two of most ought to fight about ... (read more)

2Mateusz Bagiński5d

A particularly annoying-to-me kind of discourse wormhole: 1. ^ Or even, eh, social pressures, etc.

6TsviBT5d

Mhm. Yeah that's annoying. Though in her probabilistic defense, 1. In fact her salience might have changed; she might not have noticed either; it might not even be a genuinely adversarial process (even subconsciously). 2. She might reasonably not know exactly what position she wants to defend, while still being able to partially and partially-truthfully defend it. For example, she might have a deep intuition that incest is morally wrong; and then give an argument against incest that's sort of true, like "there's power differences" or "it could make a diseased baby"; and then you argue / construct a hypothetical where those things aren't relevant; and then she switches to "no but like, the family environment in general has to be decision-theoretically protected from this sort of possibility in order to prevent pressures", and claims that's what she's been arguing all along. Where from your perspective, the topic was the claim "disease babies mean incest is bad", but from hers it was "something inchoate which I can't quite express yet means incest is bad". And her behavior can be cooperative, at least as described so far: she's working out her true rejection by trying out some rejections she knows how to voice. 3. Sometimes I'm talking to someone (e.g. about AGI timelines) and they'll start listing facts. And the facts don't seem immediately relevant, or like, I don't know how to take them on board or to respond because I don't know what argument is being made. And if I try to clarify what argument is being made, they just keep listing more facts--which disorients me; and I often imagine that they have a background assumption like "A implies B" and so they have started giving supporting facts to explain and give evidence for A. And then I'm confused because I don't know what B even is. So I try to ask; but what I get back is more stuff about A; and they are hoping that if they just say enough stuff and convince me of A, then of course since A implies B and B is

[-]TsviBT2mo7-11

The standard way to measure compute is FLOPS. Besides other problems, this measure has two major flaws: First, no one cares exactly how many FLOPS you have; we want to know the order of magnitude without having to incant "ten high". Second, it sounds cute, even though it's going to kill us.

I propose an alternative: Digital Orders Of Magnitude (per Second), or DOOM(S).

[-]TsviBT12d60

(Speculative) It seems like biotech VC is doing poorly, and this stems from the fact that it's a lot of work to discriminate good from bad prospects for the biology itself. (As opposed to, say, ability of a team to execute a business, or how much of a market there is, etc.) If this is true, have some people tried making a biotech VC firm that employs bio experts--like, say, PhD dropouts--to do deep background on startups?

[-]TsviBT2mo40

Say a "deathist" is someone who says "death is net good (gives meaning to life, is natural and therefore good, allows change in society, etc.)" and a "lifeist" ("anti-deathist") is someone who says "death is net bad (life is good, people should not have to involuntarily die, I want me and my loved ones to live)". There are clearly people who go deathist -> lifeist, as that's most lifeists (if nothing else, as an older kid they would have uttered deathism, as the predominant ideology). One might also argue that young kids are naturally lifeist, and there... (read more)

6Mateusz Bagiński2mo

This is not quite deathism but perhaps a transition in the direction of "my own death is kinda not as bad": and in a comment:

2Alexander Gietelink Oldenziel2mo

Me.

2TsviBT2mo

In what sense were you lifeist and now deathist? Why the change?

1mattmacdermott2mo

Other (more compelling to me) reasons for being a "deathist": * Eternity can seem kinda terrifying. * In particular, death is insurance against the worst outcomes lasting forever. Things will always return to neutral eventually and stay there.

6TsviBT2mo

A lifeist doesn't say "You must decide now to live literally forever no matter what happens."!

1mattmacdermott2mo

Fine, but it still seems like a reason one could give for death being net good (which is your chief criterion for being a deathist). I do think it's a weaker reason than the second one. The following argument in defence of it is mainly for fun: I slightly have the feeling that it's like that decision theory problem where the devil offers you pieces of a poisoned apple one by one. First half, then a quarter, then an eighth, than a sixteenth... You'll be fine unless you eat the whole apple, in which case you'll be poisoned. Each time you're offered a piece it's rational to take it, but following that policy means you get poisoned. The analogy is that I consider living for eternity to be scary, and you say, "well, you can stop any time". True, but it's always going to be rational for me to live for one more year, and that way lies eternity.

2Mateusz Bagiński2mo

The distinction you want is probably not rational/irrational but CDT/UDT or whatever, Also, well, it's also insurance against the best outcomes lasting forever (though you're probably going to reply that bad outcomes are more likely than good outcomes and/or that you care more about preventing bad outcomes than ensuring good outcomes)

Moderation Log