LESSWRONG
LW

All of ABlue's Comments + Replies

You want to help? Figure out what kind of incremental changes you can begin to introduce in any of them, in order to begin extinguishing the sort of problems you've now elevated to the rank of "saving-worthy" in your own head. Note that, in all likelihood, by extinguishing one you will merrily introduce a whole bunch of others - something you won't get to discover until much later one. Yet that is, realistically, what you can actually go on to accomplish.

I read this paragraph as saying ~the same thing as the original post in a different tone

Most arguments for AI Doom are either bad or weak

ABlue6mo30

Is there a better way of discovering strong arguments for a non-expert than asking for them publicly?

2Tarnish6mo

Strong arguments of this kind? I sure hope not, that'd make it easier for more people to find insights for how to build an AI that causes doom.

Doing Nothing Utility Function

ABlue7mo10

Also, it assumes there is a separate module for making predictions, which cannot be manipulated by the agent. This assumption is not very probable in my view.

Isn't this a blocker for any discussion of particular utility functions?

Alignment by default: the simulation hypothesis

ABlue7mo10

If a simple philosophical argument can cut the expected odds of AI doom by an order of magnitude, we might not change our current plans, but it suggests that we have a lot of confusion on the topic that further research might alleviate.

And more generally, "the world where we almost certainly get killed by ASI" and "The world where we have an 80% chance of getting killed by ASI" are different worlds, and, ignoring motives to lie for propaganda purposes, if we actually live in the latter we should not say we live in the former.

2Seth Herd7mo

It's the first, there's a lot of uncertainty. I don't think anyone is lying deliberately, although everyone's beliefs tend to follow what they think will produce good outcomes. This is called motivated reasoning. I don't think this changes the situation much, except to make it harder to coordinate. Rushing full speed ahead while we don't even know the dangers is pretty dumb. But some people really believe the dangers are small so they're going to rush ahead. There aren't strong arguments or a strong consensus for the danger being extremely high, even though looking at opinions of the most thorough thinkers puts risks in the alarmingly high, 50‰ plus range. Add to this disagreement the fact that most people are neither longtermist nor utilitarian; they'd like a chance to get rich and live forever even if it risks humanity's future.

Ruby's Quick Takes

ABlue7mo10

I don't think wireheading is "myopic" when it overlaps with self-maintenance. Classic example would be painkillers; they do ~nothing but make you "feel good now" (or at least less bad), but sometimes feeling less bad is necessary to function properly and achieve long-term value. I think that gratitude journaling is also part of this overlap area. That said I don't know many peoples' experiences with it so maybe it's more prone to "abuse" than I expect.

4Ruby7mo

Yeah, I think a question is whether I want to say "that kind of wireheading isn't mypoic" vs "that isn't wireheading". Probably fine eitherway if you're consistent / taboo adequately.

p4rziv4l's Shortform

ABlue7mo10

A corrigible AI is one that is cooperative to attempts to modify it to bring it more in line with what its creators/users want it to be. Some people think that this is a promising direction for alignment research, since if an AI could be guaranteed to be corrigible, even if it end up with wild/dangerous goals, we could in principle just modify it to not have those goals and it wouldn't try to stop us.

"Alignment win condition," as far as I know, is a phrase I just made up. I mean it as something that, regardless of whether it "solves" alignment in a specifi... (read more)

p4rziv4l's Shortform

ABlue7mo*10

I don't trust a hypothetical arbitrary superintelligence but I agree that a superintelligence is too much power for any extant organization, which means that "corrigibility" is not an alignment win condition. An AI resisting modification to do bad things (whatever that might mean on reflection) seems like a feature, not a bug.

1p4rziv4l7mo

What do you mean by corrigibility? Also, what do you mean by "alignment win"?

Secular interpretations of core perennialist claims

ABlue8mo90

Do you believe or allow for a distinction between value and ethics? Intuitively it feels like metaethics should take into account the Goodness of Reality principle, but I think my intuition comes from a belief that if there's some objective notion of Good, ethics collapses to "you should do whatever makes the world More Gooder," and I suppose that that's not strictly necessary.

zhukeepa8mo110

I do draw a distinction between value and ethics. Although my current best guess is that decision theory does in some sense reduce ethics to a subset of value, I do think it's a subset worth distinguishing. For example, I still have a concept of evaluating how ethical someone is, based on how good they are at paying causal costs for larger acausal gains.

I think the Goodness of Reality principle is maybe a bit confusingly named, because it's not really a claim about the existence of some objective notion of Good that applies to reality per se, and is ... (read more)

How I started believing religion might actually matter for rationality and moral philosophy

ABlue8mo138

The adulterer, the slave owner and the wartime rapist all have solid evolutionary reasons to engage in behaviors most of us might find immoral. I think their moral blind spots are likely not caused by trapped priors, like an exaggerated fear of dogs is.

I don't think the evopsych and trapped-prior views are incompatible. A selection pressure towards immoral behavior could select for genes/memes that tend to result in certain kinds of trapped prior.

How I started believing religion might actually matter for rationality and moral philosophy

ABlue8mo128

I also suspect something along the lines of "Many (most?) great spiritual leaders were making a good-faith effort to understand the same ground truth with the same psychological equipment and got significantly farther than most normal people do." But in order for that to be plausible, you would need a reason why the almost-truths they found are so goddamn antimemetic that the most studied and followed people in history weren't able to make them stick. Some of the selection pressure surely comes down to social dynamics. I'd like to think that people who hav... (read more)

zhukeepa8mo125

But in order for that to be plausible, you would need a reason why the almost-truths they found are so goddamn antimemetic that the most studied and followed people in history weren't able to make them stick.

A few thoughts:

I think many of the truths do stick (like "it's never too late to repent for your misdeeds"), but end up getting wrapped up in a bunch of garbage.
The geeks, mops, and sociopaths model feels very relevant, with the great spiritual leaders / people who were serious about doing inner work being the geeks.
In some sense, the

... (read more)

davekasten's Shortform

ABlue8mo136

That "so now what" doesn't sound like a dead end to me. The question of how to mitigate risk when normal risk-mitigation procedures are themselves risky seems like an important one.

J's Shortform

ABlue8mo10

Alright, based on your phrasing I had thought it was something you believed. I'm open to moral realism and I don't immediately see how phenomena being objectively bad would imply that physics is objectively bad.

J's Shortform

ABlue8mo10

Why does something causing something bad make that thing itself bad?

1J8mo

you'd have to ask a moral realist. but i think they would say hitler caused the holocaust so hitler is bad.

Relativity Theory for What the Future 'You' Is and Isn't

ABlue9mo20

Then I'd like to see some explanation why it doesn't have an answer, which would be adding back to normality.

I'm not saying it doesn't, I'm saying it's not obvious that it does. Normalcy requirements don't mean all our possibly-confused questions have answers, they just put restrictions on what those answers should look like. So, if the idea of successors-of-experience is meaningful at all, our normal intuition gives us desiderata like "chains of sucessorship are continuous across periods of consciousness" and "chains of successorship do not fork or mer... (read more)

Is objective morality self-defeating?

ABlue9mo10

[...]only autonomous (driven by internal will) actions, derived from duty to the moral law, can be considered moral.

[...] a belief in a thing has a totalising effect on the will of the subject.

What makes this totalizing effect distinct from the "duty to moral law" explicitly called for?

Relativity Theory for What the Future 'You' Is and Isn't

ABlue9mo10

People tend to agree that one should care about the successor of your subjective experience. The question is whether there will be one or not.And this is the question of fact.

But the question of "what, if anything, is the successor of your subjective experience" does not obviously have a single factual answer.

I can conceptualize a world where a soul always stays tied to the initial body, and as soon as its destroyed, its destroyed as well.

If souls are real (and the Hard Problem boils down to "it's the souls, duh"), then a teleporter that doesn't rea... (read more)

1Ape in the coat9mo

Then I'd like to see some explanation why it doesn't have an answer, which would be adding back to normality. I understand that I'm confused about the matter in some way. But I also understand that just saying "don't think about it" doesn't clear my confusion in the slightest. Nevermind cosciousness and the so called Hard Problem. By "soul" here I simply mean the carrier for identity over time, which may very well be physical. Yes, indeed it may be the case that perfect teleporter/cloning machine is just impossible because of such soul. That would be an appropriate solution to these problems.

The Potential Impossibility of Subjective Death

ABlue9mo10

I'm not convinced that there is a single "way" one should expect to wake up in the morning. If we're talking about things like observer-moments and exotic theories of identity, I don't think we can reliably communicate by analogy to mundane situations, since our intuitions might differ in subtle ways that don't matter in those situations.

For instance, should I believe that I will wake up because that will lead me to make decisions that lead to world-states I prefer, or should I expect to wake up because it is true that I will probably wake up? If the latte... (read more)

The Potential Impossibility of Subjective Death

ABlue10mo80

What does it mean to "should expect" something, if your identity is transmitted across multiple universes with different ground truths?

6VictorLJZ10mo

The same way one "should expect" to wake up from sleep the next morning. "You" of this instant is a particular observer-moment that is being computed in a multitude of quantum branches or universes. Here the argument is that when you die, there will be quantum branches or other universes computing observer-moments of you being revived, and hence you are guaranteed to be "revived" after death from a subjective point of view. Could you elaborate on the ground truth part? I'm not sure I understand.

Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

ABlue10mo10

I think the key to approaches like this is to eschew pre-existing, complex concepts like "human flourishing" and look for a definition of Good Things that is actually amenable to constructing an agent that Does Good Things. There's no guarantee that this would lead anywhere; it relies on some weak form of moral realism. But an AGI that follows some morality-you-largely-agree-with by its very structure is a lot more appealing to me than an AGI that dutifully maximizes the morality-you-punched-into-its-utility-function-at-bootup, appealing enough that I think it's worth wading into moral philosophy to see if the idea pans out.

LLMs could be as conscious as human emulations, potentially

ABlue1y*10

Should it make a difference? Same iterative computation.

Not necessarily, a lot of information is being discarded when you're only looking at the paper/verbal output. As an extreme example, if the emulated brain had been instructed (or had the memory of being instructed) to say the number of characters written on the paper and nothing else, the computational properties of the system as a whole would be much simpler than of the emulation.

I might be missing the point. I agree with you that an architecture that predicts tokens isn't necessarily non-conscious. ... (read more)

LLMs could be as conscious as human emulations, potentially

ABlue1y10

I don't think that in the example you give, you're making a token-predicting transformer out of a human emulation; you're making a token-predicting transformer out of a virtual system with a human emulation as a component. In the system, the words "what's your earliest memory?" appearing on the paper are going to trigger all sorts of interesting (emulated) neural mechanisms that eventually lead to a verbal response, but the token predictor doesn't necessarily need to emulate any of that. In fact, if the emulation is deterministic, it can just memorize what... (read more)

1Canaletto1y

Should it make a difference? Same iterative computation. Yes, I talked about optimizations a bit. I think you are missing a point of this example. The point is that if you are trying to conclude from the fact that this system is doing next token prediction then it's definitely not conscious, you are wrong. And my example is an existence proof, kind of.

Magic by forgetting

ABlue1y30

The number of poor people is much larger than the number of billionaires, but the number of poor people who THINK they're billionaires probably isn't that much larger. Good point about needing to forget the technique, though.

Magic by forgetting

ABlue1y10

Is this an independent reinvention of the law of attraction? There doesn't seem to be anything special about "stop having a disease by forgetting about it" compared to the general "be in a universe by adopting a mental state compatible with that universe." That said, becoming completely convinced I'm a billionaire seems more psychologically involved than forgetting I have some disease, and the ratio of universes where I'm a billionaire versus I've deluded myself into thinking I'm a billionaire seems less favorable as well.

Anyway, this doesn't seem like a g... (read more)

2avturchin1y

The number of poor people is much larger than billionaire. So in most cases you will fail to wake up as a billionaire. But sometimes it will work and it is similar to law of attraction. But formulation via forgetting is more beautiful. You forget that you are poor. UPDATE; actually, the difference with the law of attraction is that after applying the law of attraction, a person still remember that he has used the law. In magic by forgetting the fact of its use must be completely forgotten.

When is a mind me?

ABlue1y167

What does it mean when one "should anticipate" something? At least in my mind, it points strongly to a certain intuition, but the idea behind that intuition feels confused. "Should" in order to achieve a certain end? To meet some criterion? To boost a term in your utility function?

I think the confusion here might be important, because replacing "should anticipate" with a less ambiguous "should" seems to make the problem easier to reason about, and supports your point.

For instance, suppose that you're going to get your brain copied next week. After you get ... (read more)

4torekp1y

I have a closely related objection/clarification. I agree with the main thrust of Rob's post, but this part: ..strikes me as confused or at least confusing. Take your chemistry/physics tests example. What does "I anticipate the experience of a sense of accomplishment in answering the chemistry test" mean? Well for one thing, it certainly indicates that you believe the experience is likely to happen (to someone). For another, it often means that you believe it will happen to you - but that invites the semantic question that Rob says this isn't about. For a third - and I propose that this is a key point that makes us feel there is a "substantive" question here - it indicates that you empathize with this future person who does well on the test. But I don't see how empathizing or not-empathizing can be assessed for accuracy. It can be consistent or inconsistent with the things one cares about, which I suppose makes it subject to rational evaluation, but that looks different from accuracy/inaccuracy.

2Rob Bensinger1y

In the OP: "Should" in order to have more accurate beliefs/expectations. E.g., I should anticipate (with high probability) that the Sun will rise tomorrow in my part of the world, rather than it remaining night.