MinusGix - LessWrong

I'm confused, why does that make the term no longer useful? There's still a large distinction between companies focusing on developing AGI (OpenAI, Anthropic, etc.) vs those focusing on more 'mundane' advancements (Stability, Black Forest, the majority of ML research results). Though I do disagree that it was only used to distinguish them from narrow AI. Perhaps that was what it was originally, but it quickly turned into the roughly "general intelligence like a smart human" approximate meaning we have today.
I agree 'AGI' has become an increasingly vague term, but that's because it is a useful distinction and so certain groups use it to hype. I don't think abandoning a term because it is getting weakened is a great idea.
We should talk more about specific cognitive capabilities, but that isn't stopped by us using the term AGI, it is stopped by not having people analyzing whether X is an important capability for risk or capability for stopping risk.

evhub's Shortform

MinusGix2mo1-2

2b/2c. I think I would say that we should want a tyranny of the present to the extent that is in our values upon reflection. If, for example, Rome still existed and took over the world, their CEV should depend on their ethics and population. I think it would still be a very good utopia, but it may also have things we dislike.
Other considerations, like nearby Everett branches... well they don't exist in this branch? I would endorse game theoretical cooperation with them, but I'm skeptical of any more automatic cooperation than what we already have. That is, this sort of fairness is a part of our values, and CEV (if not adversarially hacked) should represent those already? I don't think this would end up in a tyranny anything like the usual form of the word if we're actually implementing CEV. We have values for people being able to change and adjust over time, and so those are in the CEV. There may very well be limits to how far we want humanity to change in general, but that's perfectly allowed to be in our values. Like, as a specific example, some have said that they think global status games will be vastly important in the far future and thus a zero-sum resource. I find it decently likely that an AGI implementing CEV would discourage such, because humans wouldn't endorse it on reflection, even if it is a plausible default outcome.

Like, essentially my view is: Optimize our-branches' humanity's values as hard as possible, this contains desires for other people's values to be satisfied, and thus they're represented. Other forms of fairness to things we aren't completely a fans of can be bargained for (locally, or acausally between branches/whatever).

So that's my argument against the tyranny and Everett branches part. I'm less skeptical of considering whether to include the recently dead, but I also don't have a great theory of how to weight them. Those about to be born wouldn't have a notable effect on CEV, I'd believe.

The option you suggest in #3 is nice, though I think it runs some risks of being dominated or notably influenced by "humans in other very odd branches", and so we're outweighed by them despite them not locally existing. I think it is less that you want a human predicate, and more of a "human who has values compatible with this local branch". This is part of why I advocate just bargaining between branches: if the humans in an AGI-made New Rome want us to instantiate their constructed friendly/restricted AGI Gods locally to proselytize, they can trade for it rather than that faction being automatically divvied out a star by our AGI's CEV.
"Human who has values compatible with this local branch" feels weak as a definition, arbitrary, but I'm not sure we can do better than that. I imagine we'd even have weightings, because we likely legitimately value baby's in special ways that don't entail maxing out reward centers or boosting them to megaminds soon after birth, we have preferences about that. Then of course there's minds that are sortof humanish, which is why you'd have a weighting.

(This is kinda rambly, but I do think a lot of this can be avoided with just plain CEV because I think most people on reflection would end up with "reevaluate whether the deal was fair with reflection and then adjust the deal and reference class based on that".)

Embee's Shortform

MinusGix2mo45

I agree Grothendieck is fascinating but I mostly just see him as interesting in different ways than von Neumann. von Neumann is often focused on because his subjects are areas that are relevant to either LessWrong's focuses or (for the cloning posts) that the subjects he was skilled at and polymath capabilities would help with alignment.

You are not too "irrational" to know your preferences.

MinusGix4mo43

I define rationality as "more in line with your overall values". There are problems here, because people do profess social values that they don't really hold (in some sense), but roughly it is what they would reflect on and come up with.
Someone could value the short-term more than the long-term, but I think that most don't. I'm unsure if this is a side-effect of Christianity-influenced morality or just a strong tendency of human thought.

Locally optimal is probably the correct framing, but that it is irrational relative to whatever idealized values the individual would have. Just like how a hacky approximation of a Chess engine is irrational relative to Stockfish—they both can be roughly considered to have the same goal, just one has various heuristics and short-term thinking that hampers it. These heuristics can be essential, as it runs with less processing power, but in the human mind they can be trained and tuned.

Though I do agree that smoking isn't always irrational: I would say smoking is irrational for the supermajority of human minds, however. The social negativity around smoking may be what influences them primarily, but I'd consider that just another fragment of being irrational— >90% of them would have a value for their health, but they are varying levels of poor at weighting the costs and the social negativity response is easier for the mind to emulate. Especially since they might see people walking around them while they're out taking a cigarette. (Of course, the social approval is some part of a real value too; though people have preferences about which social values they give into)

Are You More Real If You're Really Forgetful?

Answer by MinusGixNov 26, 202430

An important question here is "what is the point of being 'more real'?". Does having a higher measure give you a better acausal bargaining position? Do you terminally value more realness? Less vulnerable to catastrophes? Wanting to make sure your values are optimized harder?

I consider these, except for the terminal sense, to be rather weak as far as motivations go.

Acausal Bargaining: Imagine a bunch of nearby universes with instances of 'you'. They all have variations, some very similar, others with directions that seem a bit strange to the others. Still identifiably 'you' by a human notion of identity. Some of them became researchers, others investors, a few artists, writers, and a handful of CEOs.

You can model these as being variations on some shared utility function: where $U$ is shared, and $α_{i}$ is the individual utility function. Some of them are more social, others cynical, and so on. A believable amount of human variation that won't necessarily converge to the same utility function on reflection (but quite close).

For a human, losing memories so that you are more real is akin to each branch chopping off the $α_{i}$ . They lose memories of a wonderful party which changed their opinion of them, they no longer remember the horrors of a war, and so on.

Everyone may do the simple ask of losing all their minor memories which has no effect on the utility function, but then if you want more bargaining power, do you continue? The hope is that this would make your coalition easier to locate, to be more visible in "logical sight". That this increased bargaining power would thus ensure that, at the least, your important shared values are optimized harder than they could if you were a disparate group of branches.

I think this is sometimes correct, but often not.
From a simple computationalist perspective, increasing the measure of the 'overall you' is of little matter. The part that bargains, your rough algorithm and your utility function, is already shared: $U$ is shared among all your instances already, some of you just have considerations that pull in other directions ( $α_{i}$ ). This is the same core idea of the FDT explanation of why people should vote: because, despite not being clones of you, there is a group of people that share similar reasoning as you. Getting rid of your memories in the voting case does not help you!

For the Acausal Bargaining case, there is presumably some value in being simpler. But, that means more likely that you should bargain 'nearby' to present a computationally cheaper value function 'far away'. So, similar to forgetting, where you appear as if having some shared utility function, but without actually forgetting—and thus being able to optimize for $α_{i}$ in your local universe. As well, the bargained utility function presented far away (less logical sight to your cluster of universes) is unlikely to be the same as $U$ .

So, overall, my argument would be that forgetting does give you more realness. If at 7:59AM, a large chunk of universes decide to replace part of their algorithm with a specific coordinated one (like removing a memory) then that algorithm is instantiated across more universes. But, that from a decision-theoretic perspective, I don't think that matters too much? You already share the important decision theoretic parts, even if the whole algorithm is not shared.

From a human perspective we may care about this as a value of wanting to 'exist more' in some sense. I think this is a reasonable enough value to have, but that it is oft satisfied by considering the sharing of decision methods and 99.99% of personality is enough.

My main question of whether this is useful beyond a terminal value for existing more is about quantum immortality—of which I am more uncertain about.

You are not too "irrational" to know your preferences.

MinusGix4mo134

Beliefs and predictions that influence wants may be false or miscalibrated, but the feeling itself, the want itself, just is what it is, the same way sensations of hunger or heat just are what they are.

I think this may be part of the disconnect between me and the article. I often view the short jolt preferences (that you get from seeing an ice-cream shop) as heuristics, as effectively predictions paired with some simpler preference for "sweet things that make me feel all homey and nice". These heuristics can be trained to know how to weigh the costs, though I agree just having a "that's irrational" / "that's dumb" is a poor approach to it. Other preferences, like "I prefer these people to be happy" are not short-jolts but rather thought about and endorsed values that would take quite a bit more to shift—but are also significantly influenced by beliefs too.

Other values like "I enjoy this aesthetic" seem more central to your argument than short-jolts or considered values.

This is why you could view a smoker's preference for another cigarette as irrational: the 'core want' is just a simple preference for the general feel of smoking a cigarette, but the short-jolt preference has the added prediction of "and this will be good to do". But that added prediction is false and inconsistent with everything they know. The usual statement of "you would regret this in the future". Unfortunately, the short-jolt preference often has enough strength to get past the other preferences, which is why you want to downweight it.

So, I agree that there's various preferences that having them is disentangled from whether you're rational or not, but that I also think most preferences are quite entangled with predictions about reality.

“inconsistent preferences” only makes sense if you presume you’re a monolithic entity, or believe your "parts" need to all be in full agreement all the time… which I think very badly misunderstands how human brains work.

I agree that humans can't manage this, but it does still make sense for a non-monolithic entity—You'd take there being an inconsistency as a sign that there's a problem, which is what people tend to do, even if ti can't be fixed.

johnswentworth's Shortform

MinusGix5mo31

Finally, the speed at which you communicate vibing means you're communicating almost purely from System 1, expressing your actual felt beliefs. It makes deception both of yourself and others much harder. Its much more likely to reveal your true colors. This allows it to act as a values screening mechanism as well.

I'm personally skeptical of this. I've found I'm far more likely to lie than I'd endorse when vibing. Saying "sure I'd be happy to join you on X event" when it is clear with some thought that I'd end up disliking it. Or exaggerating stories because it fits with the vibe.
I view System-1 as less concerned with truth here, it is the one that is more likely to produce a fake-argument in response to a suggested problem. More likely to play social games regardless of if they make sense.

Jimrandomh's Shortform

MinusGix5mo11

I agree that it is easy to automatically lump the two concepts together.

I think another important part of this is that there are limited methods for most consumers to coordinate against companies to lower their prices. There's shopping elsewhere, leaving a bad review, or moral outrage. The last may have a chance of blowing up socially, such as becoming a boycott (but boycotts are often considered ineffective), or it may encourage the government to step in. In our current environment, the government often operates as the coordination method to punish companies for behaving in ways that people don't want. In a much more libertarian society we would want this replaced with other methods, so that consumers can make it harder to put themselves in a prisoner's dilemma or stag hunt against each other.

If we had common organizations for more mild coordination than the state interfering, then I believe this would improve the default mentality because there would be more options.

Most arguments for AI Doom are either bad or weak

MinusGix5mo10

It has also led to many shifts in power between groups based on how well they exploit reality. From hunter-gatherers to agriculture, to grand armies spreading an empire, to ideologies changing the fates of entire countries, and to economic & nuclear super-powers making complex treaties.

An argument that consequentialism is incomplete

MinusGix6mo30

This reply is perhaps a bit too long, oops.

Having a body that does things is part of your values and is easily described in them. I don't see deontology or virtue ethics as giving any more fundamentally adequate solution to this (beyond the trivial 'define a deontological rule about ...', or 'it is virtuous to do interesting things yourself', but why not just do that with consequentialism?).
My attempt at interpreting what you mean is that you're drawing a distinction between morality about world-states vs. morality about process, internal details, experiencing it, 'yourself'. To give them names, "global"-values (you just want them Done) & "indexical"/'local"-values (preferences about your experiences, what you do, etc.) Global would be reducing suffering, avoiding heat death and whatnot. Local would be that you want to learn physics from the ground up and try to figure out XYZ interesting problem as a challenge by yourself, that you would like to write a book rather than having an AI do it for you, and so on.

I would say that, yes, for Global you should/would have an amorphous blob that doesn't necessarily care about the process. That's your (possibly non-sentient) AGI designing a utopia while you run around doing interesting Local things. Yet I don't see why you think only Global is naturally described in consequentialism.

I intrinsically value having solved hard problems—or rather, I value feeling like I've solved hard problems, which is part of overall self-respect, and I also value realness to varying degrees. That I've actually done the thing, rather than taken a cocktail of exotic chemicals. We could frame this in a deontological & virtue ethics sense: I have a rule about realness, I want my experiences to be real. / I find it virtuous to solve hard problems, even if in a post-singularity world.
But do I really have a rule about realness? Uh, sort-of? I'd be fine to play a simulation where I forget about the AGI world and am in some fake-scifi game world and solve hard problems. In reality, my value has a lot more edge-cases that will be explored than many deontological rules prefer. My real value isn't really a rule, it is just sometimes easy to describe it that way. Similar to how "do not lie" or "do not kill" is usually not a true rule.
Like, we could describe my actual value here as a rule, but seems actually more alien to the human mind. My actual value for realness is some complicated function of many aspects of my life, preferences, current mood to some degree, second-order preferences, and so on. Describing that as a rule is extremely reductive.
And 'realness' is not adequately described as a complete virtue either. I don't always prefer realness: if playing a first-person shooter game, I prefer that my enemies are not experiencing realistic levels of pain! So there are intricate trade-offs here as I continue to examine my own values.

Another aspect I'm objecting to mentally when I try to apply those stances is that there's two ways of interpreting deontology & virtue ethics that I think are common on LW. You can treat them as actual philosophical alternatives to consequentialism, like following the rule "do not lie". Or you can treat them as essentially fancy words for deontology=>"strong prior for this rule being generally correct and also a good coordination point" and virtue ethics=>"acting according to a good Virtue consistently as a coordination scheme/culture modification scheme and/or because you also think that Virtue is itself a Good".
Like, there's a difference between talking about something using the language commonly associated with deontology and actually practicing deontology. I think conflating the two is unfortunate.

The overaching argument here is that consequentialism properly captures a human's values, and that you can use the basic language of "I keep my word" (deontology flavored) or "I enjoy solving hard problems because they are good to solve" (virtue ethics flavored) without actually operating within those moral theories. You would have the ability to unfold these into the consequentialist statements of whatever form you prefer.

In your reply to cubefox, "respect this person's wishes" is not a deontological rule. Well, it could be, but I expect your actual values don't fulfill that. Just because your native internal language suggestively calls it that, doesn't mean you should shoehorn it into the category of rule!
"play with this toy" still strikes me as natively a heuristic/approximation to the goal of "do things I enjoy". The interlinking parts of my brain that decided to bring that forward is good at its job, but also dumb because it doesn't do any higher order thinking. I follow that heuristic only because I expect to enjoy it—the heuristic providing that information. If I had another part of my consideration that pushed me towards considering whether that is a good plan, I might realize that I haven't actually enjoyed playing with a teddy bear in years despite still feeling nostalgia for that. I'm not sure I see the gap between consequentialism and this. I don't have the brain capacity to consider every impulse I get, but I do want to consider agents other than AIXI to be a consequentialist.
I think there's a space in there for a theory of minds, but I expect it would be more mechanistic or descriptive rather than a moral theory. Ala shard theory.

Or, alternatively, even if you don't buy my view that the majority of my heuristics can be cast as approximations of consequentialist propositions, then deontology/virtue ethics are not natural theories either by your descriptions. They miss a lot of complexity even within their usual remit.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments