All of MinusGix's Comments + Replies

I'm confused, why does that make the term no longer useful? There's still a large distinction between companies focusing on developing AGI (OpenAI, Anthropic, etc.) vs those focusing on more 'mundane' advancements (Stability, Black Forest, the majority of ML research results). Though I do disagree that it was only used to distinguish them from narrow AI. Perhaps that was what it was originally, but it quickly turned into the roughly "general intelligence like a smart human" approximate meaning we have today.
I agree 'AGI' has become an increasingly vague t... (read more)

2Kaj_Sotala
Do my two other comments [1, 2] clarify that?
MinusGix1-2

2b/2c. I think I would say that we should want a tyranny of the present to the extent that is in our values upon reflection. If, for example, Rome still existed and took over the world, their CEV should depend on their ethics and population. I think it would still be a very good utopia, but it may also have things we dislike.
Other considerations, like nearby Everett branches... well they don't exist in this branch? I would endorse game theoretical cooperation with them, but I'm skeptical of any more automatic cooperation than what we already have. That is... (read more)

I agree Grothendieck is fascinating but I mostly just see him as interesting in different ways than von Neumann. von Neumann is often focused on because his subjects are areas that are relevant to either LessWrong's focuses or (for the cloning posts) that the subjects he was skilled at and polymath capabilities would help with alignment.

3Embee
Yes, they seem to represent two completely different types of extreme intelligence which is very interesting. I also agree that vN's ideas are more relevant for the community.

I define rationality as "more in line with your overall values". There are problems here, because people do profess social values that they don't really hold (in some sense), but roughly it is what they would reflect on and come up with.
Someone could value the short-term more than the long-term, but I think that most don't. I'm unsure if this is a side-effect of Christianity-influenced morality or just a strong tendency of human thought.

Locally optimal is probably the correct framing, but that it is irrational relative to whatever idealized values the ind... (read more)

Answer by MinusGix30

An important question here is "what is the point of being 'more real'?". Does having a higher measure give you a better acausal bargaining position? Do you terminally value more realness? Less vulnerable to catastrophes? Wanting to make sure your values are optimized harder?

I consider these, except for the terminal sense, to be rather weak as far as motivations go.

Acausal Bargaining: Imagine a bunch of nearby universes with instances of 'you'. They all have variations, some very similar, others with directions that seem a bit strange to the others. Still i... (read more)

MinusGix134

Beliefs and predictions that influence wants may be false or miscalibrated, but the feeling itself, the want itself, just is what it is, the same way sensations of hunger or heat just are what they are.

I think this may be part of the disconnect between me and the article. I often view the short jolt preferences (that you get from seeing an ice-cream shop) as heuristics, as effectively predictions paired with some simpler preference for "sweet things that make me feel all homey and nice". These heuristics can be trained to know how to weigh the costs, th... (read more)

9Kaj_Sotala
Commenting on a relatively isolated point in what you wrote; none of this affects your core point about preferences being entangled with predictions (actually it relies on it). I think that the short-jolt preference's prediction is actually often correct; it's just over a shorter time horizon. The short-term preference predicts that "if I take this smoke, then I will feel better" and it is correct. The long-term preference predicts that "I will later regret taking this smoke, " and it is also correct. Neither preference is irrational, they're just optimizing over different goals and timescales. Now it would certainly be tempting to define rationality as something like "only taking actions that you endorse in the long term", but I'd be cautious of that. Some long-term preferences are genuinely that, but many of them are also optimizing for something looking good socially, while failing to model any of the genuine benefits of the socially-unpopular short-term actions.  For example, smoking a cigarette often gives smokers a temporary feeling of being in control, and if they are going out to smoke together with others, a break and some social connection. It is certainly valid to look at those benefits and judge that they are still not worth the long-term costs... but frequently the "long-term" preference may be based on something like "smoking is bad and uncool and I shouldn't do it and I should never say that there could be a valid reason to do for otherwise everyone will scold me". Then by maintaining both the short-term preference (which continues the smoking habit) and the long-term preference (which might make socially-visible attempts to stop smoking), the person may be getting the benefit from smoking while also avoiding some of the social costs of continuing. This is obviously not to say that the costs of smoking would only be social. Of course there are genuine health reasons as well. But I think that quite a few people who care about "health" actually car

Finally, the speed at which you communicate vibing means you're communicating almost purely from System 1, expressing your actual felt beliefs. It makes deception both of yourself and others much harder. Its much more likely to reveal your true colors. This allows it to act as a values screening mechanism as well.

I'm personally skeptical of this. I've found I'm far more likely to lie than I'd endorse when vibing. Saying "sure I'd be happy to join you on X event" when it is clear with some thought that I'd end up disliking it. Or exaggerating stories bec... (read more)

2Matt Goldenberg
Oh yes, if you're going on people's words, it's obviously not much better, but the whole point of vibing is that it's not about the words.  Your aesthetics, vibes, the things you care about will be communicated non-verbally.

I agree that it is easy to automatically lump the two concepts together.

I think another important part of this is that there are limited methods for most consumers to coordinate against companies to lower their prices. There's shopping elsewhere, leaving a bad review, or moral outrage. The last may have a chance of blowing up socially, such as becoming a boycott (but boycotts are often considered ineffective), or it may encourage the government to step in. In our current environment, the government often operates as the coordination method to punish compan... (read more)

5Noosphere89
This sounds very much like the phenomenon described in From Personal to Prison Gangs: Enforcing Prosocial Behavior, where the main reason for regulation/getting the government to step in has become more and more common is basically the fact that at scales larger than 150-300 people, we lose the ability to iterate games, which in the absence of acausal/logical/algorithmic decision theories like FDT and UDT, basically mean that the optimal outcome is to defect, so you can no longer assume cooperation/small sacrifices from people in general, and coordination in the modern world is a very taut constraint, so any solution has very high value. (This also has a tie-in to decision theory: At the large scale, CDT predominates, but at the very small scale, something like FDT is incentivized through kin selection, though this is only relevant for 4-50 people scales at most, and the big reasons why algorithmic decision theories aren't used by people very often is because of the original decision theories that were algorithmic like UDT basically required logical omniscience, which people obviously don't have, and even the more practical algorithmic decision theories require both access to someone's source code, and also the ability to simulate another agent either perfectly or at least very, very good simulations, which we again don't have.) This link is very helpful to illustrate the general phenomenon: https://www.lesswrong.com/posts/sYt3ZCrBq2QAf3rak/from-personal-to-prison-gangs-enforcing-prosocial-behavior

It has also led to many shifts in power between groups based on how well they exploit reality. From hunter-gatherers to agriculture, to grand armies spreading an empire, to ideologies changing the fates of entire countries, and to economic & nuclear super-powers making complex treaties.

This reply is perhaps a bit too long, oops.


Having a body that does things is part of your values and is easily described in them. I don't see deontology or virtue ethics as giving any more fundamentally adequate solution to this (beyond the trivial 'define a deontological rule about ...', or 'it is virtuous to do interesting things yourself', but why not just do that with consequentialism?).
My attempt at interpreting what you mean is that you're drawing a distinction between morality about world-states vs. morality about process, internal details, exper... (read more)

3cousin_it
No problem about long reply, I think your arguments are good and give me a lot to think about. I just thought of another possible classification: "zeroth-order consequentialist" (care about doing the action but not because of consequences), "first-order consequentialist" (care about consequences), "second-order consequentialist" (care about someone else being able to choose what to do). I guess you're right that all of these can be translated into first-order. But by the same token, everything can be translated to zeroth-order. And the translation from second to first feels about as iffy as the translation from first to zeroth. So this still feels fuzzy to me, I'm not sure what is right.
MinusGix3-2

I think there's two parts of the argument here:

  • Issues of expressing our values in a consequentialist form
  • Whether or not consequentialism is the ideal method for humans

The first I consider not a major problem. Mountain climbing is not what you can put into the slot to maximize, but you do put happiness/interest/variety/realness/etc. into that slot. This then falls back into questions of "what are our values". Consequentialism provides an easy answer here: mountain climbing is preferable along important axes to sitting inside today. This isn't always en... (read more)

2cousin_it
Here's maybe another point of view on this: consequentialism fundamentally talks about receiving stuff from the universe. An hour climbing a mountain, an hour swimming in the sea, or hey, an hour in the joy experience machine. The endpoint of consequentialism is a sort of amoeba that doesn't really need a body to overcome obstacles, or a brain to solve problems, all it needs to do is receive and enjoy. To the extent I want life to be also about doing something or being someone, that might be a more natural fit for alternatives to consequentialism - deontology and virtue ethics.

If I value a thing at one period of life and turn away from it later, I have not discovered something about my values. My values have changed. In the case of the teenager we call this process “maturing”. Wine maturing in a barrel is not becoming what it always was, but simply becoming, according to how the winemaker conducts the process.

Your values change according to the process of reflection - the grapes mature into wine through fun chemical reactions.
From what you wrote, it feels like you are mostly considering your 'first-order values'. However, yo... (read more)

2habryka
You're welcome!

Is there a way to get an article's raw or original content?
My goal is mostly to put articles in some area (ex: singular learning theory) into a tool like Google's NotebookLM to then ask quick questions about.
Google's own conversion of HTML to text works fine for most content, excepting math. A division may turn into p ( w | D n ) = p ( D n | w ) φ ( w ) p ( D n ), becoming incorrect.

I can always just grab the article's HTML content (or use the GraphQL api for that), but HTMLified MathJax notation is very, uh, verbose. I could probably do some massaging o... (read more)

4habryka
Yeah, you can grab any post in Markdown or in the raw HTML that was used to generate it using the markdown and ckEditorMarkup fields in the API:  { post(input: {selector: {_id: "jvewFE9hvQfrxeiBc"}}) { result { contents { ckEditorMarkup } } } } Just paste this into the editor at lesswrong.com/graphiql (adjusting the "id" for the post id, which is the alphanumerical string in the URL after /posts/), and you can get the raw content for any post.

I'd be interested in an article looking at whether the FDA is better at regulating food safety. I do expect food is an easier area, because erring on the side of caution doesn't really lose you much — most food products have close substitutes. If there's some low but not extremely low risk of a chemical in a food being bad for you, then the FDA can more easily deny approval without significant consequences: Medicine has more outsized effects if you are slow to approve usage.

Yet, perhaps this has led to reduced variety in food choices? I notice less generic... (read more)

I see this as occurring with various pieces of Infrabayesianism, like Diffractor's UDT posts. They're dense enough mathematically (hitting the target) which makes them challenging to read... and then also challenging to discuss. There are fewer comments even from the people who read the entire post because they don't feel competent enough to make useful commentary (with some truth behind that feeling); the silence also further making commentation harder. At least that's what I've noticed in myself, even though I enjoy & upvote those posts.

Less attentio... (read more)

1lemonhope
What was the distillation idea from a year ago?

I draw the opposite conclusion from this: the fact that the decision theory posts seem to work on the basis of a computationalist theory of identity makes me think worse of the decision-theory posts.

Why? If I try to guess, I'd point at not often considering indexicality as a consideration, merely thinking of it as having a single utility function which simplifies coordination. (But still, a lot of decision theory doesn't need to take into account indexicality..)
I see the decision theory posts as less as giving new intuitions, and more breaking old ones... (read more)

2notfnofn
I enjoyed reading your comment, but just wanted to point out that a quantum algorithm can be implemented by a classical computer, just with a possibly exponential slow down. The thing that breaks down is that any O(f(n)) algorithm on any classical computer is at worst O(f(n)^2) on a Turing machine; for quantum algorithms on quantum computers with f(n) runtime, the same decision problem can be decided in (I think) O(2^{(f(n)}) runtime on a Turing machine

the lack of argumentation or discussion of this particular assumption throughout the history of the site means it's highly questionable to say that assuming it is "reasonable enough"

While discussion on personal identity has mostly not received a single overarching post focusing solely on arguing all the details, it has been discussed to varying degrees of possible contention points. Thou Art Physics which focuses on getting the idea that you are made up of physics into your head, Identity Isn't in Specific Atoms which tries to dissolve the common intuit... (read more)

4[anonymous]
I appreciate you linking these posts (which I have read and almost entirely agree with), but what they are doing (as you mentioned) is arguing against dualism, or in favor of physicalism, or against view classical (non-QM) entities like atoms have their own identity and are changed when copied (in a manner that can influence the fundamental identity of a being like a human). What there has been a lack of discussion of is "having already accepted physicalism (and reductionism etc), why expect computationalism to be the correct theory?" None of those posts argue directly for computationalism; you can say they argue indirectly for it (and thus provide Bayesian evidence in its favor) by arguing against common opposing views, but I have already been convinced that those views are wrong. And, as I have written before, physicalism-without-computationalism seems much more faithful to the core of physicalism (and to the reasons that convinced me of it in the first place) than computationalism does. One man’s modus ponens is another man’s modus tollens. I agree that the LW-style decision theory posting encourages this type of thinking, and you seem to infer that the high-quality reasoning in the decision theory posts implies that they give good intuitions about the philosophy of identity. I draw the opposite conclusion from this: the fact that the decision theory posts seem to work on the basis of a computationalist theory of identity makes me think worse of the decision-theory posts. Can you link to some of these? I do not recall seeing anything like this here. What is "the computation"? Can we try to taboo that word? My comment to Seth Herd is relevant here ("The basic concept of computation at issue here is a feature of the map you could use to approximate reality (i.e., the territory) . It is merely part of a mathematical model that, as I've described in response to Ruby earlier, represents a very lossy compression of the underlying physical substrate. [...] So when

it fits with that definition

Ah, I rewrote my comment a few times and lost what I was referencing. I originally was referencing the geometric meaning (as an alternate to your statistical definition), two vectors at a right angle from each other.

But the statistical understanding works from what I can tell? You have your initial space with extreme uncertainty, and the orthogonality thesis simply states that (intelligence, goals) are not related — you can pair some intelligence with any goal. They are independent of each other at this most basic level. This... (read more)

I'm skeptical of the naming being bad, it fits with that definition and the common understanding of the word. The Orthogonality Thesis is saying that the two qualities of goal/value are not necessarily related, which may seem trivial nowadays but there used to be plenty of people going "if the AI becomes smart, even if it is weird, it will be moral towards humans!" through reasoning of the form "smart -> not dumb goals like paperclips". There's structure imposed on what minds actually get created, based on what architectures, what humans train the AI on, etc. Just as two vectors can be orthogonal in R^2 while the actual points you plot in the space are correlated.

1[anonymous]
With what definition? The one most applicable here, dealing with random variables (relative to our subjective uncertainty), says "random variables that are independent". Independence implies uncorrelation, even if the converse doesn't hold. This is totally false as a matter of math if you use the most common definition of orthogonality in this context. I do agree that what you are saying could be correct if you do not think of orthogonality that way and instead simply look at it in terms of the entries of the vectors, but then you enter the realm of trying to capture the "goals" and "beliefs" as specific Euclidean vectors, and I think that isn't the best idea for generating intuition because one of the points of the Orthogonality thesis seems to be to instead abstract away from the specific representation you choose for intelligence and values (which can bias you one way or another) and to instead focus on the broad, somewhat-informal conclusion.

I agree, though I haven't seen many proposing that, but also see So8res' Decision theory does not imply that we get to have nice things, though this is coming from the opposite direction (with the start being about people invalidly assuming too much out of LDT cooperation)

Though for our morals, I do think there's an active question of which pieces we feel better replacing with the more formal understanding, because there isn't a sharp distinction between our utility function and our decision theory. Some values trump others when given better tools. Though ... (read more)

Suffering is already on most reader's minds, as it is the central advocating reason behind euthanasia — and for good reason. I agree that policies which cause or ignore suffering, when they could very well avoid such with more work, are unfortunately common. However, those are often not utilitarian policies; and similarly many objections to various implementations of utilitarianism and even classic "do what seems the obviously right action" are that they ignore significant second-order effects. Policies that don't quantify what unfortunate incentives they ... (read more)

2romeostevensit
Huge numbers of people are forced to resort to illegal methods of suicide creating legal, emotional, financial, and logistics problems for their loved ones, on top of the additional grief to the suicidee personally.

This doesn't engage with the significant downsides of such a policy that Zvi mentions. There are definite questions about the cost/benefits to allowing euthanasia, even though we wish to allow it, especially when we as a society are young in our ability to handle it. Glossing the only significant feature being 'torturing people' ignores:

  • the very significant costs of people dying, which is compounded by the question of what equilibrium the mental/social availability of euthanasia is like
  • the typical LessWrong beliefs about how good technology will get in
... (read more)
4romeostevensit
I agree my gloss on it is not a substantive engagement, but rather a reminder of what I consider a crucial consideration. Policies that elide horrific suffering are the norm. Part of the point is that suffering, being not available to external quantification, must be left up to the individual whenever possible. Many objections to utilitarianism involve its frequent attempts to obviate subjective effects when this isn't appropriate.

Any opinions on how it compares to Fun Theory? (Though that's less about all of utopia, it is still a significant part)

3mesaoptimizer
If you haven't read CEV, I strongly recommend doing so. It resolved some of my confusions about utopia that were unresolved even after reading the Fun Theory sequence. Specifically, I had an aversion to the idea of being in a utopia because "what's the point, you'll have everything you want". The concrete pictures that Eliezer gestures at in the CEV document do engage with this confusion, and gesture at the idea that we can have a utopia where the AI does not simply make things easy for us, but perhaps just puts guardrails onto our reality, such that we don't die, for example, but we do have the option to struggle to do things by ourselves. Yes, the Fun Theory sequence tries to communicate this point, but it didn't make sense to me until I could conceive of an ASI singleton that could actually simply not help us.

I think that is part of it, but a lot of the problem is just humans being bad at coordination. Like the government doing regulations. If we had an idealized free market society, then the way to get your views across would 'just' be to sign up for a filter (etc.) that down-weights buying from said company based on your views. Then they have more of an incentive to alter their behavior. But it is hard to manage that. There's a lot of friction to doing anything like that, much of it natural. Thus government serves as our essential way to coordinate on importa... (read more)

I definitely agree that it doesn't give reason to support a human-like algorithm, I was focusing in on the part about adding numbers reliably.

I believe a significant chunk of the issue with numbers is that the tokenization is bad (not per-digit), which is the same underlying cause for being bad at spelling. So then the model has to memorize from limited examples what actual digits make up the number. The xVal paper encodes the numbers as literal numbers, which helps. Also Teaching Arithmetic to Small Transformers which I forget somewhat, but one of the things they do is per-digit tokenization and reversing the order (because that works better with forward generation). (I don't know if anyone has... (read more)

2Charlie Steiner
They can certainly use answer text as a scratchpad (even nonfunctional text that gives more space for hidden activations to flow). But they don't without explicit training. Actually maybe they do- maybe RLHF incentivizes a verbose style to give more room for thought. But I think even "thinking step by step," there are still plenty of issues. Tokenization is definitely a contributor. But that doesn't really support the notion that there's an underlying human-like cognitive algorithm behind human-like text output. The point is the way it adds numbers is very inhuman, despite producing human-like output on the most common/easy cases.

Yes, in principle you can get information on scheming likelihood if you get such an AI (that is also weak enough that it can't just scheme its way out of your testing apparatus). I do think making the threat credible is hard if we loosely extrapolate costs out: burning a trained up model is not cheap. The cost depends on how high you think prices for training/inference will fall in the future, and how big/advanced a model you're thinking of. Though I do think you can get deceptiveness out of weaker models than that, though they're also going to be less cap... (read more)

2Vasco Grilo
Thanks!

https://www.mikescher.com/blog/29/Project_Lawful_ebook is I believe the current best one, after a quick search on the Eliezerfic discord.

Minor: the link for Zvi's immoral mazes has an extra 'm' at the start of the part of the path ('zvi/mimmoral_mazes/')

1mesaoptimizer
Fixed, thanks!

Because it serves as a good example, simply put. It gets the idea clear across about what it means, even if there are certainly complexities in comparing evolution to the output of an SGD-trained neural network.
It predicts learning correlates of the reward signal that break apart outside of the typical environment.

When you look at the actual process for how we actually start to like ice-cream -- namely, we eat it, and then we get a reward, and that's why we like it -- then the world looks a a lot less hostile, and misalignment a lot less likely.

Yes, t... (read more)

Is this a prediction that a cyclic learning rate -- that goes up and down -- will work out better than a decreasing one? If so, that seems false, as far as I know.

https://www.youtube.com/watch?v=GM6XPEQbkS4 (talk) / https://arxiv.org/abs/2307.06324 prove faster convergence with a periodic learning rate. On a specific 'nicer' space than reality, and they're (I believe from what I remember) comparing to a good bound with a constant stepsize of 1. So it may be one of those papers that applies in theory but not often in practice, but I think it is somewhat indicative.

I agree with others to a large degree about the framing/tone/specific-words not being great, though I agree with a lot the post itself, but really that's what this whole post is about: that dressing up your words and saying partial in-the-middle positions can harm the environment of discussion. That saying what you truly believe then lets you argue down from that, rather than doing the arguing down against yourself - and implicitly against all the other people who hold a similar ideal belief as you. I've noticed similar facets of what the post gestures at,... (read more)

Along with what Raemon said, though I expect us to probably grow far beyond any Earth species eventually, if we're characterizing evolution as having a reasonable utility function then I think there's the issue of other possibilities that would be more preferable.
Like, evolution would-if-it-could choose humans to be far more focused on reproducing, and we would expect that if we didn't put in counter-effort that our partially-learned approximations ('sex enjoyable', 'having family is good', etc.) would get increasingly tuned for the common environments.

Si... (read more)

I assume what you're going for with your conflation of the two decisions is this, though you aren't entirely clear on what you mean:

  • Some agent starts with some (potentially broken in various manners, like bad heuristics or unable to consider certain impacts) decision theory, because there's no magical apriori decision algorithm
  • So the agent is using that DT to decide how to make better decisions that get more of what it wants
  • CDT would modify into Son-of-CDT typically at this step
  • The agent is deciding whether it should use FDT.
  • It is 'good enough' that
... (read more)

If your original agent is replacing themselves as a threat to FDT, because they want FDT to pay up, then FDT rightly ignores it. Thus the original agent, which just wants paperclips or whatever, has no reason to threaten FDT.

If we postulate a different scenario where your original agent literally terminally values messing over FDT, then FDT would pay up (if FDT actually believes it isn't a threat). Similarly, if part of your values has you valuing turning metal into paperclips and I value metal being anything-but-paperclips, I/FDT would pay you to avoid tu... (read more)

2lsusr
My deontological terminal value isn't to causally win. It's for FTD agents to acausally lose. Either I win, or the FDT agents abandon FDT. (Which proves that FDT is an exploitable decision theory.) There's a Daoist answer: Don't legibly and universally precommit to a decision theory. But the exploit I'm trying to point to is simpler than Daoist decision theory. Here it is: Functional decision theory conflates two decisions: 1. Use FDT. 2. Determine a strategy via FDT. I'm blackmailing contingent on decision 1 and not on decision 2. I'm not doing this because I need to win. I'm doing it because I can. Because it puts FDT agents in a hilarious lose-lose situation. The thing FDT disciples don't understand is that I'm happy to take the scenario where FDT agents don't cave to blackmail. Because of this, FDT demands that FDT agents cave to my blackmail.

Utility functions are shift/scale invariant.

If you have and , then if we shift it by some constant to get a new utility function: and then we can still get the same result.
If we look at the expected utility, then we get:
Certainty of :

50% chance of , 50% chance of nothing:

  • (so you are indifferent between certainty of and a 50% chance of by )

I think this might be where you got confused? Now the expected values are differe... (read more)

4MSRayne
Holy heck I have been enlightened. And by contemplating nothingness too! Thanks for the clarification, it all makes sense now.

Just 3 with a dash of 1?
I don't understand the specific appeal of complete reproductive freedom. It is desirable to have that freedom, in the same way it is desirable to be allowed to do whatever I feel like doing. However, that more general heading of arbitrary freedom has the answer of 'you do have to draw lines somewhere'. In a good future, I'm not allowed to harm a person (nonconsensually), and I can't requisition all matter in the available universe for my personal projects without ~enough of the population endorsing it, and I can't reproduce / const... (read more)

I'm also not sure that I consider astronomical suffering outcome (by how its described in the paper) to be bad by itself.
If you have (absurd amount of people) and they have some amount of suffering (ex: it shakes out that humans prefer some degree of negative-reinforcement as possible outcomes, so it remains) then that can be more suffering in terms of magnitude, but has the benefits of being more diffuse (people aren't broken by a short-term large amount of suffering) and with less individual extremes of suffering. Obviously it would be bad to have a wor... (read more)

I primarily mentioned it because I think people base their 'what is the S-risk outcome' on basically antialigned AGI. The post has 'AI hell' in the title and uses comparisons between extreme suffering versus extreme bliss, calls s-risks more important than alignment (which I think makes sense to a reasonable degree if antialigned s-risk is likely or a sizable portion of weaker dystopias are likely, but I don't think makes sense for antialigned being very unlikely and my considering weak dystopias to also be overall not likely) . The extrema argument is why... (read more)

2MinusGix
I'm also not sure that I consider astronomical suffering outcome (by how its described in the paper) to be bad by itself. If you have (absurd amount of people) and they have some amount of suffering (ex: it shakes out that humans prefer some degree of negative-reinforcement as possible outcomes, so it remains) then that can be more suffering in terms of magnitude, but has the benefits of being more diffuse (people aren't broken by a short-term large amount of suffering) and with less individual extremes of suffering. Obviously it would be bad to have a world that has astronomical suffering that is then concentrated on a large amount of people, but that's why I think - a naive application of - astronomical suffering is incorrect because it ignores diffuse experiences, relative experiences (like, if we have 50% of people with notably bad suffering today, then your large future civilization with only 0.01% of people with notably bad suffering can still swamp that number, though the article mentions this I believe), and more minor suffering adding up over long periods of time. (I think some of this comes from talking about things in terms of suffering versus happiness rather than negative utility versus positive utility? Where zero is defined as 'universe filled with things we dont care about'. Like, you can have astronomical suffering that isn't that much negative utility because it is diffuse / lower in a relative sense / less extreme, but 'everyone is having a terrible time in this dystopia' has astronomical suffering and high negative utility)

. If I imagine trading extreme suffering for extreme bliss personally, I end up with ratios of 1 to 300 million – e.g., that I would accept a second of extreme suffering for ten years of extreme bliss. The ratio is highly unstable as I vary the scenarios, but the point is that I disvalue suffering many orders of magnitude more than I value bliss.

I also disvalue suffering significantly more than I value happiness (I think bliss is the wrong term to use here), but not to that level. My gut feeling wants to dispute those numbers as being practical, but I'l... (read more)

Answer by MinusGix84

In the language of Superintelligent AI is necessary for an amazing future but far from sufficient, I expect that the majority of possible s-risks are weak dystopias rather than strong dystopias. We're unlikely to succeed at alignment enough and then signflip it (like, I expect strong dystopia to be dominated by 'we succeed at alignment to an extreme degree' ^ 'our architecture is not resistant to signflips' ^ 'somehow the sign flips'). So, I think literal worse-case Hell and the immediate surrounding possibilities are negligible.
I expect that the extrema ... (read more)

1Dawn Drescher
Thanks for linking that interesting post! (Haven’t finished it yet though.) Your claim is a weak one though, right? Only that you don’t expect the entirely lightcone of the future to be filled with worst-case hell, or less than 95% of it? There are a bunch of different definitions of s-risk, but what I’m worried about definitely starts at a much smaller-scale level. Going by the definitions in that paper (p. 3 or 391), maybe the “astronomical suffering outcome” or the “net suffering outcome.”
1MinusGix
I also disvalue suffering significantly more than I value happiness (I think bliss is the wrong term to use here), but not to that level. My gut feeling wants to dispute those numbers as being practical, but I'll just take them as gesturing at the comparative feeling. An idea that I've seen once, but not sure where, is: you can probably improve the amount of happiness you experience in a utopia by a large amount. Not through wireheading, which at least for me is undesirable, but 'simply' redesigning the human mind in a less hedonic-treadmill manner (while also not just cutting out boredom). I think the usual way of visualizing extreme dystopias as possible-futures has the issue that it is easy to compare them to the current state of humanity rather than an actual strong utopia. I expect that there's a good amount of mind redesign work, in the vein of some of the mind-design posts in Fun Theory but ramped up to superintelligence design+consideration capabilities, that would vastly increase the amount of possible happiness/Fun and make the tradeoff more balanced. I find it plausible that suffering is just easier to cause and more impactful even relative to strong-utopia-level enhanced-minds, but I believe this does change the calculus significantly. I might not take a 50/50 coin for strong dystopia/strong utopia, but I'd maybe take a 10/90 coin. Thankfully we aren't in that scenario, and have better odds.

https://www.lesswrong.com/posts/HoQ5Rp7Gs6rebusNP/superintelligent-ai-is-necessary-for-an-amazing-future-but-1#How_likely_are_extremely_good_and_extremely_bad_outcomes_

That said, I do think there’s more overlap (in expectation) between minds produced by processes similar to biological evolution, than between evolved minds and (unaligned) ML-style minds. I expect more aliens to care about at least some things that we vaguely recognize, even if the correspondence is never exact.
On my models, it’s entirely possible that there just turns out to be ~no overl

... (read more)

The Principles of Deep Learning Theory uses renormalization group flow in its analysis of deep learning, though it is applied at a 'lower level' than an AI's capabilities.

One minor thing I've noticed when thinking on interpretability is that of in-distribution versus out-of-distribution versus - what I call - out-of-representation data. I would assume this has been observed elsewhere, but I haven't seen it mentioned before.
In-distribution could be considered inputs in the same ''structure'' of what you trained the neural network on; out-of-distribution is exotic inputs, like an adversarially noisy image of a panda or a picture of a building for an animal-recognizer NN.
Out-of-representation would be when you have a neural ... (read more)

MinusGix*220

I initially wrote a long comment discussing the post, but I rewrote it as a list-based version that tries to more efficiently parcel up the different objections/agreements/cruxes.
This list ended up basically just as long, but I feel it is better structured than my original intended comment.

(Section 1): How fast can humans develop novel technologies

  • I believe you assume too much about the necessary time based on specific human discoveries.
    • Some of your backing evidence just didn't have the right pressure at the time to go further (ex: submarines) which m
... (read more)

While human moral values are subjective, there is a sufficiently large shared amount that you can target at aligning an AI to that. As well, values held by a majority (ex: caring for other humans, enjoying certain fun things) are also essentially shared. Values that are held by smaller groups can also be catered to. 

If humans were sampled from the entire space of possible values, then yes we (maybe) couldn't build an AI aligned to humanity, but we only take up a relatively small space and have a lot of shared values. 

The AI problem is easier in some ways (and significantly harder in others) because we're not taking an existing system and trying to align it. We want to design the system (and/or systems that produce that system, aka optimization) to be aligned in the first place. This can be done through formal work to provide guarantees, lots of code, and lots of testing.

However, doing that for some arbitrary agent or even just a human isn't really a focus of most alignment research. A human has the issue that they're already misaligned (in a sense), and there are many ... (read more)

3nim
Thank you for clarifying! This highlights an assumption about AI so fundamental that I wasn't previously fully aware that I had it. As you say, there's a big difference between what to do if we discover AI, vs if we create it. While I think that we as a species are likely to create something that meets our definition of strong AI sooner or later, I consider it vanishingly unlikely that any specific individual or group who goes out trying to create it will actually succeed. So for most of us, especially myself, I figure that on an individual level it'll be much more like discovering an AI that somebody else created (possibly by accident) than actually creating the thing. It's intuitively obvious why alignment work on creating AI doesn't apply to extant systems. But if the best that the people who care most about it can do is work on created AI without yet applying any breakthroughs to the prospect of a discovered AI (where we can't count on knowing how it works, ethically create and then destroy a bunch of instances of it, etc)... I think I am beginning to see where we get the meme of how one begins to think hard about these topics and shortly afterward spends a while being extremely frightened.
Load More