2b/2c. I think I would say that we should want a tyranny of the present to the extent that is in our values upon reflection. If, for example, Rome still existed and took over the world, their CEV should depend on their ethics and population. I think it would still be a very good utopia, but it may also have things we dislike.
Other considerations, like nearby Everett branches... well they don't exist in this branch? I would endorse game theoretical cooperation with them, but I'm skeptical of any more automatic cooperation than what we already have. That is...
I agree Grothendieck is fascinating but I mostly just see him as interesting in different ways than von Neumann. von Neumann is often focused on because his subjects are areas that are relevant to either LessWrong's focuses or (for the cloning posts) that the subjects he was skilled at and polymath capabilities would help with alignment.
I define rationality as "more in line with your overall values". There are problems here, because people do profess social values that they don't really hold (in some sense), but roughly it is what they would reflect on and come up with.
Someone could value the short-term more than the long-term, but I think that most don't. I'm unsure if this is a side-effect of Christianity-influenced morality or just a strong tendency of human thought.
Locally optimal is probably the correct framing, but that it is irrational relative to whatever idealized values the ind...
An important question here is "what is the point of being 'more real'?". Does having a higher measure give you a better acausal bargaining position? Do you terminally value more realness? Less vulnerable to catastrophes? Wanting to make sure your values are optimized harder?
I consider these, except for the terminal sense, to be rather weak as far as motivations go.
Acausal Bargaining: Imagine a bunch of nearby universes with instances of 'you'. They all have variations, some very similar, others with directions that seem a bit strange to the others. Still i...
Beliefs and predictions that influence wants may be false or miscalibrated, but the feeling itself, the want itself, just is what it is, the same way sensations of hunger or heat just are what they are.
I think this may be part of the disconnect between me and the article. I often view the short jolt preferences (that you get from seeing an ice-cream shop) as heuristics, as effectively predictions paired with some simpler preference for "sweet things that make me feel all homey and nice". These heuristics can be trained to know how to weigh the costs, th...
Finally, the speed at which you communicate vibing means you're communicating almost purely from System 1, expressing your actual felt beliefs. It makes deception both of yourself and others much harder. Its much more likely to reveal your true colors. This allows it to act as a values screening mechanism as well.
I'm personally skeptical of this. I've found I'm far more likely to lie than I'd endorse when vibing. Saying "sure I'd be happy to join you on X event" when it is clear with some thought that I'd end up disliking it. Or exaggerating stories bec...
I agree that it is easy to automatically lump the two concepts together.
I think another important part of this is that there are limited methods for most consumers to coordinate against companies to lower their prices. There's shopping elsewhere, leaving a bad review, or moral outrage. The last may have a chance of blowing up socially, such as becoming a boycott (but boycotts are often considered ineffective), or it may encourage the government to step in. In our current environment, the government often operates as the coordination method to punish compan...
It has also led to many shifts in power between groups based on how well they exploit reality. From hunter-gatherers to agriculture, to grand armies spreading an empire, to ideologies changing the fates of entire countries, and to economic & nuclear super-powers making complex treaties.
This reply is perhaps a bit too long, oops.
Having a body that does things is part of your values and is easily described in them. I don't see deontology or virtue ethics as giving any more fundamentally adequate solution to this (beyond the trivial 'define a deontological rule about ...', or 'it is virtuous to do interesting things yourself', but why not just do that with consequentialism?).
My attempt at interpreting what you mean is that you're drawing a distinction between morality about world-states vs. morality about process, internal details, exper...
I think there's two parts of the argument here:
The first I consider not a major problem. Mountain climbing is not what you can put into the slot to maximize, but you do put happiness/interest/variety/realness/etc. into that slot. This then falls back into questions of "what are our values". Consequentialism provides an easy answer here: mountain climbing is preferable along important axes to sitting inside today. This isn't always en...
If I value a thing at one period of life and turn away from it later, I have not discovered something about my values. My values have changed. In the case of the teenager we call this process “maturing”. Wine maturing in a barrel is not becoming what it always was, but simply becoming, according to how the winemaker conducts the process.
Your values change according to the process of reflection - the grapes mature into wine through fun chemical reactions.
From what you wrote, it feels like you are mostly considering your 'first-order values'. However, yo...
Thank you!
Is there a way to get an article's raw or original content?
My goal is mostly to put articles in some area (ex: singular learning theory) into a tool like Google's NotebookLM to then ask quick questions about.
Google's own conversion of HTML to text works fine for most content, excepting math. A division may turn into p ( w | D n ) = p ( D n | w ) φ ( w ) p ( D n ), becoming incorrect.
I can always just grab the article's HTML content (or use the GraphQL api for that), but HTMLified MathJax notation is very, uh, verbose. I could probably do some massaging o...
I'd be interested in an article looking at whether the FDA is better at regulating food safety. I do expect food is an easier area, because erring on the side of caution doesn't really lose you much — most food products have close substitutes. If there's some low but not extremely low risk of a chemical in a food being bad for you, then the FDA can more easily deny approval without significant consequences: Medicine has more outsized effects if you are slow to approve usage.
Yet, perhaps this has led to reduced variety in food choices? I notice less generic...
I see this as occurring with various pieces of Infrabayesianism, like Diffractor's UDT posts. They're dense enough mathematically (hitting the target) which makes them challenging to read... and then also challenging to discuss. There are fewer comments even from the people who read the entire post because they don't feel competent enough to make useful commentary (with some truth behind that feeling); the silence also further making commentation harder. At least that's what I've noticed in myself, even though I enjoy & upvote those posts.
Less attentio...
I draw the opposite conclusion from this: the fact that the decision theory posts seem to work on the basis of a computationalist theory of identity makes me think worse of the decision-theory posts.
Why? If I try to guess, I'd point at not often considering indexicality as a consideration, merely thinking of it as having a single utility function which simplifies coordination. (But still, a lot of decision theory doesn't need to take into account indexicality..)
I see the decision theory posts as less as giving new intuitions, and more breaking old ones...
the lack of argumentation or discussion of this particular assumption throughout the history of the site means it's highly questionable to say that assuming it is "reasonable enough"
While discussion on personal identity has mostly not received a single overarching post focusing solely on arguing all the details, it has been discussed to varying degrees of possible contention points. Thou Art Physics which focuses on getting the idea that you are made up of physics into your head, Identity Isn't in Specific Atoms which tries to dissolve the common intuit...
it fits with that definition
Ah, I rewrote my comment a few times and lost what I was referencing. I originally was referencing the geometric meaning (as an alternate to your statistical definition), two vectors at a right angle from each other.
But the statistical understanding works from what I can tell? You have your initial space with extreme uncertainty, and the orthogonality thesis simply states that (intelligence, goals) are not related — you can pair some intelligence with any goal. They are independent of each other at this most basic level. This...
I'm skeptical of the naming being bad, it fits with that definition and the common understanding of the word. The Orthogonality Thesis is saying that the two qualities of goal/value are not necessarily related, which may seem trivial nowadays but there used to be plenty of people going "if the AI becomes smart, even if it is weird, it will be moral towards humans!" through reasoning of the form "smart -> not dumb goals like paperclips". There's structure imposed on what minds actually get created, based on what architectures, what humans train the AI on, etc. Just as two vectors can be orthogonal in R^2 while the actual points you plot in the space are correlated.
I agree, though I haven't seen many proposing that, but also see So8res' Decision theory does not imply that we get to have nice things, though this is coming from the opposite direction (with the start being about people invalidly assuming too much out of LDT cooperation)
Though for our morals, I do think there's an active question of which pieces we feel better replacing with the more formal understanding, because there isn't a sharp distinction between our utility function and our decision theory. Some values trump others when given better tools. Though ...
Suffering is already on most reader's minds, as it is the central advocating reason behind euthanasia — and for good reason. I agree that policies which cause or ignore suffering, when they could very well avoid such with more work, are unfortunately common. However, those are often not utilitarian policies; and similarly many objections to various implementations of utilitarianism and even classic "do what seems the obviously right action" are that they ignore significant second-order effects. Policies that don't quantify what unfortunate incentives they ...
This doesn't engage with the significant downsides of such a policy that Zvi mentions. There are definite questions about the cost/benefits to allowing euthanasia, even though we wish to allow it, especially when we as a society are young in our ability to handle it. Glossing the only significant feature being 'torturing people' ignores:
Any opinions on how it compares to Fun Theory? (Though that's less about all of utopia, it is still a significant part)
I think that is part of it, but a lot of the problem is just humans being bad at coordination. Like the government doing regulations. If we had an idealized free market society, then the way to get your views across would 'just' be to sign up for a filter (etc.) that down-weights buying from said company based on your views. Then they have more of an incentive to alter their behavior. But it is hard to manage that. There's a lot of friction to doing anything like that, much of it natural. Thus government serves as our essential way to coordinate on importa...
I definitely agree that it doesn't give reason to support a human-like algorithm, I was focusing in on the part about adding numbers reliably.
I believe a significant chunk of the issue with numbers is that the tokenization is bad (not per-digit), which is the same underlying cause for being bad at spelling. So then the model has to memorize from limited examples what actual digits make up the number. The xVal paper encodes the numbers as literal numbers, which helps. Also Teaching Arithmetic to Small Transformers which I forget somewhat, but one of the things they do is per-digit tokenization and reversing the order (because that works better with forward generation). (I don't know if anyone has...
Yes, in principle you can get information on scheming likelihood if you get such an AI (that is also weak enough that it can't just scheme its way out of your testing apparatus). I do think making the threat credible is hard if we loosely extrapolate costs out: burning a trained up model is not cheap. The cost depends on how high you think prices for training/inference will fall in the future, and how big/advanced a model you're thinking of. Though I do think you can get deceptiveness out of weaker models than that, though they're also going to be less cap...
https://www.mikescher.com/blog/29/Project_Lawful_ebook is I believe the current best one, after a quick search on the Eliezerfic discord.
Minor: the link for Zvi's immoral mazes has an extra 'm' at the start of the part of the path ('zvi/mimmoral_mazes/')
Because it serves as a good example, simply put. It gets the idea clear across about what it means, even if there are certainly complexities in comparing evolution to the output of an SGD-trained neural network.
It predicts learning correlates of the reward signal that break apart outside of the typical environment.
When you look at the actual process for how we actually start to like ice-cream -- namely, we eat it, and then we get a reward, and that's why we like it -- then the world looks a a lot less hostile, and misalignment a lot less likely.
Yes, t...
Is this a prediction that a cyclic learning rate -- that goes up and down -- will work out better than a decreasing one? If so, that seems false, as far as I know.
https://www.youtube.com/watch?v=GM6XPEQbkS4 (talk) / https://arxiv.org/abs/2307.06324 prove faster convergence with a periodic learning rate. On a specific 'nicer' space than reality, and they're (I believe from what I remember) comparing to a good bound with a constant stepsize of 1. So it may be one of those papers that applies in theory but not often in practice, but I think it is somewhat indicative.
I agree with others to a large degree about the framing/tone/specific-words not being great, though I agree with a lot the post itself, but really that's what this whole post is about: that dressing up your words and saying partial in-the-middle positions can harm the environment of discussion. That saying what you truly believe then lets you argue down from that, rather than doing the arguing down against yourself - and implicitly against all the other people who hold a similar ideal belief as you. I've noticed similar facets of what the post gestures at,...
Along with what Raemon said, though I expect us to probably grow far beyond any Earth species eventually, if we're characterizing evolution as having a reasonable utility function then I think there's the issue of other possibilities that would be more preferable.
Like, evolution would-if-it-could choose humans to be far more focused on reproducing, and we would expect that if we didn't put in counter-effort that our partially-learned approximations ('sex enjoyable', 'having family is good', etc.) would get increasingly tuned for the common environments.
Si...
I assume what you're going for with your conflation of the two decisions is this, though you aren't entirely clear on what you mean:
If your original agent is replacing themselves as a threat to FDT, because they want FDT to pay up, then FDT rightly ignores it. Thus the original agent, which just wants paperclips or whatever, has no reason to threaten FDT.
If we postulate a different scenario where your original agent literally terminally values messing over FDT, then FDT would pay up (if FDT actually believes it isn't a threat). Similarly, if part of your values has you valuing turning metal into paperclips and I value metal being anything-but-paperclips, I/FDT would pay you to avoid tu...
Utility functions are shift/scale invariant.
If you have and , then if we shift it by some constant to get a new utility function: and then we can still get the same result.
If we look at the expected utility, then we get:
Certainty of :
50% chance of , 50% chance of nothing:
I think this might be where you got confused? Now the expected values are differe...
Just 3 with a dash of 1?
I don't understand the specific appeal of complete reproductive freedom. It is desirable to have that freedom, in the same way it is desirable to be allowed to do whatever I feel like doing. However, that more general heading of arbitrary freedom has the answer of 'you do have to draw lines somewhere'. In a good future, I'm not allowed to harm a person (nonconsensually), and I can't requisition all matter in the available universe for my personal projects without ~enough of the population endorsing it, and I can't reproduce / const...
I'm also not sure that I consider astronomical suffering outcome (by how its described in the paper) to be bad by itself.
If you have (absurd amount of people) and they have some amount of suffering (ex: it shakes out that humans prefer some degree of negative-reinforcement as possible outcomes, so it remains) then that can be more suffering in terms of magnitude, but has the benefits of being more diffuse (people aren't broken by a short-term large amount of suffering) and with less individual extremes of suffering.
Obviously it would be bad to have a wor...
I primarily mentioned it because I think people base their 'what is the S-risk outcome' on basically antialigned AGI. The post has 'AI hell' in the title and uses comparisons between extreme suffering versus extreme bliss, calls s-risks more important than alignment (which I think makes sense to a reasonable degree if antialigned s-risk is likely or a sizable portion of weaker dystopias are likely, but I don't think makes sense for antialigned being very unlikely and my considering weak dystopias to also be overall not likely) . The extrema argument is why...
. If I imagine trading extreme suffering for extreme bliss personally, I end up with ratios of 1 to 300 million – e.g., that I would accept a second of extreme suffering for ten years of extreme bliss. The ratio is highly unstable as I vary the scenarios, but the point is that I disvalue suffering many orders of magnitude more than I value bliss.
I also disvalue suffering significantly more than I value happiness (I think bliss is the wrong term to use here), but not to that level. My gut feeling wants to dispute those numbers as being practical, but I'l...
In the language of Superintelligent AI is necessary for an amazing future but far from sufficient, I expect that the majority of possible s-risks are weak dystopias rather than strong dystopias. We're unlikely to succeed at alignment enough and then signflip it (like, I expect strong dystopia to be dominated by 'we succeed at alignment to an extreme degree' ^ 'our architecture is not resistant to signflips' ^ 'somehow the sign flips'). So, I think literal worse-case Hell and the immediate surrounding possibilities are negligible.
I expect that the extrema ...
...That said, I do think there’s more overlap (in expectation) between minds produced by processes similar to biological evolution, than between evolved minds and (unaligned) ML-style minds. I expect more aliens to care about at least some things that we vaguely recognize, even if the correspondence is never exact.
On my models, it’s entirely possible that there just turns out to be ~no overl
The Principles of Deep Learning Theory uses renormalization group flow in its analysis of deep learning, though it is applied at a 'lower level' than an AI's capabilities.
One minor thing I've noticed when thinking on interpretability is that of in-distribution versus out-of-distribution versus - what I call - out-of-representation data. I would assume this has been observed elsewhere, but I haven't seen it mentioned before.
In-distribution could be considered inputs in the same ''structure'' of what you trained the neural network on; out-of-distribution is exotic inputs, like an adversarially noisy image of a panda or a picture of a building for an animal-recognizer NN.
Out-of-representation would be when you have a neural ...
I initially wrote a long comment discussing the post, but I rewrote it as a list-based version that tries to more efficiently parcel up the different objections/agreements/cruxes.
This list ended up basically just as long, but I feel it is better structured than my original intended comment.
(Section 1): How fast can humans develop novel technologies
While human moral values are subjective, there is a sufficiently large shared amount that you can target at aligning an AI to that. As well, values held by a majority (ex: caring for other humans, enjoying certain fun things) are also essentially shared. Values that are held by smaller groups can also be catered to.
If humans were sampled from the entire space of possible values, then yes we (maybe) couldn't build an AI aligned to humanity, but we only take up a relatively small space and have a lot of shared values.
The AI problem is easier in some ways (and significantly harder in others) because we're not taking an existing system and trying to align it. We want to design the system (and/or systems that produce that system, aka optimization) to be aligned in the first place. This can be done through formal work to provide guarantees, lots of code, and lots of testing.
However, doing that for some arbitrary agent or even just a human isn't really a focus of most alignment research. A human has the issue that they're already misaligned (in a sense), and there are many ...
I'm confused, why does that make the term no longer useful? There's still a large distinction between companies focusing on developing AGI (OpenAI, Anthropic, etc.) vs those focusing on more 'mundane' advancements (Stability, Black Forest, the majority of ML research results). Though I do disagree that it was only used to distinguish them from narrow AI. Perhaps that was what it was originally, but it quickly turned into the roughly "general intelligence like a smart human" approximate meaning we have today.
I agree 'AGI' has become an increasingly vague t... (read more)