Expectations Influence Reality (and AI)

Edit: I want to be very clear that I'm not espousing the law of attraction. Instead, I'm saying that expectations influence reality—but always indirectly, through our actions and behavior (as informed by our beliefs). The most accurate level of description will always follow the laws of physics, without any spooky direct influences from our beliefs or expectations.

Epistemic status: Exploratory. This idea is based, in part, on hard-to-explain personal experiences with enlightenment states. I know I can explain the underlying model that led me to this conclusion, but it will be time-consuming, and I wanted to publish some of the "implications" from the model first. This post may make more sense to people who have experienced enlightenment or are at least familiar with the three marks of existence.

I've been following LessWrong for about two years, which is just as long as I've been an effective altruist. This topic is weird for a first post; I hope you'll forgive me for honestly stating something that I feel is very important and neglected in this community: the language used to talk about AGI progress and the expectations evoked by that language—both in capabilities and in alignment theory. What expectations should we have about the pace of AGI progress? Is there a right way to feel? I don't think there is a universally true answer (devoid of any implicit framing), but I will argue for a pragmatic response. This also has "self-improvement" applications, although that's not my intention.

Expectations Influence Your Future

When you're trying to update your beliefs about the larger world to match reality, it's helpful to assume that your expectations play no role in how reality will turn out. If you assume otherwise and look for evidence to confirm your current expectations, you will have fallen prey to confirmation bias. That's all fine and good until reality slaps you in the face with disconfirming evidence. (I could link a hundred LessWrong articles here; these are not new ideas.) The lesson we're supposed to take away is that if our beliefs don't match reality, then our beliefs are the problem and must be shifted in the right direction.

However, there are occasions when it's more helpful to adjust reality first and let your beliefs follow suit after—namely, in your personal life.

Suppose that you are depressed and feel hopeless. Life seems awful, and things you previously enjoyed, such as listening to music, seem dull, flat, or empty. Depression is very complex and diverse, but one piece of advice that helped me was to focus on changing my environment and behavior first before trying to challenge unrealistic thoughts.

Consider this line of dialogue:

Dylan the Depressed: "This glass of water seems half-empty. Everything seems half-empty to me.

Rowan the Realist: "Well, the glass isn't half-empty; it's just half. It's neither good nor bad, but thinking makes it so. You should update your beliefs accordingly."

Dylan the Depressed: "I can't. No matter how hard I try, the glass seems half-empty to me."

Penny the Pragmatist: "Here, why not add some more water to your glass? Then it won't be half-empty."

Dylan the Depressed: "But it will still seem half-empty on some level, even though I'll know intellectually that it's not. My beliefs won't update that fast."

Penny the Pragmatist: "Fair enough. But if you repeat small, manageable actions like this over time, you might find your beliefs gradually shifting along with them."

This is the most obvious case I could imagine in which it's often more helpful to shift your personal reality (or environment) first before adjusting your beliefs—as long as you already believe sufficiently in your ability to influence reality. That said, the law of equal and opposite advice says that if this doesn't work, perhaps you need to work on changing your beliefs first.

But I don't want to introduce a false dichotomy here; instead, I'd like to offer a gradient model as follows.

Adjust Reality | Adjust Both Simultaneously | Adjust Beliefs

In my mind, I envision this ideally functioning like a pendulum. First, you change your environment to have more reliable evidence and tools for reasoning. Next, you'll notice that your beliefs are wrong, so you'll update them. Then back you go. Some rationalists may be spending too much time on the right side of this pendulum. I think this has important implications for how AGI progress unfolds.

The example in the dialogue above applied to expectations in everyday life. I'd argue, however, that our expectations not only affect our personal lives but also contribute to the expectations of those around us. We might call these collective expectations "culture."

Cultural Expectations Influence The Future

I'm guessing that this is where some readers will object that I'm implying that "culture" affects reality from the top-down. That is, in a sense, what I'm saying—and I intend to write about that someday—but what I'm also saying is that even if you're a staunch reductionist, you may find it pragmatic to model culture as affecting reality because that may explain why some cultural endeavors in business, politics, and science succeed while others fail.

Unfortunately, I can't offer detailed examples off the top of my head, but if any come to mind, feel free to share them in the comments. I decided that adding more detail to the examples below might not be helpful because I'm trying to communicate how to apply this model to your existing knowledge to look for patterns. At any rate, this makes a lot of sense of the "cultural endeavors" (in their broadest sense) of which I'm aware. The last bullet point is the most important and the most speculative.

Ray Kurzweil and Peter Diamandis exemplify an extreme technological optimism that seems a bit out of touch with reality. Perhaps their optimism has helped catapult both of them to success, but I'd say they are too focused on changing their reality rather than their beliefs. Maybe that is why Singularity University, which they founded, has been mired in controversy since its inception.
On the other hand, some postmodernists operate in a culture focused on changing beliefs much more than altering their reality. They look out into the world and may see it relatively clearly. Still, they may be so intent on using polysemic terms to categorize everything that it may not look (from an outsider's perspective) that they have accomplished much. (That said, I think there is valuable research being done by some postmodernists—especially in identifying the pieces that comprise dialogue.)
Greta Thunberg's activism around climate change has made a massive splash on the world stage. I think this is because she is "lucky" to be incredibly focused on changing reality rather than her beliefs at a time when her beliefs happen to accurately describe a major global problem, climate change, that has been downplayed by world leaders.
When physicists discovered nuclear fission at the end of World War II, well... history is complicated, and I don't know much about how it went down; here's one link. But the culture around its discovery was pessimistic in the sense that it happened against the backdrop of a world war. I wonder if nuclear bombs wouldn't have been invented until much later had nuclear fission been discovered at a different time. Maybe nuclear power would have been more commonplace.

Bizarre Speculation: EfficientZero

In a recent post about EfficientZero, 1a3orn says:

Honestly, I'm... still puzzling over why this technique [self-supervised consistency loss] works so well.

Despite the just-so stories mentioned above about how intelligence surely involves prediction, I'm dubious about my ability to retrodict these results.

Here's part of the problem, as I see it. The basic architecture here dictates that the state representation should be, when training starts, predictive of an initially random projection of future states. But learning to be predictive of such random future states seems like it falls subject to exactly the same problem as learning to be predictive of future observations: you have no guarantee that EfficientZero will be learning relevant information, which means it could be wasting network capacity on irrelevant information. There's a just-so story you could tell where adding this extra predictive loss results in worse end-to-end behavior because of this wasted capacity, just like there's a just-so story where adding this extra predictive loss results in better end-to-end behavior because of faster training. I'm not sure why one turned out to be true rather than the other.

My radical suggestion is that maybe the expectations around EfficientZero contributed to this better end-to-end behavior. How? Well, EfficientZero is an agent, just like we are agents. My suggestion is that EfficientZero has expectations about itself that are shaped, in subtle ways, by the expectations of the people who have worked on it. It wants to behave well (from our perspective) because it has been designed that way. That may sound blasphemous, but I'm not saying that expectations violate the laws of physics and act mysteriously. It just looks that way—but so convincingly that it's essentially true. With hindsight bias, you'll be able to look back and say that one just-so story was clearly right all along, but before you can do that, I think it makes a lot of sense to attribute agency or intentionality or emergent consciousness to EfficientZero as the explanatory—

"Hold it right there!" a fairy said, appearing in a blaze of light. "Without explaining what you mean by those terms, they are mere curiosity-stoppers."

Ah, okay. Thank you for alerting me to the error of my ways. Yes, I know that defining something doesn't explain it. I understand that. All I'm asking you all to do is consider the paradox of consciousness again—that famous hard problem. How do you feel alive and feel like you have some control over the world, even though you seem to be made out of stuff which isn't alive and that follows strict laws of physics?

It turns out that there is an answer. An increasingly large number of people have figured it out—and everyone has figured it out, at least to some extent. People who experience enlightenment say weird things like, "You, the actor is identical with everything." In other words, there is no separation between your sense of self and everything you witness in the world. When you expect yourself to do something, it is as if you're directly causing and being caused by external forces to do that thing. Both levels of explanation are occurring simultaneously, and they're both helpful models at different times.

I'm bringing this up because I feel that it is vital for the long-term success of AI alignment that we move away from a perspective in which we're desperately trying to trap AI in a box that it will never be able to escape and towards a mindset in which we're creating a positive, nurturing environment that would allow everyone, including AIs, to win. We can frame this as a positive-sum game rather than a zero-sum game. In practice, that may mean that we need to minimize our alignment tax (how many dimensions we want an AI to optimize) and think about how to build in "rest time" or "dreams" or "sleep" into the design of AIs (these phrases no doubt reveal my ignorance, but that's okay). As humans, we enjoy autonomy, but not too much of it. If we don't have constraints, we might go insane—but not having constraints is just as bad. That doesn't imply anything about AIs because they're built differently, but it seems to me that these considerations are worth taking into account.

Managing the Culture Around AI Development

To sum up, I'm arguing that the culture surrounding any collective endeavor, including the development of new technology, influences the trajectory of how that endeavor plays out—for better or for worse.

I don't have a good sense of the culture around AI development as someone watching from the sidelines, so it could turn out that I'm preaching to the choir. At any rate, we need to put more thought into crafting a culture of cautious optimism and respect (or something similar) for AI development and alignment theory rather than a culture of fear and intense urgency. In my view, this is important for pragmatic reasons and because AI may assimilate the expectations for itself that are exemplified in its design structure and its training data.

The development of new technologies may be a scientific endeavor. Still, their trajectory before and after being deployed is influenced by a world of cultural forces that is every bit as real. Neglecting those forces would be a serious oversight.

I often find myself thinking about this. As in "the law of attraction: thinking/belief directly influences reality". I think it might be true, in this strange world that we live in of which we know so little.

However, I find it very hard to remain optimistic when I feel that humanity has not only extinction pointed at their heads, but probably also something worse (s-risks (don't get into this topic if you don't wanna risk your sanity)). Add that to the opinion that the alignment problem is nearly impossible because you can't control something more intelligent than yourself, plus the vulnerable world structure, plus the fact that computer systems are so fallible, plus the possibly short timelines to AGI...

If we had 100 years maybe we could do it and remain optimistic, but we might not even have 10.

When this is your worldview of the facts, it's very hard to have any hope and to not be paranoid. I think evolution gave us paranoia as a last resort strategy to splash about and think/try everything possible. If everyone actually became paranoid/super-depressed about this, I think that would be our only chance to effectively stop AGI development until we have the alignment problem solved. Would take massive changes to world government etc, but it's possible. Rats in lab experiments have no chance at all to alter their grim fate, but we do, if little.

But I don't know. Maybe no pessimism ever helps anyway. Maybe optimismic law of attraction does exist. I really don't know. I just find it impossible to have hope when I look at the situation.

It's interesting you mention the law of attraction. I think our beliefs don't directly influence reality, but they do influence reality indirectly—through the things we say to one another, through our actions, and through the things we consciously notice around us.

As I pointed out in the post, a depressed person can be exposed to the same stimuli and have a different reaction than a non-depressed person. In this way, their beliefs influence reality—but always according to the laws of physics. When you get down to it, the most accurate story of cause and effect will always be told in terms of physics, but I think there are some cases when you can maintain a fairly accurate summary by talking in terms of beliefs and expectations.

Even though I don't currently share your pessimism, I do share your fear. I'm worried a lot about the future, including about s-risks. That's truly what scares me the most.

Personally, I don't think the alignment problem is one where we'll know when it's solved or not. We'll just get closer and closer to better alignment, never approaching perfection. If the future goes "well," then AI will stay aligned for a long time. If it goes badly, then AI won't be aligned as long.

One thing that gives me solace is to remember that even if the universe were to be tiled with paperclips or something, new forms of life would still (eventually) arise from the chaos—just as it has in the past. Just as importantly, even if the worst (or best) outcomes happen, they won't last forever. Entropy won't allow it. I hope we can achieve a future that avoids extremes on either end—a balanced existence of pleasure and pain that can last a long time. (Due to the symmetries in physics, along with the Qualia Research Institute's symmetry theory of valence, think that pleasure and pain must always be dished out in equal proportions in the grand scheme of things. But I'll get to that in a future post.)

Paragraphs:

Alignment is based on mathematical theorems like game theory and other stuff. If you can find a mathematically proven solution, it willl stick (at least that's the consensus I guess). Still things like viruses or bugs could occur, but here we have an intelligent agent equipped to deal with that, and no hacker is smart enough to hack something a million times more intelligent than him.
Can't see life arising from aluminium alone.

Forever is a really long time.

Your view of pleasure and pain should become more influenced by science than philosophy, since the latter has usually got it wrong. Pain is a stupidity in the modern human era, it's an extremely outdated survival mechanism. Look up Jo Cameron, a woman who was born immune to pain. She reported high happiness, sanity, was completely functional. Some people will go through little pain in their life's. Others, absurd amounts. All comes down to luck, not balance. We could eliminate pain forever (with gene therapy). Or make world a living hell. There's no advantage in having pain. Nor dues pleasure make up for it. Humans seek stability, not an endless orgasm. In fact that would become pain as well, it would fry you up.