
Why Everyone (Else) Is a Hypocrite: Evolution and the Modular Mind
Concept Safety
Multiagent Models of Mind
Keith Stanovich: What Intelligence Tests Miss

Wiki Contributions


I really like this post for being an appropriately nuanced look at both how to strive for honesty and also reasons not to.

I've seen some takes on the topic that seem to arise from either a deep discomfort of ever having to deal with anything at all misleading, or just from a straight-out desire to minimize social effort and use honesty as an excuse to blurt out whatever is on your mind. This post is very much not that, and rather goes into detail on how being deeply honest actually requires putting in more work to do it well.

No one (to my knowledge?) highlighted that the future might well go as follows:
“There’ll be gradual progress on increasingly helpful AI tools. Companies will roll these out for profit and connect them to the internet. There’ll be discussions about how these systems will eventually become dangerous, and safety-concerned groups might even set up testing protocols (“safety evals”). Still, it’ll be challenging to build regulatory or political mechanisms around these safety protocols so that, when they sound the alarm at a specific lab that the systems are becoming seriously dangerous, this will successfully trigger a slowdown and change the model release culture from ‘release by default’ to one where new models are air-gapped and where

Hmm, I feel like I always had something like this as one of my default scenarios. Though it would of course have been missing some key details such as the bit about model release culture, since that requires the concept of widely applicable pre-trained models that are released the way they are today. 

E.g. Sotala & Yampolskiy 2015 and Sotala 2018 both discussed there being financial incentives to deploy increasingly sophisticated narrow-AI systems until they finally crossed the point of becoming AGI.

S&Y 2015:

Ever since the Industrial Revolution, society has become increasingly automated. Brynjolfsson [60] argue that the current high unemployment rate in the United States is partially due to rapid advances in information technology, which has made it possible to replace human workers with computers faster than human workers can be trained in jobs that computers cannot yet perform. Vending machines are replacing shop attendants, automated discovery programs which locate relevant legal documents are replacing lawyers and legal aides, and automated virtual assistants are replacing customer service representatives.

Labor is becoming automated for reasons of cost, efficiency and quality. Once a machine becomes capable of performing a task as well as (or almost as well as) a human, the cost of purchasing and maintaining it may be less than the cost of having a salaried human perform the same task. In many cases, machines are also capable of doing the same job faster, for longer periods and with fewer errors. In addition to replacing workers entirely, machines may also take over aspects of jobs that were once the sole domain of highly trained professionals, making the job easier to perform by less-skilled employees [298].

If workers can be affordably replaced by developing more sophisticated AI, there is a strong economic incentive to do so. This is already happening with narrow AI, which often requires major modifications or even a complete redesign in order to be adapted for new tasks. ‘A roadmap for US robotics’ [154] calls for major investments into automation, citing the potential for considerable improvements in the fields of manufacturing, logistics, health care and services.

Similarly, the US Air Force Chief Scientistʼs [78] ‘Technology horizons’ report mentions ‘increased use of autonomy and autonomous systems’ as a key area of research to focus on in the next decade, and also notes that reducing the need for manpower provides the greatest potential for cutting costs. In 2000, the US Congress instructed the armed forces to have one third of their deep strike force aircraft be unmanned by 2010, and one third of their ground combat vehicles be unmanned by 2015 [4].

To the extent that an AGI could learn to do many kinds of tasks—or even any kind of task—without needing an extensive re-engineering effort, the AGI could make the replacement of humans by machines much cheaper and more profitable. As more tasks become automated, the bottlenecks for further automation will require adaptability and flexibility that narrow-AI systems are incapable of. These will then make up an increasing portion of the economy, further strengthening the incentive to develop AGI. Increasingly sophisticated AI may eventually lead to AGI, possibly within the next several decades [39, 200].

Eventually it will make economic sense to automate all or nearly all jobs [130, 136, 289].

And with regard to the difficulty of regulating them, S&Y 2015 mentioned that:

... there is no clear way to define what counts as dangerous AGI. Goertzel [115] point out that there is no clear division between narrow AI and AGI and attempts to establish such criteria have failed. They argue that since AGI has a nebulous definition, obvious wide-ranging economic benefits and potentially significant penetration into multiple industry sectors, it is unlikely to be regulated due to speculative long-term risks.

and in the context of discussing AI boxing and oracles, argued that both AI boxing and Oracle AI are likely to be of limited (though possibly still some) value, since there's an incentive to just keep deploying all AI in the real world as soon as it's developed:

Oracles are likely to be released. As with a boxed AGI, there are many factors that would tempt the owners of an Oracle AI to transform it to an autonomously acting agent. Such an AGI would be far more effective in furthering its goals, but also far more dangerous.

Current narrow-AI technology includes HFT algorithms, which make trading decisions within fractions of a second, far too fast to keep humans in the loop. HFT seeks to make a very short-term profit, but even traders looking for a longer-term investment benefit from being faster than their competitors. Market prices are also very effective at incorporating various sources of knowledge [135]. As a consequence, a trading algorithmʼs performance might be improved both by making it faster and by making it more capable of integrating various sources of knowledge. Most advances toward general AGI will likely be quickly taken advantage of in the financial markets, with little opportunity for a human to vet all the decisions. Oracle AIs are unlikely to remain as pure oracles for long.

Similarly, Wallach [283] discuss the topic of autonomous robotic weaponry and note that the US military is seeking to eventually transition to a state where the human operators of robot weapons are ‘on the loop’ rather than ‘in the loop’. In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robotʼs actions and interfere if something goes wrong.

Human Rights Watch [90] reports on a number of military systems which are becoming increasingly autonomous, with the human oversight for automatic weapons defense systems—designed to detect and shoot down incoming missiles and rockets—already being limited to accepting or overriding the computerʼs plan of action in a matter of seconds. Although these systems are better described as automatic, carrying out pre-programmed sequences of actions in a structured environment, than autonomous, they are a good demonstration of a situation where rapid decisions are needed and the extent of human oversight is limited. A number of militaries are considering the future use of more autonomous weapons.

In general, any broad domain involving high stakes, adversarial decision making and a need to act rapidly is likely to become increasingly dominated by autonomous systems. The extent to which the systems will need general intelligence will depend on the domain, but domains such as corporate management, fraud detection and warfare could plausibly make use of all the intelligence they can get. If oneʼs opponents in the domain are also using increasingly autonomous AI/AGI, there will be an arms race where one might have little choice but to give increasing amounts of control to AI/AGI systems.

I also have a distinct memory of writing comments saying something "why does anyone bother with 'the AI could escape the box' type arguments, when the fact that financial incentives would make the release of those AIs inevitable anyway makes the whole argument irrelevant", but I don't remember whether it was on LW, FB or Twitter and none of those platforms has a good way of searching my old comments. But at least Sotala 2018 had an explicit graph showing the whole AI boxing thing as just one way by which the AI could escape, that was irrelevant if it was released otherwise:

Nice post! I like the ladder metaphor.

For events, one saving grace is that many people actively dislike events getting too large and having too many people, and start to long for the smaller cozier version at that point. So instead of the bigger event competing with the smaller one and drawing people away from it, it might actually work the other way around, with the smaller event being that one that "steals" people from the bigger one.

Previous LW discussion about the Inner Ring: [1, 2].

Good question! I would find it plausible that it would have changed, except maybe if the people you'd call would be in their fifties or older.

Based on the link, it seems you follow the Theravada tradition. 

For what it's worth, I don't really follow any one tradition, though Culadasa does indeed have a Theravada background.

Yeah, some Buddhist traditions do make those claims. The teachers and practitioners who I'm the most familiar with and trust the most tend to reject those models, sometimes quite strongly (e.g. Daniel Ingram here). Also near the end of his life, Culadasa came to think that even though it might at one point have seemed like he had predominantly positive emotions in the way that some schools suggested, in reality he had just been repressing them with harmful consequences.

Culadasa: As a result of my practice, I had reached a point where emotions would arise but they really had no power over me, but I could choose to allow those emotions to express themselves if they served a purpose. Well, it’s sort of a downweighting of emotions – negative emotions were strongly downweighted, and positive emotions were not downweighted at all. So this was the place I was coming from as a meditation teacher. I just never really experienced anger; when something would cause some anger to arise, I’d notice it and let go of it, and, you know, it wasn’t there. Negative emotions in general were just not part of my life anymore. So it was a process of getting in touch with a  lot of these emotions that, you know, I hadn’t been making space for because I saw them as unhealthy, unhelpful, so on and so forth.

Michael: So, in essence, you had bypassed them.

Culadasa: Yes, it’s a bypassing. I think it’s a very common bypassing, too, when somebody reaches this particular stage on the path. I mean, this is a big of a digression, but I think it maybe helps to put the whole thing into perspective, the rest of our conversation into perspective…

Michael: Please digress.

Culadasa: Okay. So this is a stage at which the sense of being a separate self completely disappears. I mean, prior to that, at stream entry, you know, there’s no more attachment to the ego, the ego becomes transparent, but you still have this feeling that I’m a separate self; it still produces craving; you have to work through that in the next path, and so on and so forth. But this is a stage where that very primitive, that very primal sense of being a separate self falls away. Now, what I know about this from a neuroscience point of view is that there’s a part of the brainstem which was the earliest concentration of neurons that was brain-like in the evolution of brains, and there are nuclei there that were responsible for maintaining homeostasis of the body, and they still do that today. One of their major purposes is to regulate homeostasis in the body, blood pressure, heart rate, oxygenation of the blood, you name it, just every aspect of internal bodily maintenance. With the subsequent development of the emotional brain, the structures that are referred to as the limbic system, evolution provided a way to guide animals’ behaviors on the basis of emotions and so these same nuclei then created ascending fibers into this limbic system, from the brainstem into these new neural structures that constituted the emotional brain.

Michael: So this very old structure that regulated the body linked up with the new emotional structures.

Culadasa: Right. It linked up with it, and the result was a sense of self. Okay? You can see the enormous value of this to an animal, to an organism. A sense of self. My goodness. So now these emotions can operate in a way that serves to improve the survival, reproduction, everything else of this self, right? Great evolutionary advance. So now we have organisms with a sense of self. Then the further evolution of cerebral cortex, all of these other higher structures, then that same sense of self became integrated into that as well. So there we have the typical human being with this very strong, very primal sense that “I am me. I am a separate self.” We can create all kinds of mental constructs around this, but even cats and dogs and deer and mice and lizards and things like that have this sense of self. We elaborate an ego on top of it. So there’s these two aspects to self in a human being. One is the ego self, the mental construct that’s been built around this more primal sense of self. So this is a stage at which that primal sense of self disappears and what usually seems to happen is, at the same time, there is a temporary disappearance of all emotions. I think that we’ll probably eventually find out that the neural mechanism by which we bring about this shift, that these two things are linked, because the sense of self is – its passageway to the higher brain centers, which constitute the field of conscious awareness that we live in and all of the unconscious drives that we’re responding to, the limbic system, the emotional brain, is the link.

Michael: Yes.

Culadasa: So something happens that interrupts that link. The emotions come back online, but they come back online in a different way from that point. So instead of being overcome by fear, anger, lust, joy, whatever, these things arise and they’re something that you can either let go of or not. [laughs] That’s the place where I was.

Michael: They seem very ephemeral…

Culadasa: Yes, right. They’re very ephemeral, and very easy to deal with, and there is a tendency for other people to see you as less emotional and truly you are because you’ve downregulated a lot of more negative emotions. But you’re by no means nonemotional; you’re still human, you still have the full gamut of human emotions available to you. But you do get out of the habit of giving much leeway to certain kinds of emotions. And the work that I was doing with Doug pushed me in the direction of, “Let’s go ahead and let’s experience some of those emotions. Let’s see what it feels like to experience the dukkha of wanting things to be different than the way they are.” So that’s what we did. And I started getting in touch with these emotions and their relationship to my current life situation where I wasn’t fulfilling my greatest aspirations because I was doing a lot of things that – stuff that had to be done, but that I had no interest in, but I had to do it and that’s what occupied my time.

I'm guessing that something similar is what's actually happening for a lot of the schools claiming complete elimination of all negative feelings. Insight practices can be used in ways that end up bypassing or suppressing a lot of one's emotions, but actually negative feelings are still having effects in the person, they just go unnoticed.

If you think about it, you can't be sad and not mind it. You can't be angry but not mind it. 

This disagrees with my experience, and with the experience of several other people I know.


The biggest question on my mind right now is, what does your friend think of this post now that you've written it? 

Agree. This connects to why I think that the standard argument for evolutionary misalignment is wrong: it's meaningless to say that evolution has failed to align humans with inclusive fitness, because fitness is not any one constant thing. Rather, what evolution can do is to align humans with drives that in specific circumstances promote fitness. And if we look at how well the drives we've actually been given generalize, we find that they have largely continued to generalize quite well, implying that while there's likely to still be a left turn, it may very well be much milder than is commonly implied.

Load More