Disclaimer: This is valuable for understanding the AI industry and the landscape that it is a part of, and for surviving the decade. It is NOT an EA cause area, and should not distract from the overriding priority of AGI.

TL;DR

Part 1 (4 minutes): 
Modern social media platforms, and people with backdoor access e.g. botnets, use AI not just to manipulate public opinion, but also to induce and measure a wide variety of measurable behavior, including altering emotions and values/drives. There are overriding incentives to wire the platform and the users to induce akrasia, power-seeking behavior, and feel repulsed by complex thought, not just to keep people using the platform, but also to improve each user’s predictability and data quality, as this keeps them and their environment more similar to most of the millions people that they’ve collected data on.

Part 2 (4 minutes): 
Social media users can also be deliberately used by powerful people to attack the AI safety community, offering a wide variety of creative attacks for attackers to select based on their preferences. As it currently stands, we would be caught with our pants down and they would get away with it, which incentivizes attacks.

Part 3 (6 minutes): 
Various commentary on the implications of these systems, and our failure to spot this until it was too late.

 

Part 1: Optimizing for mediocrity

  1. Social media probably maximizes propensity to return to daily habits, and akratic ones in particular, as well as akratic tendencies and mindsets in general. These correlate strongly with measurably increased social media use and reduced quit rates, as social media use itself is akratic.
    1. Altruistic, intelligent, and agentic tendencies probably induce causality cascades just on their own, as they introduce unusual behavior and tear down schelling fences; large sample sizes of human behavior data is powerful, but individual human minds are still complex (especially elite groups), and data quality and predictiveness increase as more variables are controlled, temporarily making the mind more similar to the millions of other minds that data has already been collected on. At least, while the mind is in the social media news feed environment, which is the best opportunity for data collection due to giving every user a highly similar experience in a controlled environment.
    2. This also yanks people in the opposite direction of what CFAR was trying to do.
  2. Social media systems attempt to maximize the user’s drive to prioritize social reality over objective reality, as social reality drives people to care more about what happens on social media, measurably increasing social media use and reducing quit rates, whereas more focus on objective reality makes AI safety people more likely to optimize their life and measurably reduce social media use and increase quit rates.
    1. Maximally feeding the drive to accumulate and defend percieved social status, vastly in excess of the natural drive that the user would have otherwise experienced.
    2. There is also a mental health death spiral of being strongly pulled between objective reality and social reality, such as Valentine’s Here’s the exit post, or the original quokka thread on twitter.  
  3. There is also the long-term effects of instant gratification, a topic which I know is not new here, and is groan-inducing due to the NYT Op-ed clowns that keep vomiting all over the topic, but it’s still worth noting that instant gratification makes people weak in the same way that dath ilan makes people strong.

The risk here is not prolonged exposure to mediocrity, it is prolonged exposure to intense optimization power, with mediocrity being what is optimized for, as controlling for variables increases data quality and predictive power, which in turn is necessary to maximize use and minimize quit rates.

I think some of these are probably false positives, or were fixed by a random dev, or had a small effect size, but I also doubt all of them are. If there is 1) hill climbing based on massive sample sizes of human behavior data, and 2) continuous experimentation and optimization to measurably increase use time and reduce quit rates/risk, then EA and AI safety people have probably been twisted out of shape in some of the ways described here.

Any one of these optimizations would be quite capable of twisting much of the AI safety community out of shape, as tailored environments (derived from continuous automated experimentation on large-n systems) create life experiences optimized in the direction of fitting the human mind like a hand fitting inside of a glove. At the scale that this optimization is taking place, comparing millions or even billions of minds to each other to find webs of correlations and predict future behavior such as belief formation, effects should be assumed to be intense by default.

Although algorithmic progress (from various ML applications, not LLMs) theoretically give tech companies more leeway to choose to detect and reduce these effects wherever they appear, they have little incentive to do so. Market competition means that they might successfully coordinate to steer systems and people away from highly visible harms like sleep deprivation, but less visible harms (like elite thinking being rendered mediocre) is harder to coordinate the industry around, due to the need to interfere with the algorithms that maximize use and the difficulty of measurement, and small sample sizes for strange effects on elite groups.  

Furthermore, the system is too large and automated to manage. A user could receive 100 instances of manipulation per hour or more, with most of the instances being calculated by automated systems on the same day as they are deployed, and with each instance taking the form of a unique video or post (or combination of videos or posts) that deploys a concept or combination of concepts in strategic ways (unlike LLMs, these concepts and combinations are measured entirely by measurable effect such as reducing quit rates or causing frequent negative reactions to a targeted concept, rather than understanding of the content of the message; only the results are being measured, and there is no limit to how complex or galaxy-brained the causal mechanism can be, affecting the deep structures of the brain from all sorts of angles, so long as it steers users behavior in the correct direction e.g. reducing quit rates).

This is why social media use, in addition to vastly increasing the attack surface of the AI safety community in a changing world, is also likely to impede or even totally thwart self-improvement. Inducing mediocrity controls for variables in a complex system (the human mind) which increases data quality and behavior prediction success rates, but inducing mediocrity also reduces altruism effectiveness substantially.  

Therefore, even in a world where deliberate adversarial use of social media was somehow guaranteed to not happen by the people running or influencing the platforms, which is not the world we live in, social media users would still not be able to evaluate whether social media use is appropriate for the AI safety community.

 

Part 2: Deliberate use by hostile actors

Of course, if anyone wanted to, they could crank up any of these dynamics (or all of them at once), or alternate between ramping the entire community up or down along major events like the FTX collapse, or even targeting specific kinds of people. Intelligence is not distributed evenly among humans, but most people tend to think they’re clever and pat themselves on the back for thinking of a way to make something they did look like an accident or like natural causes (or, in particular, making it look like someone else did it, which is also one of the candidates for “why human intelligence evolved to sometimes succeed at all at observing objective reality instead of just social reality” as these capabilities caused people to successfully eliminate less-savvy rivals, increasing the probability that they themself become the tribe’s chief and subsequently maximize their own offspring). This situation becomes worse when these “natural causes” are prevalent on their own.

Most or all of these dynamics, and many others not listed here, can be modulated as variables based on what maximizes visible disruption. Notably, in particular, including making individuals in the community increasingly fixated on low-trust behaviors such as obsessions with power games or zero-sum social status competitions, which causes distrust death spirals.

Precise steering is unnecessary if you can adjust social graphs and/or turn people into sociopaths who see all of AI safety as their personal sacrificial lamb (as sociopaths do). Nihilism is a behavior that is fairly easy to measure, and therefore easy to induce.

This also includes clown attacks, unpersuasive bumbling critics of social media cause people to perceive criticism of social media as low-status, which measurably reduces quit rates (regardless of whether the devs are aware of the exact causal dynamic at all). It’s probably not very hard to trigger social media arguments at strategic times and places, because there are billions of case studies of tension building up and releasing as a person scrolls through a news feed (with each case study accompanied by plenty of scrolling biodata).

Social media platforms face intense market incentives to set up automatic optimization to make people feel safe, as users only use it if they think they are safe, resulting in rather intense and universal optimization on the platform to combine posts generate a wide variety of feelings that increase the probability that a diverse variety of users all end up feeling safe while on the platform (including, but not limited to, unintentionally persuading them to feel unsafe when they are off the platform, e.g. increasing visibility of content that makes them worry that they will accidentally say something racist IRL, losing all of their social status, if they are not routinely using the platform and staying up-to-date about the latest things that recently became racist to say).

Artificial manufacturing of Ugh Fields around targeted concepts is trivial, although reliably high success rates are much harder, especially for extraordinary individuals and groups (although even for harder targets, there are so many angles of attack that eventually something will stick in a highly measurable way).

News feeds are capable of maximizing the feeling of relief and satisfaction, while simultaneously stimulating cognition such that people are drained in ways less percievable than stress as it is conventionally understood, similar to how Valentine managed to notice that he felt drained a few hours after drinking coffee. Social media platforms are allowed to mislead users into falsely believing that stress is being relieved, and this would happen by default if a false belief of stress relief maximizes use, while simultaneously the best and most useful data happens to be produced by stimulating parts of the brain that are easily spent over the course of the day, or are useful for other things such as creative thinking. Or maybe that combination happens to be what minimizes quit rates e.g. “feeling engaged”.

Attacks (including inducing or exacerbating periods of depression or akrasia or Malthusian nihilism such as the post-FTX Malthusian environment) can even be ramped up and down to see what happens; or for no reason, just to trip our sensors or see if we notice at all. That is how great the power difference is with the current situation.

Exposing yourself to this degree of optimization pressure, in an environment as hostile and extractionary as this, is just not a reasonable decision. These systems are overwhelmingly stacked towards capabilities to observe and cause changes in human behavior, including beliefs and values.

Data collected and stored on AI safety-focued individuals now can also be used against individuals and orgs and communities in the near future, as AI safety becomes more prominent, as algorithmic progress improves (again, ML, not LLMs), and as power changes hands to people potentially less squeamish. 

 

Part 3: Implications and Commentary

 

Decoupling that’s actually hard

This is a great opportunity to practice decoupling where it’s actually hard. The whole thing with high decoupling and low decoupling revolves around a dynamic where people have a hard time decoupling on issues related to matters that they actually care about. AGI is an issue that people correctly predict matters a ton (the enemy’s gate is down), whereas religious worship and afterlives is an issue that people incorrectly predict matters a ton. 

The variation within a genetically diverse species like humans implies that plenty of people will have a hard time taking AI safety seriously in the first place, whereas plenty of others will have a hard time decoupling AI safety from the contemporary use of AI manipulation. Succeeding at both of these things is necessary to intuitively and correctly understand that the contemporary use of AI for manipulation is instrumentally valuable to understand in order to understand AI race dynamics and the community attack surface, while simultaneously not being valuable enough to compete with AI safety as an EA cause area. People like Gary Marcus did not even pass the first hurdle, and people who can’t pass both hurdles (even after I’ve pointed them out) are not the intended audience here.

A big problem is that there’s just a ton of people who don’t take AI safety seriously, or failed to take it seriously than much smaller issues, and over time learned to be vague about this whenever they’re in public, because whenever they reveal that they don’t take AI safety seriously they get RLHF’d into not doing that by the people in the room who try to explain why that’s clearly wrong. These RLHF’d people are basically impostors and they’re unambiguously the ones at fault here, but it sure does make it hard to write and get upvoted about topics directly decision relevant for the AGI situation but aren’t themselves about AGI at all. 

After understanding mass surveillance, extant human manipulation and research, the tech industry, and AI geopolitics as well as possible, you are ready to understand the domain that AI safety takes place in, and have solid world models to use on issues that will actually matter in the end, like AGI, or community building/health/attack surface.

Inscrutable and rapidly advancing capabilities within SOTA LLMs might make it difficult for people to mentally disentangle modern user data-based manipulation from AGI manipulation (e.g. the probably hopeless task of AI boxing), but it would be nice if people could at least aspire to rise to the challenge of coherently disentangling important concepts.

 

Inner vs Outer community threat model

The focus on internal threats within the AI safety community is associated with high social status, whereas the focus on external threats facing the AI safety community is associated with low status. 

This is because the inner threat model implies the ability to detect and eliminate threats within the community, which in turn implies the ability to defeat other community members while avoiding being defeated by them, which implies the capability to ascend.

Meanwhile, an outer threat model is low status, as it signals a largely gearless model of the AI safety community, similar to Cixin Liu’s idealistic, conflict-free depiction of international affairs in his novel, The Three-Body Problem

In reality, there is something like a Mexican standoff between various AI safety orgs and individuals, as it’s difficult to know whether to share or hoard game-changing strategic information, especially since you don’t know what other game-changing strategic information will be discoverable downstream of any piece of strategic information that is shared by you or your org. 

After getting to know someone well, you can be confident that they look friendly and sane, but you can’t be confident that they’re not a strategic defector who is aware of the value of spending years looking friendly and sane, and even given that they are friendly and sane in the present, it is hard to be confident that they won’t turn unfriendly or insane years in the future.

Needless to say, there are in fact external threats, and the AI safety community could easily be destroyed or twisted by them choosing to use technology that already exists and has existed for a long time. 

The dynamics described in Geeks, Mops, and Sociopaths, where sociopaths inescapably infiltrate communities, exploit goodhart’s law to pose as geeks, and eliminate the geeks without the geeks even knowing what happened, is trivial to artificially manufacture (even in elite groups) for people who control or influence a major social media platform. This is because sociopaths are trivial to artificially manufacture, as the first section of this post demonstrated that nihilism is trivial to artificially manufacture if someone can modulate the variables and maximize for nihilism (though only to the extent that it is measurable). 

You don’t even need to model the social graph and turn people in strategic locations (although this is also possible with sufficient social graph research, possibly requiring sensor data), because sociopaths automatically orient themselves even if spawned in random locations, like scattering a drone swarm, and then the drones autonomously hone in on targets at key choke points.

This is only one facet of the AI safety community’s attack surface, there are so many others. Even one org that becomes compromised or influenced becomes a threat to all the others. Giving attackers wiggle room to enter the space and sow divisions between people, launching attacks from perfect positions, acting as a third party that turns two inconvenient orgs against each other, or just outright steer the thinking of entire orgs, all of this just rewards attackers and incentivises further attacks. 

I know that some people are going to see this and immediately salivate over the opportunity to use this to cast spells to try to steer rival orgs off of a cliff, but that’s not how it works in this particular situation. You can only continue giving hackers more information and power over the entire space, or you can try to stop the bleeding. This is easily provable:

Drone armies

It can create and mass produce the perfect corporate employee, maximizing the proportion of software engineers who are docile, and detecting and drawing a social graph for the employees who aren’t docile, based on unusual social media behavior and possibly sensor data as well. These capabilities have likely already been noticed and pursued. 

This, along with lie detectors that actually work, also represent some of the most promising opportunities for intelligence agencies to minimize snowden risk among their officers and keyboard warriors, totally unhinging intelligence agencies from any possibility of accountability e.g. poisoning and torturing non-ineffective dissidents (ineffective dissidents, in contrast, will likely be embraced due to their prevalence and due to democratic tradition), or people with a known probability of becoming non-ineffective dissidents e.g. ~20%, or, depending on scale, even the relatively small number of people who are too unpredictable for systems to be confident of a low probability of becoming non-ineffective dissidents (e.g. the few people who cannot be established to have a >99% chance of of lifelong docility). 

There is no historical precedent of intelligence agencies existing without these constraints, although there is plenty of historical precedent of widespread utilization of plausible deniability, as well as intelligence officials being optimistic, not pessimistic, about their ability to succeed at getting away with various large-scale crimes.

 

The universe is allowed to do this to you

I think this also demonstrates just how difficult it is to get things right in the real world, in a world that’s been changing as fast as it’s been. Did you have an important conversation, which you shouldn’t have had with a smartphone nearby, while a smartphone was nearby (or possibly even directly touching your body and the biodata it emits)? Your model of reality wasn’t good enough, and now you and everyone associated with you is NGMI. Oops! You also get no feedback because everything feels normal even though it isn’t, and therefore you are likely to continue having conversations near smartphones that you shouldn’t have had near smartphones. Survival of the fittest is one of the most foundational, fundamental, and universal laws of reality for life (and agents, a subcategory of life).

If you encounter someone who surprises you by putting a syringe into one of your veins, it’s important to note that they might not push the plunger and deploy the contents of the syringe! 

Even a psychopath has tons of incentives to not do that, unlike with social media, where influence is carefully sculpted to be undetectable, and where the companies have potentially insurmountable incentives to addict/push the plunger. 

Needlessly to say, the correct decision is to immediately move to protect your brain/endocrine system, whose weak point is your bloodstream. The correct decision is not to turn around and shout down the people behind you saying that you shouldn’t let strangers put syringes in your veins.

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 6:21 PM
[-]gull3mo114

Have you read Cixin Liu's Dark Forest, the sequel to Three Body Problem? The situation on the ground might be several iterations more complicated than you're predicting.

Strong upvoted! That's the way to think about this.

I read Three-body problem, not the rest yet (you've guessed my password, I'll go buy a copy).

My understanding of the situation here on the real, not-fake Earth, is that having the social graph be this visible and manipulable by invisible hackers, does not improve the situation.

I tried clean and quiet solutions and they straight-up did not work at all. Social reality is a mean mother fucker, especially when it is self-reinforcing, so it's not surprising to see somewhat messy solutions become necessary.

I think I was correct to spend several years (since early 2020) trying various clean and quiet solutions, and watching them not work, until I started to get a sense of why they might not be working.

Of course, maybe the later stages of my failures were just one more person falling through the cracks of the post-FTX Malthusian environment, twisting EA and AI safety culture out of shape. This made it difficult for a lot of people to process information about X-risk, even in cases like mine where the price tag was exactly $0.

I could have waited longer and made more tries, but that would have meant sitting quietly through more years of slow takeoff with the situation probably not being fixed.

Could transparency/openness of information be a major factor?

I've noticed that video games become much worse as a result of visibility of data. With wikis, build-in search, automatic markets, and other such things, metas (as in meta-gaming) start to form quickly. The optimal strategies become rather easy to find, and people start exploiting them as a matter of course.

Another example is dating. Compare modern dating apps to the 1980s. Dating used to be much less algorithmic, you didn't run people through a red-flag checklist, you just spent time with them and evaluated how enjoyable that was.

I think the closed-information trait is extremely valuable as it can actually defeat Moloch. Or more accurately, the world seems to be descending into an unfavorable nash's equilibrium as a result of optimal strategies being visible.

By the way, the closed-information vs open-information duality can be compared to ribbonfarm's Warrens vs. Plazas view of social spaces (not sure if you know about that article)

[-]gull3mo12

So you read Three Body Problem but not Dark Forest. Now that I think about it, that actually goes quite a long way to put the rest into context. I'm going to go read about conflict/mistake theory and see if I can get into a better headspace to make sense of this.

[-]gjm3mo94

"Regression to the mean" is clearly an important notion in this post, what with being in the title and all, but you never actually say what you mean by it. Clearly not the statistical phenomenon of that name, as such.

(My commenting only on this should not be taken to imply that I find the rest of the post reasonable; I think it's grossly over-alarmist and like many of Trevor's posts treats wild speculation about the capabilities and intentions of intelligence agencies etc. as if it were established fact. But I don't think it likely that arguing about that will be productive.)

Yes, it means inducing conformity. It means making people more similar to the average person while they are in the controlled environment. 

That is currently the best way to improve data quality when you are analyzing something as complicated as a person. Even if you somehow were able to get secure copies of all the sensor data from all the sensors from all the smartphones, with the current technology level you're still better off controlling for variables wherever possible, including within the user's mind. 

For example, the trance state people go into when they use social media. Theoretically, you get more information from smarter people when they are thoughtful, but with modern systems it's best to keep their thoughts simple so you can compare their behavior to the simpler people that make up the vast majority of the data (and make them lose track of time until 1-2 hours pass, around when the system makes them feel like leaving the platform, which is obviously a trivial task).

EDIT: this was a pretty helpful thing to point out, I replaced every instance the word "regression to the mean" with "mediocrity" and "inducing mediocrity".

[-]gjm3mo20

Let us suppose that social media apps and sites are, as you imply, in the business of trying to build sophisticated models of their users' mental structures. (I am not convinced they are -- I think what they're after is much simpler -- but I could be wrong, they might be doing that in the future even if not now, and I'm happy to stipulate it for the moment.)

If so, I suggest that they're not doing that just in order to predict what the users will do while they're in the app / on the site. They want to be able to tell advertisers "_this_ user is likely to end up buying your product", or (in a more paranoid version of things) to be able to tell intelligence agencies "_this_ user is likely to engage in terrorism in the next six months".

So inducing "mediocrity" is of limited value if they can only make their users more mediocre while they are in the app / on the site. In fact, it may be actively counterproductive. If you want to observe someone while they're on TikTok and use those observations to predict what they will do when they're not on TikTok, then putting them into an atypical-for-them mental state that makes them less different from other people while on TikTok seems like the exact opposite of what you want to do.

I don't know of any good reason to think it at all likely that social media apps/sites have the ability to render people substantially more "mediocre" permanently, so as to make their actions when not in the app / on the site more predictable.

If the above is correct, then perhaps we should expect social media apps and sites to be actively trying not to induce mediocrity in their users.

Of course it might not be correct. I don't actually know what changes in users' mental states are most helpful to social media providers' attempts to model said users, in terms of maximizing profit or whatever other things they actually care about. Are you claiming that you do? Because this seems like a difficult and subtle question involving highly nontrivial questions of psychology, of what can actually be done by social media apps and sites, of the details of their goals, etc., and I see no reason for either of us to be confident that you know those things. And yet you are happy to declare with what seems like utter confidence that of course social media apps and sites will be trying to induce mediocrity in order to make users more predictable. How do you know?

Yes, this is a sensible response; have you seen Tristan Harris's Social Dilemma documentary? It's a great introduction to some of the core concepts but not everything. 

Modelling user's behavior is not possible with normal data science or for normal firms with normal data security, but is something that very large and semi-sovereign firms like the Big 5 tech companies would have a hard time not doing given such large and diverse sample sizes. Modelling of minds, sufficient to predict people based on other people, is far less deep and is largely a side effect of comparing people to other people with sufficiently large sample sizes. The dynamic is described in this passage I've cited previously.

Generally, inducing mediocrity while on the site is a high priority, but it's mainly about numbness and suppressing higher thought e.g. those referenced in Critch's takeaways on CFAR and the sequences. They want the reactions to content to emerge from your true self, but they don't want any of the other stuff that comes from higher thinking or self awareness.

You're correct that an extremely atypical mental state on the platform would damage the data (I notice this makes me puzzled about "doomscrolling"); however, what they're aiming for is a typical state for all users (plus whatever keeps them akratic while off the platform), and for elite groups like the AI safety community, the typical state for the average user is quite a downgrade.

Advertising was big last decade, but with modern systems, stable growth is a priority, and maximizing ad purchases would harm users in a visible way, so finding the sweet spot is easy if you just don't put much effort into ad matching (plus noticing that the advertising is predictive creeps users out, same issue as making people use for 3-4 hours a day). Acquiring and retaining large numbers of users is far harder and far more important, now that systems are advanced enough to compete more against each other (less predictable) than against the user's free time (more predictable, especially now that there has been so much user data collected during scandals, but all kinds of things could still happen). 

On the intelligence agency side, the big players are probably more interested in public sentiment about Ukraine, NATO, elections/democracy, covid etc by now, rather than causing and preventing domestic terrorism (I might be wrong about that though).

Happy to talk or debate further tomorrow.

[-]gjm3mo61

Once again you are making a ton of confident statements and offering no actual evidence. "is a high priority", "they want", "they don't want", "what they're aiming for is", etc. So far as I can see you don't in fact know any of this, and I don't think you should state things as fact that you don't have solid evidence for.

[-]trevor2mo1-1

They want data. They strongly prefer data on elites (and useful/relevant for analyzing and understanding elite behavior) over data on commoners. 

We are not commoners.

These aren't controversial statements, and if they are, they shouldn't be.

Whenever someone uses "they," I get nervous.