I predict at 95% that similar types of automated manipulation strategies as these were deployed by US, Russia, or Chinese companies or agencies to steer people’s thinking on Ukraine War and/or Covid-related topics
Does stuff like the twitter files count? Because that was already confirmed, it's at 100%.
Also commenting on the same section:
Wouldn't the US government or its agencies do more against Tiktok, if they were sufficiently aware of its possibilities to steer people's thinking?
I haven't really looked into the twitter files, or the right-wing narratives of FBI/Biden suppression of right-wing views (I do know that Musk and the Right are separate and the overlap isn't necessarily his fault, e.g. criticism of the CDC and Ukraine War ended up consigned to the realm of right-wing clowns regardless of the wishes of the critics).
AFAIK the twitter files came nowhere near to confirming the level of manipulation technology that I describe here, mostly focusing on covert informal government operatives de-facto facilitating censorship in plausibly deniable ways. The reason I put a number as extreme as 95% is that weird scenarios during 2020-22 still count, so long as they describe intensely powerful use of AI and statistical analytics for targeted manipulation of humans at around the level of power I described here.
The whole point is that I'm arguing that existing systems are already powerful and dangerous, it's not a far-off future thing or even 4 years away. If it did end up being ONLY the dumb censorship described in the twitter files and the Right, then that would falsify my model.
Why this is valuable
In the face of unclear AGI timelines, thinking about SOTA human behavior manipulation technology is not intrinsically valuable (in fact, all along I’ve repeatedly asserted that it has a serious threat of distracting people from AGI, which will kill everyone, and should not be researched as a new X-risk).
However, it is probably instrumentally valuable for understanding AI governance, geopolitics, and race dynamics, and because the AI safety community is disproportionately at risk of being targeted due to entering an arena with heavy hitters.
One of the main ways that people transmit knowledge and intuitive understanding is through intuitive examples e.g. use cases. Examples of manipulation strategies are critical to develop an intuitive understanding of what's probably already out there; however, although it's easy to predict whether governments are working on autonomous manipulation in general, which is more than sufficient to indicate that the AI safety community should take precautions to minimizing the attack surface; it's still much harder to trace the specific outlines of particular autonomous manipulation strategy that has likely already been discovered and exploited.
The clown attack example was low-hanging fruit for me to discover and write about; its use for autonomous manipulation was both unusually powerful, obvious, and inevitable. Lots of people found that example extremely helpful for getting oriented towards the situation with human thought control. Other examples of automated manipulation are harder to be confident were 1) easy to discover and 2) existing institutions were incentivized to deploy and test; but they will still be helpful for demonstrating what kinds of things are probably out there.
This model is falsifiable; I predict at 95% that similar types of automated manipulation strategies as these were deployed by US, Russia, or Chinese companies or agencies to steer people’s thinking on Ukraine War and/or Covid-related topics. I lose a large amount of bayes points if that didn’t happen e.g. governments weren’t sufficiently aware of these capabilities due to uniform incompetence, or the engineering problems are too difficult e.g. due to spaghetti towers or data poisoning/data security persistently thwarting the large tech firms instead of just the smaller ones). They should have had these capabilities ready by then, and they should have already used them for geopolitical matters that incentivized information warfare.
List of Concrete Examples
1. Measurement-based thought and behavior steering
The current paradigm is already predisposed to deploy intensely optimized manipulation strategies, which are bizarrely insightful in counterintuitive ways, and randomly become superhuman.
For example, this tweet:
This combination of words induces the reader to:
This doesn’t need to be anywhere close to deliberate, nor does this even require AI or 2020s technology. An algorithm from the late 2000s can simply discover that exposure to this post correlated with people more frequently returning to the social media platform on a daily basis, or that it was unusually effective and increasing habitual use among demographics that are normally resistant to habitual use, such as psychology researchers or cybersecurity experts.
Although galaxy-brained manipulation strategies can wind up labeled and quantified, or even as gears in the models of modern ML, superhuman manipulation can also be discovered and exploited by simpler systems that are simply optimizing for whatever maximizes a single metric, such as causing humans to choose to use your platform more.
This can become even more complicated when simple algorithms try to adversarially steal user time from other platforms running their own algorithms, and more complicated still when both sides of the adversarial environment are running modern ML instead of late-2000s algorithms.
Targeting AI safety hypothetical example:
Attackers trying to slow/mitigate the rapid growth of the AI safety community optimize social media algorithms or botnets to minimize people’s motivation to contribute to AI safety in a measurable way. The algorithm optimizing for this has already determined that Ayn Rand-like posts convincing people to not care about others, resentment towards elite community members, or emphasizing the personal benefits of goodharting social status, causes people to reduce engagement in altruistic communities in general. But in the case of AI safety, it is actually manipulating them to become nihilistic, not just less motivated, substantially increasing the prevalence of intelligent bad actors in the AI safety community.
Ukraine hypothetical example:
American intelligence agencies seek to wield their capability to steer public opinion in Western countries in order to contribute to the war effort in Ukraine, by reducing draft dodging among Ukrainian men and increasing morale among Ukrainian soldiers. The algorithm finds that anti-Putin posts/word combinations, rather than pro-Ukrainian posts or anti-Russian posts, are the most effective at reducing draft dodging among Ukrainian men. However, this is interpreted by Russia as the West openly orchestrating regime change in Russia, and retaliates with a mirror strategy.
2. Mapping human internals with causal analysis
Modern ML seems likely to already be able to map the internals of the human mind by creating causal graphs on beliefs and belief formation. According to Zack M Davis’s post on optimized propaganda (involving a set of 3 beliefs/attitudes, A, B, and C):
Social media provides an environment that not only controls for variables, but also subjects people to an extremely wide variety of topics in charismatic or creative/artistic/poetic ways, such as the thriving cultural ecosystem described by Jan Kuviet which harnesses large amounts of creative talent towards integrating intense stimuli into targeted concepts.
By measuring the exact pace that people scroll past each post with their thumb and mouse wheel, and converting that data into linear curves which are optimal for ML, large tech companies and intelligence agencies would by-default have more than enough data to compare and contrast a wide variety of thoughts and reflections from large numbers of different types of people on a wide variety of topics, and acquire a critical mass of correlations sufficient to run causal analysis and predictive analytics on the human thought process.
Psychology research within tech companies and intelligence agencies could even outpace the declining university-based 20th century psychology research paradigm, with orders of magnitude fewer researchers. This is due to vastly superior research and experimentation capabilities offered by the 21st century paradigm, which would accelerating effectiveness and efficiency of hypothesis generation and testing by orders of magnitude.
Ukraine hypothetical example:
Using causal graphs to understand what kinds of social media and foreign news articles predisposes Ukrainian men towards draft compliance vs draft dodging, running predictive analytics based on troop defections in order to research morale and make it more consistent under extreme conditions such as long periods of sleep deprivation/interruption and constant fighting. They also produce more effective and insightful training manuals to distribute to officers throughout the Ukrainian military. American intelligence agencies can also more effectively map the expansion and contraction of antiwar sentiment among economic, political, cultural, and technocratic elites throughout the West (including tracking the downstream effects of Russian influence operations with greater nuance).
Targeting AI safety hypothetical example:
Top Amazon engineers and executives develop a superior understanding of intimidation and bluffing by contracting with consultants at top legal firms, and test/refine the consultants’s theories of the human mind against their quantitative models of the human mind (including causal graphs) made from social media user data from a negotiated exchange of unpoisoned data between Amazon and Facebook. When Anthropic grabs their attention, Amazon executives combine their general-purpose intimidation/bluffing research with analysis of the social media behavior data from Anthropic employees, gaining substantial knowledge needed to form strategies to bluff and intimidate Anthropic’s leadership and rank-and-file into accelerating AI, entirely via in-person verbal conversations due to a superior understanding of human psychology. For example, understanding which aspects and concepts in AI safety are taken more seriously than others, and uncovering sensitive topics by labeling posts that frequently caused unusual social media scrolling behavior. They can gain a gears-level understanding of the community dynamics within AI safety, developing a vastly stronger local understanding of the AI safety community than from general subcultural models like Geeks, Mops, and Sociopaths. As a result, they have many degrees of freedom to find and exploit divisions, and even generate new rifts in the AI safety community.
3. Sensor data
When you have sample sizes of billions of hours of human behavior data and sensor data, millisecond differences in reactions from different kinds of people (e.g. facial microexpressions, millisecond differences at scrolling past posts covering different concepts, heart rate changes after covering different concepts, eyetracking differences after eyes passing over specific concepts, touchscreen data, etc) transform from being imperceptible noise to becoming the foundation of webs of correlations mapping the human mind.
Eyetracking is likely the most valuable user data ML layer for predictive analytics and sentiment analysis and influence technologies in general, since the eyetracking layer is only two sets of coordinates that map to the exact position that each eye is centered on the screen at each millisecond (one for each eye, since millisecond-differences in the movement of each eye might also correlate with valuable information about a person’s thought process). This compact data allows deep learning to “see”, with millisecond-precision, exactly how quickly one’s eyes and mind linger on each word and sentence. Notably, sample sizes of millions of these coordinates might be so intimately related to the human thought process that value of eyetracking data might exceed the value of all other facial muscles combined (facial muscles, the originator of all facial expressions and emotional microexpression, might also be compactly reducible via computer vision as there are fewer than 100 muscles near the face and most of them have a very bad signal to noise ratio, but not nearly as efficiently as eyetracking).
By comparing people to other people and predicting traits and future behavior, multi-armed bandit algorithms can predict whether a specific research experiment or manipulation strategy is worth the risk of undertaking at all in the first place; resulting in large numbers of success cases with a low detection rate (as detection would likely yield a highly measurable response, particularly with substantial sensor exposure). A large part of predictive analytics is finding which people behave similarly to which other people on certain topics, possibly even mapping behavior caused by similar genes e.g. the pineapple pizza-enjoying gene might be physically attached to multiple genes that each cause a specific type of split second emotional reaction under various circumstances, allowing microexpression data to map genes that predict behavior potentially more effectively than sequencing a person’s actual genome itself. The combination of eyetracking data with LLMs, on the other hand, can potentially map knowledge or memories well enough to compare people on that front.
The social media user data-based paradigm is around as optimal for researching, steering, and reinforcing interest/attention, as it is for researching and steering impressions/vibes.
Ukraine hypothetical example:
During the early months while the Ukraine war still could have gone either way, Western intelligence agencies are unwilling to risk conventional hacks to disrupt the lives of antiwar bloggers in the US and UK and Germany, for obvious blowback reasons. However, they are less constrained from hacking the social media news feeds of antiwar bloggers to optimize for making their facial microexpressions become similar to people with similar genes suffering from severe depression or akrasia/motivation problems, dramatically reducing. Furthermore, they find which kinds of comments cause facial microexpressions indicating motivation reduction in that particular antiwar blogger, and increase the collision rate with the types of people who tend to make those kinds of comments, while increasing the collision rate with friendlier and more insightful readers when the antiwar bloggers write on topics unrelated to Ukraine, steering the basic structure of democracy itself away from criticizing the West’s proxy wars without any need for older social media steering tech which has already become controversial (e.g. bots, “shadowbanning”, etc) by efficiently inducing people to choose other paths and make them think it was their own idea.
Targeting AI Safety hypothetical example: Daniel Kokotajlo finds out and writes Persuasion Tools, a stellar post on AI persuasion capabilities and that makes a wide variety of serious pre-paradigmatic efforts at tackling the downstream consequences of AI persuasion capabilities on humanity, and he then somehow forgets about basically all of it by the time he writes What 2026 looks like 10 months later. Lc, author of What an actually pessimistic containment strategy looks like, finds out and cannot be fooled in the same way by any existing manipulation strategy, so the multi-armed bandit algorithm instead persuades him to tank his reputation by posting publicly about dangerous self-harm diets. Adam Long writes a successful post depicting AI Safety and AI Influence as feuding enemy camps, and that's actually an accurate description of what the factional environment ended up being; AI Influence has, in fact, become the ideological enemy camp to AI safety.
As the AI safety community moves up in the AI policy and foreign policy arena, and attracts more and more attention from intelligence agencies around the world, the security vulnerabilities continue to remain unpatched, and the house of cards builds taller and taller.
Conclusion/solutions
The solution for the AI safety community is to minimize the attack surface in the most cost efficient ways. Webcams must be covered up when not in use. Social media is provably unsalvageable as a leisure activity and must be stopped, regardless of the resulting wide variety of compulsions to return to routine use.