It seems like if capabilities are escalating like that, it's important to know how long ago it started. I don't think the order-of-magnitude-every-4-years would last (compute bottleneck maybe?), but I see what you're getting at, with the loss of hope for agency and stable groups happening on a function that potentially went bad a while ago.
Having forecasts about state-backed internet influence during the Arab Spring and other post-2008 conflicts seems like it would be important for estimating how long ago the government interest started, since that was close to the Deep Learning revolution. Does anyone have good numbers for these?
I agree with all of this, except 4 years is a lot of time lately and the empirical record from here to 2015 (in 2015 I don't think there was much capabilities at all aside from bots drowning out voices and things like clown attacks, Yann Lecun wrote a great post on how these systems are hard to engineer in practice) suggests that the rate will continue without big increases in investment, especially since compute production alone can generates a large portion of that OOM every 4 years, and a fixed proportion of compute will be going into psych research, and there's also stuff like edge computing/5G.
I don't have good models on the Arab Spring other than that authoritarian states face pretty strong incentives to blame all kinds of domestic problems on foreign influence ops from the West. It's a pretty bad equilibria since influence ops actually do come out of the West.
Sorry, but you're overthinking what's required. Simply being able to reliably use existing techniques is more than enough to hack the minds of large groups of people, no complex new research needed.
Here is a concrete example.
First, if you want someone's attention, just make them feel listened to. ELIZA could already successfully do this in the 1970s, ChatGPT is better. The result is what therapists call transference, and causes the person to wish to please the AI.
Now the AI can use the same basic toolkit mastered by demagogues throughout history. Use simple and repetitive language to hit emotional buttons over and over again. Try to get followers to form a social group. Switch positions every so often. Those that pay insufficient attention will have the painful experience of being attacked by their friends, and it forces everyone to pay more attention.
All of this is known and effective. What AI brings is that it can use individualized techniques, at scale, to suck people into many target groups. And once they are in those groups, it can use the demagogue's techniques to erase differences and get them aligned into ever bigger groups.
The result is that, as Sam Altman predicted, LLMs will prove superhumanly persuasive. They can beat the demagogues at their own game by seeding the mass persuasion techniques by individual attention at scale.
Do you think that this isn't going to happen? Social media accidentally did a lot of this at scale. Now it is just a question of weaponizing something like TikTok.
Yes, like the example of "Clown Attacks" isn't at all novel or limited to AI, it's old stuff. And it's not even true that you can't be resistant to them, though these days going actively against peer pressure in these things isn't very fashionable. That said, the biggest risk of LLMs right now is indeed IMO how well they can enact certain forms of propaganda and sentiment analysis en masse. No longer can I say "the government wouldn't literally read ALL your emails to figure out what you think, you're not worth the work it would take": now it might, because the cost has dramatically dropped.
That said, the biggest risk of LLMs right now is indeed IMO how well they can enact certain forms of propaganda and sentiment analysis en masse.
I agree with the contents of this comment in general, but not the idea that propaganda generation is the greatest risk. Lots of people know about that already, and I'd argue that the risk to democracy via persuading the masses isn't very tractable, whereas the risk to the AI safety community via manipulating elites in random ways with automated high-level psychology research, is very tractable (minimize sensor exposure).
My point wasn't that it would be a very new capability in general, but it could be deployed at a scale and cost impossible before. Armies of extremely smart and believable bots flooding social media all around. The "huh, everyone else except me thinks X, maybe they do have a point/my own belief is hopeless" gregariousness effect is real, and has often been used already, but this allows bad actors to take it to a whole new level. This could also be deployed as you say against AI safety itself, but not exclusively.
I think that this is pretty appropriately steel manned, given that there are tons of people here who do ML daily way better than I ever could in my best moments, but they're clueless about the industry/economics/natsec side and that's a quick fix on my part. If I'm underestimating the severity of the situation then they'd know.
I think that this is more like zero days in the human brain; like performing a magic trick that prevents an entire room full of string theorists from thinking about something that's important to them e.g. lab leak hypothesis. Making people talk about specific concepts in their sleep. Lie detectors that only need a voice recording from a distance. Seeing which concepts scare people and which concepts don't based on small changes in their heart rate. etc.
My intuition says that engineering problems like these are hard and all sorts of problems crop up, not just Goodhart's law. Yann Lecun wrote a great post on this:
1. Building something that works in the real world is harder than what most armchair AI safety folks think.
2. there is a natural tendency to exaggerate the potential risks of your own work, because it makes you feel powerful and important.
This seems pretty in-line with some pretty fundamental engineering principles e.g. spaghetti code systems. I think that a big part of it is that, unlike in China, western bigtech companies and intelligence agencies are strangled from small workforces of psychology researchers/data labellers/hypothesis generators due to Snowden risk. Although discovering the human internal causality interpretability dynamic was a big update in the other direction, it being really easy to get away with doing a ton of stuff to people with very little data and older AI systems.
I disagree that people who do ML daily would be in a good position to judge the risks here. The key issue is not the capabilities of AI, but rather the level of vulnerability of the brain. Since they don't study that, they can't judge it.
It is like how scientists proved to be terrible at unmasking charlatans like Uri Geller. Nature doesn't actively try to fool us, charlatans do. The people with actual relevant expertise were people who studied how people can be fooled. Which meant magicians like James Randi. Similarly, to judge this risk, I think you should look at how dictators, cult leaders, and MLM companies operate.
A century ago Benito Mussolini figured out how to use mass media to control the minds of a mass audience. He used this to generate a mass following, and become dictator of Italy.. The same vulnerabilities exploited the same way have become a staple for demagogues and would-be dictators ever since. But human brains haven't been updated. And so Donald Trump has managed to use the same basic rootkit to amass about 70 million devoted followers. As we near the end of 2023, he still has a chance of successfully overthrowing our democracy if he can avoid jail.
Your thinking about zero days is a demonstration of how thinking in terms of computers can mislead you. What matters for an attack is the availability of vulnerable potential victims. In computers there is a correlation between novelty and availability. Before anyone knows about a vulnerability, everyone is available for your attack. Then it is discovered, a patch is created, and availability goes down as people update. But humans don't simply upgrade to brain 2.1.8 to fix the vulnerabilities found in brain 2.1.7. People can be brainwashed today by the same techniques that the CIA was studying when they funded the Reverend Sun Moon back in the 1960s.
You do make an excellent point about the difficulty of building something that can work at scale in the real world. Which is why I focused my scenario on techniques that have worked, repeatedly, at scale. We know that they can work, because they have worked. We see it in operation whenever we study the propaganda techniques used by dictators like Putin.
Given these examples, the question stops being an abstract, "Can AI find vulnerabilities by which we can be exploited?" It then switches to, "Is AI capable of executing effectrive variants on the strategies that dictators, cult leaders and MLM founders already have shown works at scale against human minds?"
I think that the answer is a pretty clear yes. Properly directed, ChatGPT should be more than capable of doing this. We then have the hallmark of a promising technology, we know that nothing fundamentally new is required. It is just a question of execution.
My thinking about this (and other people like Tristan Harris who can't think about superintelligence) is that the big difference is that persuasion, as a science, is getting amplified by orders of magnitude greater than the 20th century.
As a result, the AI safety community is at risk of getting blindsided by manipulation strategies that we're vulnerable to because we don't recognize them.
I don't imagine CFAR's founders as being particularly vulnerable to clown attacks, for example, but they also would also fail to notice clown attacks being repeatedly tested against them; so it stands to reason that today's AI would be able to locate something that would both work on them AND prevent them from noticing, if it had enough social media scrolling data to find novel strategies based on results.
I'm less interested in the mass psychology stuff from the 2020s because a lot of that was meant to target elites who influenced more people downstream, and elites are now harder to fool than in the 20th century; and also, if democracy dies, then it dies, and it's up to us to not die with it. One of the big issues with AI targeting people based on bayes-predicted genes is that it can find one-shot strategies, including selecting 20th century tactics with the best odds of success.
This is why I think that psychology is critical, especially for interpreting causal data, but also we shouldn't expect things to be too similar to the 20th century because it's a new dimension, and the 21st century is OOD anyway (OOD is similar to the butterfly effect, changes cause a cascade of other changes).
With all due respect, I see no evidence that elites are harder to fool now than they were in the past. For concrete examples, look at the ones who flipped to Trump over several years. The Corruption of Lindsey Graham gives an especially clear portrayal about how one elite went from condemning Trump to becoming a die-hard supporter.
I dislike a lot about Mr. Graham. But there is no question that he was smart and well aware of how authoritarians gain power. He saw the risk posed by Trump very clearly. However he knew himself to be smart, and thought he could ride the tiger. Instead, his mind got eaten.
Moving on, I believe that you are underestimating the mass psychology stuff. Remember, I'm suggesting it as a floor to what could already be done. New capabilities and discoveries allow us to do more. But what should already be possible is scary enough.
However that is a big topic. I went into it in AI as Super-Demagogue which you will hopefully find interesting.
I think that it's generally really hard to get a good sense of what's going on when it comes to politicians, because so much of what they do is intended to make a persona believable and disguise the fact that most of the policymaking happens elsewhere.
Moving on, I believe that you are underestimating the mass psychology stuff. Remember, I'm suggesting it as a floor to what could already be done. New capabilities and discoveries allow us to do more. But what should already be possible is scary enough.
That's right, the whole point of the situation is that everything I've suggested is largely just a floor for what's possible, and in order to know the current state of the limitations, you need to have the actual data sets + watch the innovation as it happens. Hence why the minimum precautions are so important.
I went into it in AI as Super-Demagogue which you will hopefully find interesting.
I'll read this tomorrow or the day after, this research area has tons of low-hanging fruit and few people looking into it.
Hi Trevor! I appreciate this thread of related ideas that you have been developing about intelligence agencies, AI-augmented persuasion techniques, social media, etc.
That said, I think this post is too conspiratorial in assuming that some combination of social media companies / national governments understand how to actually deploy effective persuasion techniques in a puppetmaster-like way which is way beyond everyone else. I think that the current situation is more like "we are living in an anarchic world influenced by an out-of-control memetic marketplace of ideas being influenced by many different actors of varying levels of sophistication, none of whom have amazing next-level gameboard-flipping dominance". Some scattered thoughts on this theme:
Anyways, I guess my overall point is that it just doesn't seem true that the CIA, or Facebook, or China, or anyone else, currently has access to amazing next-gen persuasion tech. So IMO you are thinking about this in the wrong way, with too much of a conspiratorial / Tom Clancy vibe. But the reason I wrote such a long comment is because I think you should keep exploring these general topics, since I agree with you about most of the other assumptions you are making!
Thanks for putting the work into this attempt to falsify my model! I read all of it, but I will put work into keeping my response concise and helpful.
If the CIA (or other entities affiliated with the US government, including tech companies being pressured by the government) is so good at persuasion ops, why are there so many political movements that seem to go against the CIA's interests?
If google & facebook & etc are so good at manipulating me, why do their efforts at influence often still seem so clumsy?
There is a pretty low bar for running SGD on secure user data, and social media news feed data/algorithms/combinations of posts, in order to see what causes people to stay versus leave. That means that their systems, by default, throw outrageously clumsy failed influence attempts at you, because those tend to cause users to feel safe and return; when in reality they are not safe and shouldn't return. Meanwhile, influencing people in ways that they notice will make them leave and get RLHF'd out quickly. This is the most probable outcome, they probably couldn't prevent algorithms from doing this if they tried due to goodharts law.
Where is the evidence of super-persuasion techniques being used by other countries, or in geopolitical situations?
The Ukraine War and Lab Leak Hypothesis are both things that I would have predicted the US to commit substantial capabilities to, including SOTA thought/behavior prediction/steering. It's hard to predict what they will do in the future because they are often trying to fool or outmaneuver foreign intelligence agencies and high-IQ elites with much better information than me; the US might suddenly decide to fully commit to accusing China of originally leaking COVID, or powerful US elites might rally against proxy wars and the trend towards a second Cold War. If pro-terrorism sentiment became popular in the US, then that could just be the Russians succeeding, not the NSA failing. This is why I focus on the math of how human internal interpretability naturally translates into industry change and cyberattacks on the brain. I would love to get better at this.
The main issue here is that history is being written and that the AI safety community needs to avoid the fate of the CCC. I'm arguing that the math and the state of the industry indicates that human brains are extremely predisposed to being hacked, that governments would totally do this, and that the AI safety community is unusually likely to be targeted.
Have there been any significant leaks that indicate the USA is focused on persuasion tech and has seen significant successes with it?
Snowden was in 2013 back when modern ML was new, and if Moscow wanted to leak this tech to the world, they would succeed, and we wouldn't be talking about it early on Lesswrong. But it's basic math, and the issue is that people aren't panicking, or they panic but eventually go back to using social media and then they feel fine (SGD does this automatically in order to retain users). Tristan Harris is worried about this and "leaks" the fundamental math constantly, possibly helped design/push standards that made the tech less outrageously harmful e.g. using social media 5 hours a day.
IMO you are thinking about this in the wrong way, with too much of a conspiratorial / Tom Clancy vibe.
The problem here is that most books and documentaries about intelligence agencies are entertainment products that need to optimize for intrigue or thrill in order to reduce the risk of failing to recoup the production budget. Meanwhile, most people with experience at intelligence agencies worked at the low-competence low-trust bottom-70% where the turnover is much higher. Taking a step back, intelligence agencies seem to be Bureaucracies that Conquer, and should be better at intimidating politicians and normal-bureaucrats than politician and normal-bureaucrats are at intimidating them.
- Social media in particular seems like a powerful lever to influence culture (see Slate Star Codex & Revolt of the Public)
If they had any sense, they'd try to kill moloch by building dath ilan. They probably won't, because they've probably been turned deeply nihilistic by fighting so many information wars. Most people get turned into moloch worshippers from far less e.g. business school.
- Totally separately from all the above stuff about persuasion, the shadowier parts of governments (military & intelligence-agency bureaucracies) seem very important to think about when we are trying to think ahead about the future of AI technology and human civilization.
I'm thinking not so much, if they weren't massively investing in intensely powerful brain hacking technology and burying everyone underneath sensors, then there wouldn't be much intersection at all. They could focus on what they're good at, stopping China, and we could focus on what we're good at, saving the world from unaligned AI. Sadly, the happy-go-lucky outcome where people can hack your brain, but don't, doesn't seem likely under the current global affairs.
(this comment is kind of a "i didn't have time to write you a short letter so I wrote you a long one" situation)
re: Infowar between great powers -- the view that China+Russia+USA invest a lot of efforts into infowar, but mostly "defensively" / mostly trying to shape domestic opinion, makes sense. (After all, it must be easier to control the domestic media/information lansdscape!) I would tend to expect that doing domestically-focused infowar stuff at a massive scale would be harder for the USA to pull off (wouldn't it be leaked? wouldn't it be illegal somehow, or at least something that public opinion would consider a huge scandal?), but on the other hand I'd expect the USA to have superior infowar technology (subtler, more effective, etc). And logically it might also be harder to percieve effects of USA infowar techniques, since I live in the USA, immersed in its culture.
Still, my overall view is that, although the great powers certainly expend substantial effort trying to shape culture, and have some success, they don't appear to have any next-gen technology qualitatively different and superior to the rhetorical techniques deployed by ordinary successful politicians like Trump, social movements like EA or wokeism, advertising / PR agencies, media companies like the New York Times, etc. (In the way that, eg, engineering marvels like the SR-72 Blackbird were generations ahead of competitors' capabilities.) So I think the overall cultural landscape is mostly anarchic -- lots of different powers are trying to exert their own influence and none of them can really control or predict cultural changes in detail.
re: Social media companies' RL algorithms are powerful but also "they probably couldn't prevent algorithms from doing this if they tried due to goodharts law". -- Yeah, I guess my take on this is that the overt attempts at propaganda (aimed at placating the NYT) seem very weak and clumsy. Meanwhile the underlying RL techniques seem potentially powerful, but poorly understood or not very steerable, since social media companies seem to be mostly optimizing for engagement (and not even always succeeding at that; here we are talking on LessWrong instead of tweeting / tiktoking), rather than deploying clever infowar superweapons. If they have such power, why couldn't left-leaning sillicon valley prevent the election of Trump using subtle social-media-RL trickery?
(Although I admit that the reaction to the 2016 election could certainly be interpreted as sillicon valley suddenly realizing, "Holy shit, we should definitely try to develop social media infowar superweapons so we can maybe prevent this NEXT TIME." But then the 2020 election was very close -- not what I'd have expected if info-superweapons were working well!)
With Twitter in particular, we've had such a transparent look at its operations during the handover to Elon Musk, and it just seems like both sides of that transaction have been pretty amateurish and lacked any kind of deep understanding of how to influence culture. The whole fight seems to have been about where to tug one giant lever called "how harshly do we moderate the tweets of leftists vs rightists". This lever is indeed influential on twitter culture, and thus culture generally -- but the level of sophistication here just seems pathetic.
Tiktok is maybe the one case where I'd be sympathetic to the idea that maybe a lot of what appears to be random insane trends/beliefs fueled by SGD algorithms and internet social dynamics, is actually the result of fairly fine-grained cultural influence by Chinese interests. I don't think Tiktok is very world-changing right now (as we'd expect, it's targeting the craziest and lowest-IQ people first), but it's at least kinda world-changing, and maybe it's the first warning sign of what will soon be a much bigger threat? (I don't know much about the details of Tiktok the company, or the culture of its users, so it's hard for me to judge how much fine-grained control China might or might not be exerting.)
Unrelated -- I love the kind of sci-fi concept of "people panic but eventually go back to using social media and then they feel fine (SGD does this automatically in order to retain users)". But of course I think that the vast majority of users are in the "aren't panicking" / never-think-about-this-at-all category, and there are so few people in the "panic" category (panic specifically over subtle persuasion manipulation tech that isn't just trying to maximize engagement but instead achieve some specific ideological outcome, I mean) that there would be no impact on the social-media algorithms. I think it is plausible that other effects like "try not to look SO clickbaity that users recognize the addictiveness and leave" do probably show up in algorithms via SGD.
More random thoughts about infowar campaigns that the USA might historically have wanted to infowar about:
"Human brains are extremely predisposed to being hacked, governments would totally do this, and the AI safety community is unusually likely to be targeted."
-- yup, fully agree that the AI safety community faces a lot of peril navigating the whims of culture and trying to win battles in a bunch of diverse high-stakes environments (influencing superpower governments, huge corporations, etc) where they are up against a variety of elite actors with some very strong motivations. And that there is peril both in the difficulty of navigating the "conventional" human-persuasion-transformed social landscape of today's world (already super-complex and difficult) and the potentially AI-persuasion-transformed world of tomorrow. I would note though, that these battles will (mostly?) play out in pretty elite spaces, wheras I'd expect the power of AI information superweapons to have the most powerful impact on the mass public. So, I'd expect to have at least some warning in the form of seeing the world go crazy (in a way that seems different from and greater than today's anarchic internet-social-dynamics-driven craziness), before I myself went crazy. (Unless there is an AI-infowar-superweapon-specific hard-takeoff where we suddenly get very powerful persuasion tech but still don't get the full ASI singularity??)
re: Dath Ilan -- this really deserves a whole separate comment, but basically I am also a big fan of the concept of Dath Ilan, and I would love to hear your thoughts on how you would go about trying to "build Dath Ilan" IRL.
"In times of change, learners inherit the earth, while the learned find themselves beautifully equipped to deal with a world that no longer exists"
—Eric Hoffer
Summary:
We're already in the timeline where the research and manipulation of the human thought process is widespread; SOTA psychological research systems require massive amounts of human behavior data, which in turn requires massive numbers of unsuspecting test subjects (users) in order to automate the process of analyzing and exploiting human targets. This therefore must happen covertly, and both the US and China have a strong track record of doing things like this. This outcome is a strong attractor state since anyone with enough data can do it, and it naturally follows that powerful organizations would deny others access e.g. via data poisoning. Most people are already being persuaded that this is harmless, even though it is obviously ludicrously dangerous. Therefore, we are probably already in a hazardously transformative world and must take standard precautions immediately.
This should not distract people from AI safety. This is valuable because the AI safety community must survive. This problem connects to the AI safety community in the following way:
State survival and war power ==> already depends on information warfare capabilities.
Information warfare capabilities ==> already depends on SOTA psychological research systems.
SOTA psychological research systems ==> already improves and scales mainly from AI capabilities research, diminishing returns on everything else.[1]
AI capabilities research ==> already under siege from the AI safety community.
Therefore, the reason why this might be such a big concern is:
State survival and war power ==> their toes potentially already being stepped on by the AI safety community?
Although it's also important to note that people with access to SOTA psychological research systems are probably super good at intimidation and bluffing, it's also the case that the AI safety community needs to get a better handle on the situation if we are in the bad timeline; and the math indicates that we are already well past that point.
The Fundamental Problem
If there were intelligent aliens, made of bundles of tentacles or crystals or plants that think incredibly slowly, their minds would also have discoverable exploits/zero days, because any mind that evolved naturally would probably be like the human brain, a kludge of spaghetti code that is operating outside of its intended environment.
They would probably not even begin to scratch the surface of finding and labeling those exploits, until, like human civilization today, they began surrounding thousands or millions of their kind with sensors that could record behavior several hours a day and find webs of correlations.
In the case of humans, the use of social media as a controlled environment for automated AI-powered experimentation appears to be what created that critical mass of human behavior data. Current 2020s capabilities for psychological research and manipulation vastly exceed the 20th century academic psychology paradigm.
The 20th century academic psychology paradigm still dominates our cultural impression of what it means to research the human mind; but when the effectiveness of psychological research and manipulation starts increasing by an order of magnitude every 4 years, it becomes time to stop mentally living in a world that was stabilized by the fact that manipulation attempts generally failed.
The capabilities of social media to steer human outcomes are not advancing in isolation, they are parallel to a broad acceleration in the understanding and exploitation of the human mind, which itself is a byproduct of accelerating AI capabilities research.
By comparing people to other people and predicting traits and future behavior, multi-armed bandit algorithms can predict whether a specific research experiment or manipulation strategy is worth the risk of undertaking at all in the first place; resulting in large numbers of success cases a low detection rate (as detection would likely yield a highly measurable response, particularly with substantial sensor exposure).
When you have sample sizes of billions of hours of human behavior data and sensor data, millisecond differences in reactions from different kinds of people (e.g. facial microexpressions, millisecond differences at scrolling past posts covering different concepts, heart rate changes after covering different concepts, eyetracking differences after eyes passing over specific concepts, touchscreen data, etc) transform from being imperceptible noise to becoming the foundation of webs of correlations mapping the human mind.
Unfortunately, the movement of scrolling past a piece of information on a social media news feed with a mouse wheel or touch screen will generate at least one curve since the finger accelerates and decelerates each time. Trillions of those curves are outputted each day by billions of people. These curves are linear algebra, the perfect shape to plug into ML.
Social media’s individualized targeting uses deep learning and massive samples of human behavioral data to procedurally generate an experience that fits human mind like a glove, in ways we don't fully understand, but allow hackers incredible leeway to find ways to optimize for steering people’s thinking or behavior in measurable directions, insofar as those directions are measurable. This is something that AI can easily automate. I originally thought that going deeper into human thought reading/interpretability was impossible, but I was wrong; the human race is already well past that point as well, due to causality inference.
Most people see social media influence as something that happens to people lower on the food chain, but this is no longer true; there is at least one technique, Clown Attacks, that generally works on all humans who aren't already aware of it, regardless of intelligence or commitment to truthseeking; and that particular technique became discoverable with systems vastly weaker than the ones that exist today. I don't know what manipulation strategies the current systems can find, but I can predict with great confidence that they're well above the clown attack.
They would notice and pursue these capabilities
If the big 5 tech companies (Facebook, Google, Microsoft, Amazon, and Apple) notice ways to steer people towards buying specific products, or to feel a wide variety of compulsions to avoid quitting the platform, and to prevent/counteract other platforms from running multi-armed bandit algorithms to automatically exploit strategies (e.g. combinations of posts) to plunder their users, then you can naturally assume that they've noticed their capabilities to steer people in a wide variety of other directions as well.
The problem is that major governments and militaries are overwhelmingly incentivized and well-positioned to exploit those capabilities for offensive and defensive information warfare; if American companies abstain from manipulation capabilities, and Chinese companies naturally don't, then American intelligence agencies worry about a gap in information warfare capabilities and push the issue.
Government/military agencies are bottlenecked by competence, which is difficult to measure due to a lack of transparency at higher levels and high employee turnover at lower levels, but revolving-door employment easily allows them to source flexible talent from the talent pools of the big 5 tech companies; this practice is endangered by information warfare itself, further driving interest in information warfare superiority.
Access to tech company talent pools also determine the capability of intelligence agencies to use OS exploits and chip firmware exploits needed to access the sensors in the devices of almost any American, not just the majority of Americans that leave various sensor permissions on, which allows even greater access to the psychological research needed to compromise critical elites such as the AI safety community.
The capabilities of these systems to run deep bayesian analysis on targets dramatically improves with more sensor data, particularly video feed on the face and eyes; this is not necessary to run interpretability on the human brain, but it increases these capabilities by yet more orders of magnitude (particularly for lie detection and conversation topic aversion).
This can't go on
There isn’t much point in having a utility function in the first place if hackers can change it at any time. There might be parts that are resistant to change, but it’s easy to overestimate yourself on this; especially if the effectiveness of SOTA influence systems have been increasing by an order of magnitude every 4 years. Your brain has internal conflicts, and they have human internal causality interpretability systems. You get constant access to the surface, but they get occasional access to the deep.
The multi-armed bandit algorithm will keep trying until it finds something that works. The human brain is a kludge of spaghetti code, so there’s probably something somewhere.
The human brain has exploits, and the capability and cost of social media platforms to use massive amounts of human behavior data to find complex social engineering techniques is a profoundly technical matter, you can’t get a handle on this with intuition or pre-2010s historical precedent.
Thus, you should assume that your utility function and values are at risk of being hacked at an unknown time, and should therefore be assigned a discount rate to account for the risk over the course of several years.
Slow takeoff over the course of the next 10 years alone guarantees that this discount rate is too high, in reality, for people in the AI safety community to continue to go on believing that it is something like zero.
I think that approaching zero is a reasonable target, but not with the current state of affairs where people don’t even bother to cover up their webcams, have important and sensitive conversations about the fate of the earth in rooms with smartphones, sleep in the same room or even the same bed as a smartphone, and use social media for nearly an hour a day (scrolling past nearly a thousand posts).
The discount rate in this environment cannot be considered “reasonably” close to zero if the attack surface is this massive; and the world is changing this quickly. We are gradually becoming intractably compromised by slow takeoff before the party even starts.
If people have anything they value at all, and the AI safety community almost certainly does have that, then the current AI safety paradigm of zero effort is wildly inappropriate, it is basically total submission to invisible hackers. The human brain has low-hanging exploits and the structures of power have already been set in motion, so we must take the standard precautions immediately.
The AI safety community must survive the storm.
e.g. Facebook could hire more psychologists to label data and correlations, but has greater risk of one of them breaking their NDA and trying to leak Facebook's manipulation capabilities to the press like Snowden and Fishback. Stronger AI means more ways to make users label their own data.