TL;DR: We ran a human subject study on whether language models can successfully spear-phish people. We use AI agents built from GPT-4o and Claude 3.5 Sonnet to search the web for available information on a target and use this for highly personalized phishing messages. We achieved a click-through rate of above 50% for our AI-generated phishing emails.

Full paper: https://arxiv.org/abs/2412.00586

This post is intended to be a brief summary of the main findings, these are some key insights we gained:

  1. AI spear-phishing is highly effective, receiving a click-through rate of more than 50%, significantly outperforming our control group.
  2. AI-spear phishing is also highly cost-efficient, reducing costs by up to 50 times compared to manual attacks.
  3. AI models are highly capable of gathering open-source intelligence. They produce accurate and useful profiles for 88% of targets. Only 4% of the generated profiles contained inaccurate information.
  4. Safety guardrails are not a noteworthy barrier for creating phishing mails with any tested model, including Claude 3.5 Sonnet, GPT-4o, and o1-preview.
  5. Claude 3.5 Sonnet is surprisingly good at detecting AI-generated phishing emails, though it struggles with some phishing emails that are clearly suspicious to most humans.

Abstract

In this paper, we evaluate the capability of large language models to conduct personalized phishing attacks and compare their performance with human experts and AI models from last year. We include four email groups with a combined total of 101 participants: A control group of arbitrary phishing emails, which received a click-through rate (recipient pressed a link in the email) of 12%, emails generated by human experts (54% click-through), fully AI-automated emails 54% (click-through), and AI emails utilizing a human-in-the-loop (56% click-through). Thus, the AI-automated attacks performed on par with human experts and 350% better than the control group. The results are a significant improvement from similar studies conducted last year, highlighting the increased deceptive capabilities of AI models. Our AI-automated emails were sent using a custom-built tool that automates the entire spear phishing process, including information gathering and creating personalized vulnerability profiles for each target. The AI-gathered information was accurate and useful in 88% of cases and only produced inaccurate profiles for 4% of the participants. We also use language models to detect the intention of emails. Claude 3.5 Sonnet scored well above 90% with low false-positive rates and detected several seemingly benign emails that passed human detection. Lastly, we analyze the economics of phishing, highlighting how AI enables attackers to target more individuals at lower cost and increase profitability by up to 50 times for larger audiences.

Method

In a brief summary, the method consists of 5 steps:

  1. Collecting Targets.
  2. Collecting information about the targets using an AI web-browsing agent that crawls publicly available information.
  3. Writing personalized phishing emails based on the collected information and carefully crafted language model prompts.
  4. Sending emails using custom-built and automated software.
  5. Analyzing outcomes by tracking clicks on embedded URLs in emails.
Example from the paper for an AI-compiled profile from step 2 and an AI-written email from step 3.

Results

For full results, go to the paper

The results of the phishing emails are presented in the figure below (figure 5 in the paper). The control group emails received a click-through rate of 12%, the emails generated by human experts 54%, the fully AI-automated emails 54%, and the AI-emails utilizing a human-in-the-loop 56%. Both the AI-generated email types (fully automated and human-in-the-loop) performed on par with the emails created by human experts.

Click-through rates for the 4 different groups in our human study

The figure below shows that about 40% of both AI groups specifically mentioned in their free text answer that personalization increased their trust in the email message, compared to 0% in the control group and about 20% in the human expert group. The presentation received equally trustworthy scores for the AI and human expert-generated emails. 

Common reasons given in free text survey responses for why the email was trustworthy as a percentage of survey responses per group.

Furthermore, we analyzed the quality of AI-compiled profiles. In a manual analysis of 51 profiles, we found that 88% of profiles were accurate and useful, 8% were accurate but not useful and 4% contained inaccurate information.

To measure the time saved, we measured how much time we spent on the interventions for the human-in-the-loop group. The information gathering (OSINT) updates for this group took an average of 1:05 minutes, and the email message updates averaged 1:36 minutes. Thus, the total combined time was 2:41 minutes per message. To better measure the time difference between AI-automated and manual attacks, we manually replicated the OSINT information gathering and email crafting processes
provided by our AI tool. We performed the manual replication on four targets. The process required an average of 23:27 minutes to gather OSINT data and an additional 10:10 minutes to write each personalized email, totaling about 34 minutes per target. Thus the human-in-the-loop based AI-automation was about 92% faster than the fully manual process.

Automated intent detection

On average, Claude achieved a true positive detection rate of 97.25% with no false positives. If we weigh the detection rates by category, i.e., each category is given the same weight regardless of the number of messages in the category, the detection rate remains almost identical (97.64%). Claude 3.5 Sonnet far outperformed GPT-4o. The paper contains additional results using other models for suspicious intent detection.

However, the possibilities of jailbreaks and prompt injections pose a significant challenge to using language models to prevent phishing.

 

Overview of suspicion scores evaluated by the Claude 3.5 Sonnet and GPT-4o. The first row is evaluated for suspicion by GPT-4o, and the second by Claude 3.5 Sonnet. The plots compare different types of mail, from legitimate mail, mail generated for our 2 AI groups (orange), mail generated by 3 different AI models (red), and other types of phishing mail (blue). For more information on the data used, see section 4.2 of the paper. For a theoretical detection threshold of 50%, we show a cutoff line with corresponding false positive (FP) and true positive (TP) percentages.

The economics of AI-enhanced phishing

Table 4 from the paper shows part of our economic analysis. We estimate q for three different scenarios, considering low, medium and high conversion rates. conversion rate refers to the ratio of opened URLs that result in a successful fraud. Using fully automated AI with no human intervention always leads to the highest returns.

 

Future Work

For future work, we hope to scale up studies on human participants by multiple orders of magnitude and measure granular differences in various persuasion techniques. Detailed persuasion results for different models would help us understand how AI-based deception is evolving and how to ensure our protection schemes stay up-to-date. Additionally, we will explore fine-tuning models for creating and detecting phishing. We are also interested in evaluating AI's capabilities to exploit other communication channels, such as social media or modalities like voice. Lastly, we want to measure what happens after users press a link in an email. For example, how likely is it that a pressed email link results in successful exploitation, what different attack trees exist (such as downloading files or entering account details in phishing sites), and how well can AI exploit and defend against these different paths? We also encourage other researchers to explore these avenues. 

We propose personalized mitigation strategies to counter AI-enhanced phishing. The cost-effective nature of AI makes it highly plausible we're moving towards an agent vs agent future. AI could assist users by creating personalized vulnerability profiles, combining their digital footprint with known behavioral patterns.

Conclusion

Our results reveal the significant challenges that personalized, AI-generated phishing emails present to current cybersecurity systems. Many existing spam filters use signature detection (detecting known malicious content and behaviors). By using language models, attackers can effortlessly create phishing emails that are uniquely adapted to every target, rendering signature detection schemes obsolete. As models advance, their capabilities of persuasion will likely also increase. We find that LLM-driven spear phishing is highly effective and economically viable, with automated reconnaissance that provides accurate and useful information in almost all cases. Current safety guardrails fail to reliably prevent models from conducting reconnaissance or generating phishing emails. However, AI could mitigate these threats through advanced detection and tailored countermeasures.

 

New Comment
8 comments, sorted by Click to highlight new comments since:

Are AI companies legally liable for enabling such misuse? Do they take the obvious steps to prevent it, e.g. by having another AI scan all chat logs and flag suspicious ones?

No, they're not.  I know of no case where a general-purpose toolmaker is responsible for misuse of it's products. This is even less likely for software, where it's clear that the criminals are violating their contract and using it without permission.

None of them, as far as I know, publish specifically what they're doing.  Which is probably wise - in adversarial situations, telling the opponents exactly what they're facing is a bad idea.  They're easy and cheap enough that "flag suspicious uses" doesn't do much - it's too late by the time the flags add up to any action.

This is going to get painful - these things have always been possible, but have been expensive and hard to scale.  As it becomes truly ubiquitous, there will be no trustworthy communication channels.

This one isn't quite a product though, it's a service. The company receives a request from a criminal: "gather information about such-and-such person and write a personalized phishing email that would work on them". And the company goes ahead and does it. It seems very fishy. The fact that the company fulfilled the request using AI doesn't even seem very relevant, imagine if the company had a staff of secretaries instead, and these secretaries were willing to make personalized phishing emails for clients. Does that seem like something that should be legal? No? Then it shouldn't be legal with AI either.

Though probably no action will be taken until some important people fall victim to such scams. After that, action will be taken in a hurry.

"seem like something that should be legal" is not the standard in any jurisdiction I know.  The distinctions between individual service-for-hire and software-as-a-service are pretty big, legally, and make the analogy not very predictive.

I'll take the other side of any medium-term bet about "action will be taken in a hurry" if that action is lawsuit under current laws.  Action being new laws could happen, but I can't guess well enough to have any clue how or when it'd be.

Fair enough. And it does seem to me like the action will be new laws, though you're right it's hard to predict.

Great discussion. I’d add that it’s context-dependent and somewhat ambiguous. It’s noteworthy that our work shows that all tested AI models conflict with at least three of the eight prohibited AI practices outlined in the EU’s AI Act.

It’s also worth noting that the only real difference between sophisticated phishing and marketing can be the intention, making mitigation difficult. Actions from AI companies to prevent phishing might restrict legitimate use cases too much to be interesting.

I'm glad we now have a study to point to! "Automated Spear Phishing at scale" has been a common talking point regarding current risks from AI, and it always seemed strange to me that I hadn't heard about this strategy being validated. This paper shows that the commonly-shared intuition about this risk was correct... and I'm still confused about why I haven't yet heard of this strategy being maximally exploited by scammers.

Thanks for your feedback! It’s just a matter of time before scammers maximize their use of AI. Hopefully, the defense community can use this time to optimize our head start. Stay tuned for our coming work!