Palisade is hiring Research Engineers

Charlie Rogers-Smith; Jeffrey Ladish

Palisade is looking to hire Research Engineers. We are a small team consisting of Jeffrey Ladish (Executive Director), Charlie Rogers-Smith (Chief of Staff), and Kyle Scott (part-time Treasurer & Operations). In joining Palisade, you would be a founding member of the team, and would have substantial influence over our strategic direction. Applications are rolling, and you can fill out our short (~10-20 minutes) application form here.

Palisade’s mission

We research dangerous AI capabilities to better understand misuse risks from current systems, and how advances in hacking, deception, and persuasion will affect the risk of catastrophic AI outcomes. We create concrete demonstrations of dangerous capabilities to advise policy makers and the public on AI risks.

We are working closely with government agencies, policy think tanks, and media organizations to inform relevant decision makers. For example, our BadLlama work demonstrated that it is possible to effectively undo Llama 2-Chat 70B’s safety fine-tuning for less than $200, and has been used to confront Mark Zuckerberg in the first of Chuck Schumer’s Insight Forums, cited by Senator Hassan in a senate hearing on threats to national security, and used to advise the UK AI Safety Institute.

While our BadLlama work focused on validating a well-known argument—that you can fine-tune away safety—we plan to research emerging dangerous capabilities in both open source and API-gated models, in the following areas:

Automated hacking. Current AI systems can already automate parts of the cyber kill chain. We’ve demonstrated that GPT-4 can leverage known vulnerabilities to achieve remote code execution on unpatched Windows 7 machines. We plan to explore how AI systems could conduct reconnaissance, compromise target systems, and use information from compromised systems to pivot laterally through corporate networks or carry out social engineering attacks.
Spear phishing and deception. Preliminary research suggests that LLMs can be effectively used to phish targets. We’re currently exploring how well AI systems can scrape personal information and leverage it to craft scalable spear-phishing campaigns. We also plan to study how well conversational AI systems could build rapport with targets to convince them to reveal information or take actions contrary to their interests.
Scalable disinformation. Researchers have begun to explore how LLMs can be used to create targeted disinformation campaigns at scale. We’ve demonstrated to policymakers how a combination of text, voice, and image generation models can be used to create a fake reputation-smearing campaign against a target journalist. We plan to study the cost, scalability, and effectiveness of AI-disinformation systems.

We are looking for

People who excel at:

Working with language models. We’re looking for somebody who is or could quickly become very skilled at working with frontier language models. This includes supervised fine-tuning, using reward models/functions (RLHF/RLAIF), building scaffolding (e.g. in the style of AutoGPT), and prompt engineering / jailbreaking.
Software engineering. Alongside working with LMs, much of the work you do will benefit from a strong foundation in software engineering—such as when designing APIs, working with training data, or doing front-end development. Moreover, strong SWE experience will help getting up to speed with working with LMs, hacking, or new areas we want to pivot to.
Technical communication. By writing papers, blog posts, and internal documents; and by speaking with the team and external collaborators about your research.

While it’s advantageous to excel at all three of these skills, we will strongly consider people who are either great at working with language models or at software engineering, while being able to communicate their work well.

Competencies that are nice to have:

Hacking. One of our focus areas is developing offensive hacking capabilities using frontier models. Hacking experience is a big bonus, and we are willing to create a hacking-specific role for a suitable candidate.
Public communication. Enthusiasm for making your work understandable and compelling to think tanks, policy makers, journalists, and the wider public.

Salary

We’re offering between 150,000 - 250,000 USD, depending on your skill and experience, plus healthcare.

Location and visas

This role is in-person at our Berkeley office, and we’d expect you to work from our office at least 25% of the time. We can sponsor US visas, but there may be a delay of a couple of months while we’re setting up infrastructure.

Our application process

Applications for the two research engineer positions are rolling. Here’s our process:

Application form (~10-20 minutes) (link)
Technical pre-screen (via CodeSignal’s Industry Coding Framework)
Work test (paid, ~2-8h)
Interview (culture)
Work trial (Ideally, 1-3 weeks). We realize that this is a big ask, and we’re open to adapting this depending on you and your needs.

Please reach out to charlie@palisaderesearch.org if you have any questions. We look forward to hearing from you :)

23