This is the first post I've seen on here about ChatGPT plugins. I'm trying to wrap my head around the implications and would like to see more discussion about it.
Maybe it's a way for OpenAI to build a moat?
New post from Zvi on ChatGPT Plugins: https://www.lesswrong.com/posts/DcfPmgBk63cajiiqs/gpt-4-plugs-in
The repo https://github.com/greshake/llm-security proposes that there are big security issues with LLMs having a plugins API:
We demonstrate potentially brutal consequences of giving LLMs like ChatGPT interfaces to other applications. We propose newly enabled attack vectors and techniques and provide demonstrations of each in this repository:
- Remote control of chat LLMs
- Leaking/exfiltrating user data
- Persistent compromise across sessions
- Spread injections to other LLMs
- Compromising LLMs with tiny multi-stage payloads
- Automated Social Engineering
- Targeting code completion engines
Based on our findings:
- Prompt injections can be as powerful as arbitrary code execution
- Indirect prompt injections are a new, much more powerful way of delivering injections.
Jesus fucking Christ. The ones they already gave it access to include Wolfram Alpha (so it can now do math), executing code (?!?), accessing internet search data, accessing your email and Todo list and drive, interacting with financial services (Klarna), ordering products, interpreting and generating images... This is absolutely crazy. It doesn't even have to get more sentient, malicious or agentic for this to be insanely risky. It still hallucinates 15 % of the time. I have seen no stats for moral misjudgements or destabilising, but would guess similar ballpark. What it could fuck up due to sheer incompetence alone. They are making it too integrated to shut down if risks become impossible to deny (on purpose?). The security vulnerabilities for private data and finances are bananas. The existential risk is... I don't even. I'm in shock. Like, boxing an AI was already a risky proposal due to it escaping or being let out. But just dumping it out of the box from the start? What the flying fuck? I figured we would fuck up, and that setting things up so a malicious and clever AI could not beat us would likely fail, but just handing everything over on a platter? I'm one of the most optimistic and hopeful people in this community, and with one of the friendliest and most appreciative outlooks on ChatGPT and OpenAI. But this is insane. And keep in mind... It only needs permission, that is it. People have given it shoddy plugin code, and chatGPT fucking debugged it with them. Like, wasn't given a working version, but updated the version it was given to make it with. It codes reasonably well, and can read the internet, and conceal plugin use from the user. This is grotesquely overpowered.
This is going to be a short post. OpenAI recently announced ChatGPT Plugins which will allow any developer to connect ChatGPT to external APIs and potentially real-world actions. Even if GPT-4 is not yet capable of destroying civilization, we are entering a world where LLMs are heavily integrated into everything we build, and take actions on their own. Any outage will be economically unacceptable, and there is no way back. It could take only the next few iterations of the language model, and the right feedback loop through some combination of plugins, to yield a catastrophe-capable agent.
I hope us rationalists still have an emotional part, so let me appeal to it. Do not give up. This post isn't meant to discourage. I don't know what we are going to do in this new world, but I'm ready for the fight.