This will probably make the already-bad computer security/infosec situation significantly worse. In general, people pasting snippets they don't understand is bad news; but at least in the case of StackOverflow, there's voting and comments, which can catch the most egregious stuff. On the other hand, consider the headline snippet on the Copilot's landing page:
// Determine whether the sentiment of text is positive
// Use a web service
async function isPositive(text: string): Promise<boolean> {
const response = await fetch(`http://text-processing.com/api/sentiment/`, {
method: "POST",
body: `text=${text}`,
headers: {
"Content-Type": "application/x-www-form-urlencoded",
},
});
const json = await response.json();
return json.label === "pos";
}
Everything below the function prototype was AI generated. And, this code has TWO security vulnerabilities! First, it's using an http URL, rather than https. And second, if the input string has newlines, it can put values into fields other than the ones intended. In this specific case, with the text-processing.com sentiment analysis API demo, there's not much to do with that (the only documented field available other than text
is language
), but if this is representative, then we are probably in for some bad times.
(Disclaimer: I work at OpenAI, and I worked on the models/research behind copilot. You should probably model me as a biased party)
This will probably make the already-bad computer security/infosec situation significantly worse.
I'll take the other side to that bet (the null hypothesis), provided the "significantly" unpacks to something reasonable. I'll possibly even pay to hire the contractors to run the experiment.
I think a lot of people make a lot of claims about new tech that will have a significant impact that end up falling flat. A new browser will revolutionize this or that; a new website programming library will make apps significantly easier, etc etc.
I think a good case in point is TypeScript. JavaScript is the most common language on the internet. TypeScript adds strong typing (and all sorts of other strong guarantees) and has been around for a while. However I would not say that TypeScript has significantly impacted the security/infosec situation.
I think my prediction is that Copilot does not significantly affect the computer security/infosec situation.
It's worth separating out that this line of research -- in particular training large la...
Are you saying that (i) few people will use copilot, or (ii) many people will use copilot but it will have little effect on their outputs or (iii) many people will use copilot and it will boost their productivity a lot but will have little effect on infosec? Your examples sound more like supporting i or ii than supporting iii, but maybe I'm misinterpreting.
I think all of those points are evidence that updates me in the direction of the null hypothesis, but I don't think any of them is true to the exclusion of the others.
I think a moderate amount of people will use copilot. Cost, privacy, and internet connection will factor to limit this.
I think copilot will have a moderate affect on users outputs. I think it's the best new programming tool I've used in the past year, but I'm not sure I'd trade it for, e.g. interactive debugging (reference example of a very useful programming tool)
I think copilot will have no significant differential effect on infosec, at least at first. The same way I think the null hypothesis should be a language model produces average language, I think the null hypothesis is a code model produces average code (average here meaning it doesn't improve or worsen the infosec situation that jim is pointing to).
In general these lead me to putting a lot of weight on 'no significant impact' in aggregate, though I think it is difficult for anything to have a significant impact on the state of computer security.
(Some examples come to mind: Snowden leaks (almost definitely), Let'sEncrypt (maybe), HTTPSEverywhere (maybe), Domain Authentication (maybe))
Summary of the debate
1. jim originally said that copilot produces code with vulnerability, which, if used extensively, could generate loads of vulnerabilities, giving more opportunities for exploits overall. jim mentions it worsening "significantly" infosec
2. alex responds that given that the model tries to produce the code it was trained on, it will (by def.) produce average level code (with average level of vulnerability), so it won't change the situation "significantly" as the % of vulnerabilities per line of code produced (in the world) won't change much
3. vanessa asks if the absence of change from copilot results from a) lack of use b) lack of change in speed/vulnerability code production from using (ie. used as some fun help but without strong influence on the safety on the code as people would still be rigorous) c) change in speed/productivity, but not in the % of vulnerability
4. alex answers that indeed it makes users more productive and it helps him a lot, but that doesn't affect overall infosec in terms of % of vulnerability (same argument as 2). He nuances his claim a bit saying that a) it would moderatly affect outputs b) some stuff like cost will limit how much it affe...
(I can't speak to any details of Copilot or Codex and don't know much about computer security; this is me speaking as an outside alignment researcher.)
A first pass of improving the situation would be to fine-tune the model for quality, with particular attention to security (both using higher-quality demonstrations and eventually RL fine-tuning). This is an interesting domain from an alignment perspective because (i) I think it's reasonable to aim at narrowly superhuman performance even with existing models, (ii) security is a domain where you care a lot about rare failures and could aim for error rates close to 0 for many kinds of severe failures.
My gut feeling is that the security vulnerabilities that Copilot introduces here are fairly commonly done by your average JS programmer without Copilot.
body
constantly.I'm not exactly sure what this means for your prediction.
On the one hand, it's wrong code. On the other hand, there's a lot of unknowns right now.
I do think that the quality of code output by ML systems will increase much more rapidly than the quality of code output by human engineers. It's just not that hard to do things like generating millions of completions and fine-tuning the model to avoid all the common security problems, whereas the analogous education project with human engineers is quite challenging.
If you go to the site, which the URL points to, you'll see that the docs give you the address which was used in your example. I tried sending a request to https and it works, but the certificate is expired. My guess is that this site hasn't been getting much attention from the administrators.
I don't know how the service determines what code to write. If you were telling me what you wanted to do, you might also need to tell me what sort of validation you would need. Are you placing this text in a database? Are you directly returning the text to ...
One of the major, surprising consequences of this is that it's likely to become infeasible to develop software in secret. If AI tools like this confer a large advantage, and the tools are exclusively cloud hosted, then maintaining the confidentiality of a code base will mean forgoing these tools, and if the impact of these tools is large enough, then forgoing them may not be feasible.
On the plus side, this will put malware developers at something of a disadvantage. It might increase the feasibility of preventing people from running dangerous AI experiments (which sure seem like they're getting closer by the day). On the minus side, this creates an additional path to internet takeover, and increases centralization of power in general.
I wonder how much closed-source software is hosted on non-public GH repos? GH Enterprise exists, and seems widely-used.
One of the major, surprising consequences of this is that it's likely to become infeasible to develop software in secret.
Or maybe developers will find good ways of hacking value out of cloud hosted AI systems while keeping their actual code base secret. e.g. maybe you have a parallel code base that is developed in public, and a way of translating code-gen outputs there into something useful for your actual secret code base.
It entirely depends on how good it is.
I gave Kite a real go for a month and kept hoping it would get better. It didn't and I stopped using it because it hindered more than it helped.
(Also, I bet the folks at Kite are not happy right now!)
My prediction is that the main impact is to make it easier for people to throw together quick MVPs and prototypes. It might also make it easier for people to jump into new languages or frameworks.
I predict it won't impact mainstream corporate programming much. The dirty secret of most tech companies is that programmers don't actually spend that much time programming. If I only spend 5 hours per week writing code, cutting that time down to 4 hours while potentially reducing code quality isn't a trade anyone will really want to make.
I expect its impact will mostly be in helping junior devs get around the "blank page" problem quickly. I don't see it having much impact beyond that, not with GPT-3 being the backend
If Copilot improves programming significantly-enough it might be a huge blow to makers of other IDE's and text editors unless they provide an API for others to use. I don't expect re-implementing Copilot-esque prediction is in the wheelhouse of places like Jetbrains or any of the open source editors.
According to the FAQ they're focusing on VS Code which comes from the same parent company as Github (Microsoft).
I've been using GitHub's official Copilot plugin for PyCharm for a couple of weeks. Nice to see that they reached outside of their corporate parent to make this.
The plugin is a bit buggy, but very usable. Copilot is a good assistant.
I expect that it will help with the tedious parts of programming that require you to know details about specific libraries. It will make those parts of programming that are "google -> stackoverflow -> find correct answer -> copy-paste into code" much fater, but I expect it to struggle on truly novel stuff, on scientific programming and anything research-level. If this thing can auto-complete a new finite-element PDE solver in a way that isn't useless, I will be both impressed and scared.
I don't really understand why code generation is the focus. I would expect that code review is an area where automated process that aren't fully reliable would generate more value.
Agreed, but code generation is a more natural fit for a GPT-style language model. GPT-3 and Codex use massive training sets; I would guess that the corpus of human code reviews is not nearly so big.
I would think that code generation has a much greater appeal to people / is more likely to go viral than code review tools. The latter surely is useful and I'm certain it will be added relatively soon to github/gitlab/bitbucket etc., but if OpenAI wanted to start out building more hype about their product in the world, then generating code makes more sense (similar to how art generating AIs are everywhere now, but very few people would care about art critique AIs).
https://copilot.github.com/
https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/
Will Copilot or similar systems become ubiquitous in the next few years? Will they increase the speed of software development or AI research? Will they change the skills necessary for software development?
Is this the first big commercial application of the techniques that produced GPT-3?
For anyone who's used Copilot, what was your experience like?