https://copilot.github.com/

https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/

GitHub Copilot is powered by OpenAI Codex, a new AI system created by OpenAI. OpenAI Codex has broad knowledge of how people use code and is significantly more capable than GPT-3 in code generation, in part, because it was trained on a data set that includes a much larger concentration of public source code.

Will Copilot or similar systems become ubiquitous in the next few years? Will they increase the speed of software development or AI research? Will they change the skills necessary for software development?

Is this the first big commercial application of the techniques that produced GPT-3?

For anyone who's used Copilot, what was your experience like?

New Answer
New Comment


7 Answers sorted by

This will probably make the already-bad computer security/infosec situation significantly worse. In general, people pasting snippets they don't understand is bad news; but at least in the case of StackOverflow, there's voting and comments, which can catch the most egregious stuff. On the other hand, consider the headline snippet on the Copilot's landing page:

// Determine whether the sentiment of text is positive
// Use a web service
async function isPositive(text: string): Promise<boolean> {
  const response = await fetch(`http://text-processing.com/api/sentiment/`, {
    method: "POST",
    body: `text=${text}`,
    headers: {
      "Content-Type": "application/x-www-form-urlencoded",
    },
  });
  const json = await response.json();
  return json.label === "pos";
}

Everything below the function prototype was AI generated. And, this code has TWO security vulnerabilities! First, it's using an http URL, rather than https. And second, if the input string has newlines, it can put values into fields other than the ones intended. In this specific case, with the text-processing.com sentiment analysis API demo, there's not much to do with that (the only documented field available other than text is language), but if this is representative, then we are probably in for some bad times.

(Disclaimer: I work at OpenAI, and I worked on the models/research behind copilot.  You should probably model me as a biased party)

This will probably make the already-bad computer security/infosec situation significantly worse.

I'll take the other side to that bet (the null hypothesis), provided the "significantly" unpacks to something reasonable.  I'll possibly even pay to hire the contractors to run the experiment.

I think a lot of people make a lot of claims about new tech that will have a significant impact that end up falling flat.  A new browser will revolutionize this or that; a new website programming library will make apps significantly easier, etc etc.

I think a good case in point is TypeScript.  JavaScript is the most common language on the internet.  TypeScript adds strong typing (and all sorts of other strong guarantees) and has been around for a while.  However I would not say that TypeScript has significantly impacted the security/infosec situation.

I think my prediction is that Copilot does not significantly affect the computer security/infosec situation.

It's worth separating out that this line of research -- in particular training large la... (read more)

Are you saying that (i) few people will use copilot, or (ii) many people will use copilot but it will have little effect on their outputs or (iii) many people will use copilot and it will boost their productivity a lot but will have little effect on infosec? Your examples sound more like supporting i or ii than supporting iii, but maybe I'm misinterpreting.

I think all of those points are evidence that updates me in the direction of the null hypothesis, but I don't think any of them is true to the exclusion of the others.

I think a moderate amount of people will use copilot.  Cost, privacy, and internet connection will factor to limit this.

I think copilot will have a moderate affect on users outputs.  I think it's the best new programming tool I've used in the past year, but I'm not sure I'd trade it for, e.g. interactive debugging (reference example of a very useful programming tool)

I think copilot will have no significant differential effect on infosec, at least at first.  The same way I think the null hypothesis should be a language model produces average language, I think the null hypothesis is a code model produces average code (average here meaning it doesn't improve or worsen the infosec situation that jim is pointing to).

In general these lead me to putting a lot of weight on 'no significant impact' in aggregate, though I think it is difficult for anything to have a significant impact on the state of computer security.

(Some examples come to mind: Snowden leaks (almost definitely), Let'sEncrypt (maybe), HTTPSEverywhere (maybe), Domain Authentication (maybe)) 

Summary of the debate

1. jim originally said that copilot produces code with vulnerability, which, if used extensively, could generate loads of vulnerabilities, giving more opportunities for exploits overall. jim mentions it worsening "significantly" infosec

2. alex responds that given that the model tries to produce the code it was trained on, it will (by def.) produce average level code (with average level of vulnerability), so it won't change the situation "significantly" as the % of vulnerabilities per line of code produced (in the world) won't change much

3. vanessa asks if the absence of change from copilot results from a) lack of use b) lack of change in speed/vulnerability code production from using (ie. used as some fun help but without strong influence on the safety on the code as people would still be rigorous) c) change in speed/productivity, but not in the % of vulnerability

4. alex answers that indeed it makes users more productive and it helps him a lot, but that doesn't affect overall infosec in terms of % of vulnerability (same argument as 2). He nuances his claim a bit saying that a) it would moderatly affect outputs b) some stuff like cost will limit how much it affe... (read more)

3Taran
This part I think is not quite right.  The counterfactual jim gives for Copilot isn't manual programming, it's StackOverflow.  The argument is then: right now StackOverflow has better methods for promoting secure code than Copilot does, so Copilot will make the security situation worse insofar as it displaces SO.
5Taran
This is my prediction too, but there are two strands to the argument that I think are worth teasing apart: First, how many people will use Copilot?  The base rate for infosec impact of innovations is very low, because most innovations are taken up slowly or not at all.  Typescript is typical: most people who could use Typescript use Javascript instead (see for example the TIOBE rankings), so even if Typescript prevents all security problems it can't impact the overall security situation much.  Garbage collection is another classic example: it was in production systems in the late 60s, but didn't become mainstream until the 90s with the rise of Java and Perl.  There was a span of 20+ years where GC didn't much affect the use-after-free landscape, even though GC prevents 100% of use-after-free bugs. (counterpoint: StackOverflow was also an innovation, it was taken up right away, and Copilot is more like StackOverflow than it is like a traditional technical innovation.  I don't really buy this because Copilot seems it'll be much harder to get started with even once it's out of beta) Second, are users of Copilot more or less likely to write security bugs?  Here my prediction points the other way: Copilot does generate security bugs, and users are unusually unlikely to catch them because they'll tend to use it in domains they're unfamiliar with.  Somewhat more weakly I think it'll be worse than the counterfactual where they don't have Copilot and have to use something else, for the reasons jimrandomh lists. I'm curious whether you see the breakdown the same way, and if so, how you see the impact of Copilot conditional on its being widely adopted.

(I can't speak to any details of Copilot or Codex and don't know much about computer security; this is me speaking as an outside alignment researcher.)

A first pass of improving the situation would be to fine-tune the model for quality, with particular attention to security (both using higher-quality demonstrations and eventually RL fine-tuning). This is an interesting domain from an alignment perspective because (i) I think it's reasonable to aim at narrowly superhuman performance even with existing models, (ii) security is a domain where you care a lot about rare failures and could aim for error rates close to 0 for many kinds of severe failures.

My gut feeling is that the security vulnerabilities that Copilot introduces here are fairly commonly done by your average JS programmer without Copilot.

  1. Non-https is the URL that text-processing.com tells you to use in its examples. And as johnfuller notes, text-processing.com has a bad https certificate.
  2. I encounter the incorrect usage of body constantly.

I'm not exactly sure what this means for your prediction.

On the one hand, it's wrong code.  On the other hand, there's a lot of unknowns right now. 

  • Will people who would've otherwise written the r
... (read more)

I do think that the quality of code output by ML systems will increase much more rapidly than the quality of code output by human engineers. It's just not that hard to do things like generating millions of completions and fine-tuning the model to avoid all the common security problems, whereas the analogous education project with human engineers is quite challenging.

If you go to the site, which the URL points to, you'll see that the docs give you the address which was used in your example. I tried sending a request to https and it works, but the certificate is expired. My guess is that this site hasn't been getting much attention from the administrators. 

 I don't know how the service determines what code to write. If you were telling me what you wanted to do, you might also need to tell me what sort of validation you would need. Are you placing this text in a database? Are you directly returning the text to ... (read more)

One of the major, surprising consequences of this is that it's likely to become infeasible to develop software in secret. If AI tools like this confer a large advantage, and the tools are exclusively cloud hosted, then maintaining the confidentiality of a code base will mean forgoing these tools, and if the impact of these tools is large enough, then forgoing them may not be feasible.

On the plus side, this will put malware developers at something of a disadvantage. It might increase the feasibility of preventing people from running dangerous AI experiments (which sure seem like they're getting closer by the day). On the minus side, this creates an additional path to internet takeover, and increases centralization of power in general.

surely private installations of the facility will be sold to trade-secret-protecting teams

I wonder how much closed-source software is hosted on non-public GH repos? GH Enterprise exists, and seems widely-used.

One of the major, surprising consequences of this is that it's likely to become infeasible to develop software in secret. 


Or maybe developers will find good ways of hacking value out of cloud hosted AI systems while keeping their actual code base secret.  e.g. maybe you have a parallel code base that is developed in public, and a way of translating code-gen outputs there into something useful for your actual secret code base.

 

It entirely depends on how good it is.

I gave Kite a real go for a month and kept hoping it would get better.  It didn't and I stopped using it because it hindered more than it helped.

 

(Also, I bet the folks at Kite are not happy right now!)

My prediction is that the main impact is to make it easier for people to throw together quick MVPs and prototypes. It might also make it easier for people to jump into new languages or frameworks.

I predict it won't impact mainstream corporate programming much. The dirty secret of most tech companies is that programmers don't actually spend that much time programming. If I only spend 5 hours per week writing code, cutting that time down to 4 hours while potentially reducing code quality isn't a trade anyone will really want to make.

I expect its impact will mostly be in helping junior devs get around the "blank page" problem quickly. I don't see it having much impact beyond that, not with GPT-3 being the backend

If Copilot improves programming significantly-enough it might be a huge blow to makers of other IDE's and text editors unless they provide an API for others to use. I don't expect re-implementing Copilot-esque prediction is in the wheelhouse of places like Jetbrains or any of the open source editors. 

According to the FAQ they're focusing on VS Code which comes from the same parent company as Github (Microsoft).

I've been using GitHub's official Copilot plugin for PyCharm for a couple of weeks. Nice to see that they reached outside of their corporate parent to make this.

The plugin is a bit buggy, but very usable.  Copilot is a good assistant.

I expect that it will help with the tedious parts of programming that require you to know details about specific libraries. It will make those parts of programming that are "google -> stackoverflow -> find correct answer -> copy-paste into code" much fater, but I expect it to struggle on truly novel stuff, on scientific programming and anything research-level. If this thing can auto-complete a new finite-element PDE solver in a way that isn't useless, I will be both impressed and scared.

3 comments, sorted by Click to highlight new comments since:

I don't really understand why code generation is the focus. I would expect that code review is an area where automated process that aren't fully reliable would generate more value. 

Agreed, but code generation is a more natural fit for a GPT-style language model. GPT-3 and Codex use massive training sets; I would guess that the corpus of human code reviews is not nearly so big.

I would think that code generation has a much greater appeal to people / is more likely to go viral than code review tools. The latter surely is useful and I'm certain it will be added relatively soon to github/gitlab/bitbucket etc., but if OpenAI wanted to start out building more hype about their product in the world, then generating code makes more sense (similar to how art generating AIs are everywhere now, but very few people would care about art critique AIs).