Most of my programmer friends believe that Language Models trained on code will not affect their day job anytime soon. In this post, I make the case that 1) code generation is already useful (assuming minimal prompt engineering skills) 2) even if you do not believe in 1), code generation will increase programmers' throughput way sooner than it will fully automate them.
Language Models trained on Code do not bring us closer to Full Code Automation
This misconception comes from thinking linearly instead of exponentially. Language models are good enough at generating code to make the very engineers building such models slightly more productive, for instance when dealing with a new API. In other words, the returns (aka the improvements in the algorithm) from investing more resources in code generation directly helps (with better developer tools) create a better code-generating algorithm.

Code generation does not automate the part of my workday where I think hard
- It still accelerates “glue code” or “API work”—a substantial fraction of large codebases.
- Besides, only a set of privileged engineers get to think about the broad picture every day.
- Plus, hard thinking is mostly required at the start, when designing the architecture.
- And thinking seldom happens in a silo. It instead requires many iterations, through coding.
I asked a model to generate code but it doesn't seem to be able to solve it
More often than not, the issue is not about the model. Try another prompt. (Example)
The output is outdated code from average programmers
Code quality (length, variable naming, taste) is prompt and hyperparameter dependent. Generally, language models use variables from the prompt and you can rename those yourself.
Only developers who repeat the same tasks will be automated so it will not affect me
You might still see gains in productivity in learning how to use a more advanced version.
My job does not involve solving simple coding tests from docstrings
You should be capable of separating your code in smaller functions and write docstrings.
Codex cannot solve my problem since it has only access to a limited training set
Github Copilot stores your data. Supposedly, the same applies to the Codex beta.
Current Language Models still make silly mistakes
If the mistake is silly, then fixing it is trivial.
Anyway, it is error prone so it cannot be used for critical software
It generates less error than I do when writing code for the first time.
I would strongly suggest applying to Github Copilot or OpenAI Codex access to check for yourself, avoiding cherry-picked examples on the internet (in good and in bad). Indeed, if you search online, you might run into outdated reviews, where it turns out that highlighted errors actually work now. If you cannot wait for beta access, I recommend asking a friend for a demo (I'm happy to showcase it to everyone), trying genji python or reading this up-to-date review.
More generally, programmers should seriously consider learning prompt engineering to avoid being left behind, and, I believe, any future forecast about AI progress should include this shorter loop between deep learning models and programmer productivity.
Thinking about it more there's another, more serious restriction, at least for now: Codex can't write code that depends on the rest of your codebase. Consider the following lightly-anonymized code from a real-world codebase I contribute to:
I don't think Codex could write that code from just that function signature and that docstring, because a human couldn't do it: they wouldn't know how to find the names of the observed and marginalized random variables, and they wouldn't know that self.components exists or has to be explicitly converted into a set. And if the human didn't know what random variables are, or what marginalization is, they'd have an even tougher time.
One can imagine a prompt that might elicit this implementation, something like "this is a Python function that returns the elements of set(self.components) that do not appear in self.observed or self.marginalized", but then we are not really writing docstrings anymore, we are writing programs in a novel language with a stochastic compiler.
This should be fixable with larger context windows, I think. If the prompt for a method could include the whole class definition aside from that method, Codex could at least in principle use other class methods and variables in sensible ways. But this will have to wait on the practical realization of O(n) or O(nlogn) context window scaling.
Thanks for natural language stochastic compiler explanation, makes a lot of sense. I broadly get a sense of what you mean by "context window" since people have been mentioning that quite a lot when talking about GPT-3. As for whether it makes sense to write docstrings for trivial things, I think this is only pointing at the Codex demo examples where people write docstrings and get results, but for most of my use cases, and when it gets really interesting, is when it auto-completes 1) while I'm writing 2) when I'm done writing and it guesses the next line ... (read more)