TL;DR It doesn't matter that ChatGPT can generate boilerplate that almost works, and it doesn't matter that, feasibly, a hypothetical model in the (near) future could write code that reliably works and might be a little bit more interesting than boilerplate. The concern that deep learning AIs could replace programmers is based on a misunderstanding of what a programmer's job actually is.


All this talk I’m seeing about AI being close to replacing programmers indicates there’s a significant gap between what people think programming is like and what programming is actually like. I get the sense that most people who don’t work in tech think that programming is like sitting down in front of a computer, saying to yourself, “alrighty, let’s make an app,” and expertly busting out code until you have a fresh app. It’s more like getting onboarded into an organization that has hundreds of thousands of lines of archaic, institutional code, and being tasked with finding and fixing the 1-10 lines that happen to be somehow causing the most urgent bug, and then doing this over and over.

There’s a reason we tend to call it “software development” or “software engineering” instead of “programming.” The actual “programming” (the code composition, the fingers-on-the-keys) is a very small part of the job. Most of the job is code maintenance (which is doing things like fixing bugs) and technology integration (which is doing things like connecting a UI framework to an API that provides data). Yes, there is composition of novel functionality involved, it’s just a comparatively small and easy part of the work. Most software work—and the hardest software work—is maintaining big existing institutional software. And even when you are creating new things, the state of the software universe is such that almost all of the work has already been done for you, and all you have to do is the technology integration: people have already created all the libraries and frameworks and resources that implement basically any fundamental thing you could ever need to do, and there’s no reason to re-implement it, so the code that you write is basically connecting together all of those libraries and frameworks and resources.

I spent several hours last weekend doing exactly this. I had an idea for an app, I knew what API provided the data I needed, and I knew what frontend framework I wanted to use (real-world software expertise has a lot more to do with knowledge of different frameworks and APIs and their use cases than it does with writing slick code), so I set up the project and started working on it. The work was about 80% reading the API’s documentation, 18% configuring my API keys and downloading the example project and things like that, and 2% writing code to hook everything up.

From over here in the Real World of programming, the question of whether AI (I’m talking about all AI short of superhuman artificial general intelligence [in which case, all bets are off], i.e. any feasible deep-learning model.) will replace or even just meaningfully displace programmers does not cause concern. Language model AIs like ChatGPT might be close to pretty good automatic code composition, but it just doesn’t matter.


In the same way that ChatGPT can write banal prose, it can write banal code—what we call “boilerplate” (brainless code that is just sort of necessary to get things to work; boilerplate is often abstracted away into a package so that instead of having to write it you only have to call a function)—that almost works.

You know what else can produce boilerplate that almost works? Stack Overflow, the programming Q&A site universally loved by programmers all over the world. ChatGPT is like a glorified Stack Overflow, and it’s really not even that good, because it’s not so good at identifying issues in code, which Stack Overflow can do for you in minutes. But anyway maybe we’re really close to a glorified Stack Overflow. Maybe in just a few more years an AI will be able to respond to complex problem definitions, identify issues in code, and write code that works. But it still wouldn’t matter.

There’s a commandment you often hear when you’re getting started in programming: “Thou shalt not copy-paste code from Stack Overflow.” It’s a good rule, because copy-pasting code gets you in trouble—if you don’t understand what’s in there, then when something breaks, you won’t be able to fix it. And something will break.

In real-life programming, there’s this enormous emphasis on readability and comprehension. It’s super important that we all write code in such a way that we’re able to understand what it’s doing. Because eventually, someone is going to be that person being tasked with fixing a bug caused by your code (it is even likely to be you), and if they can’t figure out what’s going on with that code, they won’t be able to fix it. Again, real-life programming is much more about code maintenance—there is a lot more work to do on existing software than there are fresh new apps to make—and code maintenance is mostly about reading code. It is said that programming is 80% reading code and 20% writing code, but it would be more accurate to say that programming is 80% reading code, 15% editing code, 5% writing code (and remember that the whole 100% of programming is still only a small part of the software development job).

If you started using ChatGPT or any other AI to write your code, you would be committing the same sin as someone copy-pasting code from Stack Overflow. Because when it comes time to do maintenance, someone will ask: “What’s this code doing?” or “Why did you do it this way?” or “How can we make it do this instead of this?” or “How can we add on another module right here?” and you will only be able to answer: “I don’t know, the AI wrote it.”

But will a future language model be something more than a glorified Stack Overflow? It’s only a matter of time before AIs are able to compose not just boilerplate but really novel code, you say. I say, code-writing AIs might become more like interns at best. Deep learning AIs will never really be able to move beyond boilerplate, because deep learning models are only capable of picking up on recurring patterns and relationships (even if they are very complex patterns and relationships), and the patterns in the code are the boilerplate. The rest of the code is specific to each project. Maybe, someday, there will be an AI sophisticated enough to write things like unit tests—little functions that you write to automatically verify that your real functions produce the expected results. They’re not quite boilerplate, because you do have to understand what the function you’re testing is supposed to do, but they are pretty brainless. I can imagine it’s possible that a deep learning model would be able to analyze a function definition and produce a unit test. But:

The big salaries fund the software maintenance and the tech integration, the work that requires intimacy with the org’s codebase, knowledge of frameworks and resources, and raw Experience with software. It’s the interns that typically work on the code composition tasks (the comparatively easy stuff), like writing unit tests, which is one of the few places where orgs still commonly need more composition. So yes, maybe some future language models will displace software engineering interns, or will write unit tests for us. Hurrah if so. Writing unit tests is incredibly boring.

Or here’s another angle on it: in real-life programming work, when that work does happen to be code composition, the boilerplate is not the bottleneck. If you’re an experienced programmer, these things are not hard, and they are not time consuming. Why would you even bother going to the OpenAI website and typing in a prompt to get the boilerplate (maybe only after some finagling of the prompt to get the output right) to copy into your code editor? Just write it out yourself real quick. This stuff is the easy stuff. The bottleneck is the documentation-reading, the data-interpretation, the code-comprehension. The bottleneck is always the problem at hand, it’s never the boilerplate, it’s never the patterns. The expertise of the programmer isn’t in knowing how to set up a Flask server, or anything else you could get ChatGPT to write, or even in being able to write basic functionality that ChatGPT can’t write—it’s in knowing how to deal with the specificities and intricacies of the problem you’re working on.


Nonetheless, ChatGPT’s code-writing capabilities are impressive. It’s a little shocking that a computer can write almost usable code, (Isn’t this supposed to be the thing that we’re worried about? A computer program writing its own code until it achieves super-human intelligence?) and future language models can only get better. We can at least imagine some future language model that accepts prompts in English and does amazing things—maybe it could analyze the codebase and identify bugs and write fixes that keep existing conventions, maybe it could somehow ingest messy real-world data and figure out how to write an interface on it, maybe it could have built-in knowledge of frameworks and resources, etc. Even if these things are extremely difficult and a long way off, we can at least imagine them, and there is no reason to believe they would be impossible.

Alas, it still doesn’t matter. Even this hypothetical super code-writing AI would not meaningfully displace programmers.

If you’re using a language model to write code, what you’re doing is using English as a programming language. It’s the exact same job, just a different representation. It’s exactly like using ChatGPT to compile English into Python code the way gcc compiles C++ into binary. So you’re choosing English as your programming language over Python, C++, and all the other programming languages. And English is a terrible programming language.

Because that’s another thing that people are confused about when it comes to programming, and this is something that even programmers don’t all recognize. The code isn’t for the computer—it’s for you. It’s for humans. If the coding language was meant for the computer, we would all be writing in pure binary instead of these abstracted and symbolic languages. The Python/C++/whatever-code isn’t some obstacle that we are trying to overcome. The code is the interface that we designed to be able to program the computer. It’s what we need. It’s objective, explicit, unambiguous, (relatively) static, internally consistent, and robust. English has none of these properties—it’s subjective, meaning is often implicit, and ambiguous, it’s always changing, contradictions appear, and its structure does not hold up to analysis.

Consider, for instance, the simple existence of the term “prompt engineering,” which describes the practice of iteratively fine-tuning the prompts you submit to ChatGPT (or whatever) to get your desired output. Prompt engineering is a matter of some identification of what the model responds to, and a lot of guesswork. It’s a matter of treating the natural language of the prompt as a formal language, manipulating symbols which lose their human meaning and intuitive structure like some sort of abstract association game. You’re working with the worst programming language imaginable.

There are properties of programming languages that make it difficult to write code, for instance, the demand that you declare what a variable’s type is, or that you adhere to a very rigid syntax, but it’s exactly these properties (which you are circumventing by using a language model) that make the programming language useful. The creators of C++ didn’t include these properties to be malicious or to make it harder to write code, they did it because it makes a programmer’s job easier. These properties make it possible to know what the program is going to do, or they make errors identifiable earlier on, or they prevent side effects. All of the exasperating demands and specificities of C++ or any other programming language are what make it a good programming language. It’s how we specify large, sophisticated, complicated software such that it’s unlikely to break, it will do exactly what we want it to, and we can come back to it, read it, and figure out what’s going on.

(Programmers reading this know that Python, which is over 30 years old, already constitutes a shift away from more arcane-looking programming languages like C++ into more human-looking text: and it comes with a price. When you work in C++, you’re much more likely to identify errors before you even start the program; with Python, errors are liable to appear at any time during a run, which can make it much harder to identify them and more time consuming to fix them. Some programmers hate Python for this reason. The demands of some projects and domains simply do not allow for a language like Python, which is no doubt easier to learn and write and read than C++, to be used.)

It might seem like it would be convenient to generate code from natural language prompts, or it might even seem like a really good thing—now everyone can program computers! just tell the computer what you want to do, in plain English!—but unfortunately, it’s not. Using a code-generating AI instead of a programming language would simply mean that your job is figuring out how to use natural language to specify software instead of a programming language, and that wouldn’t be an improvement. Trying to specify a piece of software in English would be a proper nightmare. I can say so with confidence because this is the first part of every software project—you do your best to describe it in natural language first so that everyone is on the same page and you have a good idea of what you’re going to do. Inevitably, the natural language specification falls short (and this is a serious understatement). There are all these considerations, technical details, compatibilities, versions, integrations, real-world data, and so many other things you have to worry about. This is the job of a programmer, not so much the programming.

Software is hard. Computers are difficult, finicky, alien things. Programming languages are our most promising source of power over them. I imagine a world where, instead of hiring programmers, managers simply tell AIs what they want in plain English, then pat themselves on the back for saving so much on payroll; now the manager is the programmer, and he’s writing code in English: and I laugh to myself heartily.


Thank you for reading. This is my first time sharing here, so I would love to hear your thoughts. If you liked this, consider paying a visit to Orbis Tertius.

New Comment
8 comments, sorted by Click to highlight new comments since:

I think you're totally spot on about ChatGPT and near term LLMs. The technology is still super far away from anything that could actually replace a programmer because of all of the complexities involved.

Where I think you go wrong is looking at the long term future AIs. As a black box, at work I take in instructions on Slack (text), look at the existing code and documentation (text), and produce merge requests, documentation, and requests for more detailed requirements (text). Nothing there requires some essentially human element - the AI just needs to be good at guessing what requirements the product team and customers want and then asking questions and running tests to further divine how the product should work. If specifying a piece of software in English is a nightmare, then your boss's job is already a nightmare, since that's what they do. The key is that they can give a specification, answer questions about the specification, and review implementations of that specification along the way, and those are all things that an AI could do.

I'm already an intelligence that takes in English specifications and produces code, and there's no fundamental reason that my intelligence can't be replaced by an artificial one. 

Thanks for writing this up! I have been in software development for longer than most people here have been around, and you are absolutely right, over the last several decades the majority of the work shifted from writing new code to figuring out how to best connect the pieces of the puzzle that is a hodgepodge of APIs, idiosyncratic implementations, poorly documented messages and undocumented gotchas. There is usually a million ways to accomplish the same task, too, all different in terms of performance, scalability, costs and applicability, and knowing which to pick is what differentiates a senior SWE from a beginner.

That said, I expect the whole paradigm to shift in the short order. Even now you can tell ChatGPT something like 

Evaluate this pseudocode: 
BinaryTreeInPreorderForm = {1 2 4 5 3} 
BinaryTreeInPostOrderForm = PreorderToPostOrder(BinaryTreeInPreorderForm) 
print BinaryTreeInPostOrderForm

and it will generate a working python code, and then execute it!

You can also ask it to generate unit tests for the PreorderToPostOrder() function, and it will do a passable job. You can further ask it to add a unit test for a specific test case, and it will do that, too. You can even go deeper and ask it to figure out what test cases might be missing, and then add them. You can request a performance estimate in the big O notation. It will often make mistakes and hallucinate answers, but it is very close to being at the level of a junior SWE at this point, and in many regards much better. Also, much cheaper. It can automate a lot of mundane work for you, and find answers to questions that would be not easy or time-consuming to look up online. It also excels at documenting your design and code, something human programmers suck at and hate doing. It can also look for common pitfalls and test them.

What I am getting at is, while LLMs is unlikely to replace a senior SWE... at least in 2023, they will eat away at the bottom end, interns and junior programmers. The team leads can instruct the LLM the same way they instruct their team, and increasingly more usable and reliable applications will be popping up. At some point soon, the whole idea of a "high-level programming language" will go the way of "Assembler". You will talk to LLM in English, it will do the rest. One can call it "prompt hacking", but it is really a new high-level language that is much more human-friendly.

In your example:

The work was about 80% reading the API’s documentation, 18% configuring my API keys and downloading the example project and things like that, and 2% writing code to hook everything up.

most of the work is actually mundane and automatable, such as extracting information from the API docs, following the standard protocol for API key configuration and creating the glue between APIs. The real work is the remaining 1% that is instructing your LLMule to do the heavy lifting.

Using a code-generating AI instead of a programming language would simply mean that your job is figuring out how to use natural language to specify software instead of a programming language, and that wouldn’t be an improvement.

Well, my contention is that it would be a vast improvement.

Basically, the state of the field I expect to see is that the repositories would not consist of C/python/Java code, but of LLM instructions. Moreover, the LLMs can read these instructions and optimize them, too! Not yet, probably not this year, but soon enough. 

Software is hard. Computers are difficult, finicky, alien things. Programming languages are our most promising source of power over them. I imagine a world where, instead of hiring programmers, managers simply tell AIs what they want in plain English, then pat themselves on the back for saving so much on payroll; now the manager is the programmer, and he’s writing code in English: and I laugh to myself heartily.

Well, yes to the first two, for sure. No to the rest. AI bots, not programming languages "are our most promising source of power over them". And where you "laugh hysterically", I nod and feel that this time cannot come soon enough. Having started my career handcrafting Assembler and FORTRAN, I would be most gratified to see these monsters and their descendants go the way of the dinosaurs that they are. I might be wrong, and maybe there are some severe obstacles there, but I would give more than even odds that the jobs currently performed by interns, junior and intermediate SWE, QA department, and a big chunk of IT support will fade away in the next 5 years or so.

At some point soon, the whole idea of a "high-level programming language" will go the way of "Assembler". You will talk to LLM in English, it will do the rest. One can call it "prompt hacking", but it is really a new high-level language that is much more human-friendly.

When will it replace the Night Watch?

A person who can debug a device driver or a distributed system is a person who can be trusted in a Hobbesian nightmare of breathtaking scope; a systems programmer has seen the terrors of the world and understood the intrinsic horror of existence.

— James Mickens

ETA: H/t Eliezer for the link, which he cited in a related context.

Hah, a great link! To be fair, we will not rid ourselves of monsters, we will replace them with different monsters, which may or may not eventually finish us off.

The assumption that ChatGPT can't do more than just write code is already wrong today. It's decent for telling you options of various packages that might solve your problem and give you pro and cons for each of them. 

Given the way ChatGPT works in particular, it's bad at reading through existing code and finding a bug. As work is going on to move from a "one prompt"-"one answer model" to giving agent instructions and the agent taking multiple actions in succession, it can read through the code to search for the bug. 

OpenAI already had WebGPT that was an agent that could go out and read on the web to find an sources to have a good answer that's just not hallucinated. At the moment it's quite unclear what a model that can freely browse the documentation and existing code and write new code will be able to do. 

I don't know how widespread this problem is, but I often find myself unable to just write code even if it is a really boilerplate. I have a perfect vision in my head of what and how my code should do, but I can't translate it into code strings on screen and I need to literally take piece of paper, write down all the algorithm in natural language and then transform it into program string-by-string because otherwise I am eerily stuck. I think that for me ChatGPT should be really useful.

I disagree with English (in principle at least) being inadequate for software specification.

For any commercial software, the specification basically is just "make profit for this company". The rest is implementation detail.

(Obviously this is an absurd example, but it illustrates how you can express abstractions in English that you can't in C++.)

I don't think the comparison of giving a LLM instructions and expecting correct code to be output is fair. You are vastly overestimating the competence of human programmers: when was the last time you wrote perfectly correct code on the very first try?

Giving the LLM the ability to run its code and modify it until it thinks its right would be a much fairer comparison. And if, as you say, writing unit tests is easy for a LLM, wouldn't that just make this trial-and-error loop trivial? You can just bang the LLM against the problem until the unit tests pass.

(And this process obviously won't produce bug-free code, but humans don't do that in the first place either.)