The Real Coding Revolution: Why Data Beats Natural Language

The Real Coding Revolution: Why Data Beats Natural Language
Data is laser focused, natural language is directional

It's tempting to believe that explaining something to a computer in plain English (or any natural language) will result in it doing exactly what you want. Many have promised this over the past year. For example, Nvidia CEO Jensen Huang argued that natural language will replace coding. And even though I wholeheartedly agree that the age of traditional programming is fading away, I don't think that English is the new programming language.


Let's use a thought experiment to answer why that is. Imagine not just an LLM or multi-agent system like ChatDev, but the AGI (Artificial General Intelligence) itself. It is not just human-smart, it is beyond our greatest expectations. Let's call it Ust (Unreasonably Smart Thing). On the other side is you, a very smart human. You want to build some piece of software, something trivial, say a simple ToDo app. You tell Ust, "Build me a ToDo app that is available to me via the Web." Ust being what it is, builds it flawlessly. This is the best ToDo app the world has ever seen, duh!

But, is it exactly what you wanted? To know that it is, the only thing you can do is to actually interact with it, or in other words, test it. Why is it? The reason is that your words are not deterministic, any natural language is vague. It's not that Ust can't understand you or that you can't explain it. It is just that the language itself is not precise enough. The communication medium is not fit to transfer the exact idea from one mind to another.

You can say, "Hold on, I can clarify it further." You can, which will narrow down the possibilities, but it will never be exact. You can say, "I want a ToDo app that has a list of tasks, and I can add, remove, and mark them as done." This is better, but still not exact. What do you mean by "mark them as done"? Do you want a checkbox, a button, or a swipe? Do you want to be able to undo it? Do you want to be able to see the history of changes? And so on.

You will always need to test it to ensure the software meets your needs. This is why natural language will never replace programming.

It is not a question of bigger models, it is a question of communication medium!

Data is the new programming language

Now let's imagine that instead of giving Ust a vague description of what you want, you give tests, or just examples of inputs and outputs to the system it needs to build. You can say, "I want a ToDo app that when I send it a list of tasks containing this data, it will return me a list of tasks with the new one added containing that data." You describe precisely the shape of the data that goes in and out of the system. You might supplement it with some natural language, which will help Ust to understand the context, but the core of the communication is the data.

Now Ust, again, being what it is, guarantees that your requirements will be satisfied exactly. Only now, you do not need to test the result, because your requirements are exact this time. It is not ambiguous, it is not vague, it is precise.
If I say, here is a function doStuff(a: number, b: number): number, and here is what it does:

  • doStuff(1, 2) => 3
  • doStuff(2, 3) => 5
  • doStuff(-10, 4) => -6

I never explained what the function does in a natural language, but you know exactly what it does.

Data is a human language too

You are using data to communicate clearly all the time. Just recall a situation when you had dialog like this:

  • "I want a drawer because I do not have enough space for my stuff."
  • "What stuff?"
  • "My office supplies"
  • "Like stapler, paper, pens?"
  • "No like I use these binders, that are kinda this big, and I have like 10 of them."

There are two things to notice here. The conversation starts by setting a general direction. This is the natural language part. But as soon as the other person asks for more details, you switch to data and not only data, you almost always give examples. We love examples because they are precise, they are clear, they are unambiguous. They communicate exactly what we want.

I argue that examples are the tests, the use cases that define the behaviour and disambiguate the requirements.

Using natural language to describe system behavior is like using a flashlight to point—it's general but not precise. Using data is like a laser pointer...

Data vs natural language

I realize that "data" is in principle suffers from the same basic problem as natural language. In fact, it is a language too. And that you can structure your language in the way that it is, effectively, an algorithm. In the end, I am using natural language here, but my main argument is that it is ambiguous.

Using natural language to describe system behavior is like using a flashlight to point—it's general but not precise. Using data is like a laser pointer—more precise, though still not perfect.


Natural language is not the new programming language, data is. Data is the language that is precise, clear, and unambiguous.

I am not arguing that using natural language has no place in software engineering, the opposite is true. It was and will be the main way to communicate the general direction, the context, the constraints, and the goals. But it is not the way to communicate the exact requirements. For that, you need data.