That real-world snippet example doesn't look very readable. Did the formatting fail? I don't see any line breaks and the #
seem misplaced for comments.
The traditional solution to this problem is templating.
And here I thought it would be colors, so you can tell which code is on level A of abstraction (green) and which is on level B (red), at a glance.
If I'm coding in Go, I want to use Go and I want to use its power fully. I don't want to use some crippled version of it that's used only inside of templates.
That's what makes Lisp macros awesome. You write them in Lisp.
Also, since you like Python, have seen the Mako language? It's less restricted than Jinja2 and can take pretty much arbitrary Python.
But, when rendering web pages, (the primary use for Jinja2) you want to keep as much complexity out of your templates as possible, so you can test your logic more easily. A restricted DSL enforces that.
"Blocks of code" is a leaky, untyped abstraction. It's better to have something like a DOM API for the target language, where each node in the parse tree is represented as a typed object with properties and children. (For example, if you're generating C, an #include directive would be represented as a typed object that's part of a larger object.) Then you can use the same API to generate, analyze, or transform code. It looks like overkill in toy examples, but can be impressively concise in longer examples.
Long ago, I worked on a Java IDE that effectively worked this way. Our captive compiler did some basic error correction on the source (so it kept working even while typing), then generated an annotated https://en.wikipedia.org/wiki/Abstract_syntax_tree, which the visual elements could read to generate their displays. The cool aspect of code generation is that our tools could also update the tree, and the code generator would insert/delete/refactor the code to match.
Writing programs that generate programs is hard. The programmer has to think at two levels of abstraction at once. She has to follow the logic of the generator. At the same time she can't lose the focus on the logic of the generated code. And the two don't even have to be written in the same language!
That's a hard enough feat even when the tools aren't putting obstacles in your way. But, unfortunately, that's exactly what they are doing.
Consider this Python program that outputs the classic C "Hello, world!" program:
Ugly, you say? Yes, it's ugly. But it's just generating the simplest possible program! If we wanted to generate something truly complex it would become doubleplusugly.
But ugliness aside, the problem is that the code is unreadable.
Reading code is generally harder than writing code. Reading code with two parallel levels of abstraction is yet much harder. Add some atrocious formatting, sprinkle with copious amount of escape sequences and even the best programmer won't be able to understand what's going on.
The traditional solution to this problem is templating.
The idea is that the generated program is like a form, a pre-printed template with few blank slots to fill in:
And here's how it works with, say Jinja2:
Well, it's not much better. Weird formatting and escape sequences remain. However, given that the template is now a single string we can load it from a file instead of using a string literal. The content of the file would look much better:
The downside is that the template and the generator now live in two different files which makes the logic harder to follow.
By the way, I am not picking on Jinja2 here. All the code generation tools I've looked at work in basically the same manner. Here's, for example, how Golang's text/template package looks like:
It code generation was really like filling in tax forms, the templating approach would be good enough.
But it's not.
Even the simple Go example above requires some extra logic. Specifically, it uses different text depending on whether the person is question attended the wedding or not.
But it gets worse.
Imagine you want to generate the following report:
That's no longer filling in empty slots. The template has to loop through the list of employees and generate a line for each of them.
Again Jinja2:
As can be seen, moving to more complex generated text means adding more special constructs (if-then-else, for-in etc.) into the template until we end up with a full, Turing-complete code generation DSL.
There are two things I don't like about the templating approach to code generation:
First, I don't want to learn a new DSL. If I'm coding in Go, I want to use Go and I want to use its power fully. I don't want to use some crippled version of it that's used only inside of templates.
Second, I don't want to use large templates in the first place. The pieces I want to fill into the template are often too complex to be generated inside of the template and therefore I have to precompute them, put them in an array and then use text/template to render the array. That in turn tears the generation logic, which is a single conceptual thing, into two pieces: The precomputation and the template rendering.
And I am not even speaking of complex cases when I want to fill in a slot in a template not by a simple string but rather by a full template of its own.
Given the considerations above, I've created a small Python library in 2017 to do code generation in a different way. The idea was to treat "a block of code" as a primitive type, not unlike string or integer. The user would then use the language — the real language, not a limited version thereof — to manipulate those blocks of code, add them together and eventually generate the entire program out of them.
As can be seen, it's a pure Python. No template-specific DSL, no nothing. Instead, there's a new primitive type called "tile" that really just a rectangular area of text. Tile literals support tile interpolation (@{} stuff) but that's not much different from the existing Python string interpolation (F-strings) and can't be really claimed to be a separate language within a language. The user is free to manipulate the tiles in any way that she sees fit.
Unfortunately, since 2017, I haven't had a chance to use the library in anger.
Until last week, that is.
Doing so resulted in smoothing of the API and adding of some convenience features. It also bloated the code of the library from 35 LoC to 74 LoC (!)
The code I have written generates some man pages and C header files.
Let me paste a snippet here so that you get a feeling how a real-world code written with tiles looks like:
Generally speaking, I am happy with the experiment.
I would like to highlight the following facts:
1. There's no DSL in the code whatsoever. Not even a simple one.
2. The source code is nicely indented.
3. The generated code is nicely indented.
4. There isn't a single escape sequence in the code.
On the other hand, there are some minor warts.
For example, when you want to join multiple strings in Python, you can do it this way:
When you want to join multiple tiles though you need an extra pair of parentheses:
When you want to put one tile below another and separate them by a blank like you do it as follows:
The empty line literal (t%"") looks too much like a vim command.
However, both of these problems are artifacts of hacking the tile support on top of existing Python language rather than having the tile primitive type supported by the language itself. If there was native support for tiles, the code would look like this:
Feb 21st, 2019
by
martin_sustrik