In my last post, I discussed some basic logical connectives. Before moving on to more advanced topics, I want to discuss formal syntax. I think many people will find this post dry and boring, but I think it's neccessary to cover at some poin
Up till this point, I've been assuming an understanding which bears dwelling on. I've previously stated that a judgment is a series of mental tokens which one may declare. While I've technically adhered to that, it's obvious that I'm assuming more structure. Furthermore, I've made usage of variable binding and substitution, which have not been properly accounted for.
This short post is intended to specify the nature of syntax as used in this series. It will mostly be a recount of abstract syntax and binding according to the first chapter of Practical Foundations for Programming Languages, but with retrospect from the account of meaning explanations given earlier.
Abstract Syntax Trees
Any given form of logic specifies various sorts of phrases which may be combined to form judgments. We'll distinguish between two different types of syntax, an informal "surface" syntax which is human readable, and a formal abstract syntax which is more well-defined. We'll be concerning ourselves with this abstract syntax in this post, treating the surface syntax as a mere readable short-hand for the abstract syntax.
In defining abstract syntax, we concern ourselves primarily with the rules by which we may combine phrases with each other. Since we already have an understanding of meaning explanation under our belt, we may use this to express the rules for correct syntax formation. At first, our phrases will be in the form of trees, called abstract syntax trees, or ASTs.
ASTs are made up of two main kinds of things, variables and operators. The variables may be placed at the leaves of an AST, while the operators make up the nodes. For the sake of restricting how syntax may be combined, we give every operator a sort. This sort will dictate the role of an operator by restricting how it may be combined with other operators.
Each operator comes equipped with a list of sorts, s1, ... sn, which tell us what sorts of ASTs can be plugged into the operator. In addition, each operator has an intrinsic sort s. When ASTs of the proper sorts are plugged into this operator, we obtain an AST of sort s. We denote the full arity s(s1,...sn).
Variables will be the most significant part of our discussion of abstract syntax. Briefly, a variable is a single token which is intended to be used as a marker for a place where a more concrete expression might be placed. As a result, expressions containing variables make up schemes of statements which may be specialized, via substitution, into more specific statements. For example, in basic algebra, we may form polynomials such as x2+2x+1, which can be specialized by substitution of, say, 7 for x to obtain 72+(2×7)+1.
Abstract syntax trees are classified by sorts that divide ASTs into syntactic categories. As an example, the various true, prop, :, etc. operators will be placed into a different sort than λ, ∨, →, etc. We would not have tried stating, for example, true true. Beyond not meaning anything, we will make this a syntactically incorrect statement.
Variables in abstract syntax trees range over sorts in the sense that only ASTs of the specified sort of the variable can be plugged into that variable. But the core idea carries over from school mathematics, namely that a variable is an unknown, or a place-holder, whose meaning is given by substitution.
As an example, consider a language of arithmetic expressions built from numbers, addition, and multiplication. The abstract syntax of such a language consists of a single sort Exp (standing for "expression") generated by the following:
∙ An operator 0 of arity N().
∙ An operator S of arity N(N).
∙ An operator num of arity Exp(N).
∙ Operators plus and times, both of arity Exp(Exp,Exp).
As an example, the expression 2+(3×x), which involves a variable, x, would be represented by the AST plus(num(S(S(0)));times(num(S(S(S(0))));x)) of sort Exp, under the assumption that x is also of sort Exp. For the sake of readability, all numbers will simply be written as numerals from now on. Because, say, num(4), is an AST of sort Exp, we may plug it in for x in the above AST to obtainplus(num(2);times(num(3);num(4))), which is written in informal, surface syntax as 2+(3×4). We may, of course, plug in any more complex ASTs of sort Exp for x instead.
We may give a meaning explanation for ASTs as follows;
We may declare x to be an AST of sort s when we've declared x to be a variable of sort s.
We may declare θ(a1;a2;...an) to be an AST of sort s when we've declared θ to be an operator of arity s(s1,s2,...sn) and have declared, for each i, ai to be an AST of sort si.
To fully explicate the system above, we have a meaning explanation like;
We may declare Exp to be a sort.
We may declare N to be a sort.
We may declare 0 an operator of arity N()
...
and this will fully specify our syntax.
Implicitly, for all the theories we've discussed so far, the exact abstract syntax is left implicit. But we may employ the following explanation;
We may declare Judgment to be a sort.
We may declare Proposition to be a sort.
We may declare hypothetical to be an operator of arity Judgment(Judgment, Judgment).
...
We may declare true an operator of arity Judgment(Proposition).
We may declare prop an operator of arity Judgment(Proposition).
We may declare ∧ an operator of arity Proposition(Proposition, Proposition)
We may declare ⊤ an operator of arity Proposition()
...
From these rules we can declare that
hypothetical(prop(∧(A;⊤));true(A))
is a judgment. Not one we can necessarily declare, but a well-formed judgment none the less. We typically write this as;
A∧⊤propAtrue
which is our informal, surface syntax.
It's worth pointing out at this point that these sorts of things have a standard notation in the form of a BNF grammar. The above would typically be written more succinctly as;
Judgment, J ::=
true(P)
prop(P)
hypothetical(J; J)
...
Proposition, P ::=
⊤
∨(P;P)
∧(P;P)
...
we won't be spelling out grammars very much in this series, but, when need be, this syntax will be used. Consider it a short-hand for the kind of meaning explanation given before.
The meaning explanation for ASTs provides justification for a principle of reasoning called structural induction. Suppose that we wish to prove that some predicate holds of all ASTs of a given sort. To show this it is enough to consider all the ways in which it can be generated, and show that the predicate holds in each case under the assumption that it holds for its constituent ASTs (if any). So, in the case of the sort Exp just described, we must show that
∙ The predicate holds for any variable x of sort Exp.
∙ The predicate holds for any number, num(n).
∙ Assuming that the predicate holds for a1 and a2, prove that it holds for plus(a1; a2) and times(a1; a2).
Because these cases exhaust all formation possibilities, we are assured that the predicate holds for any AST of sort Exp. More generally, we'd need to prove that;
∙Ps(x) for any variable x of sort s.
∙ Given an operator θ of arity s(s1,...sn), assuming we have Ps1(a1), ... Psn(an), we must prove that Ps(θ(a1;...sn)).
This allows us to prove Ps for all ASTs of sort s. We will use this technique to establish the well-behaviour of substitution over ASTs.
In general, when a grammar is defined we'll be reasoning about ASTs modulo some set of variables. Each AST has a set of variables appearing within it. We say that x is fresh in our AST if x does not appear in it, otherwise we say that x is free in our AST.
Given a sort s, a substitution a[b:=c], where a is an AST of sort s, b is a variable of sort s′, and c is an AST of sort s′, is defined as;
By well-behaviour, I mean that, given a fixed AST, a substitution will always result in a unique new AST. To be more precise, we want to prove that
Given an AST a with free variables X,x and an AST b with free variables X, then there is a unique AST c for which a[x:=b]↓c.
To prove this, we consider the inductive cases for ASTs in general;
For the base case, we must consider what happens to a lone variable. That is, if we have a variable y which is free in X,x, then we have y[x:=b]↓b in the case that y is x (in which case b is our unique c), and y otherwise (in which case y is our unique c).
For the inductive case, we must consider an operator of the form θ(a1;a2;...an). Our inductive hypothesis is that there's a unique ci for each ai such that ai[x:=b]↓ci. In this case, we have
by our inductive hypothesis. This means our unique c is θ(c1;c2;...cn).
That exhausts all formation possibilities. This establishes that substitution is well-behaved over ASTs, it will never get stuck.
Abstract Binding Trees
Abstract binding trees, or ABTs, enrich ASTs with the ability to manage variables. The primary difference between an AST and an ABT is the ability to bind variables, allowing subtrees of an ABT to possess free variables which are not themselves free in the tree itself. The subtree where a bound variable is considered free is the scope of that variable. This is useful for allowing an ABT to talk about indexed expressions and expression schemes. A primary consequence of this is the arbitrary nature of any particular choice of variable within a binding.
A common example of variable binding in ordinary mathematics is in integration. In the expression ∫f(x)dx, we have an integral binding the variable x for use within the expression f(x). The free variables of f(x) are x, while there are no free variables in ∫f(x)dx, assuming f is an operator. By the nature of this binding, the choice of x could have been otherwise without changing the expression. So we could rewrite the integral as ∫f(y)dy. We would consider the two ABTs underlying these expressions to be identical. This identification is known as α-equivalence. Binding has a specific scope. For example, in the expression x∫f(x)dx, the first x is a variable which isn't bound by anything, while the second is the bound x of the integral. As just stated, this will be considered identical to x∫f(y)dy. Furthermore, we may consider the expression ∫∫f(x,y)dxdy to be semantically different from ∫∫f(x,y)dydx as the order of bound variables is different.
Just like an AST, an ABT is made up of variables and operators, each having an associated arity. In the case of ABTs, it's also possible for an operator to bind variables of a particular sort within an argument. An operator of sort s with an argument of sort s1 which binds variables of sort b1, b2, ... bn will have its arity denoted s(b11.b12....b1n.s1). Assuming that an integral binds something of sort Exp, we may denote its sort as Exp(Exp . Exp), and we may write the ABT of ∫f(x)dx as ∫(x.f(x)). For the sake of readability, we will write b11.b12....b1n.s1 as →b1.s1.
Since we may rename variables within a binding without changing the ABT, we define a notion of variable replacement. Given a list of variables, →b, we may define a replacement of these variables as a list of pairs of variables, stating new and distinct names for the previous list. We denote such a replacement by ρ. This will be a function which, given a variable x in →b, will return a fresh variable ρ(x). We denote the application of such a replacement to an expression a with ^ρ(a).
Note that all variables within a renaming must be fresh relative to each-other to avoid binding conflicts. For example, we can't rename the variables in∫∫f(x,y)dydx to ∫∫f(z,z)dzdz since that changes the ABT, soρ≡{x↔z,y↔z} is not a valid renaming, but ρ≡{x↔z,y↔w} is.
We may fully explain the meaning of ABTs as;
We may declare x to be an ABT of sort s when we've declared x to be a variable of sort s.
We may declare θ(→b1.a1;...→bn.an) to be an ABT of sort s when we've declared θ to be an operator of arity s(−→bs1.s1,...−→bsn.sn) and, for each i and all replacements ρi for →bi we have declared ^ρi(ai) to be an ABT of sort si.
That renaming is significant. When we have an operator with sub-expressions that make use of bound variables, we don't want it to be a well-formed ABT under specific binding names, but rather, we want it to be well-formed under all binding names.
A version of structural induction called structural induction modulo fresh renaming holds for ABTs. The only difference is that we must establish our predicate for an ABT modulo renaming. To prove Ps for all ABTs, we must prove that;
∙Ps(x) holds for any variable x of sort s
∙ Given an operator o of arity s(−→b1s.s1,...−→bns.sn), assuming we have, for all i and all renaming ρi of →bi a proof of Psi(ρi(ai)), we must prove that Ps(θ(→b1.a1;...→bn.sn)).
The second condition ensures that the inductive hypothesis holds for all fresh choices of bound variable names, and not just the ones actually given in the ABT.
We can now extend our grammars with rules that allow bound variables. Thus far, the most significant usages of variable binding are as part of generalized judgments and lambda expressions. Example grammar rules would look like;
We may declare generalprop an operator of arity Judgment(Proposition . Judgment).
We may declare generalterm an operator of arity Judgment(Term . Judgment).
We may declare λ an operator of arity Term(Term . Term).
Where Term is a sort in the grammar of type theory. Before now there was likely some confusion over exactly what general judgments were meant to bind. As it turns out, there are several judgments for each thing we may want to bind, all having the similar meaning explanations. Without a discussion of syntax, this ambiguity may have never been realized. Going forward, we will use operators which don't have the kind of over-loading that general judgments possess.
It will be useful to give a precise account of free occurrence. We can define a judgment, denoted freex(a), indicating when a variable occurs freely within an ABT. That is, when a variable appears within an ABT, but isn't bound. We give the following meaning explanation;
We may declare freex(x).
We may declare freex(θ(→b1.a1;...→bn.an)) when, for some i, there is an ai such that, for each renaming ρi of bi, we may declare freex(^ρi(ai)).
Some care is necessary to characterize substitution over ABTs. The most obvious issue is substitution of a variable who's name is already bound. Consider the case where x is bound at some point within a, then x does not occur free within a, and hence is unchanged by substitution. For example, ∫(x.a)[x:=b]↓∫(x.a), since there are no free occurrences of x in x.a, we can't declare freex(∫(x.a)). A different, perhaps more common, issue is free variable capture during substitution. For example, provided that x is distinct from y, ∫(y.plus(x;y))[x:=y] is not ∫(y.plus(y;y)), which confuses two different variables named y. Instead, it's simply undefined. It is, however, defined on a renaming of y. Capture can always be avoided by first renaming the bound variables. If we rename the bound variable y to y′ to obtain ∫(x′.plus(x;y′)), then ∫(x′.plus(x;y′))[x:=y] is defined, and reduces to ∫(x′.plus(y;y′)).
I've stated that ABTs which are α-equivalent should be considered as identical. The relation a=αb of α-equivalence means that a and b are identical up to the choice of bound variable names. We may give α-equivalence the following meaning explanation;
We may declare x=αx.
We may declare θ(→b1.a1;...→bn.an)=αθ(→b′1.a′1;...→b′n.a′n) when we have declared, for every i and every renaming ρi of bi and renaming ρ′i of b′i, that ^ρi(ai)=α^ρ′i(a′i).
When we take this identification seriously, we can define substitution unproblematically;
x[x:=b]↓b
x[y:=b]↓x when x and y are different variables
If we cannot, for any b in any →bi, declare freeb(c), then θ(→b1.a1;...→bn.an)[x:=c]↓θ(→b1.a′1;...→bn.a′n) where a′i is simply ai if x is in →bi and a′i is ai[x:=c] otherwise.
The conditions in that last rule prevent the problems discussed above. The condition on b prevents naive variable capture, and can always be fulfilled by replacing all bound variables in our AST with fresh symbols. The conditions on the as prevent us from substituting variables which aren't free to be substituted.
Substitution is essentially performed on α-equivalence classes of ABTs rather than on an ABT itself. We can choose a capture-avoiding representative for a given class of α-equivalent ABTs and, any time we perform a substitution, we first canonicalize the binding names of the ABTs. This fixes all mentioned technical problems with substitution. There are several ways to do this in practice, for example by using de Bruijin indices.
de Bruijin indices
If you're anything like me, much of the discussion of ABTs may not sit right with you. Something about considering ABTs up to swapping variables seems to be a bit of a hack, something which could easily lead to incorrect reasoning without one even taking notice. I mentioned that what should really be done is a process of canonicalization where a canonical representative of an ABT is chosen, and that de Bruijin indices could do this job. The point of this section is to describe how, exactly, this works.
A de Bruijin index is simply a numeral signifying a bound variable. 0 is the most recently bound, 1 the second most recent, 2 the third most recent, etc. If we wanted to, for sake of clarity, we could mark these numerals so that we don't confuse them with actual numbers. I'll use an over-line, ¯¯¯0, ¯¯¯1, ¯¯¯2, etc.
For a de Bruijin indexed ABT, we don't explicitly bind variables. Instead, for each place we look up the arity of the operator to know when variables are bound. For example, instead of writing λr.case(r;e.(r,e);o.(r,o)) we'd write λ(case(¯¯¯0;(¯¯¯1,¯¯¯0);(¯¯¯1,¯¯¯0))) notice that r becomes ¯¯¯0 outside of the subsequent bindings, and becomes ¯¯¯1 inside of them. After further bindings, the most recent variable becomes the second, third, etc. most recently bound variable.
Our notion of free variable has to be amended slightly. In the expression λ((¯¯¯0,¯¯¯1)), ¯¯¯1 is free while ¯¯¯0 is bound. We define the following, where freen(x) judges that ¯¯¯n is free in x;
We may declare freen(¯¯¯n).
We may declare freen(θ(x0;x1;...xn)) when we've declared θ to be of arity s(→b1.s1,→b2.s2,...→bn.sn) and, for some i, we've declared freen+|→bi|(xi).
Where |→bi| is the length of the list →bi. In the real world, freen(x) would be implemented on a computer as a function that returns a boolean. As an example, we may declare free0(λ((¯¯¯0,¯¯¯1))) since we know that λ has arity Exp(Exp.Exp) and we may declare free0+1((¯¯¯0,¯¯¯1)). We may declare that since we know that , has arity Exp(Exp;Exp), and we may declare (in this case for i=1) free1(¯¯¯1).
In the course of that validation, the variable we were testing the freedom of changed its denotation as we entered into variable bindings. This is where much of the confusion over de Bruijin indices arises.
Dealing with substitution is a rather subtle business, but once all the details are clarified, the algorithm is rather simple. Consider this naive evaluation of a substitution;
Hmm... is that right? Let's consider what this is doing when the variables are given explicit names. So λ(λ(¯¯¯1)[¯¯¯0:=(λ(¯¯¯0),¯¯¯0)]) should be λx.(λy.x)[x:=(λz.z,x)]. Well, clearly we did something wrong, our final evaluation shouldn't be λx.λy.x. We did not take into account subsiquent bindings. When entering a binding, the variable we're substituting for goes from being the nth bound variable to the (n+1)th bound variable. Let's try again;
Now, is that right? Our final evaluation was λx.λy.(λz.z,y). Well, that's not right. The second part of our pair started out being bound as x but ended up being bound as y. Clearly, we also need to increment the variables in what we're substituting as well whenever we enter into a binding.
So, our final evaluation is now λx.λy.(λz.y,x). Wait, that's still not right. The first part of our pair had a variable bound as z, but this became a y after incrementing. This is because that variable wasn't free for the ABT (λ(¯¯¯0),¯¯¯0). So we see what we must do; only modify free variables during substitution.
So our final evaluation is λx.λy.(λz.z,x), which is correct. The modification that we need to perform on our substituting expression is called quotation, and is a function defined like so;
quotein(¯¯¯¯¯m)↓¯¯¯¯¯¯¯¯¯¯¯¯¯i+m whenm≥n.
quotein(¯¯¯¯¯m)↓¯¯¯¯¯m when m<n.
quotein(θ(x0;...xm))↓θ(quotein+|→b0|(x0);...quotein+|−→bm|(xm)) when θ has arity s(→b0.s0;...−→bm.sm).
Using this quotation function, we can define substitution as follows;
¯¯¯x[x:=e]↓e.
¯¯¯x[y:=e]↓¯¯¯x when x≠y.
θ(x0;...xm)[n:=e]↓θ(x0[n+|→b0|:=quote|→b0|0(e)];...xm[n+|−→bm|:=quote|−→bm|0(e)])when θ has arity s(→b0.s0;...−→bm.sm).
This works fine, but there's an extra hitch to working with de Bruijin indices, and that's dealing with the removal of variable bindings. Take the following reduction;
ap(λx.y;z)↓y[x:=z]
Notice that the x binding is being removed. That means that any subsequent binding in y will no longer be the nth binding, but will now be the (n-1)th binding. This means we need to actually decrement all the free variables in y by quoting with a -1.
ap(λ(y);z)↓quote−10(y[¯¯¯0:=z])
Similarly, we have the meaning;
We may declare |xJ when we may declare J[x:=E] for any expression E.
This must be modified when using de Bruijn indices. For example;
We may declare |()J) when we may declare quote−10(J[¯¯¯0:=E]) for any expression E.
All this may seem rather complicated. A common quote goes "de Bruijn syntax is a Cylon detector". Despite that, de Bruijin indices are probably the simplest method for handling the canonicalization of ABTs, and this is common in actual computer implementations such as in Agda. That being said, we will not make mention of de Bruijin indices after this section. I think it's important to know about them, at least to have peace of mind. Using them allows one to fix all possible ambiguities in variable bindings, and after a while they do become more intuitive, but they hardly assist readability.
Preamble
In my last post, I discussed some basic logical connectives. Before moving on to more advanced topics, I want to discuss formal syntax. I think many people will find this post dry and boring, but I think it's neccessary to cover at some poin
Up till this point, I've been assuming an understanding which bears dwelling on. I've previously stated that a judgment is a series of mental tokens which one may declare. While I've technically adhered to that, it's obvious that I'm assuming more structure. Furthermore, I've made usage of variable binding and substitution, which have not been properly accounted for.
This short post is intended to specify the nature of syntax as used in this series. It will mostly be a recount of abstract syntax and binding according to the first chapter of Practical Foundations for Programming Languages, but with retrospect from the account of meaning explanations given earlier.
Abstract Syntax Trees
Any given form of logic specifies various sorts of phrases which may be combined to form judgments. We'll distinguish between two different types of syntax, an informal "surface" syntax which is human readable, and a formal abstract syntax which is more well-defined. We'll be concerning ourselves with this abstract syntax in this post, treating the surface syntax as a mere readable short-hand for the abstract syntax.
In defining abstract syntax, we concern ourselves primarily with the rules by which we may combine phrases with each other. Since we already have an understanding of meaning explanation under our belt, we may use this to express the rules for correct syntax formation. At first, our phrases will be in the form of trees, called abstract syntax trees, or ASTs.
ASTs are made up of two main kinds of things, variables and operators. The variables may be placed at the leaves of an AST, while the operators make up the nodes. For the sake of restricting how syntax may be combined, we give every operator a sort. This sort will dictate the role of an operator by restricting how it may be combined with other operators.
Each operator comes equipped with a list of sorts, s1, ... sn, which tell us what sorts of ASTs can be plugged into the operator. In addition, each operator has an intrinsic sort s. When ASTs of the proper sorts are plugged into this operator, we obtain an AST of sort s. We denote the full arity s(s1,...sn).
Variables will be the most significant part of our discussion of abstract syntax. Briefly, a variable is a single token which is intended to be used as a marker for a place where a more concrete expression might be placed. As a result, expressions containing variables make up schemes of statements which may be specialized, via substitution, into more specific statements. For example, in basic algebra, we may form polynomials such as x2+2x+1, which can be specialized by substitution of, say, 7 for x to obtain 72+(2×7)+1.
Abstract syntax trees are classified by sorts that divide ASTs into syntactic categories. As an example, the various true, prop, :, etc. operators will be placed into a different sort than λ, ∨, →, etc. We would not have tried stating, for example, true true. Beyond not meaning anything, we will make this a syntactically incorrect statement.
Variables in abstract syntax trees range over sorts in the sense that only ASTs of the specified sort of the variable can be plugged into that variable. But the core idea carries over from school mathematics, namely that a variable is an unknown, or a place-holder, whose meaning is given by substitution.
As an example, consider a language of arithmetic expressions built from numbers, addition, and multiplication. The abstract syntax of such a language consists of a single sort Exp (standing for "expression") generated by the following:
As an example, the expression 2+(3×x), which involves a variable, x, would be represented by the AST plus(num(S(S(0)));times(num(S(S(S(0))));x)) of sort Exp, under the assumption that x is also of sort Exp. For the sake of readability, all numbers will simply be written as numerals from now on. Because, say, num(4), is an AST of sort Exp, we may plug it in for x in the above AST to obtainplus(num(2);times(num(3);num(4))), which is written in informal, surface syntax as 2+(3×4). We may, of course, plug in any more complex ASTs of sort Exp for x instead.
We may give a meaning explanation for ASTs as follows;
To fully explicate the system above, we have a meaning explanation like;
and this will fully specify our syntax.
Implicitly, for all the theories we've discussed so far, the exact abstract syntax is left implicit. But we may employ the following explanation;
From these rules we can declare that
is a judgment. Not one we can necessarily declare, but a well-formed judgment none the less. We typically write this as;
which is our informal, surface syntax.
It's worth pointing out at this point that these sorts of things have a standard notation in the form of a BNF grammar. The above would typically be written more succinctly as;
Judgment, J ::=
Proposition, P ::=
we won't be spelling out grammars very much in this series, but, when need be, this syntax will be used. Consider it a short-hand for the kind of meaning explanation given before.
The meaning explanation for ASTs provides justification for a principle of reasoning called structural induction. Suppose that we wish to prove that some predicate holds of all ASTs of a given sort. To show this it is enough to consider all the ways in which it can be generated, and show that the predicate holds in each case under the assumption that it holds for its constituent ASTs (if any). So, in the case of the sort Exp just described, we must show that
Because these cases exhaust all formation possibilities, we are assured that the predicate holds for any AST of sort Exp. More generally, we'd need to prove that;
This allows us to prove Ps for all ASTs of sort s. We will use this technique to establish the well-behaviour of substitution over ASTs.
In general, when a grammar is defined we'll be reasoning about ASTs modulo some set of variables. Each AST has a set of variables appearing within it. We say that x is fresh in our AST if x does not appear in it, otherwise we say that x is free in our AST.
Given a sort s, a substitution a[b:=c], where a is an AST of sort s, b is a variable of sort s′, and c is an AST of sort s′, is defined as;
By well-behaviour, I mean that, given a fixed AST, a substitution will always result in a unique new AST. To be more precise, we want to prove that
To prove this, we consider the inductive cases for ASTs in general;
For the base case, we must consider what happens to a lone variable. That is, if we have a variable y which is free in X,x, then we have y[x:=b]↓b in the case that y is x (in which case b is our unique c), and y otherwise (in which case y is our unique c).
For the inductive case, we must consider an operator of the form θ(a1;a2;...an). Our inductive hypothesis is that there's a unique ci for each ai such that ai[x:=b]↓ci. In this case, we have
by the definition of substitution, and
by our inductive hypothesis. This means our unique c is θ(c1;c2;...cn).
That exhausts all formation possibilities. This establishes that substitution is well-behaved over ASTs, it will never get stuck.
Abstract Binding Trees
Abstract binding trees, or ABTs, enrich ASTs with the ability to manage variables. The primary difference between an AST and an ABT is the ability to bind variables, allowing subtrees of an ABT to possess free variables which are not themselves free in the tree itself. The subtree where a bound variable is considered free is the scope of that variable. This is useful for allowing an ABT to talk about indexed expressions and expression schemes. A primary consequence of this is the arbitrary nature of any particular choice of variable within a binding.
A common example of variable binding in ordinary mathematics is in integration. In the expression ∫f(x)dx, we have an integral binding the variable x for use within the expression f(x). The free variables of f(x) are x, while there are no free variables in ∫f(x)dx, assuming f is an operator. By the nature of this binding, the choice of x could have been otherwise without changing the expression. So we could rewrite the integral as ∫f(y)dy. We would consider the two ABTs underlying these expressions to be identical. This identification is known as α-equivalence. Binding has a specific scope. For example, in the expression x∫f(x)dx, the first x is a variable which isn't bound by anything, while the second is the bound x of the integral. As just stated, this will be considered identical to x∫f(y)dy. Furthermore, we may consider the expression ∫∫f(x,y)dxdy to be semantically different from ∫∫f(x,y)dydx as the order of bound variables is different.
Just like an AST, an ABT is made up of variables and operators, each having an associated arity. In the case of ABTs, it's also possible for an operator to bind variables of a particular sort within an argument. An operator of sort s with an argument of sort s1 which binds variables of sort b1, b2, ... bn will have its arity denoted s(b11.b12....b1n.s1). Assuming that an integral binds something of sort Exp, we may denote its sort as Exp(Exp . Exp), and we may write the ABT of ∫f(x)dx as ∫(x.f(x)). For the sake of readability, we will write b11.b12....b1n.s1 as →b1.s1.
Since we may rename variables within a binding without changing the ABT, we define a notion of variable replacement. Given a list of variables, →b, we may define a replacement of these variables as a list of pairs of variables, stating new and distinct names for the previous list. We denote such a replacement by ρ. This will be a function which, given a variable x in →b, will return a fresh variable ρ(x). We denote the application of such a replacement to an expression a with ^ρ(a).
Note that all variables within a renaming must be fresh relative to each-other to avoid binding conflicts. For example, we can't rename the variables in∫∫f(x,y)dydx to ∫∫f(z,z)dzdz since that changes the ABT, soρ≡{x↔z,y↔z} is not a valid renaming, but ρ≡{x↔z,y↔w} is.
We may fully explain the meaning of ABTs as;
That renaming is significant. When we have an operator with sub-expressions that make use of bound variables, we don't want it to be a well-formed ABT under specific binding names, but rather, we want it to be well-formed under all binding names.
A version of structural induction called structural induction modulo fresh renaming holds for ABTs. The only difference is that we must establish our predicate for an ABT modulo renaming. To prove Ps for all ABTs, we must prove that;
The second condition ensures that the inductive hypothesis holds for all fresh choices of bound variable names, and not just the ones actually given in the ABT.
We can now extend our grammars with rules that allow bound variables. Thus far, the most significant usages of variable binding are as part of generalized judgments and lambda expressions. Example grammar rules would look like;
Where Term is a sort in the grammar of type theory. Before now there was likely some confusion over exactly what general judgments were meant to bind. As it turns out, there are several judgments for each thing we may want to bind, all having the similar meaning explanations. Without a discussion of syntax, this ambiguity may have never been realized. Going forward, we will use operators which don't have the kind of over-loading that general judgments possess.
It will be useful to give a precise account of free occurrence. We can define a judgment, denoted freex(a), indicating when a variable occurs freely within an ABT. That is, when a variable appears within an ABT, but isn't bound. We give the following meaning explanation;
Some care is necessary to characterize substitution over ABTs. The most obvious issue is substitution of a variable who's name is already bound. Consider the case where x is bound at some point within a, then x does not occur free within a, and hence is unchanged by substitution. For example, ∫(x.a)[x:=b]↓∫(x.a), since there are no free occurrences of x in x.a, we can't declare freex(∫(x.a)). A different, perhaps more common, issue is free variable capture during substitution. For example, provided that x is distinct from y, ∫(y.plus(x;y))[x:=y] is not ∫(y.plus(y;y)), which confuses two different variables named y. Instead, it's simply undefined. It is, however, defined on a renaming of y. Capture can always be avoided by first renaming the bound variables. If we rename the bound variable y to y′ to obtain ∫(x′.plus(x;y′)), then ∫(x′.plus(x;y′))[x:=y] is defined, and reduces to ∫(x′.plus(y;y′)).
I've stated that ABTs which are α-equivalent should be considered as identical. The relation a=αb of α-equivalence means that a and b are identical up to the choice of bound variable names. We may give α-equivalence the following meaning explanation;
When we take this identification seriously, we can define substitution unproblematically;
The conditions in that last rule prevent the problems discussed above. The condition on b prevents naive variable capture, and can always be fulfilled by replacing all bound variables in our AST with fresh symbols. The conditions on the as prevent us from substituting variables which aren't free to be substituted.
Substitution is essentially performed on α-equivalence classes of ABTs rather than on an ABT itself. We can choose a capture-avoiding representative for a given class of α-equivalent ABTs and, any time we perform a substitution, we first canonicalize the binding names of the ABTs. This fixes all mentioned technical problems with substitution. There are several ways to do this in practice, for example by using de Bruijin indices.
de Bruijin indices
If you're anything like me, much of the discussion of ABTs may not sit right with you. Something about considering ABTs up to swapping variables seems to be a bit of a hack, something which could easily lead to incorrect reasoning without one even taking notice. I mentioned that what should really be done is a process of canonicalization where a canonical representative of an ABT is chosen, and that de Bruijin indices could do this job. The point of this section is to describe how, exactly, this works.
A de Bruijin index is simply a numeral signifying a bound variable. 0 is the most recently bound, 1 the second most recent, 2 the third most recent, etc. If we wanted to, for sake of clarity, we could mark these numerals so that we don't confuse them with actual numbers. I'll use an over-line, ¯¯¯0, ¯¯¯1, ¯¯¯2, etc.
For a de Bruijin indexed ABT, we don't explicitly bind variables. Instead, for each place we look up the arity of the operator to know when variables are bound. For example, instead of writing λr.case(r;e.(r,e);o.(r,o)) we'd write λ(case(¯¯¯0;(¯¯¯1,¯¯¯0);(¯¯¯1,¯¯¯0))) notice that r becomes ¯¯¯0 outside of the subsequent bindings, and becomes ¯¯¯1 inside of them. After further bindings, the most recent variable becomes the second, third, etc. most recently bound variable.
Our notion of free variable has to be amended slightly. In the expression λ((¯¯¯0,¯¯¯1)), ¯¯¯1 is free while ¯¯¯0 is bound. We define the following, where freen(x) judges that ¯¯¯n is free in x;
Where |→bi| is the length of the list →bi. In the real world, freen(x) would be implemented on a computer as a function that returns a boolean. As an example, we may declare free0(λ((¯¯¯0,¯¯¯1))) since we know that λ has arity Exp(Exp.Exp) and we may declare free0+1((¯¯¯0,¯¯¯1)). We may declare that since we know that , has arity Exp(Exp;Exp), and we may declare (in this case for i=1) free1(¯¯¯1).
In the course of that validation, the variable we were testing the freedom of changed its denotation as we entered into variable bindings. This is where much of the confusion over de Bruijin indices arises.
Dealing with substitution is a rather subtle business, but once all the details are clarified, the algorithm is rather simple. Consider this naive evaluation of a substitution;
Hmm... is that right? Let's consider what this is doing when the variables are given explicit names. So λ(λ(¯¯¯1)[¯¯¯0:=(λ(¯¯¯0),¯¯¯0)]) should be λx.(λy.x)[x:=(λz.z,x)]. Well, clearly we did something wrong, our final evaluation shouldn't be λx.λy.x. We did not take into account subsiquent bindings. When entering a binding, the variable we're substituting for goes from being the nth bound variable to the (n+1)th bound variable. Let's try again;
Now, is that right? Our final evaluation was λx.λy.(λz.z,y). Well, that's not right. The second part of our pair started out being bound as x but ended up being bound as y. Clearly, we also need to increment the variables in what we're substituting as well whenever we enter into a binding.
So, our final evaluation is now λx.λy.(λz.y,x). Wait, that's still not right. The first part of our pair had a variable bound as z, but this became a y after incrementing. This is because that variable wasn't free for the ABT (λ(¯¯¯0),¯¯¯0). So we see what we must do; only modify free variables during substitution.
So our final evaluation is λx.λy.(λz.z,x), which is correct. The modification that we need to perform on our substituting expression is called quotation, and is a function defined like so;
Using this quotation function, we can define substitution as follows;
This works fine, but there's an extra hitch to working with de Bruijin indices, and that's dealing with the removal of variable bindings. Take the following reduction;
Notice that the x binding is being removed. That means that any subsequent binding in y will no longer be the nth binding, but will now be the (n-1)th binding. This means we need to actually decrement all the free variables in y by quoting with a -1.
Similarly, we have the meaning;
This must be modified when using de Bruijn indices. For example;
All this may seem rather complicated. A common quote goes "de Bruijn syntax is a Cylon detector". Despite that, de Bruijin indices are probably the simplest method for handling the canonicalization of ABTs, and this is common in actual computer implementations such as in Agda. That being said, we will not make mention of de Bruijin indices after this section. I think it's important to know about them, at least to have peace of mind. Using them allows one to fix all possible ambiguities in variable bindings, and after a while they do become more intuitive, but they hardly assist readability.