I'll focus on the gears-level elaboration of why all those computational mechanisms are equivalent. In short: If you want to actually get anything done, Turing machines suck! They're painful to use, and that makes it hard to get insight by experimenting with them. Lambda calculus / combinator calculi are way better for actually seeing why this general equivalence is probably true, and then afterwards you can link that back to TMs. (But, if at all possible, don't start with them!)
Sure, in some sense Turing machines are "natural" or "obvious": If you start from finite state machines and add a stack, you get a push-down automaton. Do it again, and you essentially have a Turing machine. (Each stack is one end of the infinite tape.) There's more bookkeeping involved to properly handle "underflows" (when you reach the bottom of a stack and need to extend it) yadda yadda, so every "logical state" would require a whole group of states to actually implement, but that's a boring mechanical transformation. So just slap another layer of abstraction on it and don't actually work with 2-stack PDAs, and you're golden, right? Right…?
Well… actually the same problem repeats again and again. TMs are just extremely low-level and not amenable to abstraction. By default, you get two tape symbols. If you need more, you can emulate those, which will require multiple "implementation states" to implement a single "logical state", so you add another layer of abstraction… Same with multiple tapes or whatever else you need. But you're still stuck in ugly implementation details, because you need to move the head left and right to skip across stuff, and you can't really abstract that away. (Likewise, chaining TMs or using one as a "subroutine" from another isn't easily possible and involves lots of manual "rewiring".) So except for trivial examples, actually working with Turing machines is extremely painful. It's not surprising that it's really hard to see why they should be able to run all effectively computable functions.
Enter lambda calculus. It's basically just functions, on a purely syntactic level. You have binders or (anonymous) function definitions – the eponymous lambdas. You have variables, and you have nested application of variables and terms (just like you're used to from normal functions.) There's exactly one rule: If a function is applied to an argument, you substitute that argument into the body wherever the placeholder variable was used. So e.g. (λx.((fx)y)(gx))z → ((fz)y)(gz) by substituting z for x. (Actual use of lambda calculus needs a few more rules: You have to make sure that you're not aliasing variables and changing the meaning, but that's a relatively minor (if tricky) bookkeeping problem that does abstract away fairly cleanly.)
Now, on top of that, you can quickly build a comfortable working environment. Want to actually name things? let x := y in t
is just a ((λx.t)y). (The assignment being split around the whole code is a bit ugly, but you don't even need "actual" extensions, it's just syntax sugar.) Data can also be emulated quite comfortably, if you understand one key idea: Data is only useful because you eventually do something with it. If all you have is abstraction, just abstract out the "actually doing" part with a placeholder in the most general scheme that you could possibly need. (Generally an induction scheme or "fold".) [Note: I'll mostly use actual names instead of one-letter variables going forward, and may drop extra parentheses: f x y z = (((f x) y) z)
]
- In a way, Booleans are binary decisions.
true := λtt.λff.tt
, false := λtt.λff.ff
(Now your if c then x else y
just becomes a c x y
… or you add some syntax sugar.)
- In a way, natural numbers are iteration.
zero := λze.λsu.ze
, succ := λn.λze.λsu.su(n ze su)
. (If you look closely, that's just the induction principle. And all that the number "does" is: (succ (succ (succ zero)) x f) = f (f (f x))
)
- Lists are just natural numbers "with ornaments", a value dangling on every node.
nil := λni.λco.ni
, cons := λx.λxs.λni.λco.co x (xs ni co)
The usual list functions are easy too: map := λf.λxs.(xs nil (λx.λxs.cons (f x) xs))
, fold := λxs.xs
(yep, the list – as represented – is the fold)
And so on… so I think you'll believe me that you basically have a full functional programming language, and that lambda calculus is equivalent to all those more usual computational substrates. (…and maybe Turing machines too?)
Still… even if lambda calculus is extremely simple in some ways, it's also disturbingly complex in others. Substitution everywhere inside a term is "spooky action at a distance", a priori it's quite plausible that this does a lot of "heavy lifting". So: Let's turn to combinator calculi. Let's keep the term structure and application of things, but get rid of all (object-level) variables and binders. Instead, we define three substitution rules: (using variables only at the meta-level)
I x = x
K v x = v
(or, fully parenthesized: (K v) x = v
)
S a b x = (a x) (b x)
(or, again more precisely: ((S a) b) x = (a x) (b x)
)
(Strictly speaking, we don't even need I
, as SKKx = (Kx)(Kx) = x
, so if we say I := SKK
that's equivalent.) The interesting thing about these rules is that all lambda-terms can be translated into SKI
-terms. (To get rid of a binder / λ, you recursively transform the body: Pretend the intended argument x
is already applied to the body, how do you get it where it needs to go? If you have an application, you prefix it with S
such that the argument is passed down into both branches. If a branch doesn't use the argument (that includes other variables), prefix it with K
so it remains unchanged. Otherwise, it is the bound variable that should be substituted, so put an I
and you get the value. Done.) Worst case, this will cause an exponential(!) increase of the term size, but it'll get the job done. After that, there's no more "looking inside terms" or dealing with variables, but extremely simple rewrite rules. To revisit the examples above:
true := λtt.λff.tt → λtt.(K tt) → S(KK)I
, false := λtt.λff.ff → λtt.(I) → KI
zero := λze.λsu.ze → S(KK)I
, succ := λn.λze.λsu.su(n ze su) → λn.λze.SI(S(K(n ze))I) → λn.S(K(SI))(S(S(KS)(S(KK)(S(KK)(S(K n)I)))(KI)) → (S(K(S(K(SI))))(S(S(KS)(S(K(S(KS)))(S(K(S(KK)))(S(S(KS)(S(KK)I))(KI)))))(K(KI))))
[…except I think I messed up somewhere.]
- (And yes, I did that by hand… so I hope you'll accept that I stop there and don't translate the other examples.)
So actually you don't need lambda calculus (or variables, for that matter), just these two rules. You can compile your Haskell to this SKI-stuff.
Take a moment to process that.
Now I hope it's also pretty obvious that you can implement a Turing machine in Haskell? Same for imperative programming languages – look at e.g. the Whitespace interpreter (Haskell version in the archive, Idris version on Github). So at this point I think you have a pretty good intuition as for why all existing computational mechanisms can be reduced down to SKI-calculus.
As for the other direction…with a basic TM model, dealing with all those dynamically resizing parts would be extremely painful, but if you use an extended Turing machine that has lots of different symbols and also more than one band, it's actually not that hard? Detecting expandable locations isn't that hard if you use one band as scratch space for counting nested parentheses. Doing the "reductions" (which can actually greatly expand the space needed because S
duplicates the argument) can also happen on a separate band, and then afterwards you copy over the stuff back onto the "main band", after adjusting the placement of the unchanged stuff on the side. (I hope that's sufficiently clear? This is basically using multiple bands as a fixed number of temporary helper variables.)
And that's it. Now you can run your Turing machine in your Haskell on your Turing machine, or whatever… Hope you're having fun!
I'll focus on the gears-level elaboration of why all those computational mechanisms are equivalent. In short: If you want to actually get anything done, Turing machines suck! They're painful to use, and that makes it hard to get insight by experimenting with them. Lambda calculus / combinator calculi are way better for actually seeing why this general equivalence is probably true, and then afterwards you can link that back to TMs. (But, if at all possible, don't start with them!)
Sure, in some sense Turing machines are "natural" or "obvious": If you start from finite state machines and add a stack, you get a push-down automaton. Do it again, and you essentially have a Turing machine. (Each stack is one end of the infinite tape.) There's more bookkeeping involved to properly handle "underflows" (when you reach the bottom of a stack and need to extend it) yadda yadda, so every "logical state" would require a whole group of states to actually implement, but that's a boring mechanical transformation. So just slap another layer of abstraction on it and don't actually work with 2-stack PDAs, and you're golden, right? Right…?
Well… actually the same problem repeats again and again. TMs are just extremely low-level and not amenable to abstraction. By default, you get two tape symbols. If you need more, you can emulate those, which will require multiple "implementation states" to implement a single "logical state", so you add another layer of abstraction… Same with multiple tapes or whatever else you need. But you're still stuck in ugly implementation details, because you need to move the head left and right to skip across stuff, and you can't really abstract that away. (Likewise, chaining TMs or using one as a "subroutine" from another isn't easily possible and involves lots of manual "rewiring".) So except for trivial examples, actually working with Turing machines is extremely painful. It's not surprising that it's really hard to see why they should be able to run all effectively computable functions.
Enter lambda calculus. It's basically just functions, on a purely syntactic level. You have binders or (anonymous) function definitions – the eponymous lambdas. You have variables, and you have nested application of variables and terms (just like you're used to from normal functions.) There's exactly one rule: If a function is applied to an argument, you substitute that argument into the body wherever the placeholder variable was used. So e.g. (λx.((fx)y)(gx))z → ((fz)y)(gz) by substituting z for x. (Actual use of lambda calculus needs a few more rules: You have to make sure that you're not aliasing variables and changing the meaning, but that's a relatively minor (if tricky) bookkeeping problem that does abstract away fairly cleanly.)
Now, on top of that, you can quickly build a comfortable working environment. Want to actually name things?
let x := y in t
is just a ((λx.t)y). (The assignment being split around the whole code is a bit ugly, but you don't even need "actual" extensions, it's just syntax sugar.) Data can also be emulated quite comfortably, if you understand one key idea: Data is only useful because you eventually do something with it. If all you have is abstraction, just abstract out the "actually doing" part with a placeholder in the most general scheme that you could possibly need. (Generally an induction scheme or "fold".) [Note: I'll mostly use actual names instead of one-letter variables going forward, and may drop extra parentheses:f x y z = (((f x) y) z)
]true := λtt.λff.tt
,false := λtt.λff.ff
(Now yourif c then x else y
just becomes ac x y
… or you add some syntax sugar.)zero := λze.λsu.ze
,succ := λn.λze.λsu.su(n ze su)
. (If you look closely, that's just the induction principle. And all that the number "does" is:(succ (succ (succ zero)) x f) = f (f (f x))
)nil := λni.λco.ni
,cons := λx.λxs.λni.λco.co x (xs ni co)
The usual list functions are easy too:map := λf.λxs.(xs nil (λx.λxs.cons (f x) xs))
,fold := λxs.xs
(yep, the list – as represented – is the fold)And so on… so I think you'll believe me that you basically have a full functional programming language, and that lambda calculus is equivalent to all those more usual computational substrates. (…and maybe Turing machines too?)
Still… even if lambda calculus is extremely simple in some ways, it's also disturbingly complex in others. Substitution everywhere inside a term is "spooky action at a distance", a priori it's quite plausible that this does a lot of "heavy lifting". So: Let's turn to combinator calculi. Let's keep the term structure and application of things, but get rid of all (object-level) variables and binders. Instead, we define three substitution rules: (using variables only at the meta-level)
I x = x
K v x = v
(or, fully parenthesized:(K v) x = v
)S a b x = (a x) (b x)
(or, again more precisely:((S a) b) x = (a x) (b x)
)(Strictly speaking, we don't even need
I
, asSKKx = (Kx)(Kx) = x
, so if we sayI := SKK
that's equivalent.) The interesting thing about these rules is that all lambda-terms can be translated intoSKI
-terms. (To get rid of a binder / λ, you recursively transform the body: Pretend the intended argumentx
is already applied to the body, how do you get it where it needs to go? If you have an application, you prefix it withS
such that the argument is passed down into both branches. If a branch doesn't use the argument (that includes other variables), prefix it withK
so it remains unchanged. Otherwise, it is the bound variable that should be substituted, so put anI
and you get the value. Done.) Worst case, this will cause an exponential(!) increase of the term size, but it'll get the job done. After that, there's no more "looking inside terms" or dealing with variables, but extremely simple rewrite rules. To revisit the examples above:true := λtt.λff.tt → λtt.(K tt) → S(KK)I
,false := λtt.λff.ff → λtt.(I) → KI
zero := λze.λsu.ze → S(KK)I
,succ := λn.λze.λsu.su(n ze su) → λn.λze.SI(S(K(n ze))I) → λn.S(K(SI))(S(S(KS)(S(KK)(S(KK)(S(K n)I)))(KI)) → (S(K(S(K(SI))))(S(S(KS)(S(K(S(KS)))(S(K(S(KK)))(S(S(KS)(S(KK)I))(KI)))))(K(KI))))
[…except I think I messed up somewhere.]So actually you don't need lambda calculus (or variables, for that matter), just these two rules. You can compile your Haskell to this SKI-stuff.
Take a moment to process that.
Now I hope it's also pretty obvious that you can implement a Turing machine in Haskell? Same for imperative programming languages – look at e.g. the Whitespace interpreter (Haskell version in the archive, Idris version on Github). So at this point I think you have a pretty good intuition as for why all existing computational mechanisms can be reduced down to SKI-calculus.
As for the other direction…with a basic TM model, dealing with all those dynamically resizing parts would be extremely painful, but if you use an extended Turing machine that has lots of different symbols and also more than one band, it's actually not that hard? Detecting expandable locations isn't that hard if you use one band as scratch space for counting nested parentheses. Doing the "reductions" (which can actually greatly expand the space needed because
S
duplicates the argument) can also happen on a separate band, and then afterwards you copy over the stuff back onto the "main band", after adjusting the placement of the unchanged stuff on the side. (I hope that's sufficiently clear? This is basically using multiple bands as a fixed number of temporary helper variables.)And that's it. Now you can run your Turing machine in your Haskell on your Turing machine, or whatever… Hope you're having fun!
I think I got something about what you said, that in a certain sense we don't care about state with a Turing machine. It just occurred to me that the only reason we can use these more convenient designs in a processor is because its memory is limited. (Not sure if this is true? There is probably a really awkward way you could encode addresses, even if your memory were infinite? You'd probably do some tree-like thing, and figuring out where you actually want to put stuff would take some time.)