I am curious how to construct the cheapest addition for d digits.
Intuitively, "cheapest" refers to using as few gates as possible, exploting some mathematical properties of binary addition, and perhaps reusing some intermediate results. Though I am not sure how exactly to compare "costs" of gates of different dimensions. (This probably requires some insight into how the hardware of actual computers is built. The mathematical "cost" of gates should approximate the actual costs of producing and operating the hardware. I mean, the ultimate benefit of this exercise would be to design an actual computer that is cheaper or faster or more energy efficient.)
Then the same question for multiplication. Then, for building the entire computer.
Even very simply stated problems in boolean circuit optimization turn out to be deeply connected with extremely hard mathematical and computational problems, and in some cases even provably undecidable problems.
I don't hold out very much hope of ever being able to find and prove solutions to most of these questions, even assuming superintelligence.
On the other hand, there are no doubt lots of practical improvements, even very substantial ones, to be made.
Logic circuits traditionally consist of a directed, acyclic graph. Each node can take a value of 0 or 1. The nodes can have 0, 1, or 2 inputs. 0-input nodes are the "input" for the whole system. 1-input nodes are always NOT gates, they output 0 if they are input 1 and vice versa. 2-input nodes are either AND or OR gates, which work like you would expect. Some of the nodes are designated output nodes.
Let's generalise. We can define an (m,n) gate as a function {0,1}m→{0,1}n. In this case a NOT gate is a (1,1) gate; both AND and OR gates are (2,1) gates. Consider all of the possible (m,n) gates: there are 2m possible inputs, and each one of them has one of 2n possible outputs. This means there are (2n)2m=2n×2m possible (m,n) gates. Keeping track of the number of gates is important so we will thoroughly convince ourselves of this.
For m=0,n=1 there are 21×20=2 possible gates. These correspond to gates with no inputs and one output, so the two gates are a node which always outputs 0 and a node which always outputs 1. If we allow n to vary, we have 2n possible (0,n) gates. This makes sense, as a (0,n) gate just consists of a node which always outputs the same string of n bits. (m,n) gates can then be considered as a lookup table with 2m locations, each containing n bits of information. In general there are n×2m bits of information required to specify an (m,n) gate.
Example 1
Now let's consider all possible (2,1) gates. There are 21×22=16 of them:
010=00002
110=00012
210=00102
310=00112
410=01002
510=01012
610=01102
710=01112
810=10002
910=10012
1010=10102
1110=10112
1210=11002
1310=11012
1410=11102
1510=11112
So what are the numbers above each gate?
Consider the input {0,1}m as a set of numbers a0,...,am−1. We can define the input A uniquely using A=∑m−1i=0ai2i. In binary this is just the concatenation am−1||...||a0. This means there are 2m possible inputs. We will index these with k∈0,...,2m−1.
We then take bj as the jth output when the input A is fed into the gate. We define B similarly to A above as B=∑n−1j=0bj2j. We then let Bk be the output number corresponding to the input Ak.
Each B has n digits, so we can define C=∑2m−1k=0Bk2Ak×n. This is equivalent to concatenating Bn−1||...||B0 in binary, which also makes clear that C has n×2m digits so the maximum value of C is 2n×2m−1.
Each C thus uniquely defines an (n,m) gate, therefore the three values (m,n;C) with C∈0,...,2n×2m−1 define a function {0,1}m→{0,1}n. This is a slight generalisation of the notion of a boolean function, as boolean functions are generally defined as functions {0,1}n→{0,1}. We will call C the identifier of the gate.
Let's look at the above table more closely. We see some familiar gates: 1 is NOR, 6 is XOR, 7 is NAND, 8 is AND, 9 is XNOR, and 15 is OR. In fact for any gate B, the gate 15−B is the result of applying a NOT function to the output.
We also have some input-asymmetric gates. 2, 4, 11, and 13 are all of the form (a0 [GATE] NOT a1) where [GATE] is an AND, OR, NAND or NOR gate.
The remaining gates are also of note. 1 and 15 are both constant. 3, 5, 10, and 12 all only depend on one of the inputs. These gates could be constructed from a circuit containing a single (0,1) or (1,1) gate. These illustrate a way in which gates can be trivial: they can be independent of one or more of the inputs.
There are other ways gates can be trivial: but these rely on having multiple output nodes. The first is duplicate output nodes, imagine the set of (2,2) gates for which b0=b1 for all inputs. There are 16 of them, each corresponding to a (2,1) gate.
Another is separability. We will define an (m,n) gate as separable if we can split the m inputs into two or more subsets, each of which can only affect a subset of the n outputs. A case of this would be the (2,2) gate where b0 = a0, and b1 = a1. Another example would be the (2,2) gate where b0 = NOT a1, and b1 = a0.
Gates can also be equivalent to each other with regards to permuting the inputs. We see this with the pairs 2 and 4; and 11 and 13 above.
Example 2
A half-adder is a (2,2) gate. We can work through and find its identifier: First we need a table of inputs and outputs. We will take the carry term first and the sum second.
The binary representation of C can be easily read off as 10 01 01 00. 100101002=14810 therefore the half-adder is identified as (2,2;148).
A full-adder is a (3,2) gate:
The binary number C is here read off as 11 10 10 01 10 01 01 00. 11101001100101002=5979610. So the full adder can be expressed as (3,2;59796). C gets big fast as gates gain inputs.
Constructing Gates from Simpler Gates
We might want to construct larger gates from simpler ones. This is how computers are built, and in fact we can build a half-adder from just two "standard" (2,1) logic gates, an AND and an XOR. A full-adder requires 5 "standard" (2,1) logic gates. It may be possible to build a full-adder with even fewer (2,1) logic gates if we are allowed to use all 15, but I have not checked and I do not know a good way to check yet.
We can use some tools from information theory to lower-bound how many smaller gates we ought to need in general to construct a bigger one. We will consider the space complexity as the amount of information required to specify a gate of a certain size, and compare this to the space complexity of a network of g (2,1) gates.
The space complexity of a gate is log2(m)+log2(n)+n×2m. This grows as Θ(n×2m).
When specifying a gate network, we need to specify the gates, which grows linearly in g: Θ(g). We must also consider the information required to specify the connectivity of the network. This is an acyclic directed graph. The number of vertices is g+m, and the number of edges is 2g. The space complexity of this thus grows as the space complexity of a sparse directed graph, which is Θ(|V|+|E|). Here this ends up being Θ(g+m).
Binary Addition
But this is not always the case: consider a gate which adds together two d-digit binary numbers. This will be a (2d,d+1) gate. But to construct this from 1 half-adder and d−1 full-adders requires only g=2+5×(d−1)=5d−3 separate standard logic gates. Our value of g here does not satisfy Θ(g+m)=Θ(n×2m), as Θ(d)≠Θ(d×22d). Something is clearly up here. Addition appears to be some sort of special process which is unusually simple.
In this framework, this appears to us as addition being much easier than expected to construct as a network. Being easier than expected to construct as a network also makes it faster to compute, since the number of gates corresponds the number of operations needed to evaluate the process.