ACT-R theory predicts that the "strength" of declarative memory in human brains decays as ∑i1√ti, where ti is the amount of time since the ith exposure to the fact. The power law here seems strange – surely things should either decay discretely or exponentially. My first reaction when I heard this was to think very confusedly about reaction rates between neurotransmitters, and my second reaction was to look for the empirical evidence.[1] As it turns out, the power law relationship was empirically established in the late 90s. [2] Ready for another shock? This matches closely with an empirical observation about information decay in financial markets! Our goal in this post is to give a theoretical explanation of this coincidence.
The motivating idea behind our argument is that impact decay is caused by the market literally forgetting what happened. This is cute but naive[3] – we shouldn't hope to use neuroscience to explain something essentially mathematical. Instead, we'll leverage this analogy to find a semi-rigorous derivation in the appropriate direction.
Epistemic status: the ideas are probably not new, but I haven't seen them collected in one place before. There's some math, and it's a bit sketchy.
This post grew out of conversations with LW user Kirby Sikes, which in turn were inspired by this comment of Terry Stewart. We'll take a somewhat meandering path to the main result, walking the same path we used to discover it.
Thanks again to Kirby Sikes for helpful comments on a draft.
Conventions
We emphasize phrases in bold and define phrases in italics.
Refer back here if you lose track of notation: T is the "final" time, t is an intermediate time, and t0 is the "initial" time. N is a number of shares or contracts. stdev is short for standard deviation. K is a "strike price" and e is an "edge", both in units of dollars. X is a random variable. u is a unit vector.
We're forced to lean on some knowledge of probability theory. There's a tiny bit of calculus – it can be safely black-boxed if you don't understand it.
Background: options pricing
An options contract gives the owner the right (but not the obligation) to buy[4] a unit of stock at a specific strike price on a specific expiration date. It's called an "option" because it lets you choose whether to do something (exercise the option) or not (let the option expire). It is generally obvious to the owner whether it is good or bad to exercise. They can always choose to expire when exercising is bad, so the option has positive value.
How much value? Consider the price of the underlying stock on the expiration date as a random variable – call its distribution the terminal price distribution. If it's Gaussian, then it's not hard to check[5] that the expected value of the option on the expiration date is proportional to the standard deviation (which we'll abbreviate as stdev) of the terminal price distribution. With some significantly harder math, you can see that the stdev rule is correct under pretty mild assumptions about stock prices.
Price impact from options pricing
Price impact refers to the phenomenon that buying a stock makes its price go up. If you don't have special information about what the stock is worth, the market shouldn't care about your purchase, so this impact should decay with time.[6] In this section, we'll reason about how fast that decay should be.
Suppose there is a market maker willing to sell N units of stock at a price of K+e.[7] This implies that the current "fair" price of the stock is K. If a customer buys N units of stock, the new fair price will be K+e, so e can be interpreted as the price impact of buying N units of stock. Our goal is to derive a relationship between e and N.
Thankfully, pricing e is very similar to pricing an option, and the latter is well understood.[8] We'll do this by constructing a synthetic option (or "imaginary option", if you prefer) whose price should be equal to e,[9] and then we'll compute its price (up to a constant factor).
Anyone in the world has the optionto buyN units of stock from the market maker for K+e dollars. This is very similar to everybody collectively owning an options contract with a strike price of K+e. Roughly speaking, the market maker expects to make a profit proportional to e times N when they execute a trade.[10] So they're effectively selling N options for e dollars each, and the appropriate value for e is the price of one synthetic option.
We want to price some options contract, but actually we haven't specified which one – we still need the expiration time. After selling to the customer, the market maker will buy the stock back at a constant rate, giving an effective expiration time proportional to N.[11] Great! Recall from the first section that the price is proportional to the stdev of the terminal price distribution. The variance is the square of the stdev, and the variance grows linearly with time.[12]
Putting it all together: we found a synthetic option with e equal to its price, which is proportional to some kind of standard deviation, which is proportional to √N. Whew!
Instantaneous impact from cumulative impact
Now we know what happens immediately after we buy a big block of stock, but we want to know what happens long after we buy 1 share.
(Content warning: calculus) We assume that the total price impact is equal to the sum of the impacts of each individual trade – hopefully this assumption seems mild to the reader. Now let's imagine buying one share every second for an hour. The price impact at the end will be proportional to √N, and it will also be equal to ∑Tt=0(impact from purchase at time t). We can approximate this sum as an integral, and then the decay rate will be the time derivative of the total impact at the end of the hour: recall the total impact is √N. Since we're buying at a constant rate, the time derivative is the same as the derivative with respect to N. Taking that derivative and applying the size-time equivalence again, we get 1√T! 🥳
Price impact from random walks
That derivation was fun, but it seems really unclear how to apply it to brains. Let's take inspiration from the "forgetting" idea and ask the following question:
Given only the information of the current price at time T, can we distinguish the world where we bought one share at time 0 from the counterfactual world where we did not?
Let's replace "current price" with "market maker's position". (I promise this substitution is reasonable, in the sense that using it helps you design trading algorithms that make money, but I won't take the time to justify it here.) Recall that we're interested in the case where the customer has no special information. Let's model their buy/sell choices as independent coin flips and write XT for the market maker's position at time T.
In the counterfactual where the customer does not buy at time 0, XT is a sum of T independent coin flips, so its distribution is approximately Gaussian with mean 0 and stdev √T. In the counterfactual where the customer does buy at time 0, XT is 1 plus a sum of T−1 independent coin flips, so its distribution is approximately Gaussian with mean 1 and stdev √T−1.
These distributions are very similar! So they should be quite hard to tell apart. In fact, any procedure for distinguishing them will have a success probability that scales like the total variation distance between the distributions. You can compute this with a sort of nasty integral... whose value turns out to scale with 1√T!
Putting it all together
Dear reader, it's time to pull the rug... we'll model the brain with a neural net. The parameter space is RD for some huge D, and our "brain" moves around that space by stochastic gradient descent. Since stochastic gradient descent is kind of random, we'll model it as a high-dimensional random walk. ("Brownian motion" is a better word, but we don't care.)
Say the artificial brain is in state X0 at time t0. In one counterfactual world, it hears the utterance "sassy shrimps jumble the pool"[13], which causes X1=X0+u. In the other (completely different!) counterfactual world, it hears some random nonsense that causesX1=X0+noise. Again we'll ask the question "how well can we distinguish between the XT appearing in these two worlds?" The answer is just the same as before: the best you can do is project onto the u axis, and then we're almost exactly in the one-dimensional random walk case (with approximately Gaussian step sizes instead of coin flips).
Now we've explained the rate of brain drain with an option concoction. Hopefully you've enjoyed this at least √time to readtime to write times as much as I did.
My third reaction was to think about this economic argument linking exponential discounting to hyperbolic discounting, but that's a story for another day.
Here "strength" is the inverse of recall time. The experiment is then approximately what you expect: get a bunch of undergrads, bring them into a room periodically, (I think on the scale of months, but memory of the paper is fuzzy) ask them to repeat some facts on a randomly varying schedule, and do some regressions.
The market makers have an entry in a database table! They don't have to forget if they don't want to! I'm not exactly a proponent of the EMH, but financial markets aren't that inefficient...
A call option lets the owner buy while a put option lets the owner sell. We will only need to think about call options for argument, but we could have thought about puts instead.
The point is that if you dohave a strong opinion, you will buy a lot. If you've bought a lot recently, you're somewhat likely to buy more. Your old buys get less scary as time passes, since you're more likely to have had a weaker opinion.
Well, it's still possible to make a lot of money by understanding it better than others! It's well understood in the same sense that gravity and quantum mechanics are: we have models that are very useful, we know about some serious flaws, and there's large rewards available to anyone who can fix them.
This technique is fundamental – the options pricing formulae that we skip over here are derived by coming up with dynamic trading strategies whose expected profits match the expected profits of options.
For simplicity, suppose that they have magical powers that let them slowly buy stock for price K. As a model of real market makers in real financial markets, this is not too insane.
Whenever X can be decomposed as a sum ∑iXi with the pairwise correlations of the Xi being 0, we have that the variance of X is the sum of the variance of the Xi. In this case, the Xi are returns over disjoint intervals. The zero correlation condition is the same as positing that you can't make money by trading off of a simple linear model.
ACT-R theory predicts that the "strength" of declarative memory in human brains decays as ∑i1√ti, where ti is the amount of time since the ith exposure to the fact. The power law here seems strange – surely things should either decay discretely or exponentially. My first reaction when I heard this was to think very confusedly about reaction rates between neurotransmitters, and my second reaction was to look for the empirical evidence.[1] As it turns out, the power law relationship was empirically established in the late 90s. [2] Ready for another shock? This matches closely with an empirical observation about information decay in financial markets! Our goal in this post is to give a theoretical explanation of this coincidence.
The motivating idea behind our argument is that impact decay is caused by the market literally forgetting what happened. This is cute but naive[3] – we shouldn't hope to use neuroscience to explain something essentially mathematical. Instead, we'll leverage this analogy to find a semi-rigorous derivation in the appropriate direction.
Epistemic status: the ideas are probably not new, but I haven't seen them collected in one place before. There's some math, and it's a bit sketchy.
This post grew out of conversations with LW user Kirby Sikes, which in turn were inspired by this comment of Terry Stewart. We'll take a somewhat meandering path to the main result, walking the same path we used to discover it.
Thanks again to Kirby Sikes for helpful comments on a draft.
Conventions
We emphasize phrases in bold and define phrases in italics.
Refer back here if you lose track of notation: T is the "final" time, t is an intermediate time, and t0 is the "initial" time. N is a number of shares or contracts. stdev is short for standard deviation. K is a "strike price" and e is an "edge", both in units of dollars. X is a random variable. u is a unit vector.
We're forced to lean on some knowledge of probability theory. There's a tiny bit of calculus – it can be safely black-boxed if you don't understand it.
Background: options pricing
An options contract gives the owner the right (but not the obligation) to buy[4] a unit of stock at a specific strike price on a specific expiration date. It's called an "option" because it lets you choose whether to do something (exercise the option) or not (let the option expire). It is generally obvious to the owner whether it is good or bad to exercise. They can always choose to expire when exercising is bad, so the option has positive value.
How much value? Consider the price of the underlying stock on the expiration date as a random variable – call its distribution the terminal price distribution. If it's Gaussian, then it's not hard to check[5] that the expected value of the option on the expiration date is proportional to the standard deviation (which we'll abbreviate as stdev) of the terminal price distribution. With some significantly harder math, you can see that the stdev rule is correct under pretty mild assumptions about stock prices.
Price impact from options pricing
Price impact refers to the phenomenon that buying a stock makes its price go up. If you don't have special information about what the stock is worth, the market shouldn't care about your purchase, so this impact should decay with time.[6] In this section, we'll reason about how fast that decay should be.
Suppose there is a market maker willing to sell N units of stock at a price of K+e.[7] This implies that the current "fair" price of the stock is K. If a customer buys N units of stock, the new fair price will be K+e, so e can be interpreted as the price impact of buying N units of stock. Our goal is to derive a relationship between e and N.
Thankfully, pricing e is very similar to pricing an option, and the latter is well understood.[8] We'll do this by constructing a synthetic option (or "imaginary option", if you prefer) whose price should be equal to e,[9] and then we'll compute its price (up to a constant factor).
Anyone in the world has the option to buy N units of stock from the market maker for K+e dollars. This is very similar to everybody collectively owning an options contract with a strike price of K+e. Roughly speaking, the market maker expects to make a profit proportional to e times N when they execute a trade.[10] So they're effectively selling N options for e dollars each, and the appropriate value for e is the price of one synthetic option.
We want to price some options contract, but actually we haven't specified which one – we still need the expiration time. After selling to the customer, the market maker will buy the stock back at a constant rate, giving an effective expiration time proportional to N.[11] Great! Recall from the first section that the price is proportional to the stdev of the terminal price distribution. The variance is the square of the stdev, and the variance grows linearly with time.[12]
Putting it all together: we found a synthetic option with e equal to its price, which is proportional to some kind of standard deviation, which is proportional to √N. Whew!
Instantaneous impact from cumulative impact
Now we know what happens immediately after we buy a big block of stock, but we want to know what happens long after we buy 1 share.
(Content warning: calculus)
We assume that the total price impact is equal to the sum of the impacts of each individual trade – hopefully this assumption seems mild to the reader. Now let's imagine buying one share every second for an hour. The price impact at the end will be proportional to √N, and it will also be equal to ∑Tt=0(impact from purchase at time t). We can approximate this sum as an integral, and then the decay rate will be the time derivative of the total impact at the end of the hour: recall the total impact is √N. Since we're buying at a constant rate, the time derivative is the same as the derivative with respect to N. Taking that derivative and applying the size-time equivalence again, we get 1√T! 🥳
Price impact from random walks
That derivation was fun, but it seems really unclear how to apply it to brains. Let's take inspiration from the "forgetting" idea and ask the following question:
Given only the information of the current price at time T, can we distinguish the world where we bought one share at time 0 from the counterfactual world where we did not?
Let's replace "current price" with "market maker's position". (I promise this substitution is reasonable, in the sense that using it helps you design trading algorithms that make money, but I won't take the time to justify it here.) Recall that we're interested in the case where the customer has no special information. Let's model their buy/sell choices as independent coin flips and write XT for the market maker's position at time T.
In the counterfactual where the customer does not buy at time 0, XT is a sum of T independent coin flips, so its distribution is approximately Gaussian with mean 0 and stdev √T. In the counterfactual where the customer does buy at time 0, XT is 1 plus a sum of T−1 independent coin flips, so its distribution is approximately Gaussian with mean 1 and stdev √T−1.
These distributions are very similar! So they should be quite hard to tell apart. In fact, any procedure for distinguishing them will have a success probability that scales like the total variation distance between the distributions. You can compute this with a sort of nasty integral... whose value turns out to scale with 1√T!
Putting it all together
Dear reader, it's time to pull the rug... we'll model the brain with a neural net. The parameter space is RD for some huge D, and our "brain" moves around that space by stochastic gradient descent. Since stochastic gradient descent is kind of random, we'll model it as a high-dimensional random walk. ("Brownian motion" is a better word, but we don't care.)
Say the artificial brain is in state X0 at time t0. In one counterfactual world, it hears the utterance "sassy shrimps jumble the pool"[13], which causes X1=X0+u. In the other (completely different!) counterfactual world, it hears some random nonsense that causesX1=X0+noise. Again we'll ask the question "how well can we distinguish between the XT appearing in these two worlds?" The answer is just the same as before: the best you can do is project onto the u axis, and then we're almost exactly in the one-dimensional random walk case (with approximately Gaussian step sizes instead of coin flips).
Now we've explained the rate of brain drain with an option concoction. Hopefully you've enjoyed this at least √time to readtime to write times as much as I did.
My third reaction was to think about this economic argument linking exponential discounting to hyperbolic discounting, but that's a story for another day.
Here "strength" is the inverse of recall time. The experiment is then approximately what you expect: get a bunch of undergrads, bring them into a room periodically, (I think on the scale of months, but memory of the paper is fuzzy) ask them to repeat some facts on a randomly varying schedule, and do some regressions.
The market makers have an entry in a database table! They don't have to forget if they don't want to! I'm not exactly a proponent of the EMH, but financial markets aren't that inefficient...
A call option lets the owner buy while a put option lets the owner sell. We will only need to think about call options for argument, but we could have thought about puts instead.
The following integral gives the expected value up to a constant factor: ∫∞0xσe−x2/σ2dx
The point is that if you do have a strong opinion, you will buy a lot. If you've bought a lot recently, you're somewhat likely to buy more. Your old buys get less scary as time passes, since you're more likely to have had a weaker opinion.
e stands for edge – the amount of money they're trying to make from the customer.
Well, it's still possible to make a lot of money by understanding it better than others! It's well understood in the same sense that gravity and quantum mechanics are: we have models that are very useful, we know about some serious flaws, and there's large rewards available to anyone who can fix them.
This technique is fundamental – the options pricing formulae that we skip over here are derived by coming up with dynamic trading strategies whose expected profits match the expected profits of options.
For simplicity, suppose that they have magical powers that let them slowly buy stock for price K. As a model of real market makers in real financial markets, this is not too insane.
Maybe 1/2 or 2/3 depending on how you want to count things, but we only need to know up to a constant factor.
Whenever X can be decomposed as a sum ∑iXi with the pairwise correlations of the Xi being 0, we have that the variance of X is the sum of the variance of the Xi. In this case, the Xi are returns over disjoint intervals. The zero correlation condition is the same as positing that you can't make money by trading off of a simple linear model.
Mild rearrangement of a draw from https://www.useapassphrase.com/