Note: this post leans heavily on metaphors and examples from computer programming, but I've tried to write it so it's accessible to a determined person with no programming background.
To summarize some info from computer processor design at very high density: There are a variety of ways to manufacture the memory that's used in modern computer processors. There's a trend where the faster a kind of memory is to read from and write to, the more expensive it will be. So modern computers have a hierarchical memory structure: a very small amount of memory that's very fast to do computation with ("the registers"), a larger amount of memory that's a bit slower to do computation with, a even larger amount of memory that's even slower to do computation with, and so on. The two layers immediately below the the registers (the L1 cache and the L2 cache) are typically abstracted away from even the assembly language programmer. They store data that's been accessed recently from the level below them ("main memory"). The processor will do a lookup in the caches when accessing data; if the data is not already in the cache, that's called a "cache miss" and the data will get loaded in to the cache before it's accessed.
(Please correct me in the comments if I got any of that wrong; it's based on years-old memories of an undergrad computer science course.)
Lately I've found it useful to think of my memory in the same way. I've got working memory (7±2 items?), consisting of things that I'm thinking about in this very moment. I've got short term memory and long term memory. And if I can't find something after trying to think of it for a while, I'll look it up (frequently on Google). Cache miss for the lose.
What are some implications of thinking about memory about this way?
Register limitations and chunking
When programming, I've noticed that sometimes I'll encounter a problem that's too big to fit in my working memory (WM) all at once. In the spirit of getting stronger, I'm typically tempted to attack the problem head on, but I find that my brain just tends to flit around the details of the problem instead of actually making progress on it. So lately I've been toying with the idea of trying to break off a piece of the problem that can be easily modularized and fits fully in my working memory and then solving it on its own. (Feynman: "What's the smallest nontrivial example?") You could turn this definition around and define a good software architecture as one that consists of modular components that can individually be made to fit completely in to one's working memory when reading code.
As you write or read code modules, you'll come to understand them better and you'll be able to compress or "chunk" them so they take up less space in your working memory. This is why top-down programming doesn't always work that well. You're trying to fit the entire design in your working memory, but because you don't have a good understanding of the components yet (since you haven't written them), you aren't dealing with chunks but pseudochunks. This is true for concepts in general: it takes all of a beginner's WM to comprehend a for loop, but in a master's WM a for loop can be but one piece in a larger puzzle.
Swapping
One thing to observe: you don't get alerted when memory at the top of your mental hierarchy gets overwritten. We've all had the experience of having some idea in the shower and having forgotten it by the time we get out. Similarly, if you're working on a delicate mental task (programming, math, etc.) and you get interrupted, you'll lose mental state related to the problem you're working on.
If you're having difficulty focusing, this can easily make doing a delicate mental task, like a complicated math problem, much less fun and productive. Instead of actually making progress on the task, your mind drifts away from it, and when you redirect your attention, you find that information related to the problem has swapped out of your working memory or short-term memory and must be re-loaded. If you're getting distracted frequently enough or you're otherwise lacking mental stamina, you may find that you spend the majority of your time context switching instead of making progress on your problem.
Adding an additional external cache level
Anecdotally, adding an additional brain cache level between long-term memory and Google seems like a pretty big win for personal productivity. My digital notebook (since writing that post, I've started using nvALT) has turned out to be one of my biggest wins where productivity is concerned; it's ballooned to over 700K words, and a decent portion of it consists of copy-pasted snippets that represent the best information from Google searches I've done. A co-worker wrote a tool that allows him to quickly look up how to use software libraries and reports that he's continued to find it very useful years after making it.
Text is the most obvious example of an exobrain memory device, but here's a more interesting example: if you're cleaning a messy room, you probably don't develop a detailed plan in your head of where all of your stuff will be placed when you finish cleaning. Instead, you incrementally organize things in to related piles, then decide what to do with the piles, using the organization of the items in your room as a kind of external memory aid that allows you to do a mental task that you wouldn't be able to do entirely in your head.
Would it be accurate to say that you're "not intelligent enough" to organize your room in your head without the use of any external memory aides? It doesn't really fit with the colloquial use of "intelligence", does it? But in the same way computers are frequently RAM-limited, I suspect that humans are also frequently RAM-limited, even on mental tasks we frequently associate with "intelligence". For example, if you're reading a physics textbook and you notice that you're getting confused, you could write down a question that would resolve your confusion, then rewrite the question to be as precise as possible, then list hypotheses that would answer your question along with reasons to believe/disbelieve each hypothesis. By writing things down, you'd be able to devote all of your working memory to the details of a particular aspect of your confusion without losing track of the rest of it.
Short term memory is working memory. "Short-term memory" is a distinction no longer used by cognitive psychologists.
Really, you have highly activated long-term memory (working memory), less activated memory (things you've recently thought about), and even less activated memory. Level of activation, and graph distance from activated nodes, is what determines probability and speed of recall.
This is basic cognitive psychology; I don't know of any good textbooks on the subject because the classes I took in this never used textbooks, but with some scholarship (authors I recommend are Baddley & Hitch, Attkinson & Schiffrin, and later Engle) you should know this to be true.
Notice that this is true at the micro and macro levels of processing. You can use an API in a day and be familiar with it while still losing track of things at the end of the day. You can use an API for a month and be reasonably fluent in it a month later.
nVALT looks like an incredibly valuable tool; I use a simple wiki for this but feel like that should be further out in my cache hierarchy, storing more organized and structured content rather than quick notes. Thanks for pointing it out.
However, human memory is functionally infinite: the process is bound by encoding time rather than any notion of "space". As such, you should definitely invest in creating a set of Anki decks. Anything you want to quickly remember forever should be in an Anki deck. nVALT and related systems should only store relationships, things you can't easily fit on Anki decks and want to be able to compute over.
You can make things even easier to remember by making them more proximal to things you've overlearned and will never forget; for example, learn functional programming and express everything in terms of functional programming. If you want to learn a new API or framework, phrase it in terms of functional programming. This is just an example, but given thought you can extend this practice.
Thanks for the neuroscience info!
I've found that memorizing info in Anki takes significantly longer than writing it in my digital notebook. In his Anki guide, gwern writes:
... (read more)