To repeat the intuitive idea: an abstract model throws away or ignores information from the concrete model, but in such a way that we can still make reliable predictions about some aspects of the underlying system.
That's not quite my understanding what an abstraction is. What you have described is basic modeling. An abstraction is a model that works well in a large class of disparate domains, like polymorphism in programming. The idea of addition is such an abstraction, it works equally well for numbers, strings, sheep, etc. What you call a natural abstraction is closer to my intuitive understanding of the concept. I do not subscribe to your assertion that this is a "property of the territory". Sheep are not like bit strings. Anyhow, ideas are in the mind, and different sets of ideas can be useful for predicting different sets of observations of seemingly unrelated parts of the territory. Also, good luck with your research, whatever definition of abstraction you use, as long as it is useful to you.
Worth noting that 'abstraction' has different meanings in different disciplines. For example, Wikipedia has separate articles for abstraction in computer science, abstraction in mathematics, abstraction in linguistics, and abstraction in sociology.
Does abstraction also need to make answering your queries computationally easier?
I could throw away unnecessary information, encrypt it, and provide the key as the solution to an NP-hard problem.
Is this still an abstraction?
Good question. In these posts, I generally ignored computational constraints - i.e. effectively assumed infinite compute. I expect that one could substitute a computationally limited version of probability theory (like logical induction, for instance) and get a qualitatively-similar notion of abstraction which sufficiently-computationally-"scrambled" info is no longer an abstraction, even if the scrambling is reversible in principle.
I think that there are some abstractions that aren't predictively useful, but are still useful in deciding your actions.
Suppose I and my friend both have the goal of maximising the number of DNA strings whose MD5 hash is prime.
I call sequences with this property "ana" and those without this property "kata". Saying that "the DNA over there is ana" does tell me something about the world, there is an experiment that I can do to determine if this is true or false, namely sequencing it and taking the hash. The concept of "ana" isn't useful in a world where no agents care about it and no detectors have been built. If your utility function cares about the difference, it is a useful concept. If someone has connected an ana detector to the trigger of something important, then its a useful concept. If your a crime scene investigator, and all you know about the perpetrators DNA is that its ana, then finding out if Joe Blogs has ana DNA could be important. The concept of ana is useful. If you know the perpitrators entire genome, the concept stops being useful.
A general abstraction is consistent with several, but not all universe states. There are many different universe states in which the gas has a pressure of 37Pa, but also many where it isn't. So all abstractions are subsets of possible universe states. Usually, we use subsets that are suitable for reasoning about in some way.
Suppose you were literally omniscient, knowing every detail of the universe, but you had to give humans a 1Tb summary. Unable to include all the info you might want, you can only include a summery of the important points, you are now engaged in lossy compression.
Sensor data is also an abstraction, for instance you might have temperature and pressure sensors. Cameras record roughly how many photons hit them without tracking every one. So real world agents are translating one lossy approximation of the world into another without ever being able to express the whole thing explicitly.
How you do lossy compression depends on what you want. Music is compressed in a way that is specific to defects in human ears. Abstractions are much the same.
How you do lossy compression depends on what you want.
I think this is technically true, but less important than it seems at first glance. Natural abstractions are a thing, which means there's instrumental convergence in abstractions - some compressed information is relevant to a far wider variety of objectives than other compressed information. Representing DNA sequences as strings of four different symbols is a natural abstraction, and it's useful for a very wide variety of goals; MD5 hashes of those strings are useful only for a relatively narrow set of goals.
Somewhat more formally... any given territory has some Kolmogorov complexity, a maximally-compressed lossless map. That's a property of the territory alone, independent of any goal. But it's still relevant to goal-specific lossy compression - it will very often be useful for lossy models to re-use the compression methods relevant to lossless compression.
For instance, maybe we have an ASCII text file which contains only alphanumeric and punctuation characters. We can losslessly compress that file using e.g. Huffman coding, which uses fewer bits for the characters which appear more often. Now we decide to move on to lossy encoding - but we can still use the compressed character representation found by Huffman, assuming the lossy method doesn't change the distribution of characters too much.
An abstraction like "object permanence" would be useful for a very wide variety of goals, maybe even for any real-world goal. An abstraction like "golgi apparatus" is useful for some goals but not others. "Lossless" is not an option in practice: our world is too rich, you can just keep digging deeper into any phenomenon until you run out of time and memory ... I'm sure that a 50,000 page book could theoretically be written about earwax, and it would still leave out details which for some goals would be critical. :-)
Abstraction means assigning a symbol to reference a set of other symbols. It saves time and memory: time by allowing retrieval of data based on a set of rules, memory by shrinking the size of the reference.
For example: the words 'natural' and 'artificial': we sort things into one of these labels based on whether or not they were made by a human. A 'natural' thing could be 'physical' or 'biological'. An 'artificial' thing could be 'theory' or 'implementation'. If I don't need to distinguish between physical and biological things, instead of referring to them directly, I can use the more abstract reference of 'natural' things, saving space and time in my statement.
The challenge with natural language abstraction is agreeing on definitions. Many would define the terms in the above example differently. The more we can agree on definitions of terms, the better we can reason about their subsets.
In a logically valid system of abstraction, any symbol can be related to every other symbol: either by a common parent reference, or by using one to refer to the other.
Let's start with a few examples (borrowed from here) to illustrate what we're talking about:
The general pattern: there’s some ground-level “concrete” model (or territory), and an abstract model (or map). The abstract model throws away or ignores information from the concrete model, but in such a way that we can still make reliable predictions about some aspects of the underlying system.
Notice that the predictions of the abstract models, in most of these examples, are not perfectly accurate. We're not dealing with the sort of "abstraction" we see in e.g. programming or algebra, where everything is exact. There are going to be probabilities involved.
In the language of embedded world-models, we're talking about multi-level models: models which contain both a notion of "table", and of all the pieces from which the table is built, and of all the atoms from which the pieces are built. We want to be able to use predictions from one level at other levels (e.g. predict bulk material properties from microscopic structure, or predict from material properties whether it's safe to sit on the table), and we want to move between levels consistently.
Formalization: Starting Point
To repeat the intuitive idea: an abstract model throws away or ignores information from the concrete model, but in such a way that we can still make reliable predictions about some aspects of the underlying system.
So to formalize abstraction, we first need some way to specify which "aspects of the underlying system" we wish to predict, and what form the predictions take. The obvious starting point for predictions is probability distributions. Given that our predictions are probability distributions, the natural way to specify which aspects of the system we care about is via a set of events or logic statements for which we calculate probabilities. We'll be agnostic about the exact types for now, and just call these "queries".
That leads to a rough construction. We start with some concrete model MC and a set of queries Q. From these, we construct a minimal abstract model MA by keeping exactly the information relevant to the queries, and throwing away all other information. By the minimal map theorems, we can represent MA directly by the full set of probabilities P[Q|MC]; MA and P[Q|MC] contain exactly the same information. Of course, in practical examples, the probabilities P[Q|MC] will usually have some more compact representation, and MA will usually contain some extraneous information as well.
To illustrate a bit, let's identify the concrete model, class of queries, and abstract model for a few of the examples from earlier.
Already with the second two examples there seems to be some "cheating" going on in the model definition: we just define the query class as all the events/logic statements whose probabilities change based on the information in the map. But if we can do that, then anything can be an "abstract map" of any "concrete territory", with the queries Q taken to be the events/statements about the territory which the map actually has some information about - not a very useful definition!
Natural Abstractions
Intuitively, It seems like there exist "natural abstractions" - large sets of queries on a given territory which all require roughly the same information. Statistical mechanics is a good source of examples - from some macroscopic initial conditions, we can compute whatever queries we want about any macroscopic measurements later on. Note that such natural abstractions are a property of the territory - it's the concrete-level model which determines what large classes of queries can be answered with relatively little information.
For now, I'm interested primarily in abstraction of causal dags - i.e. cases in which both the concrete and abstract models are causal dags, and there is some reasonable correspondence between counterfactuals in the two. In this case, the set of queries should include counterfactuals, i.e. do() operations in Pearl's language. (This does require updating definitions/notation a bit, since our queries are no longer purely events, but it's a straightforward-if-tedious patch.) That's the main subject I'm researching in the short term: what are the abstractions which support large classes of causal counterfactuals? Expect more posts on the topic soon.