(...) the term technical is a red flag for me, as it is many times used not for the routine business of implementing ideas but for the parts, ideas and all, which are just hard to understand and many times contain the main novelties.
- Saharon Shelah
As a true-born Dutchman I endorse Crocker's rules.
For my most of my writing see my short-forms (new shortform, old shortform)
Twitter: @FellowHominid
Personal website: https://sites.google.com/view/afdago/home
I am not sure what 'it' refers to in 'it is bad'.
People read more into this shortform than I intended. It is not a cryptic reaction, criticism, or reply to/of another post.
I don't know what you mean by intelligent [pejorative] but it sounds sarcarcastic.
To be clear, the low predictive efficiency is not a dig at archeology. It seems I have triggered something here.
Whether a question/domain has low or high (marginal) predictive effiency is not a value judgement, just an observation.
I don't dispute these facts.
A lot of discussion of intelligence considers it as a scalar value that measures a general capability to solve a wide range of tasks. In this conception of intelligence it is primarily a question of having a ' good Map' . This is a simplistic picture since it's missing the intrinsic limits imposed on prediction by the Territory. Not all tasks or domains have the same marginal returns to intelligence - these can vary wildly.
Let me tell you about a 'predictive efficiency' framework that I find compelling & deep and that will hopefully give you some mathematical flesh to these intuitions. I initially learned about these ideas in the context of Computational Mechanics, but I realized that there underlying ideas are much more general.
Let be a predictor variable that we'd like to use to predict a target variable under a joint distribution . For instance could be the contex window and could be the next hundred tokens, or could be the past market data and is the future market data.
In any prediction task there are three fundamental and independently varying quantities that you need to think of:
For the third quantity, let us introduce the notion of causal states or minimally sufficient statistics. We define an equivalence relation on by declaring
The resulting equivalence classes, denoted as , yield a minimal sufficient statistic for predicting . This construction is ``minimal'' because it groups together all those that lead to the same predictive distribution , and it is ``sufficient'' because, given the equivalence class , no further refinement of can improve our prediction of .
From this, we define the forecasting complexity (or statistical complexity) as
which measures the amount of information---the cost in bits---to specify the causal state of . Finally, the \emph{predictive efficiency} is defined by the ratio
which tells us how much of the complexity actually contributes to reducing uncertainty in . In many real-world domains, even if substantial information is stored (high ), the gain in predictability () might be modest. This situation is often encountered in fields where, despite high skill ceilings (i.e. very high forecasting complexity), the net effect of additional expertise is limited because the predictive information is a small fraction of the complexity.
Example of low efficiency.
Let be the outcome of 100 independent fair coin flips, so each has bits.
Define as a single coin flip whose bias is determined by the proportion of heads in . That is, if has heads then:
In this example, a vast amount of internal structural information (the cost to specify the causal state) is required to extract just a tiny bit of predictability. In practical terms, this means that even if one possesses great expertise—analogous to having high forecasting complexity —the net benefit is modest because the inherent (predictive efficiency) is low. Such scenarios are common in fields like archaeology or long-term political forecasting, where obtaining a single predictive bit of information may demand enormous expertise, data, and computational resources.
Please read carefully what I wrote - I am talking about energy consumption worldwide not electricity consumption in the EU. Electricity in the EU accounts only for a small percentage of carbon emissions.
See
As you can see, solar energy is still a tiny percentage of total energy sources. I don't think it is an accident that the electricity split graph in the EU has been cited in this discussion because it is a proxy that is much more rose-colored.
Energy and electricity are often conflated in discussions around climate change, perhaps not coincidentally because the latter seems much more tractable to generate renewably than total energy production.
No, it is not confused. Be careful with reading precisely what I wrote. I said total energy production worldwide, not electricity production in the european union.
As you can see Solar is still a tiny percentage of energy consumption. That is not to say that things will not change - I certainly hope so! I give it significant probability. But if we are to be honest with ourselves than it is currently yet to be seen whether solar energy will prove to be the solution.
Moreover, in the case that solar energy does take over and ' solve' climate change that still does not prove the thesis - that solar energy solving climate change being majorly the result of deliberate policy instead of the result of market forces / ceteris paribus technological development.
I somewhat agree but
And finally there is a difference between skill ceilings for domains with high versus low predictive efficiency. In the latter much more intelligence will still yield returns but rapidly diminishing
(See my other comment for more details on predictive effiency)
One aspect I didnt speak about that may be relevant here is the distinction between
irreducible uncertainty h (noise, entropy)
reducible uncertainty E ('excess entropy')
and forecasting complexity C ('stochastic complexity').
All three can independently vary in general.
Domains can be more or less noisy (more entropy h)- both inherently and because of limited observations
Some domains allow for a lot of prediction (there is a lot of reducible uncertainty E) while others allow for only limited prediction (eg political forecasting over longer time horizons)
And said prediction can be very costly to predict (high forecasting complexity C). Archeology is a good example: to predict one bit about the far past correctly might require an enormous amount of expertise, data and information. In other words it s really about the ratio between the reducible uncertainty and the forecasting complexity: E/C.
Some fields have very high skill ceiling but because of a low E/C ratio the net effect of more intelligence is modest. Some domains arent predictable at all, i.e. E is low. Other domains have a more favorable E/C ratio and C is high. This is typically a domain where there is a high skill ceiling and the leverage effect of addiitonal intelligence is very large.
[For a more precise mathematical toy model of h, E,C take a look at computational mechanics]
From marginal revolution.
What does this crowd think? These effects are surprisingly small. Do we believe these effects? Anecdotally the effect of LLMs has been enormous for my own workflow and colleagues. How can this be squared with the supposedly tiny labor market effect?
Are we that selected of a demographic?