Alexander Gietelink Oldenziel

(...) the term technical is a red flag for me, as it is many times used not for the routine business of implementing ideas but for the parts, ideas and all, which are just hard to understand and many times contain the main novelties.
- Saharon Shelah

As a true-born Dutchman I endorse Crocker's rules.

For my most of my writing see my short-forms (new shortform, old shortform)

Twitter: @FellowHominid

Personal website: https://sites.google.com/view/afdago/home

Sequences

Singular Learning Theory

Posts

Sorted by New

5Alexander Gietelink Oldenziel's Shortform

578

44Proceedings of ILIAD: Lessons and Progress

79Announcing ILIAD2: ODYSSEY

25d

96Timaeus in 2024

2mo

90Agent Foundations 2025 at CMU

3mo

67Timaeus is hiring!

10mo

163Announcing ILIAD — Theoretical AI Alignment Conference

11mo

20Are extreme probabilities for P(doom) epistemically justifed?

173Timaeus's First Four Months

59What's next for the field of Agent Foundations?

188Announcing Timaeus

Wikitag Contributions

Comments

Sorted by

Newest

Alexander Gietelink Oldenziel's Shortform

Alexander Gietelink Oldenziel3d210

Large Language Models, Small Labor Market Effects?

We examine the labor market effects of AI chatbots using two large-scale adoption surveys (late 2023 and 2024) covering 11 exposed occupations (25,000 workers, 7,000 workplaces), linked to matched employer-employee data in Denmark. AI chatbots are now widespread—most employers encourage their use, many deploy in-house models, and training initiatives are common. These firm-led investments boost adoption, narrow demographic gaps in take-up, enhance workplace utility, and create new job tasks. Yet, despite substantial investments, economic impacts remain minimal. Using difference-in-differences and employer policies as quasi-experimental variation, we estimate precise zeros: AI chatbots have had no significant impact on earnings or recorded hours in any occupation, with confidence intervals ruling out effects larger than 1%. Modest productivity gains (average time savings of 2.8%), combined with weak wage pass-through, help explain these limited labor market effects. Our findings challenge narratives of imminent labor market transformation due to Generative AI.

From marginal revolution.

What does this crowd think? These effects are surprisingly small. Do we believe these effects? Anecdotally the effect of LLMs has been enormous for my own workflow and colleagues. How can this be squared with the supposedly tiny labor market effect?

Are we that selected of a demographic?

Dalcy's Shortform

Alexander Gietelink Oldenziel9d20

@Fernando Rosas

Alexander Gietelink Oldenziel's Shortform

Alexander Gietelink Oldenziel12d20

I am not sure what 'it' refers to in 'it is bad'.

Alexander Gietelink Oldenziel's Shortform

Alexander Gietelink Oldenziel13d20

People read more into this shortform than I intended. It is not a cryptic reaction, criticism, or reply to/of another post.

I don't know what you mean by intelligent [pejorative] but it sounds sarcarcastic.

To be clear, the low predictive efficiency is not a dig at archeology. It seems I have triggered something here.

Whether a question/domain has low or high (marginal) predictive effiency is not a value judgement, just an observation.

Alexander Gietelink Oldenziel's Shortform

Alexander Gietelink Oldenziel13d20

I don't dispute these facts.

Alexander Gietelink Oldenziel's Shortform

Alexander Gietelink Oldenziel13d*240

The Marginal Returns of Intelligence

A lot of discussion of intelligence considers it as a scalar value that measures a general capability to solve a wide range of tasks. In this conception of intelligence it is primarily a question of having a ' good Map' . This is a simplistic picture since it's missing the intrinsic limits imposed on prediction by the Territory. Not all tasks or domains have the same marginal returns to intelligence - these can vary wildly.

Let me tell you about a 'predictive efficiency' framework that I find compelling & deep and that will hopefully give you some mathematical flesh to these intuitions. I initially learned about these ideas in the context of Computational Mechanics, but I realized that there underlying ideas are much more general.

Let be a predictor variable that we'd like to use to predict a target variable $Y$ under a joint distribution $p (x, y)$ . For instance $X$ could be the contex window and $Y$ could be the next hundred tokens, or $X$ could be the past market data and $Y$ is the future market data.

In any prediction task there are three fundamental and independently varying quantities that you need to think of:

$H (Y ∣ X)$ is the irreducible uncertainty or the intrinsic noise that remains even when $X$ is known.
$E = I (X; Y) = H (Y) - H (Y ∣ X),$ quantifies the reducible uncertainty or the amount of predictable information contained in $X$ .

For the third quantity, let us introduce the notion of causal states or minimally sufficient statistics. We define an equivalence relation on $X$ by declaring

$x \sim x^{'} if and only if p (Y ∣ x) = p (Y ∣ x^{'}) .$

The resulting equivalence classes, denoted as $c (X)$ , yield a minimal sufficient statistic for predicting $Y$ . This construction is ``minimal'' because it groups together all those $x$ that lead to the same predictive distribution $p (Y ∣ x)$ , and it is ``sufficient'' because, given the equivalence class $c (x)$ , no further refinement of $X$ can improve our prediction of $Y$ .

From this, we define the forecasting complexity (or statistical complexity) as

$C := H (c (X)),$

which measures the amount of information---the cost in bits---to specify the causal state of $X$ . Finally, the \emph{predictive efficiency} is defined by the ratio

$η = \frac{E}{C},$

which tells us how much of the complexity actually contributes to reducing uncertainty in $Y$ . In many real-world domains, even if substantial information is stored (high $C$ ), the gain in predictability ( $E$ ) might be modest. This situation is often encountered in fields where, despite high skill ceilings (i.e. very high forecasting complexity), the net effect of additional expertise is limited because the predictive information is a small fraction of the complexity.

Example of low efficiency.

Let $X \in {0, 1}^{100}$ be the outcome of 100 independent fair coin flips, so each $x$ has $H (X) = 100$ bits.

Define $Y \in {0, 1}$ as a single coin flip whose bias is determined by the proportion of heads in $X$ . That is, if $x$ has $k$ heads then:
$p (Y = 1 ∣ x) = \frac{k}{100}, p (Y = 0 ∣ x) = 1 - \frac{k}{100}$

Total information in $Y$ $H (Y)$ : \\
When averaged over all possible $X$ , the mean bias is 0.5 so that $Y$ is marginally a fair coin. Hence,
$H (Y) = 1 bit$
Conditional Entropy or irreducible uncertainty $H (Y ∣ X)$ : \\
Given $X$ , the outcome $Y$ is drawn from a Bernoulli distribution whose entropy depends on the number of heads in $X$ . For typical $X$ (around 50 heads), $H (Y ∣ x) \approx 1$ bit; however, averaging over all $X$ yields a slightly lower value. Numerically, one finds:
$H (Y ∣ X) \approx 0.98 bits .$
Predictable Information $E = I (X; Y)$ : \\
With the above numbers, the mutual information is
$E = H (Y) - H (Y ∣ X) \approx 1 - 0.98 = 0.02 bits .$
Forecasting Complexity $C = H (c (X))$ : \\
The causal state construction groups together all sequences $x$ with the same number $k$ of heads. Since $k \in {0, 1, . . ., 100}$ , there are 101 equivalence classes. The entropy of these classes is given by the entropy of the binomial distribution $Bin (100, 0.5)$ . Using an approximation:
$C \approx \frac{1}{2} {log}_{2} (2 π e (\frac{100}{4})) = \frac{1}{2} {log}_{2} (2 π e \cdot 25) \approx \frac{1}{2} {log}_{2} (427) \approx 4.37 bits .$
Predictive Efficiency $η$ :
$η = \frac{E}{C} \approx \frac{0.02}{4.37} \approx 0.0046.$

In this example, a vast amount of internal structural information (the cost to specify the causal state) is required to extract just a tiny bit of predictability. In practical terms, this means that even if one possesses great expertise—analogous to having high forecasting complexity —the net benefit is modest because the inherent $η$ (predictive efficiency) is low. Such scenarios are common in fields like archaeology or long-term political forecasting, where obtaining a single predictive bit of information may demand enormous expertise, data, and computational resources.

Alexander Gietelink Oldenziel's Shortform

Alexander Gietelink Oldenziel15d90

Please read carefully what I wrote - I am talking about energy consumption worldwide not electricity consumption in the EU. Electricity in the EU accounts only for a small percentage of carbon emissions.

See

As you can see, solar energy is still a tiny percentage of total energy sources. I don't think it is an accident that the electricity split graph in the EU has been cited in this discussion because it is a proxy that is much more rose-colored.

Energy and electricity are often conflated in discussions around climate change, perhaps not coincidentally because the latter seems much more tractable to generate renewably than total energy production.

Alexander Gietelink Oldenziel's Shortform

Alexander Gietelink Oldenziel15d40

No, it is not confused. Be careful with reading precisely what I wrote. I said total energy production worldwide, not electricity production in the european union.

As you can see Solar is still a tiny percentage of energy consumption. That is not to say that things will not change - I certainly hope so! I give it significant probability. But if we are to be honest with ourselves than it is currently yet to be seen whether solar energy will prove to be the solution.

Moreover, in the case that solar energy does take over and ' solve' climate change that still does not prove the thesis - that solar energy solving climate change being majorly the result of deliberate policy instead of the result of market forces / ceteris paribus technological development.

Alexander Gietelink Oldenziel's Shortform

Alexander Gietelink Oldenziel15d20

I somewhat agree but

The correlation is not THAT strong
The correlation differs by field

And finally there is a difference between skill ceilings for domains with high versus low predictive efficiency. In the latter much more intelligence will still yield returns but rapidly diminishing

(See my other comment for more details on predictive effiency)

Alexander Gietelink Oldenziel's Shortform

Alexander Gietelink Oldenziel15d20

One aspect I didnt speak about that may be relevant here is the distinction between

irreducible uncertainty h (noise, entropy)

reducible uncertainty E ('excess entropy')

and forecasting complexity C ('stochastic complexity').

All three can independently vary in general.

Domains can be more or less noisy (more entropy h)- both inherently and because of limited observations

Some domains allow for a lot of prediction (there is a lot of reducible uncertainty E) while others allow for only limited prediction (eg political forecasting over longer time horizons)

And said prediction can be very costly to predict (high forecasting complexity C). Archeology is a good example: to predict one bit about the far past correctly might require an enormous amount of expertise, data and information. In other words it s really about the ratio between the reducible uncertainty and the forecasting complexity: E/C.

Some fields have very high skill ceiling but because of a low E/C ratio the net effect of more intelligence is modest. Some domains arent predictable at all, i.e. E is low. Other domains have a more favorable E/C ratio and C is high. This is typically a domain where there is a high skill ceiling and the leverage effect of addiitonal intelligence is very large.

[For a more precise mathematical toy model of h, E,C take a look at computational mechanics]