Director of AI research at ALTER, where I lead a group working on the learning-theoretic agenda for AI alignment. I'm also supported by the LTFF. See also LinkedIn.
E-mail: {first name}@alter.org.il
P(GPT-5 Release)
What is the probability that OpenAI will release GPT-5 before the end of 2025? "Release" means that a random member of the public can use it, possibly paid.
Does this require a product called specifically "GPT-5"? What if they release e.g "OpenAI o2" instead, and there will never be something called GPT-5?
Number of Current Partners
(for example, 0 if you are single, 1 if you are in a monogamous relationship, higher numbers for polyamorous relationships)
This is a confusing phrasing. If you have 1 partner, it doesn't mean your relationship is monogamous. A monogamous relation is one in which there is a mutually agreed understanding that romantic or sexual interaction with other people is forbidden. Without this, your relationship is not monogamous. For example:
All of the above are not monogamous relationships!
I've been thinking along very similar lines for a while (my inside name for this is "mask theory of the mind": consciousness is a "mask"). But my personal conclusion is very different. While self-deception is a valid strategy in many circumstances, I think that it's too costly when trying to solve an extremely difficult high-stakes problem (e.g. stopping the AI apocalypse). Hence, I went in the other direction: trying to self-deceive little, and instead be self-honest about my[1] real motivations, even if they are "bad PR". In practice, this means never making excuses to myself such as "I wanted to do A, but I didn't have the willpower so I did B instead", but rather owning the fact I wanted to do B and thinking how to integrate this into a coherent long-term plan for my life.
My solution to "hostile telepaths" is diving other people into ~3 categories:
Moreover, having an extremely difficult high-stakes problem is not just a strong reason to self-deceive less, it's also strong reason to become more truth-oriented as a community. This means that people with such a common cause should strive to put each other at least in category 2 above, tentatively moving towards 3 (with the caveat of watching out for bad actors trying to exploit that).
While making sure to use the word "I" to refer to the elephant/unconscious-self and not to the mask/conscious-self.
Two thoughts about the role of quining in IBP:
I just read Daniel Boettger's "Triple Tragedy And Thankful Theory". There he argues that the thrival vs. survival dichotomy (or at least its implications on communication) can be understood as time-efficiency vs. space-efficiency in algorithms. However, it seems to me that a better parallel is bandwidth-efficiency vs. latency-efficiency in communication protocols. Thrival-oriented systems want to be as efficient as possible in the long-term, so they optimize for bandwidth: enabling the transmission of as much information as possible over any given long period of time. On the other hand, survival-oriented systems want to be responsive to urgent interrupts which leads to optimizing for latency: reducing the time it takes between a piece of information appearing on one end of the channel and that piece of information becoming known on the other end.
I believe that all or most of the claims here are true, but I haven't written all the proofs in detail, so take it with a grain of salt.
Ambidistributions are a mathematical object that simultaneously generalizes infradistributions and ultradistributions. It is useful to represent how much power an agent has over a particular system: which degrees of freedom it can control, which degrees of freedom obey a known probability distribution and which are completely unpredictable.
Definition 1: Let be a compact Polish space. A (crisp) ambidistribution on is a function s.t.
Conditions 1+3 imply that is 1-Lipschitz. We could introduce non-crisp ambidistributions by dropping conditions 2 and/or 3 (and e.g. requiring 1-Lipschitz instead), but we will stick to crisp ambidistributions in this post.
The space of all ambidistributions on will be denoted .[1] Obviously, (where stands for (crisp) infradistributions), and likewise for ultradistributions.
Example 1: Consider compact Polish spaces and a continuous mapping . We can then define by
That is, is the value of the zero-sum two-player game with strategy spaces and and utility function .
Notice that in Example 1 can be regarded as a Cartesian frame: this seems like a natural connection to explore further.
Example 2: Let and be finite sets representing actions and observations respectively, and be an infra-Bayesian law. Then, we can define by
In fact, this is a faithful representation: can be recovered from .
Example 3: Consider an infra-MDP with finite state set , initial state and transition infrakernel . We can then define the "ambikernel" by
Thus, every infra-MDP induces an "ambichain". Moreover:
Claim 1: is a monad. In particular, ambikernels can be composed.
This allows us defining
This object is the infra-Bayesian analogue of the convex polytope of accessible state occupancy measures in an MDP.
Claim 2: The following limit always exists:
Definition 3: Let be a convex space and . We say that occludes when for any , we have
Here, stands for convex hull.
We denote this relation . The reason we call this "occlusion" is apparent for the case.
Here are some properties of occlusion:
Notice that occlusion has similar algebraic properties to logical entailment, if we think of as " is a weaker proposition than ".
Definition 4: Let be a compact Polish space. A cramble set[2] over is s.t.
Question: If instead of condition 3, we only consider binary occlusion (i.e. require , do we get the same concept?
Given a cramble set , its Legendre-Fenchel dual ambidistribution is
Claim 3: Legendre-Fenchel duality is a bijection between cramble sets and ambidistributions.
The space is equipped with the obvious partial order: when for all . This makes into a distributive lattice, with
This is in contrast to which is a non-distributive lattice.
The bottom and top elements are given by
Ambidistributions are closed under pointwise suprema and infima, and hence is complete and satisfies both infinite distributive laws, making it a complete Heyting and co-Heyting algebra.
is also a De Morgan algebra with the involution
For , is not a Boolean algebra: and for any we have .
One application of this partial order is formalizing the "no traps" condition for infra-MDP:
Definition 2: A finite infra-MDP is quasicommunicating when for any
Claim 4: The set of quasicommunicating finite infra-MDP (or even infra-RDP) is learnable.
Going to the cramble set representation, iff .
is just , whereas is the "occlusion hall" of and .
The bottom and the top cramble sets are
Here, is the top element of (corresponding to the credal set .
The De Morgan involution is
Definition 5: Given compact Polish spaces and a continuous mapping , we define the pushforward by
When is surjective, there are both a left adjoint and a right adjoint to , yielding two pullback operators :
Given and we can define the semidirect product by
There are probably more natural products, but I'll stop here for now.
Definition 6: The polytopic ambidistributions are the (incomplete) sublattice of generated by .
Some conjectures about this:
One reason to doubt chaos theory’s usefulness is that we don’t need fancy theories to tell us something is impossible. Impossibility tends to make itself obvious.
This claim seems really weird to me. Why do you think that's true? A lot of things we accomplished with technology today might seem impossible to someone from 1700. On the other hand, you could have thought that e.g. perpetuum mobile, or superluminal motion, or deciding whether a graph is 3-colorable in worst-case polynomial time, or transmitting information with a rate higher than Shannon-Hartley is possible if you didn't know the relevant theory.
Here's the sketch of an AIT toy model theorem that in complex environments without traps, applying selection pressure reliably produces learning agents. I view it as an example of Wentworth's "selection theorem" concept.
Consider any environment of infinite Kolmogorov complexity (i.e. uncomputable). Fix a computable reward function
Suppose that there exists a policy of finite Kolmogorov complexity (i.e. computable) that's optimal for in the slow discount limit. That is,
Then, cannot be the only environment with this property. Otherwise, this property could be used to define using a finite number of bits, which is impossible[1]. Since requires infinitely many more bits to specify than and , there has to be infinitely many environments with the same property[2]. Therefore, is a reinforcement learning algorithm for some infinite class of hypothesis.
Moreover, there are natural examples of as above. For instance, let's construct as an infinite sequence of finite communicating infra-RDP refinements that converges to an unambiguous (i.e. "not infra") environment. Since each refinement involves some arbitrary choice, "most" such have infinite Kolmogorov complexity. In this case, exists: it can be any learning algorithm for finite communicating infra-RDP with arbitrary number of states.
Besides making this a rigorous theorem, there are many additional questions for further investigation:
Probably, making this argument rigorous requires replacing the limit with a particular regret bound. I ignore this for the sake of simplifying the core idea.
There probably is something more precise that can be said about how "large" this family of environment is. For example, maybe it must be uncountable.
Can you explain what's your definition of "accuracy"? (the 87.7% figure)
Does it correspond to some proper scoring rule?
I feel that this post would benefit from having the math spelled out. How is inserting a trader a way to do feedback? Can you phrase classical RL like this?