The current crop of AI systems appears to have world models to varying degrees of detailedness, but we cannot understand these world models easily as they are mostly giant floating-point arrays. If we knew how to interpret individual parts of the AIs’ world models, we would be able to specify...
A desirable property of an AI’s world model is that you as its programmer have an idea what’s going on inside. It would be good if you could point to a part of the world model and say, “This here encodes the concept of a strawberry; here is how this...
You want to get to your sandwich: Well, that’s easy. Apparently we are in some kind of grid world, which is presented to us in the form of a lattice graph, where each vertex represents a specific world state, and the edges tell us how we can traverse the world...
When you look up the color temperature of daylight, most sources will say 6500K, but if you buy an LED with that color temperature, it will not look like the sun in the sky. It will seem bluer (or, less yellow-y). Yet, 6500K is arguably the correct number. What is...
This is the final part of my introduction to dependent type theory. The big theme of this article is equality though that may not be immediately obvious. Let’s start by finally discussing function extensionality and propositional extensionality. What’s the deal with extensionality? Whenever we define concepts, there are basically two...
This is the fourth entry in my series on type theory. We will introduce a lot of notation that is reminiscent of set theory, to make everyone feel more at home, and then we’ll discuss the axiom of choice. Subtypes Our next topic is “subsets” within types. I mentioned in...
In June, I received a grant from the LTFF for a 6-months period of self-study aimed at mastering the necessary background for AI alignment research. The following is advice I would give to people who are attempting something similar. I have tried to keep it short. Basic advice You’ll naturally...