Part-I (The Sin of Greed) On 30 November 2022, OpenAI released ChatGPT. According to Sam Altman, it was supposed to be a demo[1] to show the progress in language models. By December 4, in just 5 days it had gained 1 million users, for comparison it took Instagram 75 days,...
In classical RL, we have an agent with a set of States (S), a set of action (A), and given some reward function (R), the aim is to find out the optimal policy (pi) which maximizes the following. This is the cummulative rewards we get by sampling actions using our...
In this post, I will cover Anthropic's work on monosemanticity[1]. Starting with a brief introduction to the motivation and methodology. Then move on to my ablation experiments where I train a sparse autoencoder on "gelu-2l"[2] and its quantized versions to see what insights I can gain. Introduction The holy grail...
In this post, I will cover Jesse Hoogland's work on Singular Learning Theory. The post is mostly meant as a dummies guide, and therefore won't be adding anything meaningfully new to his work. As a helpful guide at each point I try to mark the math difficulty of each section,...
In this post, I cover Joscha Bach' views on consciousness, how it relates to intelligence, and what role it can play to get us closer to AGI. The post is divided into three parts, first I try to cover why Joscha is interested in understanding consciousness, next, I go over...
This post discusses Joe Carlsmith’s views on how to approach the problem of AI risk as interspecies interaction and how humans can use it navigate future AI development better. The essay is divided into three parts. First I give my understanding of Carlsmith's views, then I build upon some of...