SB-1047, ChatGPT and AI's Game of Thrones

Part-I (The Sin of Greed)

On 30 November 2022, OpenAI released ChatGPT. According to Sam Altman, it was supposed to be a demo^[1] to show the progress in language models. By December 4, in just 5 days it had gained 1 million users, for comparison it took Instagram 75 days, Spotify 150 days and Netflix 2 years to gain same no. of users. By January 2023, it had 100 million users, adding 15 million users every week. ChatGPT became the fastest growing service in human history. Though most people would only come to realize this later, the race for AGI had officially began.

Days it took popular services to get to 1M and 100M users

Too

... (read 3730 more words →)

-2

Introduction to Choice set Misspecification in Reward Inference

Rahul Chand

In classical RL, we have an agent with a set of States (S), a set of action (A), and given some reward function (R), the aim is to find out the optimal policy (pi) which maximizes the following. This is the cummulative rewards we get by sampling actions using our policy (here we assume discount factor is 1)

$π * = a r g m a x (E_{π} [Σ R (s ₜ, a ₜ, s ₜ ₊ ₁)])$

The problem with this approach is that rewards do not always map to the correct set of actions or policies we want. For example, assume a game of football where I have specialized agents playing... (read 2318 more words →)

Monosemanticity & Quantization

Rahul Chand

In this post, I will cover Anthropic's work on monosemanticity^[1]. Starting with a brief introduction to the motivation and methodology. Then move on to my ablation experiments where I train a sparse autoencoder on "gelu-2l"^[2] and its quantized versions to see what insights I can gain.

Introduction

The holy grail of mech interpretability is to figure out what feature each single neuron in the network corresponds to and how changing these features changes the final output. Turns out this is really tough because of polysemanticity, that is neurons end up learning/activating for a mixture of features, which makes it tough to find out what exactly does a particular neuron do. One reason this happens is... (read 2456 more words →)

Singular Learning Theory for Dummies

Rahul Chand

In this post, I will cover Jesse Hoogland's work on Singular Learning Theory. The post is mostly meant as a dummies guide, and therefore won't be adding anything meaningfully new to his work. As a helpful guide at each point I try to mark the math difficulty of each section, some of it is trivial (I), some of it can be with followed with enough effort (II) and some of it out of our(my) scope and we blindly believe them to be true (III).

Background

Statistical learning theory is lying to you: "overparametrized" models actually aren't overparametrized, and generalization is not just a question of broad basins

Singular Learning Theory (SLT) tries to explain how... (read 2398 more words →)

AGI & Consciousness - Joscha Bach

Rahul Chand

In this post, I cover Joscha Bach' views on consciousness, how it relates to intelligence, and what role it can play to get us closer to AGI. The post is divided into three parts, first I try to cover why Joscha is interested in understanding consciousness, next, I go over what consciousness is according to him and in the final section I try to tie it all up by connecting the importance of consciousness to AI development.

Why consciousness?

Joscha's interest in consciousness is not just because of its philosophical importance or because he wants to study to what extent current models are conscious. Joscha views consciousness as this fundamental property that allows the... (read 2781 more words →)

AGI Farm

Rahul Chand

This post discusses Joe Carlsmith’s views on how to approach the problem of AI risk as interspecies interaction and how humans can use it navigate future AI development better. The essay is divided into three parts. First I give my understanding of Carlsmith's views, then I build upon some of his ideas by relating them to the field of superalignment and how the logical conclusion of his ideas might lead to scenarios where "too good" becomes bad.

The Prior - A brief summary of Joe Carlsmith’s views

In “Artificial Other” Carlsmith talks about two things

How to frame the discussion of potential AI risk as inter species interaction & the issues & opportunities that arise

... (read 2160 more words →)

LESSWRONG
LW

LESSWRONG
LW

Rahul Chand

Singular Learning Theory for Dummies

Introduction to Choice set Misspecification in Reward Inference

AGI & Consciousness - Joscha Bach

Monosemanticity & Quantization

Rahul Chand

SB-1047, ChatGPT and AI's Game of Thrones

Introduction to Choice set Misspecification in Reward Inference

Monosemanticity & Quantization

Singular Learning Theory for Dummies

AGI & Consciousness - Joscha Bach

AGI Farm

Rahul Chand

Singular Learning Theory for Dummies

Introduction to Choice set Misspecification in Reward Inference

AGI & Consciousness - Joscha Bach

Monosemanticity & Quantization

Rahul Chand

SB-1047, ChatGPT and AI's Game of Thrones

Introduction to Choice set Misspecification in Reward Inference

Monosemanticity & Quantization

Singular Learning Theory for Dummies

AGI & Consciousness - Joscha Bach

AGI Farm

Part-I (The Sin of Greed)

Too

Introduction

Background

Why consciousness?

The Prior - A brief summary of Joe Carlsmith’s views