x

Pedro Freire

Message

30

5y

Uncovering Latent Human Wellbeing in LLM Embeddings

tl;dr A one-dimensional PCA projection of OpenAI's text-embedding-ada-002 achieves 73.7% accuracy on the ETHICS Util test dataset. This is comparable with the 74.6% accuracy of BERT-large finetuned on the entire ETHICS Util training dataset. This demonstrates how language models are developing implicit representations of human utility even without direct preference...

Sep 14, 202332

Pedro Freire

Subscribe

Message

30

5y

Pedro Freire

Uncovering Latent Human Wellbeing in LLM Embeddings

tl;dr A one-dimensional PCA projection of OpenAI's text-embedding-ada-002 achieves 73.7% accuracy on the ETHICS Util test dataset. This is comparable with the 74.6% accuracy of BERT-large finetuned on the entire ETHICS Util training dataset. This demonstrates how language models are developing implicit representations of human utility even without direct preference...

Sep 14, 202332

Uncovering Latent Human Wellbeing in LLM Embeddings

ChengCheng

ChengCheng, Pedro Freire, Dan H, Scott Emmons

2y

tl;dr

A one-dimensional PCA projection of OpenAI's text-embedding-ada-002 achieves 73.7% accuracy on the ETHICS Util test dataset. This is comparable with the 74.6% accuracy of BERT-large finetuned on the entire ETHICS Util training dataset. This demonstrates how language models are developing implicit representations of human utility even without direct preference finetuning.

Introduction

Large language models (LLMs) undergo pre-training on vast amounts of human-generated data, enabling them to encode not only knowledge about human languages but also potential insights into our beliefs and wellbeing. Our goal is to uncover whether these models implicitly grasp concepts such as 'pleasure and pain' without explicit finetuning. This research aligns with the broader effort of comprehending how AI systems interpret and learn from human... (read 2215 more words →)

7

32

LESSWRONG
LW

LESSWRONG
LW

Pedro Freire

Pedro Freire

Pedro Freire

Uncovering Latent Human Wellbeing in LLM Embeddings

Pedro Freire

Pedro Freire

Pedro Freire

Uncovering Latent Human Wellbeing in LLM Embeddings

tl;dr

Introduction