x

LESSWRONG
LW

Pedro Freire

Pedro Freire

Message

30

5y

Pedro Freire hasn't written anything yet.

Pedro Freire

30

5y

;

Pedro Freire — LessWrong

Pedro Freire has not written any posts yet.

Uncovering Latent Human Wellbeing in LLM Embeddings

tl;dr A one-dimensional PCA projection of OpenAI's text-embedding-ada-002 achieves 73.7% accuracy on the ETHICS Util test dataset. This is comparable with the 74.6% accuracy of BERT-large finetuned on the entire ETHICS Util training dataset. This demonstrates how language models are developing implicit representations of human utility even without direct preference...

Sep 14, 2023•32