Final Words

Eliezer Yudkowsky

Comment Permalink

What do you think the results would be like if you try to use a language model to automatically filter for direct-opinion tweets and do automatic negation?

Reply

3Paul Bricman3y

We tried using (1) subjectivity (based on simple bag-of-words), and (2) zero-shot text classification (NLI-based) to help us sift through the years of tweets in search for bold claims. (1) seemed a pretty poor heuristic overall, and (2) was still super noisy (e.g. It would identify "that's awesome" as a bold claim, not particularly useful). The second problem was that even if tweets were identified as containing bold claims, those were often heavily contextualized in a reply thread, and so we tried decontextualizing those manually to increase the signal-to-noise ratio. Also, we were initially really confident that we'd use our automatic negation pipeline (i.e. few-shot prompt + DALL-E-like reranking of generations based on detected contradictions and minimal token edit distance), though in reality it would take way way longer than manual labeling given our non-existent infra. I agree that all those manual steps are huge sources of experimenter bias, though. Doing it the way you suggested would improve replicability, but also increase noise and compute demands.

Charlie Steiner3y20

Cool to hear you tried it!

Reply

See in context

5 [Linkpost] Value extraction via language model abduction

by Paul Bricman

1st May 2022

1 min read

3

5

This is a linkpost for https://paulbricman.com/thoughtware/velma

This is my first post on lesswrong. I'll merely be linkposting content on epistemics and alignment here while getting more familiar with the culture.

tl;dr:

We attempt to automatically infer one’s beliefs from their writing in three different ways. Initial results based on Twitter data hint at embeddings and language models being particularly promising approaches.

AI

Frontpage

5

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:13 PM

[-]Charlie Steiner3y30

What do you think the results would be like if you try to use a language model to automatically filter for direct-opinion tweets and do automatic negation?

Reply

[-]Paul Bricman3y30

We tried using (1) subjectivity (based on simple bag-of-words), and (2) zero-shot text classification (NLI-based) to help us sift through the years of tweets in search for bold claims. (1) seemed a pretty poor heuristic overall, and (2) was still super noisy (e.g. It would identify "that's awesome" as a bold claim, not particularly useful). The second problem was that even if tweets were identified as containing bold claims, those were often heavily contextualized in a reply thread, and so we tried decontextualizing those manually to increase the signal-to-noise ratio. Also, we were initially really confident that we'd use our automatic negation pipeline (i.e. few-shot prompt + DALL-E-like reranking of generations based on detected contradictions and minimal token edit distance), though in reality it would take way way longer than manual labeling given our non-existent infra.

I agree that all those manual steps are huge sources of experimenter bias, though. Doing it the way you suggested would improve replicability, but also increase noise and compute demands.