User Comment Replies

yc1mo1-1

Agreed on both points. I had to say, without reading the news occasionally, I do not realize how much I don’t know about things going on in the world. It does help me to stay informed, and deep concrete stories in particular, help to understand full pictures. Though many times, I also need to do a bit of digging to find more information to avoid bias, but if one stick to news sources that have better quality, this issue will be better. It is very likely the choice of news sources, matters.

The OP had a strong assumption that whatever reported in the n... (read more)

nikola's Shortform

yc2mo40

Terminology question - does adversarial filtering mean the same thing as decontamination?

5Lukas_Gloor2mo

In order to submit a question to the benchmark, people had to run it against the listed LLMs; the question would only advance to the next stage once the LLMs used for this testing got it wrong.

Habryka's Shortform Feed

yc3mo30

Could make this a report-based system? If the user reported a potential spam, then in the submission process ask for reasons, and ask for consent to look over the messages (between the reporter and the alleged spammer); if multiple people reported the same person it will be obvious this account is spamming with DM?

edit: just saw previous comment on this too

Open Thread Fall 2024

yc5mo10

Thanks, I was thinking of the latter more (human irrationality), but found your first part still interesting. I understand irrationality was studied in psychology and economics, and I was wondering on the modeling of irrationality particularly, for 1-2 players, but also for a group of agents. For example, there are arguments saying for a group of irrational agents, the group choice could be rational depending on group structure etc. On individual irrationality and continued group irrationality, I think we would need to estimate the level of (and prevalence of ) irrationality in some way that captures unconscious preferences, or incomplete information. How to best combine these? Maybe it would just be just more data driven.

5gilch5mo

That seems to be getting into Game Theory territory. One can model agents (players) with different strategies, even suboptimal ones. A lot of the insight from Game Theory isn't just about how to play a better strategy, but how changing the rules affects the game.

Boring & straightforward trauma explanation

yc5mo10

I am not sure if it needs to be conditional on if the event is unusual or not, or if would happen again or not in a forward looking sense in reality. Could you explain why the restriction there? Especially on <We do not call any behavior or emotional pattern ‘trauma’ if it is obviously adaptive.>

Open Thread Fall 2024

yc5mo10

How do we best model an irrational world rationally? I would assume we would need to understand at least how irrationality works?

4gilch5mo

Not sure I understand what you mean by that. The Universe seems to follow relatively simple deterministic laws. That doesn't mean you can use quantum field theory to predict the weather. But chaotic systems can be modeled as statistical ensembles. Temperature is a meaningful measurement even if we can't calculate the motion of all the individual gas molecules. If you're referring to human irrationality in particular, we can study cognitive bias, which is how human reasoning diverges from that of idealized agents in certain systematic ways. This is a topic of interest at both the individual level of psychology, and at the level of statistical ensembles in economics.

yc's Shortform

yc5mo10

Sharing an interesting report of the state of ai https://www.stateof.ai/

This includes multiple aspects of the current state of AI, and is reasonably good on the technical side.

Drake Thomas's Shortform

yc5mo11

Just saw the OP replied in another comment that he is offering advice.

[This comment is no longer endorsed by its author]Reply

Matt Goldenberg's Short Form Feed

yc6mo*30

It’s probably less on all internet but more on the rlhf guidelines (I imagine the human reviewers receive a guideline based on the LLM-training company’s policy, legal, and safety experts’ advice). I don’t disagree though that it could present a relatively more objective view on some topics than a particular individual (depending on the definition of bias).

Language Models Model Us

yc6mo10

Yeah for sure!

For PII - A relatively recent survey paper: https://arxiv.org/pdf/2403.05156

For pii/memorization generally:
- https://arxiv.org/pdf/2302.00539
- https://arxiv.org/abs/2202.07646
- Lab's LLM safety section typically has a pii/memorization section
For demographics inference:
- https://ieeexplore.ieee.org/document/9152761

For bias/fairness - survey paper: https://arxiv.org/pdf/2309.00770

This is probably far from complete, but I think the references in the survey paper, and in the Staab et al. paper should have some additional good ones as well.

1eggsyntax6mo

Thanks! I've seen some of the PII/memorization work, but I think that problem is distinct from what I'm trying to address here; what I was most interested in is what the model can infer about someone who doesn't appear in the training data at all. In practice it can be hard to distinguish those cases, but conceptually I see them as pretty distinct. The demographics link ('Privacy Risks of General-Purpose Language Models') is interesting and I hadn't seen it, thanks! It seems mostly pretty different from what I'm trying to look at, in that they're looking at questions about models' ability to reconstruct text sequences (including eg genome sequences), whereas I'm looking at questions about what the model can infer about users/authors. Bias/fairness work is interesting and related, but aiming in a somewhat different direction -- I'm not interested in inference of demographic characteristics primarily because they can have bias consequences (although certainly it's valuable to try to prevent bias!). For me they're primarily a relatively easy-to-measure proxy for broader questions about what the model is able to infer about users from their text. In the long run I'm much more interested in what the model can infer about users' beliefs, because that's what enables the model to be deceptive or manipulative. I've focused here on differences between the work you linked and what I'm aiming toward, but those are still all helpful references, and I appreciate you providing them!

Language Models Model Us

yc6mo10

This is a relatively common topic in responsible AI; glad to see reference on Staab et al, 2023! For PII (Personally Identifiable Information) - RLHF typically is the go to method for refusing such prompts, but since they are easy to be undone, efforts had been put into cleaning the pretaining safety data. For demographics inference - seems to be bias related as well.

1eggsyntax6mo

If there are other papers on the topic you'd recommend, I'd love to get links or citations.

MichaelDickens's Shortform

yc6mo10

No worries; thanks!

MichaelDickens's Shortform

yc6mo21

Examples of right leaning projects that got rejected by him due to his political affiliation, and if these examples are AI safety related

2habryka6mo

I don't currently know of any public examples and feel weird publicly disclosing details about organizations that I privately heard about. If more people are interested I can try to dig up some more concrete details (but can't make any promises on things I'll end up able sharing).

MichaelDickens's Shortform

yc6mo*40

Out of curiosity - “it's because Dustin is very active in the democratic party and doesn't want to be affiliated with anything that is right-coded” Are these projects related to AI safety or just generally? And what are some examples?

2habryka6mo

I am not sure I am understanding your question. Are you asking about examples of left-leaning projects that Dustin is involved in, or right-leaning projects that cannot get funding? On the left, Dustin is one of the biggest donors to the democratic party (with Asana donating $45M and him donating $24M to Joe Biden in 2020).

How to choose what to work on

yc6mo10

1. Maybe for everyone it would be different. It might be hard to have a standard formula to find obsessions. Sometimes it may come naturally through life events/observations/experiences. If no such experience exists yet, or one seems to be interested in multiple things, I have received an advice to try different things, and see what you would like (I agree with it). Now that I think about it, it would also be fun to survey people and ask them how they got their passion/do what they do (and to derive some standard formula/common elements if possible)!

2. I t... (read more)

What is a world-model?

yc6mo10

https://arxiv.org/pdf/1803.10122 I have a similar question and found this paper source. One thing I am not sure of is if this is no longer the same concept/close enough concept that people currently talk about, nor if this is the origin.

https://www.sciencedirect.com/science/article/pii/S0893608022001150 This paper seems to suggest something at least about multimodal perception with reinforcement learning/agent type of set up.

The alignment stability problem

yc6mo30

“A direction: asking if and how humans are stably aligned.” I think this is a great direction, and the next step seems to be breaking out what are humans aligned to - the examples here seems to mention some internal value alignment, but wondering if it would also mean external value system alignment.

LESSWRONG
LW

All of yc's Comments + Replies