Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
yc10

Thanks, I was thinking of the latter more (human irrationality), but found your first part still interesting. I understand irrationality was studied in psychology and economics, and I was wondering on the modeling of irrationality particularly, for 1-2 players, but also for a group of agents. For example, there are arguments saying for a group of irrational agents, the group choice could be rational depending on group structure etc. On individual irrationality and continued group irrationality, I think we would need to estimate the level of (and prevalence of ) irrationality in some way that captures unconscious preferences, or incomplete information. How to best combine these? Maybe it would just be just more data driven.

yc10

I am not sure if it needs to be conditional on if the event is unusual or not, or if would happen again or not in a forward looking sense in reality. Could you explain why the restriction there? Especially on <We do not call any behavior or emotional pattern ‘trauma’ if it is obviously adaptive.>

yc10

How do we best model an irrational world rationally? I would assume we would need to understand at least how irrationality works?

yc10

Sharing an interesting report of the state of ai https://www.stateof.ai/

This includes multiple aspects of the current state of AI, and is reasonably good on the technical side.

yc11

Just saw the OP replied in another comment that he is offering advice.

[This comment is no longer endorsed by its author]Reply
yc30

It’s probably less on all internet but more on the rlhf guidelines (I imagine the human reviewers receive a guideline based on the LLM-training company’s policy, legal, and safety experts’ advice). I don’t disagree though that it could present a relatively more objective view on some topics than a particular individual (depending on the definition of bias).

yc10

Yeah for sure! 

For PII - A relatively recent survey paper: https://arxiv.org/pdf/2403.05156

For bias/fairness - survey paper: https://arxiv.org/pdf/2309.00770 

This is probably far from complete, but I think the references in the survey paper, and in the Staab et al. paper should have some additional good ones as well.

yc10

This is a relatively common topic in responsible AI; glad to see reference on Staab et al, 2023! For PII (Personally Identifiable Information) - RLHF typically is the go to method for refusing such prompts, but since they are easy to be undone, efforts had been put into cleaning the pretaining safety data. For demographics inference - seems to be bias related as well.

yc10

No worries; thanks!

yc21

Examples of right leaning projects that got rejected by him due to his political affiliation, and if these examples are AI safety related

Load More