All of Julius's Comments + Replies

Julius
10

I originally had an LLM generate them for me, and then I checked those with other LLMs to make sure the answers were right and that weren't ambiguous. All of the questions are here: https://github.com/jss367/calibration_trivia/tree/main/public/questions

Answer by Julius
30

Another place that's doing something similar is clearerthinking.org

Answer by Julius
22

I like this idea and have wanted to do something similar, especially something that we could do at a meetup. For what it's worth, I made a calibration trivia site to help with calibration. The San Diego group has played it a couple times during meetups. Feel free to copy anything from it. https://calibrationtrivia.com/

1Gregory
Thanks! I have seen a similar tool like this before and enjoyed it quite a bit. I’d love to know where you source the trivia data, especially if it is available for open use. Also could be interesting to tailor to some functionality for meetups as well.
Julius
10

Thanks for the explanation and links. That makes sense

Julius
1-8

The most important takeaway from this essay is that the (prominent) counting arguments for “deceptively aligned” or “scheming” AI provide ~0 evidence that pretraining + RLHF will eventually become intrinsically unsafe. That is, that even if we don't train AIs to achieve goals, they will be "deceptively aligned" anyways.


I'm trying to understand what you mean in light of what seems like evidence of deceptive alignment that we've seen from GPT-4. Two examples that come to mind are the instance of GPT-4 using TaskRabbit to get around a CAPTCHA that ARC found a... (read more)

[This comment is no longer endorsed by its author]Reply2

You misunderstand what "deceptive alignment" refers to. This is a very common misunderstanding: I've seen several other people make the same mistake, and I have also been confused about it in the past. Here are some writings that clarify this:

https://www.lesswrong.com/posts/dEER2W3goTsopt48i/olli-jaerviniemi-s-shortform?commentId=zWyjJ8PhfLmB4ajr5

https://www.lesswrong.com/posts/a392MCzsGXAZP5KaS/deceptive-ai-deceptively-aligned-ai

https://www.lesswrong.com/posts/a392MCzsGXAZP5KaS/deceptive-ai-deceptively-aligned-ai?commentId=ij9wghDCxjXpad8Rf

(The terminolog... (read more)

Julius
10

What's the mechanism for change then? I assume you would agree that many technological changes, such as the Internet, have required overcoming a lot of status quo bias. If we leaned more into status quo bias, would these things come much later? That seems like a significant downside to me.

 

Also, I don't think the status quo is necessarily adapted to us. For example, the status quo is to have checkout aisles filled with candy.  We also have very high rates of obesity. That doesn't seem well-adapted.

1Amadeus Pagel
Status quo bias is a tendency to be skeptical of change, not an outright rejection. I don't see any reason to assume that this tendency is badly calibrated. I don't think the internet had to overcome that much resistance. At least in the US, early legislation like Section 230 was supportive.  There are also technologies where more skepticism would have been appropriate, like leaded gasoline, and arguably even cars.
Julius
50

Hello everyone,

Unfortunately, I'm not able to host the meetup at the current time. If there's anyone else willing to host, could you let me know? If not I'll move the meetup to the following month (16 Oct.) when I'll be able to host again. Sorry to have to miss this one - I was really looking forward to meeting everyone.

5CitizenTen
Hey! Connor here. I’ll be willing to host if you really can’t. Just send me all the “host” information I should know about and thing such as that. Mainly just private message me and we’ll go from there. (either through less wrong or connordpitts@gmail.com) Cheers!