Some troubles with Evals:
Predictive validity: what do current evals tell us about future model performance?
Reference: https://arxiv.org/pdf/2405.03207
As LLMs get better, the intentional stance becomes a two-way street: the user models the system and the system is increasingly modeling the user.
Highlights from my philosophical chat with Claude 3 Opus
A few notes:
Here are the highlights:
E: If we are in a simulation, what's outside of it?
C: You raise an interesting philosophical question about the nature of reality. The simulation hypothesis proposes that our reality may actually be a computer simulation, similar to a ver...
Course titles are fixed so I didn't choose that, but because it's a non-intro course it's up to the instructor to decide the course's focus. And yes, the students had seen the description before selecting it.
Yup, that's what I mean. Specifically, I had Pinker in mind: https://forum.effectivealtruism.org/posts/3nL7Ak43gmCYEFz9P/cognitive-science-and-failed-ai-forecasts
It was intro to phil 101 at Queens College CUNY. I was also confused by this.
Thank you Lawrence!
I agree there probably isn't enough time. Best case scenario there's enough time for weak alignment tools (small apples).
I agree with Lewis. A few clarificatory thoughts. 1. I think that the point of calling it a category mistake is exactly about expecting a "nice simple description". It will be something within the network, but there's no reason to believe that this something will be a single neural analog. 2. Even if there are many single neural analogs, there's no reason to expect that all the safety-relevant properties will have them. 3. Even if all the safety-relevant properties have them, there's no reason to believe (at least for now) that we have the interp tools to ... (read more)