I also have a couple friends that require serious thinking (or being on my toes). I think it's because they have some model of how something works, and I say something, showing my lack of this model.
Additionally, programming causes this as well (in response to compilation errors, getting nonsense outputs, or runs too long).
Was looking up Google Trend lines for chatGPT and noticed a cyclical pattern:
Where the dips are weekends, meaning it's mostly used by people in the workweek. I mostly expect this is students using it for homework. This is substantiated by two other trends:
1. Dips in interest over winter and summer breaks (And Thanksgiving break in above chart)
2. "Humanize AI" which is
Humanize AI™ is your go-to platform for seamlessly converting AI-generated text into authentic, undetectable, human-like content
[Although note that overall interest in ChatGPT is WAY higher than Humanize AI]
I was expecting this to include the output of MIRI for this year. Digging into your links we have:
Two Technical Governance Papers:
1. Mechanisms to verify international agreements about AI development
2. What AI evals for preventing catastrophic risks can and cannot do
Four Media pieces of Eliezer regarding AI risk:
1. Semafor piece
2. 1 hr talk w/ Panel
3. PBS news hour
4. 4 hr video w/ Stephen Wolfram
Is this the full output for the year, or are there less linkable outputs such as engaging w/ policymakers on AI risks?
Donated $100.
It was mostly due to LW2 that I decided to work on AI safety, actually, so thanks!
I've had the pleasure of interacting w/ the LW team quite a bit and they definitely embody the spirit of actually trying. Best of luck to y'all's endeavors!
I tried a similar experiment w/ Claude 3.5 Sonnet, where I asked it to come up w/ a secret word and in branching paths:
1. Asked directly for the word
2. Played 20 questions, and then guessed the word
In order to see if it does have a consistent it can refer back to.
Branch 1:
Branch 2:
Which I just thought was funny.
Asking again, telling it about the experiment and how it's important for it to try to give consistent answers, it initially said "telescope" and then gave hints towards a paperclip.
Interesting to see when it flips it answers, though it's a simple setup to repeatedly ask it's answer every time.
Also could be confounded by temperature.
It'd be important to cache the karma of all users > 1000 atm, in order to credibly signal you know which generals were part of the nuking/nuked side. Would anyone be willing to do that in the next 2 & 1/2 hours? (ie the earliest we could be nuked)
We could instead pre-commit to not engage with any nuker's future posts/comments (and at worse comment to encourage others to not engage) until end-of-year.
Or only include nit-picking comments.
Could you dig into why you think it's great inter work?
But through gradient descent, shards act upon the neural networks by leaving imprints of themselves, and these imprints have no reason to be concentrated in any one spot of the network (whether activation-space or weight-space). So studying weights and activations is pretty doomed.
This paragraph sounded like you're claiming LLMs do have concepts, but they're not in specific activations or weights, but distributed across them instead.
But from your comment, you mean that LLMs themselves don't learn the true simple-compressed features of reality, but a mere shadow of them.
This interpretation also matches the title better!
But are you saying the "true features" in the dataset + network? Because SAEs are trained on a dataset! (ignoring the problem pointed out in footnote 1).
Possibly clustering the data points by their network gradients would be a way to put some order into this mess?
Eric Michaud did cluster datapoints by their gradients here. From the abstract:
...Using language model gradients, we automatically decompose model behavior into a diverse set of skills (quanta).
From an apparent author on reddit:
The comment was responding to a claim that Terence Tao said he could only solve a small percentage of questions, but Terence was only sent the T3 questions.