In a thread which claimed that Nate Soares radicalized a co-founder of e-acc, Nate deleted my comment – presumably to hide negative information and anecdotes about how he treats people. He also blocked me from commenting on his posts.
The post concerned (among other topics) how to effectively communicate about AI safety, and positive anecdotes about Nate's recent approach. (Additionally, he mentions "I’m regularly told that I’m just an idealistic rationalist who’s enamored by the virtue of truth" -- a love which apparent...
Can you be more concrete about what "catching the ears of senators" means? That phrase seems like it could refer to a lot of very different things of highly disparate levels of impressiveness.
Recently, various groups successfully lobbied to remove the moratorium on state AI bills. This involved a surprising amount of success while competing against substantial investment from big tech (e.g. Google, Meta, Amazon). I think people interested in mitigating catastrophic risks from advanced AI should consider working at these organizations, at least to the extent their skills/interests are applicable. This both because they could often directly work on substantially helpful things (depending on the role and organization) and because this would yield ...
I think PauseAI is also extremely underappreciated.
I'm annoyed by the phrase 'do or do not, there is no try', because I think it's wrong and there very much is a thing called trying and it's important.
However, it's a phrase that's so cool and has so much aura, it's hard to disagree with it without sounding at least a little bit like an excuse making loser who doesn't do things and tries to justify it.
Perhaps in part, because I feel/fear that I may be that?
Has Tyler Cowen ever explicitly admitted to being wrong about anything?
Not 'revised estimates' or 'updated predictions' but 'I was wrong'.
Every time I see him talk about learning something new, he always seems to be talking about how this vindicates what he said/thought before.
Gemini 2.5 pro didn't seem to find anything, when I did a max reasoning budget search with url search on in aistudio.
Btw, I really dont have my mind set on this, if someone finds Tyler Cowen explictly saying he was wrong about something, please link it to me - you dont have to give an explanation to justify it, to prepare for some confirmation biasy 'here's why I was actually right and this isnt it' thing (though, any opinions/thoughts are very welcome), please feel free to just give a link or mention some post/moment.
While listening to Eliezer Yudkowsky's interview here, he said regarding alignment, "If we just got unlimited retries, we could solve it." That got me thinking: could we run a realistic enough simulation to perfect ASI alignment before unleashing it? That’s one tall task—humanity won’t be ready for a long while. But what if it's already been done, and we are the simulation?
If we assume that the alignment problem can't be reliably solved on the first try, and that a cautious advanced civilization would rather avoid...
No overthinking AI risk. People, including here get lost in mind loops and complexity.
An easy guide with everything there being a fact:
We do NOT have evidence that ever a smarter agent/being was controlled by a lesser intelligent agent/being.
Some people say that we are controlled by our gut flora, not sure if that counts. Also, toxoplasmosis, cordyceps...
Question I'd like to hear peoples' takes on: what are some things which are about the same amount of fun for you as (a) a median casual conversation (e.g. at a party), or (b) a top-10% casual conversation, or (c) the most fun conversations you've ever had? In all cases I'm asking about how fun the conversation itself was, not about value which was downstream of the conversation (like e.g. a conversation with someone who later funded your work).
For instance, for me, a median conversation is about as fun as watching a mediocre video on youtube or reading a m...
Can you give an example of what a "most fun" conversation looked like? What's the context, how did it start, how did the bulk of it go, how did you feel internally throughout, and what can you articulate about what made it so great?
The "uncensored" Perplexity-R1-1776 becomes censored again after quantizing
Perplexity-R1-1776 is an "uncensored" fine-tune of R1, in the sense that Perplexity trained it not to refuse discussion of topics that are politically sensitive in China. However, Rager et al. (2025)[1] documents (see section 4.4) that after quantizing, Perplexity-R1-1776 again censors its responses:
I found this pretty surprising. I think a reasonable guess for what's going on here is that Perplexity-R1-1776 was finetuned in bf16, but the mechanism that it learned for non-refus...
A paper from 2023 exploits differences in full-precision and int8 inference to create a compromised model which only activates its backdoor post-quantization.
High vs low voltage has very different semantics at different places on a computer chip. In one spot, a high voltage might indicate a number is odd rather than even. In another spot, a high voltage might indicate a number is positive rather than negative. In another spot, it might indicate a jump instruction rather than an add.
Likewise, the same chemical species have very different semantics at different places in the human body. For example, high serotonin concentr...
One silly sci-fi idea is this. You might have a few "trigger pills" which are smaller than a blood cell, and travel through the bloodstream. You can observe them travel through the body using medical imaging techniques (e.g. PET), and they are designed to be very observable.
You wait until one of them is at the right location, and send very precise x-rays at it from all directions. The x-ray intensity is . A mechanism in the trigger pill responds to this ionizing (or heating?), and it anchors to the location using a chemical glue ...
Every now and then in discussions of animal welfare, I see the idea that the "amount" of their subjective experience should be weighted by something like their total amount of neurons. Is there a writeup somewhere of what the reasoning behind that intuition is? Because it doesn't seem intuitive to me at all.
From something like a functionalist perspective, where pleasure and pain exist because they have particular functions in the brain, I would not expect pleasure and pain to become more intense merely because the brain happens to have more neurons. Rather...
Neuron count intuitively seems to be a better proxy for the variety/complexity/richness of positive experience. Then you can have an argument about how you wouldn't want to just increase intensity of pleasure, that just a relative number. That what matters is that pleasure is interesting. And so you would assign lesser weights to less rich experience. You can also generalize this argument to negative experiences - maybe you don't want to consider pain to be ten times worse just because someone multiplied some number by 10.
...But I would think that the broad
Reward probably IS an optimization target of RL agent if this agent knows some details of the training setup. Surely it would enhance its reward acquisition to factor this knowledge in? Then it gets reinforced, and then couple steps down that path agent thinks full time about quirks of its reward signal.
Could be bad at it, muddy, sure. Or schemey and hack the reward to get something else that is not the reward. But that's somewhat different thing than mainline thing? like, it's not as likely and a lot more diverse set of possibilities, imo.
The questi...
In the discussion about AI safety, the central issue is the rivalry between the US and China. However, when AI is used for censorship and propaganda, robots serve as police, the differences in political regimes become almost indistinguishable. There's no point in waging war when everyone can be brought to the same dystopia.
I don't understand why people rave so much about Claude Code etc., nor how they really use these agents. The problem is not capability--sure, today agents can go far without stumbling or losing the plot. The problem is that they will go not in the direction I want.
It's because my product vision, architectural vision, and code quality "functions" are complex: very tedious to express in CLAUDE/AGENTS .md, and often hardly expressible in language at all. "I know it when I see it." Hence keeping agent "on a short leash" (Karpathy)--in Cursor.
This makes me thin...
Gary Marcus asked me to make a critique of his 2024 predictions, for which he claimed that he got "7/7 correct". I don't really know why I did this, but here is my critique:
For convenience, here are the predictions:
I think the best way to evaluate them is to invert every one of them, and then see whether the version you wrote, or the i...
One lesson you should maybe take away is that if you want your predictions to be robust to different interpretations (including interpretations that you think are uncharitable), it could be worthwhile to try to make them more precise (in the case of a tweet, this could be in a linked blog post which explains in more detail). E.g., in the case of "No massive advance (no GPT-5, or disappointing GPT-5)" you could have said "Within 2024 no AI system will be publicly released which is as much of a qualitative advance over GPT-4 in broad capabilites as GPT-4 is ...
i made a thing!
it is a chatbot with 200k tokens of context about AI safety. it is surprisingly good- better than you expect current LLMs to be- at answering questions and counterarguments about AI safety. A third of its dialogues contain genuinely great and valid arguments.
You can try the chatbot at https://whycare.aisgf.us (ignore the interface; it hasn't been optimized yet). Please ask it some hard questions! Especially if you're not convinced of AI x-risk yourself, or can repeat the kinds of questions others ask you.
Send feedback to ms@contact.ms.
A coup...
Nope, I’m somewhat concerned about unethical uses (eg talking to a lot of people without disclosing it’s ai), so won’t publicly share the context.
If the chatbot answers questions well enough, we could in principle embed it into whatever you want if that seems useful. Currently have a couple of requests like that. DM me somewhere?
Stampy uses RAG & is worse.