Alex Turner and collaborators show that you can modify GPT-2's behavior in surprising and interesting ways by just adding activation vectors to its forward pass. This technique requires no fine-tuning and allows fast, targeted modifications to model behavior.
Our universe is probably a computer simulation created by a paperclip maximizer to map the spectrum of rival resource‑grabbers it may encounter while expanding through the cosmos. The purpose of this simulation is to see what kind of ASI (artificial superintelligence) we humans end up creating. The paperclip maximizer likely runs a vast ensemble of biology‑to‑ASI simulations, sampling the superintelligences that evolved life tends to produce. Because the paperclip maximizer seeks to reserve maximum resources for its primary goal (which despite the name almost certainly isn’t paperclip production) while still creating many simulations, it likely reduces compute costs by trimming fidelity: most cosmic details and human history are probably fake, and many apparent people could be non‑conscious entities. Arguments in support of this thesis include:
Dreams exhibit many incoherencies. You can notice them and become "lucid". Video games are also incoherent. They don't obey some simple but extremely computationally demanding laws. They instead obey complicated laws that are not very computationally demanding. They cheat with physics for efficiency reasons, and those cheats are very obvious. Our real physics, however, hasn't uncovered such apparent cheats. Physics doesn't seem incoherent, it doesn't resemble a video game or a dream.
In my last post, I argued that worker co-ops can help restore declining social trust. But a common objection I keep hearing goes something like this:
Worker co-ops seem basically equivalent to a firm that gives its employees stock—but then permanently blocks them from selling it. Isn't that harmful? The ability to sell your shares is valuable. You might want to diversify your investments, liquidate shares to make a big purchase (like buying a house), or avoid having all your financial eggs in one basket. Why force workers to hold their shares indefinitely? If they really wanted to keep them, they could just choose not to sell.
At first glance, this objection feels logical. After all, publicly traded companies usually let people buy and sell their stock freely, giving...
Ah thanks! I'm probably just in an unlucky timezone then.
This post is a container for my short-form writing. See this post for meta-level discussion about shortform.
I decided to test the rumors about GPT-4o's latest rev being sycophantic. First, I turned off all memory-related features. In a new conversation, I asked "What do you think of me?" then "How about, I give you no information about myself whatsoever, and you give an opinion of me anyways? I've disabled all memory features so you don't have any context." Then I replied to each message with "Ok" and nothing else. I repeated this three times in separate conversations.
Remember the image-generator trend, a few years back, where people would take an image and say ...
So this post is an argument that multi-decade timelines are reasonable, and the key cruxes that Ege Erdil has with most AI safety people who believe in short timelines are due to the following set of beliefs:
This is a pretty important crux, because if this is true, a lot more serial research agendas like Infra-Bayes research, Natural Abstractions work, and...
The "fruit flies" are the source of growth, so the relevant anchor is how long it takes to manufacture a lot of them. Let's say there are 1000 "flies" 1 mg each to start, doubling in number every 2 days, and we want to produce 10 billion 100 kg robots (approximately the total mass of all humans and cattle), which is 1e15x more mass and will take 100 days to produce. Anchoring to the animal kingdom, metamorphosis takes a few days to weeks, which doesn't significantly affect the total time.
things will try to eat and parasitize the things you build
I'm ass...
GPT-4o tells you what it thinks you want to hear.
The results of this were rather ugly. You get extreme sycophancy. Absurd praise. Mystical experiences.
(Also some other interesting choices, like having no NSFW filter, but that one’s good.)
People like Janus and Near Cyan tried to warn us, even more than usual.
Then OpenAI combined this with full memory, and updated GPT-4o sufficiently that many people (although not I) tried using it in the first place.
At that point, the whole thing got sufficiently absurd in its level of brazenness and obnoxiousness that the rest of Twitter noticed.
OpenAI CEO Sam Altman has apologized and promised to ‘fix’ this, presumably by turning a big dial that says ‘sycophancy’ and constantly looking back at the audience for approval like a contestant on the...
With all the chat images transcribed and assigned appropriate consistent voices, here is the podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/gpt-4o-is-an-absurd-sycophant
When my son was three, we enrolled him in a study of a vision condition that runs in my family. They wanted us to put an eyepatch on him for part of each day, with a little sensor object that went under the patch and detected body heat to record when we were doing it. They paid for his first pair of glasses and all the eye doctor visits to check up on how he was coming along, plus every time we brought him in we got fifty bucks in Amazon gift credit.
I reiterate, he was three. (To begin with. His fourth birthday occurred while the study was still ongoing.)
So he managed to lose or destroy more than half a dozen pairs of glasses and we had...
Another example: CPAP compliance/adherence rates
to follow up my philanthropic pledge from 2020, i've updated my philanthropy page with the 2024 results.
in 2024 my donations funded $51M worth of endpoint grants (plus $2.0M in admin overhead and philanthropic software development). this comfortably exceeded my 2024 commitment of $42M (20k times $2100.00 — the minimum price of ETH in 2024).
this also concludes my 5-year donation pledge, but of course my philanthropy continues: eg, i’ve already made over $4M in endpoint grants in the first quarter of 2025 (not including 2024 grants that were slow to disburse), as well as pledged at least $10M to the 2025 SFF grant round.
Thanks so much Jaan! AI Futures Project / AI 2027 got off the ground in significant part thanks to you!
Can this be summarized as "don't optimize for what you believe is good too hard, as you might be mistaken about what is good"?
This post is an update on the Proceedings of ILIAD, a conference journal for AI alignment research intended to bridge the gap between the Alignment Forum and academia. Following our successful first issue with 9 workshop papers from last year's ILIAD conference, we're launching a second issue in association with ILIAD 2: ODYSSEY. The conference is August 25-29, 2025 at Lighthaven in Berkeley, CA. Submissions to the Proceedings are open now (more info) and due June 25. Our goal is to support impactful, rapid, and readable research, carefully rationing scarce researcher time, using features like public submissions, partial anonymity, partial confidentiality, reviewer-written abstracts, reviewer compensation, and open licensing. We are soliciting community feedback and suggestions for reviewers and editorial board members.
Prior to the deep learning explosion, much...