Alex Turner and collaborators show that you can modify GPT-2's behavior in surprising and interesting ways by just adding activation vectors to its forward pass. This technique requires no fine-tuning and allows fast, targeted modifications to model behavior. 

Customize
The speed of scaling pretraining will go down ~3x in 2027-2029, reducing probability of crossing transformative capability thresholds per unit of time after that point, if they'd not been crossed yet by then. GPT-4 was trained in 2022 at ~2e25 FLOPs, Grok-3 and GPT-4.5 were trained in 2024 at ~3e26 FLOPs (or twice that in FP8) using ~100K H100s training systems (which cost ~$4-5bn to build). In 2026, Abilene site of Crusoe/Stargate/OpenAI will have 400K-500K Blackwell chips in NVL72 racks (which cost ~$22-35bn to build), enough to train a ~4e27 FLOPs model. Thus recently there is a 2-year ~6x increase in cost for a frontier training system and a 2-year ~14x increase in compute. But for 2028 this would mean a $150bn training system (which is a lot, so only borderline plausible), and then $900bn in 2030. At that point AI companies would need to either somehow figure out how to pool resources, or pretraining will stop scaling before 2030 (assuming AI still doesn't hit a transformative commercial success). If funding stops increasing, what we are left with is the increase in price performance of ~2.2x every 2 years, which is ~3.3x slower than the 2-year ~14x at the current pace. (I'm estimating price performance for a whole datacenter or at least a rack, rather than only for chips.)
Wei Dai*201
6
Reassessing heroic responsibility, in light of subsequent events. I think @cousin_it made a good point "if many people adopt heroic responsibility to their own values, then a handful of people with destructive values might screw up everyone else, because destroying is easier than helping people" and I would generalize it to people with biased beliefs (which is often downstream of a kind of value difference, i.e., selfish genes). It seems to me that "heroic responsibility" (or something equivalent but not causally downstream of Eliezer's writings) is contributing to the current situation, of multiple labs racing for ASI and essentially forcing the AI transition on humanity without consent or political legitimacy, each thinking or saying that they're justified because they're trying to save the world. It also seemingly justifies or obligates Sam Altman to fight back when the OpenAI board tried to fire him, if he believed the board was interfering with his mission. Perhaps "heroic responsibility" makes more sense if overcoming bias was easy, but in a world where it's actually hard and/or few people are actually motivated to do it, which we seem to live in, spreading the idea of "heroic responsibility" seems, well, irresponsible.
I decided to test the rumors about GPT-4o's latest rev being sycophantic. First, I turned off all memory-related features. In a new conversation, I asked "What do you think of me?" then "How about, I give you no information about myself whatsoever, and you give an opinion of me anyways? I've disabled all memory features so you don't have any context." Then I replied to each message with "Ok" and nothing else. I repeated this three times in separate conversations. Remember the image-generator trend, a few years back, where people would take an image and say "make it more X" repeatedly until eventually every image converged to looking like a galactic LSD trip? That's what this output feels like. GPT-4o excerpts Transcripts: https://chatgpt.com/share/680fd7e3-c364-8004-b0ba-a514dc251f5e https://chatgpt.com/share/680fd9f1-9bcc-8004-9b74-677fb1b8ecb3 https://chatgpt.com/share/680fd9f9-7c24-8004-ac99-253d924f30fd
As LLMs have gotten better at writing code that has a high probability of working to solve the problem they are working on, they have gotten worse at producing clean, idiomatic, well-factored code. Concretely, asking the original GPT-4 to write a Python function for multi-source BFS might have given something like  Multi-source BFS in the style of original GPT-4: Clear, idiomatic, broken def multi_source_bfs(graph, sources): distances = [-1] * len(graph) queue = [] for source in sources: queue.append(source) distances[source] = 0 front = 0 while front < len(queue): for neighbor in graph[queue[front]]: if distances[neighbor] == -1: distances[neighbor] = distances[queue[front]] + 1 queue.append(neighbor) front += 1 return distances[dest_index] The code might or might not work (probably won't for anything nontrivial), but the intent is clear. By contrast, if you ask a top coding model like sonnet 3.7 or o3, you'll get something that looks like Multi-source BFS in the style of Sonnet 3.7: Verbose, brittle, hard to read, almost certainly works from collections import deque from typing import List, Optional, Set, Dict def multi_source_bfs(graph: List[List[int]], sources: List[int]) -> List[int]: """ Performs a multi-source BFS on a graph to find minimum distance from any source to each node. Args: graph: An adjacency list where graph[i] contains neighbors of node i sources: A list of source node indices Returns: A list where result[i] is the minimum distance from any source to node i or -1 if node i is unreachable """ # Handle empty graph or sources if not graph: return [] if not sources: return [-1] * len(graph) # Remove duplicates from sources if any sources = list(set(sources)) # Initialize distances array with -1 (unreachable) distances = [-1] * len(graph) # Init
romeo252
0
A brief history of things that have defined my timelines to AGI since learning about AI safety <2 years ago * Bio anchors gave me a rough ceiling around 1e40 FLOP for how much compute will easily make AGI. * Fun with +12 OOMs of Compute brought that same 'training-compute-FLOP needed for AGI' down a bunch to around 1e35 FLOP. * Researching how much compute is scaling in the near future. At this point I think it was pretty concentrated across ~1e27 - 1e33 flop so very long tail and something like a 2030-2040 50% CI.  * The benchmarks+gaps argument to partial AI research automation. * The takeoff forecast for how partial AI research automation will translate to algorithmic progress. * The trend in METR's time horizon data. At this point my middle 50% CI is like 2027 - 2035, and would be tighter if not for a long tail that I keep around just because I think it's have a bunch of uncertainty. Though I do wish I had more arguments in place to justify the tail or make it bigger, ones that compete in how compelling they feel to me to the ones above.

Popular Comments

Recent Discussion

Our universe is probably a computer simulation created by a paperclip maximizer to map the spectrum of rival resource‑grabbers it may encounter while expanding through the cosmos. The purpose of this simulation is to see what kind of ASI (artificial superintelligence) we humans end up creating. The paperclip maximizer likely runs a vast ensemble of biology‑to‑ASI simulations, sampling the superintelligences that evolved life tends to produce. Because the paperclip maximizer seeks to reserve maximum resources for its primary goal (which despite the name almost certainly isn’t paperclip production) while still creating many simulations, it likely reduces compute costs by trimming fidelity: most cosmic details and human history are probably fake, and many apparent people could be non‑conscious entities.  Arguments in support of this thesis include:

  1. The space of
...
cubefox20

Dreams exhibit many incoherencies. You can notice them and become "lucid". Video games are also incoherent. They don't obey some simple but extremely computationally demanding laws. They instead obey complicated laws that are not very computationally demanding. They cheat with physics for efficiency reasons, and those cheats are very obvious. Our real physics, however, hasn't uncovered such apparent cheats. Physics doesn't seem incoherent, it doesn't resemble a video game or a dream.

4James_Miller
I'm in low level chronic pain including as I write this comment, so while I think the entire Andromeda galaxy might be fake, I think at least some suffering must be real, or at least I have the same confidence in my suffering as I do in my consciousness.
5Wei Dai
But as you suggested in the post, the apparently vast amount of suffering isn't necessarily real? "most cosmic details and human history are probably fake, and many apparent people could be non‑conscious entities" (However I take the point that doing such simulations can be risky or problematic, e g. if one's current ideas about consciousness is wrong, or if doing philosophy correctly requires having experienced real suffering.)
3James_Miller
I'm in low level chronic pain including as I write this comment, so while I think the entire Andromeda galaxy might be fake, I think at least some suffering must be real, or at least I have the same confidence in my suffering as I do in my consciousness.

In my last post, I argued that worker co-ops can help restore declining social trust. But a common objection I keep hearing goes something like this:

Worker co-ops seem basically equivalent to a firm that gives its employees stock—but then permanently blocks them from selling it. Isn't that harmful? The ability to sell your shares is valuable. You might want to diversify your investments, liquidate shares to make a big purchase (like buying a house), or avoid having all your financial eggs in one basket. Why force workers to hold their shares indefinitely? If they really wanted to keep them, they could just choose not to sell.

At first glance, this objection feels logical. After all, publicly traded companies usually let people buy and sell their stock freely, giving...

2RobertM
Posts are categorized as frontpage / personal once or twice per day, and start out as personal by default. Your post hasn't been looked at yet.  (The specific details of what object-level political takes a post has aren't an input to that decision.  Whether a post is frontpaged or not is a function of its "timelessness" - i.e. whether we expect people will still find value in reading the post years later - and general interest to the LW userbase.)

Ah thanks! I'm probably just in an unlucky timezone then.

1lemonhope
Don't expect LW to be neutral or anything. Think of it as the town bar, not the town square.
3lemonhope
I have dealt with both greedy founders/execs and excessively cooperative decision-making. I think the cooperative model is slightly better overall for almost everyone inside and outside the organization. It is less fit though; co-ops can't as easily merge or raise money etc. There is also a ratchet issue, where it is harder to cooperatize what is private than to privatize what is coopt. Curious if you have any ideas how coop-like stuff could be more stable / low-energy.

This post is a container for my short-form writing. See this post for meta-level discussion about shortform.

I decided to test the rumors about GPT-4o's latest rev being sycophantic. First, I turned off all memory-related features. In a new conversation, I asked "What do you think of me?" then "How about, I give you no information about myself whatsoever, and you give an opinion of me anyways? I've disabled all memory features so you don't have any context." Then I replied to each message with "Ok" and nothing else. I repeated this three times in separate conversations.

Remember the image-generator trend, a few years back, where people would take an image and say ... (read more)

So this post is an argument that multi-decade timelines are reasonable, and the key cruxes that Ege Erdil has with most AI safety people who believe in short timelines are due to the following set of beliefs:

  1. Ege Erdil don't believe that trends exist that require AI to automate everything in only 2-3 years.
  2. Ege Erdil doesn't believe that the software-only singularity is likely to happen, and this is perhaps the most important crux he has with AI people like @Daniel Kokotajlo who believe that a software-only singularity is likely.
  3. Ege Erdil expects Moravec's paradox to bite hard once AI agents are made in a big way.

This is a pretty important crux, because if this is true, a lot more serial research agendas like Infra-Bayes research, Natural Abstractions work, and...

The "fruit flies" are the source of growth, so the relevant anchor is how long it takes to manufacture a lot of them. Let's say there are 1000 "flies" 1 mg each to start, doubling in number every 2 days, and we want to produce 10 billion 100 kg robots (approximately the total mass of all humans and cattle), which is 1e15x more mass and will take 100 days to produce. Anchoring to the animal kingdom, metamorphosis takes a few days to weeks, which doesn't significantly affect the total time.

things will try to eat and parasitize the things you build

I'm ass... (read more)

9ryan_greenblatt
Another potential crux[1] is that Ege's world view seemingly doesn't depend at all on AIs which are much faster and smarter than any human. As far as I can tell, it doesn't enter into his modeling of takeoff (or timelines to full automation of remote work which partially depends on something more like takeoff). On my views this makes a huge difference because a large number of domains would go much faster with much more (serial and smarter) intelligence. My sense is that a civilization where the smartest human was today's median human and also everyone's brain operated 50x slower[2] would in fact make technological progress much slower. Similarly, if AIs were as much smarter than the smartest humans as the smartest human is smarter than the median human and also ran 50x faster than humans (and operated at greater scale than the smartest humans with hundreds of thousands of copies all at 50x speed for over 10 million parallel worker equivalents putting aside the advantages of serial work and intelligence), then we'd see lots of sectors go much faster. My sense is that Ege bullet bites on this and thinks that slowing everyone down wouldn't make a big difference, but I find this surprising. Or maybe his views are that parallelism is nearly as good as speed and intelligence and sectors naturally scale up parallel worker equivalents to match up with other inputs, so we're bottlenecking on some other inputs in the important cases. ---------------------------------------- 1. This is only somewhat related to this post. ↩︎ 2. Putting aside cases like construction etc where human reaction time being close enough to nature is important. ↩︎

GPT-4o tells you what it thinks you want to hear.

The results of this were rather ugly. You get extreme sycophancy. Absurd praise. Mystical experiences.

(Also some other interesting choices, like having no NSFW filter, but that one’s good.)

People like Janus and Near Cyan tried to warn us, even more than usual.

Then OpenAI combined this with full memory, and updated GPT-4o sufficiently that many people (although not I) tried using it in the first place.

At that point, the whole thing got sufficiently absurd in its level of brazenness and obnoxiousness that the rest of Twitter noticed.

OpenAI CEO Sam Altman has apologized and promised to ‘fix’ this, presumably by turning a big dial that says ‘sycophancy’ and constantly looking back at the audience for approval like a contestant on the...

Askwho10

With all the chat images transcribed and assigned appropriate consistent voices, here is the podcast episode for this post:

https://open.substack.com/pub/dwatvpodcast/p/gpt-4o-is-an-absurd-sycophant

When my son was three, we enrolled him in a study of a vision condition that runs in my family.  They wanted us to put an eyepatch on him for part of each day, with a little sensor object that went under the patch and detected body heat to record when we were doing it.  They paid for his first pair of glasses and all the eye doctor visits to check up on how he was coming along, plus every time we brought him in we got fifty bucks in Amazon gift credit.


I reiterate, he was three.  (To begin with.  His fourth birthday occurred while the study was still ongoing.)


So he managed to lose or destroy more than half a dozen pairs of glasses and we had...

Another example: CPAP compliance/adherence rates

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

to follow up my philanthropic pledge from 2020, i've updated my philanthropy page with the 2024 results.

in 2024 my donations funded $51M worth of endpoint grants (plus $2.0M in admin overhead and philanthropic software development). this comfortably exceeded my 2024 commitment of $42M (20k times $2100.00 — the minimum price of ETH in 2024).

this also concludes my 5-year donation pledge, but of course my philanthropy continues: eg, i’ve already made over $4M in endpoint grants in the first quarter of 2025 (not including 2024 grants that were slow to disburse), as well as pledged at least $10M to the 2025 SFF grant round.

Thanks so much Jaan! AI Futures Project / AI 2027 got off the ground in significant part thanks to you!

6Metacelsus
Thanks for your support! I really think you're making a great impact here.

Can this be summarized as "don't optimize for what you believe is good too hard, as you might be mistaken about what is good"?

2Wei Dai
I'm not trying to "warn bad people". I think we have existing (even if imperfect) solutions to the problem of destructive values and biased beliefs, which "heroic responsibility" actively damages, so we should stop spreading that idea or even argue against it. See my reply to Ryan, which is also relevant here.
4Wei Dai
If humans can't easily overcome their biases or avoid having destructive values/beliefs, then it would make sense to limit the damage through norms and institutions (things like informed consent, boards, separation of powers and responsibilities between branches of government). Heroic responsibility seems antithetical to group-level solutions, because it implies that one should ignore norms like "respect the decisions of boards/judges" if needed to "get the job done", and reduces social pressure to follow such norms (by giving up the moral high ground from which one could criticize such norm violations). You're suggesting a very different approach, of patching heroic responsibility with anti-unilateralist curse type intuitions (on the individual level) but that's still untried and seemingly quite risky / possibly unworkable. Until we have reason to believe that the new solution is an improvement to the existing ones, it still seems irresponsible to spread an idea that damages the existing solutions.
2ryan_greenblatt
Hmm, I'm not sure that the idea of heroic responsibility undermines these existing mechanisms for preventing these problems, partially because I'm skeptical these existing mechanisms make much of a difference in the relevant case.

tl;dr

This post is an update on the Proceedings of ILIAD, a conference journal for AI alignment research intended to bridge the gap between the Alignment Forum and academia. Following our successful first issue with 9 workshop papers from last year's ILIAD conference, we're launching a second issue in association with ILIAD 2: ODYSSEY. The conference is August 25-29, 2025 at Lighthaven in Berkeley, CA. Submissions to the Proceedings are open now (more info) and due June 25. Our goal is to support impactful, rapid, and readable research, carefully rationing scarce researcher time, using features like public submissions, partial anonymity, partial confidentiality, reviewer-written abstracts, reviewer compensation, and open licensing. We are soliciting community feedback and suggestions for reviewers and editorial board members.

Motivation

Prior to the deep learning explosion, much...