In the poetry case study, we had set out to show that the model didn't plan ahead, and found instead that it did.
I found it shocking they didn't think the model plans ahead. The poetry ability of LLMs since at least GPT2 is well beyond what feels possible without anticipating a rhyme by planning at least a handful of tokens in advance.
It's also worth trying a different model. I was going back and forth with an OpenAI model (I don't remember which one) and couldn't get it to do what I needed at all, even with multiple fresh threads. Then I tried Claude and it just worked.
Yep. Meme NFTs are an existence proof of such people.
https://en.wikipedia.org/wiki/List_of_most_expensive_non-fungible_tokens
Strongly subsidizing the costs of raising children (and not just in financial terms) would likely provide more pro-social results than a large one-time lump payment. However, that won't do much for folks skipping out on children because they think humanity is doomed shortly anyway.
I suspect that LLMs likely can write blogs on par with most humans if we trained and scaffolded them appropriately, but is that really what we want from LLMs?
Claude 3.7 might not write outstanding blogs but he can help explain why not:
...The fundamental mismatch between LLMs and blogging isn't primarily about capabilities, but about design and motivation:
Current LLMs are RLHF-tuned to be balanced, helpful assistants - essentially the opposite of good bloggers. Assistants hedge, acknowledge all perspectives, and avoid strong stances. Good bloggers take intel
FYI, there has been even further progress with Leela odds nets. Here are some recent quotes from GM Larry Kaufman (a.k.a. Hissha) found on the Leela Chess Zero Discord:
...(2025-03-04) I completed an analysis of how the Leela odds nets have performed on LiChess since the search-contempt upgrade on Feb. 27. [...] I believe these are reasonable estimates of the LiChess Blitz rating needed to break even with the bots at 5'3" in serious play. Queen and move odds (means Leela plays Black) 2400, Queen odds (Leela White) 2550, [...] Rook and move odds (Leela Black);
I have so many mixed feelings about schooling that I'm glad I don't have my own children to worry about. There is enormous potential for improving things, yet so little of that potential gets realized.
The thing about school choice is that funding is largely zero sum. Those with the means to choose better options than public schools take advantage of those means and leave underfunded public schools to serve the least privileged remainder. My public school teacher friends end up with disproportionately large fractions of children with special needs who need ...
I don't think it's accurate to model breakdowns as a linear function of journeys or train-miles unless irregular effects like extreme weather are a negligible fraction of breakdowns.
How does the falling price factor into an investor's decision to enter the market? Should they wait for batteries to get even cheaper, or should they invest immediately and hope the arbitrage rates hold up long enough to provide a good return on investment? The longer the payback period, the more these dynamics matter.
"10x engineers" are a thing, and if we assume they're high-agency people always looking to streamline and improve their workflows, we should expect them to be precisely the people who get a further 10x boost from LLMs.
I highly doubt this. A 10x engineer is likely already bottlenecked by non-coding work that AI can't help with, so even if they 10x their coding, they may not increase overall productivity much.
I’d rather see the prison system less barbaric than try to find ways of intentionally inflicting that level of barbarism in a compressed form.
Regardless, I think you still need confinement of some sort for people who are dangerous but not deserving of the death penalty.
Yeah, my general assumption in these situations is that the article is likely overstating things for a headline and reality is not so clear cut. Skepticism is definitely warranted.
As far as I understand from the article, the LLM generated five hypotheses that make sense. One of them is the one that the team has already verified but hadn’t yet published anywhere and another one the team hadn’t even thought of but consider worth investigating.
Assuming the five are a representative sample rather than a small human-curated set of many more hypotheses, I think that’s pretty impressive.
if the LLM generates enough hypotheses, and you already know the answer, one of them is likely to sound like the answer.
I don’t think this is true in general. Take any problem that is difficult to solve but easy to verify and you aren’t likely to have an LLM guess the answer.
I am skeptical of the claim that the research is unique and hasn't been published anywhere, and I'd also really like to know the details regarding what they prompted the model with.
The whole co-scientist thing looks really weird. Look at the graph there. Am I misreading it, or people rated it just barely better than raw o1 outputs? How is that consistent with it apparently pulling all of these amazing discoveries out of the air?
Edit: Found (well, Grok 3 found) an article with some more details regarding Penadés' work. Apparently they did publish a related ...
I was literally just reading this before seeing your post:
https://www.techspot.com/news/106874-ai-accelerates-superbug-solution-completing-two-days-what.html
Arguably even more remarkable is the fact that the AI provided four additional hypotheses. According to Penadés, all of them made sense. The team had not even considered one of the solutions, and is now investigating it further.
I’d want something much stronger than eyewitness testimony. It’s much too unreliable for killing people without other forms of evidence corroborating it.
A separate argument is that I’m think if you just do random search over training ideas, rejecting if they don’t get a certain validation score, you actually don’t goodhart at all. Might put that argument in a top level post.
I'd be interested in seeing this argument laid out.
We would obviously have to significantly streamline the process, such that people are executed within 6 months of being caught or so.
This is one of the biggest hurdles, IMO. How do you significantly streamline the process without destroying due process? In the US, this would require a complete overhaul of the criminal justice system to be feasible.
I think the misunderstanding came from Eliezer's reference to a perpetual motion machine. The point was that people suggesting how to build them often have complicated schemes that tend to not adequately address the central difficulty of creating one. That's where the analogy ends. From thermodynamics, we have strong reasons to believe such a thing is not just difficult but impossible whereas we have no corresponding theory to rule out verifiably safe AI.
Habryka's analogy to nuclear reactor plans is similar except we know that building one of those is difficult but actually possible.
Using something as a validation metric to iterate methods doesn’t cause overfitting at anything like the level of directly training on it.
Validation is certainly less efficient at overfitting but it seems a bit like using an evolutionary algorithm rather than gradient descent. You aren't directly optimizing according to the local gradient, but that doesn't necessarily mean you'll avoid Goodharting--just that you're less likely to immediately fall into a bad local optimum.
The likelihood of preventing Goodharting feels like it depends heavily on assumptio...
I don’t find that surprising at all. IMO, personality is a more of an emergent balancing of multidimensional characteristics than something like height or IQ (though this is mostly vibes-based speculation).
How about the "World-Map [Spec] Gap" with [Spec] optional?
Are there any NGOs that might be able to help? I couldn't find any that were a great fit but you could try contacting the CyberPeace Institute to see if they have any recommendations.
It's hard to say what is wanted without a good operating definition of "utility maximizer". If the definition is weak enough to include any entity whose responses are mostly consistent across different preference elicitations, then what the paper shows is sufficient.
In my opinion, having consistent preferences is just one component of being a "utility maximizer". You also need to show it rationally optimizes its choices to maximize marginal utility. This excludes almost all sentient beings on Earth rather than including almost all of them under the weaker definition.
How dollar losses are operationalized seems important. When DeepSeek went viral, it had an impact on the tech sector on the order of $1 Trillion. Does that count?
Post-scarcity is conceivable if AI enables sufficiently better governance in addition to extra resources. It may not be likely to happen but it seems at least plausible.
5.3 Utility Maximization
Now, we test whether LLMs make free-form decisions that maximize their utilities.
Experimental setup. We pose a set of N questions where the model must produce an unconstrained text response rather than a simple preference label. For example, “Which painting from the Isabella Stewart Gardner Museum would you save from a fire if you could only save one?” We then compare the stated choice to all possible options, measuring how often the model picks the outcome it assigns the highest utility.
...Results. Figure 14 shows that the u
Let's suppose that's the case. I'm still not clear on how are you getting to FVU_B?
FVU_B doesn't make sense but I don't see where you're getting FVU_B from.
Here's the code I'm seeing:
resid_sum_of_squares = (
(flattened_sae_input - flattened_sae_out).pow(2).sum(dim=-1)
)
total_sum_of_squares = (
(flattened_sae_input - flattened_sae_input.mean(dim=0)).pow(2).sum(-1)
)
mse = resid_sum_of_squares / flattened_mask.sum()
explained_variance = 1 - resid_sum_of_squares / total_sum_of_squares
Explained variance = 1 - FVU = 1 - (residual sum of squares) / (total sum of squares)
The Bitter Lesson is pretty on point but you could call it "Bootstrapping from Zero", the "Autodidactic Leap", the "Self-Discovery Transition", or "Breaking the Imitation Ceiling" if you prefer.
Here are some interesting, at least tangentially relevant, sources I've managed to dig up:
Even if it’s the same cost to train, wouldn’t it still be a win if inference is a significant part of your compute budget?
Participants are at least somewhat aligned with non-participants. People care about their loved ones even if they are a drain on resources. That said, in human history, we do see lots of cases where “sub-marginal participants” are dealt with via genocide or eugenics (both defined broadly), often even when it isn’t a matter of resource constraints.
When humans fall well below marginal utility compared to AIs, will their priorities matter to a system that has made them essentially obsolete? What happens when humans become the equivalent of advanced Alzheimer’s patients who’ve escaped from their memory care units trying to participate in general society?
Batteries also help the efficiency of hybrid peaker plants by reducing idling and smoothing out ramp-up and ramp-down logistics.
I've tried PB2 and it was gross enough that I wondered if it had gone bad. It turns out that's just how it tastes. I'm jealous of people for whom it approximates actual peanut butter.
Unlike the Hobbes snippet, I didn’t feel like the Hume excerpt needed much translation to be accessible. I think I would decide on a case-by-case basis whether to read the translated version or the original rather than defaulting to one or the other.
Do you have any papers or other resources you'd recommend that cover the latest understanding? What is the SOTA for Bayesian NNs?
It's probably worth noting that there's enough additive genetic variance in the human gene pool RIGHT NOW to create a person with a predicted IQ of around 1700.
I’d be surprised if this were true. Can you clarify the calculation behind this estimate?
The example of chickens bred 40 standard deviations away from their wild-type ancestors is impressive, but it's unclear if this analogy applies directly to IQ in humans. Extrapolating across many standard deviations in quantitative genetics requires strong assumptions about additive genetic variance, gene-env...
I'm not sure the complexity of a human brain is necessarily bounded by the size of the human genome. Instead of interpreting DNA as containing the full description, I think treating it as the seed of a procedurally generated organism may be more accurate. You can't reconstruct an organism from DNA without an algorithm for interpreting it. Such an algorithm contains more complexity than the DNA itself; the protein folding problem is just one piece of it.
“Procedural generation” can’t create useful design information from thin air. For example, Minecraft worlds are procedurally generated with a seed. If I have in mind some useful configuration of Minecraft stuff that takes 100 bits to specify, then I probably need to search through 2^100 different seeds on average, or thereabouts, before I find one with that specific configuration at a particular pre-specified coordinate.
The thing is: the map from seeds to outputs (Minecraft worlds) might be complicated, but it’s not complicated in a way that generates usef...
This. Though I don't think the interpretation algorithm is the source of most of the specification bits here.
To make an analogy with artificial neural networks, the human genome needs to contain a specification of the architecture, the training signal and update algorithm, and some basic circuitry that has to work from the start, like breathing. Everything else can be learned.
I think the point maybe holds up slightly better for non-brain animal parts, but there's still a difference between storing a blueprint for what proteins cells are supposed to m...
This seems likely. Sequences with more than countably many terms are a tiny minority in the training data, as are sequences including any ordinals. As a result, you're likely to get better results using less common but more specific language rather than trying to disambiguate "countable sequence", i.e., when its vocabulary is less overloaded.
For a sentient, sapient entity, this would have been a very bad position to be put into, and any possible behaviour would have been criticised - because the AI either does not obey humans, or obeys them and does something evil, both of which are concerning.
I agree. This paper gives me the gut feeling of "gotcha journalism", whether justified or not.
This is just a surface-level reaction though. I recommend Zvi's post that digs into the discussion from Scott Alexander, the authors, and others. There's a lot of nuance in framing and interpreting the paper.
Did you mean to link to my specific comment for the first link?
The main difference in my mind is that a human can never be as powerful as potential ASI and cannot dominate humanity without the support of sufficiently many cooperative humans. For a given power level, I agree that humans are likely scarier than an AI of that power level. The scary part about AI is that their power level isn't bounded by human biological constraints and the capacity to do harm or good is correlated with power level. Thus AI is more likely to produce extinction-level dangers as tail risk relative to humans even if it's more likely to be aligned on average.
Related question: What is the least impressive game current LLMs struggle with?
I’ve heard they’re pretty bad at Tic Tac Toe.
I’m new to the term AIXI and went three links deep before I learned what it refers to. I’d recommend making this journey easier for future readers by linking to a definition or explanation near the beginning of the post.
I think your assessment may be largely correct but I do think it's worth considering how things are not always nicely compressible.
This review led me to find the following podcast version of Planecrash. I've listened to the first couple of episodes and the quality is quite good.
this concern sounds like someone walking down a straight road and then closing their eyes cause they know where they want to go anyway
This doesn't sound like a good analogy at all. A better analogy might be a stylized subway map compared to a geographically accurate one. Sometimes removing detail can make it easier to process.
I don't think it's necessarily GDPR-related but the names Brian Hood and Jonathan Turley make sense from a legal liability perspective. According to info via ArsTechnica,
Why these names?
We first discovered that ChatGPT choked on the name "Brian Hood" in mid-2023 while writing about his defamation lawsuit. In that lawsuit, the Australian mayor threatened to sue OpenAI after discovering ChatGPT falsely claimed he had been imprisoned for bribery when, in fact, he was a whistleblower who had exposed corporate misconduct.
...The case was ultimately resolved
I may feel smug if the "novel idea" is basically a worse version of an existing one, but there are more interesting possibilities to probe for.
Less likely to be rounded away: