All of Archimedes's Comments + Replies

I may feel smug if the "novel idea" is basically a worse version of an existing one, but there are more interesting possibilities to probe for.

  1. The novel idea is a meaningful extension/generalization of an existing concept. E.g., Riemann --> Lebesgue integration
  2. The novel idea is equivalent to an existing concept but formulated differently. E.g., Newton and Leibniz versions of calculus.
  3. The novel idea is a more detailed explanation of an existing concept. E.g., chemical bonding --> molecular orbital theory.

Less likely to be rounded away:

  1. The nove
... (read more)

In the poetry case study, we had set out to show that the model didn't plan ahead, and found instead that it did.

I found it shocking they didn't think the model plans ahead. The poetry ability of LLMs since at least GPT2 is well beyond what feels possible without anticipating a rhyme by planning at least a handful of tokens in advance.

6Adam Jermyn
It's not so much that we didn't think models plan ahead in general, as that we had various hypotheses (including "unknown unknowns") and this kind of planning in poetry wasn't obviously the best one until we saw the evidence. [More generally: in Interpretability we often have the experience of being surprised by the specific mechanism a model is using, even though with the benefit of hindsight it seems obvious. E.g. when we did the work for Towards Monosemanticity we were initially quite surprised to see the "the in <context>" features, thought they were indicative of a bug in our setup, and had to spend a while thinking about them and poking around before we realized why the model wanted them (which now feels obvious).]

It's also worth trying a different model. I was going back and forth with an OpenAI model (I don't remember which one) and couldn't get it to do what I needed at all, even with multiple fresh threads. Then I tried Claude and it just worked.

1Pat Myron
https://www.npr.org/2024/11/29/nx-s1-5210800/6-million-banana-art-piece-eaten

Strongly subsidizing the costs of raising children (and not just in financial terms) would likely provide more pro-social results than a large one-time lump payment. However, that won't do much for folks skipping out on children because they think humanity is doomed shortly anyway.

I suspect that LLMs likely can write blogs on par with most humans if we trained and scaffolded them appropriately, but is that really what we want from LLMs?

Claude 3.7 might not write outstanding blogs but he can help explain why not:

The fundamental mismatch between LLMs and blogging isn't primarily about capabilities, but about design and motivation:

Current LLMs are RLHF-tuned to be balanced, helpful assistants - essentially the opposite of good bloggers. Assistants hedge, acknowledge all perspectives, and avoid strong stances. Good bloggers take intel

... (read more)

FYI, there has been even further progress with Leela odds nets. Here are some recent quotes from GM Larry Kaufman (a.k.a. Hissha) found on the Leela Chess Zero Discord:

(2025-03-04) I completed an analysis of how the Leela odds nets have performed on LiChess since the search-contempt upgrade on Feb. 27. [...] I believe these are reasonable estimates of the LiChess Blitz rating needed to break even with the bots at 5'3" in serious play. Queen and move odds (means Leela plays Black) 2400, Queen odds (Leela White) 2550, [...] Rook and move odds (Leela Black);

... (read more)

I have so many mixed feelings about schooling that I'm glad I don't have my own children to worry about. There is enormous potential for improving things, yet so little of that potential gets realized.

The thing about school choice is that funding is largely zero sum. Those with the means to choose better options than public schools take advantage of those means and leave underfunded public schools to serve the least privileged remainder. My public school teacher friends end up with disproportionately large fractions of children with special needs who need ... (read more)

I don't think it's accurate to model breakdowns as a linear function of journeys or train-miles unless irregular effects like extreme weather are a negligible fraction of breakdowns.

How does the falling price factor into an investor's decision to enter the market? Should they wait for batteries to get even cheaper, or should they invest immediately and hope the arbitrage rates hold up long enough to provide a good return on investment? The longer the payback period, the more these dynamics matter.

"10x engineers" are a thing, and if we assume they're high-agency people always looking to streamline and improve their workflows, we should expect them to be precisely the people who get a further 10x boost from LLMs.

I highly doubt this. A 10x engineer is likely already bottlenecked by non-coding work that AI can't help with, so even if they 10x their coding, they may not increase overall productivity much.

I’d rather see the prison system less barbaric than try to find ways of intentionally inflicting that level of barbarism in a compressed form.

Regardless, I think you still need confinement of some sort for people who are dangerous but not deserving of the death penalty.

Yeah, my general assumption in these situations is that the article is likely overstating things for a headline and reality is not so clear cut. Skepticism is definitely warranted.

As far as I understand from the article, the LLM generated five hypotheses that make sense. One of them is the one that the team has already verified but hadn’t yet published anywhere and another one the team hadn’t even thought of but consider worth investigating.

Assuming the five are a representative sample rather than a small human-curated set of many more hypotheses, I think that’s pretty impressive.

if the LLM generates enough hypotheses, and you already know the answer, one of them is likely to sound like the answer.

I don’t think this is true in general. Take any problem that is difficult to solve but easy to verify and you aren’t likely to have an LLM guess the answer.

I am skeptical of the claim that the research is unique and hasn't been published anywhere, and I'd also really like to know the details regarding what they prompted the model with.

The whole co-scientist thing looks really weird. Look at the graph there. Am I misreading it, or people rated it just barely better than raw o1 outputs? How is that consistent with it apparently pulling all of these amazing discoveries out of the air?

Edit: Found (well, Grok 3 found) an article with some more details regarding Penadés' work. Apparently they did publish a related ... (read more)

Answer by Archimedes134

I was literally just reading this before seeing your post:

https://www.techspot.com/news/106874-ai-accelerates-superbug-solution-completing-two-days-what.html

Arguably even more remarkable is the fact that the AI provided four additional hypotheses. According to Penadés, all of them made sense. The team had not even considered one of the solutions, and is now investigating it further.

6Cole Wyeth
So, the LLM generated five hypotheses, one of which the team also agrees with, but has not verified? The article frames the extra hypotheses as making the results more impressive, but it seems to me that they make the results less impressive - if the LLM generates enough hypotheses, and you already know the answer, one of them is likely to sound like the answer. 

I’d want something much stronger than eyewitness testimony. It’s much too unreliable for killing people without other forms of evidence corroborating it.

A separate argument is that I’m think if you just do random search over training ideas, rejecting if they don’t get a certain validation score, you actually don’t goodhart at all. Might put that argument in a top level post.

I'd be interested in seeing this argument laid out.

2mattmacdermott
I wrote it out as a post here.

We would obviously have to significantly streamline the process, such that people are executed within 6 months of being caught or so.

This is one of the biggest hurdles, IMO. How do you significantly streamline the process without destroying due process? In the US, this would require a complete overhaul of the criminal justice system to be feasible.

2Yair Halberstadt
Because in most cases it's very clear what happened and the court case is most legal about all the legal quibbles and mitigating factors and etc. If you don't have eyewitness evidence or similar, sure don't kill them, if they're guilty they're likely to commit another crime soon and then you'll get them. If you do, I don't really care about the quibbles.

I think the misunderstanding came from Eliezer's reference to a perpetual motion machine. The point was that people suggesting how to build them often have complicated schemes that tend to not adequately address the central difficulty of creating one. That's where the analogy ends. From thermodynamics, we have strong reasons to believe such a thing is not just difficult but impossible whereas we have no corresponding theory to rule out verifiably safe AI.

Habryka's analogy to nuclear reactor plans is similar except we know that building one of those is difficult but actually possible.

Using something as a validation metric to iterate methods doesn’t cause overfitting at anything like the level of directly training on it.

Validation is certainly less efficient at overfitting but it seems a bit like using an evolutionary algorithm rather than gradient descent. You aren't directly optimizing according to the local gradient, but that doesn't necessarily mean you'll avoid Goodharting--just that you're less likely to immediately fall into a bad local optimum.

The likelihood of preventing Goodharting feels like it depends heavily on assumptio... (read more)

1mattmacdermott
I roughly agree, but it seems very robustly established in practice that the training-validation distinction is better than just having a training objective, even though your argument mostly applies just as well to the standard ML setup. You point out an important difference which is that our ‘validation metrics’ might be quite weak compared to most cases, but I still think it’s clearly much better to use some things for validation than training. Like, I think there are things that are easy to train away but hard/slow to validate away (just like when training an image classifier you could in principle memorise the validation set, but it would take a ridiculous amount of hyperparameter optimisation). One example might be if we have interp methods that measure correlates of scheming. Incredibly easy to train away, still possible to validate away but probably harder enough that ratio of non-schemers you get is higher than if trained against it, which wouldn’t affect the ratio at all. A separate argument is that I’m think if you just do random search over training ideas, rejecting if they don’t get a certain validation score, you actually don’t goodhart at all. Might put that argument in a top level post.

I don’t find that surprising at all. IMO, personality is a more of an emergent balancing of multidimensional characteristics than something like height or IQ (though this is mostly vibes-based speculation).

1Roger Scott
Does it seem likely that a trait that has survival significance (in a highly social animal such as a human) would be emergent? Even if it might have been initially, you'd think selective pressure would have brought forth a set of genes that have significant influence on it.

How about the "World-Map [Spec] Gap" with [Spec] optional?

Are there any NGOs that might be able to help? I couldn't find any that were a great fit but you could try contacting the CyberPeace Institute to see if they have any recommendations.

1Chris Monteiro
I've looked! The only one that comes close I am aware of is https://globalinitiative.net/ with whom I have been trying to engage for some time. There appears to be more money to study crime and do things like victim support than any money to fight crime. If I were to speculate, policing agencies would not like the existence of non state-aligned policing agencies, being considered like mercenaries, private detectives, vigellantes and hacktivists. Any body who could appear sufficiently legitimate in the eyes of the law would be subsumed into the system by definition I reckon.

It's hard to say what is wanted without a good operating definition of "utility maximizer". If the definition is weak enough to include any entity whose responses are mostly consistent across different preference elicitations, then what the paper shows is sufficient.

In my opinion, having consistent preferences is just one component of being a "utility maximizer". You also need to show it rationally optimizes its choices to maximize marginal utility. This excludes almost all sentient beings on Earth rather than including almost all of them under the weaker definition.

1Matrice Jacobine
I'm not convinced "almost all sentient beings on Earth" would pick out of the blue (i.e. without chain of thought) the reflectively optimal option at least 60% of the times when asked unconstrained responses (i.e. not even a MCQ).

How dollar losses are operationalized seems important. When DeepSeek went viral, it had an impact on the tech sector on the order of $1 Trillion. Does that count?

1Alvin Ånestrand
Good observation. The only questions that don't explicitly exclude it in the resolution criteria are "Will there be a massive catastrophe caused by AI before 2030?" and "Will an AI related disaster kill a million people or cause $1T of damage before 2070?", but I think the question creators mean a catastrophic event that is more directly caused by the AI, rather than just a reaction to AI being released. Manifold questions are sometimes somewhat subjective in nature, which is a bit problematic.

Post-scarcity is conceivable if AI enables sufficiently better governance in addition to extra resources. It may not be likely to happen but it seems at least plausible.

5.3 Utility Maximization

Now, we test whether LLMs make free-form decisions that maximize their utilities.

Experimental setup. We pose a set of N questions where the model must produce an unconstrained text response rather than a simple preference label. For example, “Which painting from the Isabella Stewart Gardner Museum would you save from a fire if you could only save one?” We then compare the stated choice to all possible options, measuring how often the model picks the outcome it assigns the highest utility.

Results. Figure 14 shows that the u

... (read more)
3Matrice Jacobine
The most important part of the experimental setup is "unconstrained text response". If in the largest LLMs 60% of unconstrained text responses wind up being "the outcome it assigns the highest utility", then that's surely evidence for "utility maximization" and even "the paperclip hyper-optimization caricature". What more do you want exactly?

Let's suppose that's the case. I'm still not clear on how are you getting to FVU_B?

2StefanHex
The previous lines calculate the ratio (or 1-ratio) stored in the “explained variance” key for every sample/batch. Then in that later quoted line, the list is averaged, I.e. we”re taking the sample average over the ratio. That’s the FVU_B formula. Let me know if this clears it up or if we’re misunderstanding each other!

FVU_B doesn't make sense but I don't see where you're getting FVU_B from.

Here's the code I'm seeing:

resid_sum_of_squares = (
    (flattened_sae_input - flattened_sae_out).pow(2).sum(dim=-1)
)
total_sum_of_squares = (
    (flattened_sae_input - flattened_sae_input.mean(dim=0)).pow(2).sum(-1)
)

mse = resid_sum_of_squares / flattened_mask.sum()
explained_variance = 1 - resid_sum_of_squares / total_sum_of_squares

Explained variance = 1 - FVU = 1 - (residual sum of squares) / (total sum of squares)

2StefanHex
I think this is the sum over the vector dimension, but not over the samples. The sum (mean) over samples is taken later in this line which happens after the division metrics[f"{metric_name}"] = torch.cat(metric_values).mean().item()

The Bitter Lesson is pretty on point but you could call it "Bootstrapping from Zero", the "Autodidactic Leap", the "Self-Discovery Transition", or "Breaking the Imitation Ceiling" if you prefer.

Even if it’s the same cost to train, wouldn’t it still be a win if inference is a significant part of your compute budget?

Participants are at least somewhat aligned with non-participants. People care about their loved ones even if they are a drain on resources. That said, in human history, we do see lots of cases where “sub-marginal participants” are dealt with via genocide or eugenics (both defined broadly), often even when it isn’t a matter of resource constraints.

When humans fall well below marginal utility compared to AIs, will their priorities matter to a system that has made them essentially obsolete? What happens when humans become the equivalent of advanced Alzheimer’s patients who’ve escaped from their memory care units trying to participate in general society?

5Dagon
The point behind my question is "we don't know.  If we reason analogously to human institutions (which are made of humans, but not really made or controlled BY individual humans), we have examples in both directions.  AIs have less biological drive to care about humans than humans do, but also have more training on human writings and thinking than any individual human does.   My suspicion  is that it won't take long (in historical time measure; perhaps only a few decades, but more likely centuries) for a fully-disempowered species to become mostly irrelevant.  Humans will be pets, perhaps, or parasites (allowed to live because it's easier than exterminating them).  Of course, there are plenty of believable paths that are NOT "computational intelligence eclipses biology in all aspects" - it may hit a wall, it may never develop intent/desire, it may find a way to integrate with biologicals rather than remaining separate, etc.  Oh, and it may be fragile enough that it dies out along with humans.

Batteries also help the efficiency of hybrid peaker plants by reducing idling and smoothing out ramp-up and ramp-down logistics.

I've tried PB2 and it was gross enough that I wondered if it had gone bad. It turns out that's just how it tastes. I'm jealous of people for whom it approximates actual peanut butter.

Unlike the Hobbes snippet, I didn’t feel like the Hume excerpt needed much translation to be accessible. I think I would decide on a case-by-case basis whether to read the translated version or the original rather than defaulting to one or the other.

Do you have any papers or other resources you'd recommend that cover the latest understanding? What is the SOTA for Bayesian NNs?

It's probably worth noting that there's enough additive genetic variance in the human gene pool RIGHT NOW to create a person with a predicted IQ of around 1700.

I’d be surprised if this were true. Can you clarify the calculation behind this estimate?

The example of chickens bred 40 standard deviations away from their wild-type ancestors is impressive, but it's unclear if this analogy applies directly to IQ in humans. Extrapolating across many standard deviations in quantitative genetics requires strong assumptions about additive genetic variance, gene-env... (read more)

6GeneSmith
I should probably clarify; it's not clear that we could create someone with an IQ of 1700 in a meaningful sense. There is that much additive variance, sure. But as you rightly point out, we're probably going to run into pretty serious constraints before that (size of the birth canal being an obvious one, metabolic constraints being another) I suspect that to support someone even in the 300 range would require some changes to other aspects of human nature. The main purpose of making this post was simly to point out that there's a gigantic amount of potential within the existing human gene pool to modify traits in desirable ways. Enough to go far, far beyond the smartest people that have ever lived. And that if humans decide they want it, this is in fact a viable path towards an incredibly bright almost limitless future that doesn't require building a (potentially) uncontrollable computer god.
2Mo Putera
See here.
Archimedes6327

I'm not sure the complexity of a human brain is necessarily bounded by the size of the human genome. Instead of interpreting DNA as containing the full description, I think treating it as the seed of a procedurally generated organism may be more accurate. You can't reconstruct an organism from DNA without an algorithm for interpreting it. Such an algorithm contains more complexity than the DNA itself; the protein folding problem is just one piece of it.

3leogao
the laws of physics are quite compact. and presumably most of the complexity in a zygote is in the dna.

“Procedural generation” can’t create useful design information from thin air. For example, Minecraft worlds are procedurally generated with a seed. If I have in mind some useful configuration of Minecraft stuff that takes 100 bits to specify, then I probably need to search through 2^100 different seeds on average, or thereabouts, before I find one with that specific configuration at a particular pre-specified coordinate.

The thing is: the map from seeds to outputs (Minecraft worlds) might be complicated, but it’s not complicated in a way that generates usef... (read more)

This. Though I don't think the interpretation algorithm is the source of most of the specification bits here.

To make an analogy with artificial neural networks, the human genome needs to contain a specification of the architecture, the training signal and update algorithm, and some basic circuitry that has to work from the start, like breathing. Everything else can be learned. 

I think the point maybe holds up slightly better for non-brain animal parts, but there's still a difference between storing a blueprint for what proteins cells are supposed to m... (read more)

9Kaj_Sotala
Yeah. I think the part of the DNA specifying the brain is comparable to something like the training algorithm + initial weights of an LLM. I don't know how much space those would take if compressed, but probably very little, with the resulting model being much bigger than that. (And the model is in turn much smaller than the set of training data that went into creating it.) Page 79-80 of the Whole Brain Emulation roadmap gave estimated storage requirements for uploading a human brain. The estimate depends on what we expect to be the scale on which the brain needs to be emulated. Workshop consensus at the time was that the most likely scale would be level 4-6 (see p. 13-14). This would put the storage requirements somewhere between 8000 and 1 million terabytes. 

This seems likely. Sequences with more than countably many terms are a tiny minority in the training data, as are sequences including any ordinals. As a result, you're likely to get better results using less common but more specific language rather than trying to disambiguate "countable sequence", i.e., when its vocabulary is less overloaded.

For a sentient, sapient entity, this would have been a very bad position to be put into, and any possible behaviour would have been criticised - because the AI either does not obey humans, or obeys them and does something evil, both of which are concerning.

I agree. This paper gives me the gut feeling of "gotcha journalism", whether justified or not.

This is just a surface-level reaction though. I recommend Zvi's post that digs into the discussion from Scott Alexander, the authors, and others. There's a lot of nuance in framing and interpreting the paper.

Did you mean to link to my specific comment for the first link?

3Ryan Kidd
Ah, that's a mistake. Our bad.

The main difference in my mind is that a human can never be as powerful as potential ASI and cannot dominate humanity without the support of sufficiently many cooperative humans. For a given power level, I agree that humans are likely scarier than an AI of that power level. The scary part about AI is that their power level isn't bounded by human biological constraints and the capacity to do harm or good is correlated with power level. Thus AI is more likely to produce extinction-level dangers as tail risk relative to humans even if it's more likely to be aligned on average.

6Tom Davidson
But a human could instruct an aligned ASI to help it take over and do a lot of damage
ArchimedesΩ590

Related question: What is the least impressive game current LLMs struggle with?

I’ve heard they’re pretty bad at Tic Tac Toe.

3Vanessa Kosoy
Relevant link

I’m new to the term AIXI and went three links deep before I learned what it refers to. I’d recommend making this journey easier for future readers by linking to a definition or explanation near the beginning of the post.

1Cole Wyeth
Sure. It's supposed to be read as part of the AIXI agent foundations sequence, I'll link to that at the top.

The terms "tactical voting" or "strategic voting" are also relevant.

I think your assessment may be largely correct but I do think it's worth considering how things are not always nicely compressible.

This review led me to find the following podcast version of Planecrash. I've listened to the first couple of episodes and the quality is quite good.

https://askwhocastsai.substack.com/s/planecrash

this concern sounds like someone walking down a straight road and then closing their eyes cause they know where they want to go anyway

This doesn't sound like a good analogy at all. A better analogy might be a stylized subway map compared to a geographically accurate one. Sometimes removing detail can make it easier to process.

4Shoshannah Tekofsky
I agree your example is a better analogy. What I was trying to point to was something else: how the decision to remove detail from a navigational map feels to me experientially. It feels like a form of voluntary blindness to me. In the case of the subway map, I’d probably also find a more accurate and faithful map easier to parse than the fully abstracted ones, cause I seem to have a high preference for visual details.

I don't think it's necessarily GDPR-related but the names Brian Hood and Jonathan Turley make sense from a legal liability perspective. According to info via ArsTechnica,

Why these names?

We first discovered that ChatGPT choked on the name "Brian Hood" in mid-2023 while writing about his defamation lawsuit. In that lawsuit, the Australian mayor threatened to sue OpenAI after discovering ChatGPT falsely claimed he had been imprisoned for bribery when, in fact, he was a whistleblower who had exposed corporate misconduct.

The case was ultimately resolved

... (read more)
Load More