LESSWRONG
LW

All of Archimedes's Comments + Replies

AI #116: If Anyone Builds It, Everyone Dies

It may or may not be the first steps toward foom, but automated improvements are still improvements regardless of how "innovative-in-themselves" we consider them to be. Improving on an algorithm that's been the SOTA since 1969 is cool, even if it was done purely via brute force.

For now, it looks like it "only" found minor improvements on various SOTA, but this was done with previous generation models (a mix of Gemini 2.0 Flash and Pro)^[1]. I'd expect next-gen models and next-gen scaffolds to be another step up.

Models used. AlphaEvolve employs an ensemb

... (read more)

7Thane Ruthenis6h

You're not wrong, but... The paper does not display any capabilities we've previously been unaware of. "AI produces innovations", it's touted as, as if AI leveraged research taste and creativity to improve on the human state-of-the-art; as if the henceforth-unattained holy grail of LLMs-reliably-producing-innovations has finally been found. But in actuality, it's "LLM straightforwardly optimizes/improves a codebase that transforms compute into improvements to the SOTA". We already knew LLMs can do that sometimes. The issue isn't that the improvements are minor. It's that the AIs' ability to make those improvements in this setting has ~nothing to do with AIs' ability to output innovations in other settings. It's not really a conceptual-research task. Granted: The steelman here is that this setting is also the setting of DL research, so this can potentially lead to RSI... But now we run into a Catch-22. If LLMs are in fact not capable of reliably finding nontrivial open-domain discoveries, if they could only do so in those limited settings, then no realistic amount of "recursive self-improvement" of LLMs would result in an actual Singularity.

Archimedes's Shortform

Archimedes7d20

Compute is definitely important for experiments. The limits undoubtedly slow China's progress, but what's more difficult to determine is whether global progress is slower or not. In the toy scenario where China's research focus is exactly parallel and duplicative of Western efforts, they contribute nothing to global progress unless they are faster. More realistically, research space is high-dimensional, and you are likely correct that the decreased magnitude of their research vector likely outweighs any extra orthogonality benefits, but I don't know how to apply numbers to that tradeoff.

Archimedes's Shortform

Archimedes9d40

Limiting China's computing power via export controls on hardware like GPUs might be accelerating global progress in AI capabilities.

When Chinese labs are compute-starved, their research will differentially focus on efficiency gains compared to counterfactual universes where they are less limited. So far, they've been publishing their research, and their tricks can be quickly be incorporated by anyone else. US players can leverage their compute power, focusing on experiments and scaling while effectively delegating research topics that China is motivated to handle.

Google and OpenAI benefit far more from DeepSeek than they do from Meta.

7ryan_greenblatt9d

One of the main drivers, perhaps the main driver[1], of algorithmic progress is compute for experiments. It seems unlikely that the effect you note could compensate for the reduced pace of capabilities progress. ---------------------------------------- 1. Both labor and compute have been scaled up over the last several years at big AI companies. My understanding is the scaling in compute was more important for algorithmic progress as it is hard to parallelize labor, the marginal employee is somewhat worse, the number of employees has been growing slower than compute, and the returns to compute vs faster serial labor seem similar at current margins. That's not to say employees don't matter, I'd guess Meta is substantially held back by worse employees (and maybe worse management). ↩︎

romeo's Shortform

Archimedes21d80

Epoch.ai just released this today:

https://epoch.ai/gradient-updates/the-case-for-multi-decade-ai-timelines

Excerpt:

I would summarize the key cruxes that separate my views from people who have shorter timelines as follows:
I don’t see the trends that one would extrapolate in order to arrive at very short timelines on the order of a few years. The obvious trend extrapolations for AI’s economic impact give timelines to full remote work automation of around a decade, and I expect these trends to slow down by default.
I don’t buy the software-only singularity as a

... (read more)

Vladimir_Nesov20d229

The extensive discussion of trends in global datacenter/Nvidia revenue shows that the framing considers human economy as a whole as the system driving eventual AI takeoff, that there are always essential complementary inputs that can't be abstracted out.

Software-only singularity is about considering scaling laws for a different system that is not the entire economy and whose relevant inputs are specific AIs (varying in their capabilities and compute efficiency) and the novel software and cultural knowledge they are producing, rather than more material form... (read more)

Policy for LLM Writing on LessWrong

Archimedes1mo10

Can you update the instructions? The following doesn't work.

<summary> Summary here </summary>

Details here

</details>

This does work

+++ Summary here

Details here

+++

like this:

Summary here

Details here

2habryka1mo

You are using the Markdown editor, which many fewer users use. The instructions are correct for the WYSIWYG editor (seems fine to add a footnote explaining the different syntax for Markdown).

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Archimedes1mo70

For the 3x3 grid problem, Gemini 2.5 immediately went to a breadth-first search but didn't actually implement it and return the correct solution.

Gemini 2.5 Pro Preview output:

Okay, let's solve this step-by-step.

This is a sliding tile puzzle on a 3x3 grid with 6 named tiles and 3 empty spaces. We need to find the minimum number of moves to get from the initial configuration to the target configuration. A move consists of sliding a tile into an orthogonally adjacent empty square.

The best algorithm for finding the shortest path in terms of moves in an unweigh

... (read more)

2Dusto1mo

Interesting, unfortunately in the way I keep seeing as well. It has tricks, but it doesn't know how to use them properly.

Conceptual Rounding Errors

Archimedes2mo02

I may feel smug if the "novel idea" is basically a worse version of an existing one, but there are more interesting possibilities to probe for.

The novel idea is a meaningful extension/generalization of an existing concept. E.g., Riemann --> Lebesgue integration
The novel idea is equivalent to an existing concept but formulated differently. E.g., Newton and Leibniz versions of calculus.
The novel idea is a more detailed explanation of an existing concept. E.g., chemical bonding --> molecular orbital theory.

Less likely to be rounded away:

The nove

Archimedes2mo148

In the poetry case study, we had set out to show that the model didn't plan ahead, and found instead that it did.

I found it shocking they didn't think the model plans ahead. The poetry ability of LLMs since at least GPT2 is well beyond what feels possible without anticipating a rhyme by planning at least a handful of tokens in advance.

6Adam Jermyn2mo

It's not so much that we didn't think models plan ahead in general, as that we had various hypotheses (including "unknown unknowns") and this kind of planning in poetry wasn't obviously the best one until we saw the evidence. [More generally: in Interpretability we often have the experience of being surprised by the specific mechanism a model is using, even though with the benefit of hindsight it seems obvious. E.g. when we did the work for Towards Monosemanticity we were initially quite surprised to see the "the in <context>" features, thought they were indicative of a bug in our setup, and had to spend a while thinking about them and poking around before we realized why the model wanted them (which now feels obvious).]

Recent AI model progress feels mostly like bullshit

Archimedes2mo10

It's also worth trying a different model. I was going back and forth with an OpenAI model (I don't remember which one) and couldn't get it to do what I needed at all, even with multiple fresh threads. Then I tried Claude and it just worked.

Will Jesus Christ return in an election year?

Archimedes2mo32

Yep. Meme NFTs are an existence proof of such people.

https://en.wikipedia.org/wiki/List_of_most_expensive_non-fungible_tokens

1Pat Myron2mo

https://www.npr.org/2024/11/29/nx-s1-5210800/6-million-banana-art-piece-eaten

Have you actually tried raising the birth rate?

Archimedes2mo10

Strongly subsidizing the costs of raising children (and not just in financial terms) would likely provide more pro-social results than a large one-time lump payment. However, that won't do much for folks skipping out on children because they think humanity is doomed shortly anyway.

when will LLMs become human-level bloggers?

Archimedes2mo20

I suspect that LLMs likely can write blogs on par with most humans if we trained and scaffolded them appropriately, but is that really what we want from LLMs?

Claude 3.7 might not write outstanding blogs but he can help explain why not:

The fundamental mismatch between LLMs and blogging isn't primarily about capabilities, but about design and motivation:
Current LLMs are RLHF-tuned to be balanced, helpful assistants - essentially the opposite of good bloggers. Assistants hedge, acknowledge all perspectives, and avoid strong stances. Good bloggers take intel

Archimedes2mo180

FYI, there has been even further progress with Leela odds nets. Here are some recent quotes from GM Larry Kaufman (a.k.a. Hissha) found on the Leela Chess Zero Discord:

(2025-03-04) I completed an analysis of how the Leela odds nets have performed on LiChess since the search-contempt upgrade on Feb. 27. [...] I believe these are reasonable estimates of the LiChess Blitz rating needed to break even with the bots at 5'3" in serious play. Queen and move odds (means Leela plays Black) 2400, Queen odds (Leela White) 2550, [...] Rook and move odds (Leela Black);

Archimedes2mo10

I have so many mixed feelings about schooling that I'm glad I don't have my own children to worry about. There is enormous potential for improving things, yet so little of that potential gets realized.

The thing about school choice is that funding is largely zero sum. Those with the means to choose better options than public schools take advantage of those means and leave underfunded public schools to serve the least privileged remainder. My public school teacher friends end up with disproportionately large fractions of children with special needs who need ... (read more)

Yair Halberstadt's Shortform

Archimedes2mo10

I don't think it's accurate to model breakdowns as a linear function of journeys or train-miles unless irregular effects like extreme weather are a negligible fraction of breakdowns.

Energy Markets Temporal Arbitrage with Batteries

Archimedes2mo10

How does the falling price factor into an investor's decision to enter the market? Should they wait for batteries to get even cheaper, or should they invest immediately and hope the arbitrage rates hold up long enough to provide a good return on investment? The longer the payback period, the more these dynamics matter.

How Much Are LLMs Actually Boosting Real-World Programmer Productivity?

Archimedes2mo82

"10x engineers" are a thing, and if we assume they're high-agency people always looking to streamline and improve their workflows, we should expect them to be precisely the people who get a further 10x boost from LLMs.

I highly doubt this. A 10x engineer is likely already bottlenecked by non-coding work that AI can't help with, so even if they 10x their coding, they may not increase overall productivity much.

The case for corporal punishment

Archimedes3mo10

I’d rather see the prison system less barbaric than try to find ways of intentionally inflicting that level of barbarism in a compressed form.

Regardless, I think you still need confinement of some sort for people who are dangerous but not deserving of the death penalty.

Have LLMs Generated Novel Insights?

Archimedes3mo60

Yeah, my general assumption in these situations is that the article is likely overstating things for a headline and reality is not so clear cut. Skepticism is definitely warranted.

Have LLMs Generated Novel Insights?

Archimedes3mo94

As far as I understand from the article, the LLM generated five hypotheses that make sense. One of them is the one that the team has already verified but hadn’t yet published anywhere and another one the team hadn’t even thought of but consider worth investigating.

Assuming the five are a representative sample rather than a small human-curated set of many more hypotheses, I think that’s pretty impressive.

if the LLM generates enough hypotheses, and you already know the answer, one of them is likely to sound like the answer.

I don’t think this is true in general. Take any problem that is difficult to solve but easy to verify and you aren’t likely to have an LLM guess the answer.

Thane Ruthenis3mo164

I am skeptical of the claim that the research is unique and hasn't been published anywhere, and I'd also really like to know the details regarding what they prompted the model with.

The whole co-scientist thing looks really weird. Look at the graph there. Am I misreading it, or people rated it just barely better than raw o1 outputs? How is that consistent with it apparently pulling all of these amazing discoveries out of the air?

Edit: Found (well, Grok 3 found) an article with some more details regarding Penadés' work. Apparently they did publish a related ... (read more)

Have LLMs Generated Novel Insights?

Answer by ArchimedesFeb 23, 2025134

I was literally just reading this before seeing your post:

https://www.techspot.com/news/106874-ai-accelerates-superbug-solution-completing-two-days-what.html

Arguably even more remarkable is the fact that the AI provided four additional hypotheses. According to Penadés, all of them made sense. The team had not even considered one of the solutions, and is now investigating it further.

6Cole Wyeth3mo

So, the LLM generated five hypotheses, one of which the team also agrees with, but has not verified? The article frames the extra hypotheses as making the results more impressive, but it seems to me that they make the results less impressive - if the LLM generates enough hypotheses, and you already know the answer, one of them is likely to sound like the answer.

The case for the death penalty

Archimedes3mo10

I’d want something much stronger than eyewitness testimony. It’s much too unreliable for killing people without other forms of evidence corroborating it.

How might we safely pass the buck to AI?

Archimedes3mo10

A separate argument is that I’m think if you just do random search over training ideas, rejecting if they don’t get a certain validation score, you actually don’t goodhart at all. Might put that argument in a top level post.

I'd be interested in seeing this argument laid out.

2mattmacdermott2mo

I wrote it out as a post here.

The case for the death penalty

Archimedes3mo10

We would obviously have to significantly streamline the process, such that people are executed within 6 months of being caught or so.

This is one of the biggest hurdles, IMO. How do you significantly streamline the process without destroying due process? In the US, this would require a complete overhaul of the criminal justice system to be feasible.

2Yair Halberstadt3mo

Because in most cases it's very clear what happened and the court case is most legal about all the legal quibbles and mitigating factors and etc. If you don't have eyewitness evidence or similar, sure don't kill them, if they're guilty they're likely to commit another crime soon and then you'll get them. If you do, I don't really care about the quibbles.

How might we safely pass the buck to AI?

Archimedes3mo52

I think the misunderstanding came from Eliezer's reference to a perpetual motion machine. The point was that people suggesting how to build them often have complicated schemes that tend to not adequately address the central difficulty of creating one. That's where the analogy ends. From thermodynamics, we have strong reasons to believe such a thing is not just difficult but impossible whereas we have no corresponding theory to rule out verifiably safe AI.

Habryka's analogy to nuclear reactor plans is similar except we know that building one of those is difficult but actually possible.

How might we safely pass the buck to AI?

Archimedes3mo20

Using something as a validation metric to iterate methods doesn’t cause overfitting at anything like the level of directly training on it.

Validation is certainly less efficient at overfitting but it seems a bit like using an evolutionary algorithm rather than gradient descent. You aren't directly optimizing according to the local gradient, but that doesn't necessarily mean you'll avoid Goodharting--just that you're less likely to immediately fall into a bad local optimum.

The likelihood of preventing Goodharting feels like it depends heavily on assumptio... (read more)

1mattmacdermott3mo

I roughly agree, but it seems very robustly established in practice that the training-validation distinction is better than just having a training objective, even though your argument mostly applies just as well to the standard ML setup. You point out an important difference which is that our ‘validation metrics’ might be quite weak compared to most cases, but I still think it’s clearly much better to use some things for validation than training. Like, I think there are things that are easy to train away but hard/slow to validate away (just like when training an image classifier you could in principle memorise the validation set, but it would take a ridiculous amount of hyperparameter optimisation). One example might be if we have interp methods that measure correlates of scheming. Incredibly easy to train away, still possible to validate away but probably harder enough that ratio of non-schemers you get is higher than if trained against it, which wouldn’t affect the ratio at all. A separate argument is that I’m think if you just do random search over training ideas, rejecting if they don’t get a certain validation score, you actually don’t goodhart at all. Might put that argument in a top level post.

How to Make Superbabies

Archimedes3mo1-1

I don’t find that surprising at all. IMO, personality is a more of an emergent balancing of multidimensional characteristics than something like height or IQ (though this is mostly vibes-based speculation).

1Roger Scott3mo

Does it seem likely that a trait that has survival significance (in a highly social animal such as a human) would be emergent? Even if it might have been initially, you'd think selective pressure would have brought forth a set of genes that have significant influence on it.

Quinn's Shortform

Archimedes3mo10

How about the "World-Map [Spec] Gap" with [Spec] optional?

Murder plots are infohazards

Archimedes3mo20

Are there any NGOs that might be able to help? I couldn't find any that were a great fit but you could try contacting the CyberPeace Institute to see if they have any recommendations.

1Chris Monteiro3mo

I've looked! The only one that comes close I am aware of is https://globalinitiative.net/ with whom I have been trying to engage for some time. There appears to be more money to study crime and do things like victim support than any money to fight crime. If I were to speculate, policing agencies would not like the existence of non state-aligned policing agencies, being considered like mercenaries, private detectives, vigellantes and hacktivists. Any body who could appear sufficiently legitimate in the eyes of the law would be subsumed into the system by definition I reckon.

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Archimedes3mo10

It's hard to say what is wanted without a good operating definition of "utility maximizer". If the definition is weak enough to include any entity whose responses are mostly consistent across different preference elicitations, then what the paper shows is sufficient.

In my opinion, having consistent preferences is just one component of being a "utility maximizer". You also need to show it rationally optimizes its choices to maximize marginal utility. This excludes almost all sentient beings on Earth rather than including almost all of them under the weaker definition.

1Matrice Jacobine3mo

I'm not convinced "almost all sentient beings on Earth" would pick out of the blue (i.e. without chain of thought) the reflectively optimal option at least 60% of the times when asked unconstrained responses (i.e. not even a MCQ).

Probability of AI-Caused Disaster

Archimedes3mo10

How dollar losses are operationalized seems important. When DeepSeek went viral, it had an impact on the tech sector on the order of $1 Trillion. Does that count?

1Alvin Ånestrand3mo

Good observation. The only questions that don't explicitly exclude it in the resolution criteria are "Will there be a massive catastrophe caused by AI before 2030?" and "Will an AI related disaster kill a million people or cause $1T of damage before 2070?", but I think the question creators mean a catastrophic event that is more directly caused by the AI, rather than just a reaction to AI being released. Manifold questions are sometimes somewhat subjective in nature, which is a bit problematic.

Sterrs's Shortform

Archimedes3mo60

Post-scarcity is conceivable if AI enables sufficiently better governance in addition to extra resources. It may not be likely to happen but it seems at least plausible.

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Archimedes3mo20

5.3 Utility Maximization

Now, we test whether LLMs make free-form decisions that maximize their utilities.

Experimental setup. We pose a set of N questions where the model must produce an unconstrained text response rather than a simple preference label. For example, “Which painting from the Isabella Stewart Gardner Museum would you save from a fire if you could only save one?” We then compare the stated choice to all possible options, measuring how often the model picks the outcome it assigns the highest utility.

Results. Figure 14 shows that the u

... (read more)

3Matrice Jacobine3mo

The most important part of the experimental setup is "unconstrained text response". If in the largest LLMs 60% of unconstrained text responses wind up being "the outcome it assigns the highest utility", then that's surely evidence for "utility maximization" and even "the paperclip hyper-optimization caricature". What more do you want exactly?

StefanHex's Shortform

Archimedes3mo10

Let's suppose that's the case. I'm still not clear on how are you getting to FVU_B?

2StefanHex3mo

The previous lines calculate the ratio (or 1-ratio) stored in the “explained variance” key for every sample/batch. Then in that later quoted line, the list is averaged, I.e. we”re taking the sample average over the ratio. That’s the FVU_B formula. Let me know if this clears it up or if we’re misunderstanding each other!

StefanHex's Shortform

Archimedes3mo10

FVU_B doesn't make sense but I don't see where you're getting FVU_B from.

Here's the code I'm seeing:

resid_sum_of_squares = (
    (flattened_sae_input - flattened_sae_out).pow(2).sum(dim=-1)
)
total_sum_of_squares = (
    (flattened_sae_input - flattened_sae_input.mean(dim=0)).pow(2).sum(-1)
)

mse = resid_sum_of_squares / flattened_mask.sum()
explained_variance = 1 - resid_sum_of_squares / total_sum_of_squares

Explained variance = 1 - FVU = 1 - (residual sum of squares) / (total sum of squares)

2StefanHex3mo

I think this is the sum over the vector dimension, but not over the samples. The sum (mean) over samples is taken later in this line which happens after the division metrics[f"{metric_name}"] = torch.cat(metric_values).mean().item()

Jesse Hoogland's Shortform

Archimedes3mo40

The Bitter Lesson is pretty on point but you could call it "Bootstrapping from Zero", the "Autodidactic Leap", the "Self-Discovery Transition", or "Breaking the Imitation Ceiling" if you prefer.

Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable

Archimedes3mo30

Here are some interesting, at least tangentially relevant, sources I've managed to dig up:

Vladimir_Nesov's Shortform

Archimedes3mo10

Even if it’s the same cost to train, wouldn’t it still be a win if inference is a significant part of your compute budget?

Gradual Disempowerment, Shell Games and Flinches

Archimedes3mo50

Participants are at least somewhat aligned with non-participants. People care about their loved ones even if they are a drain on resources. That said, in human history, we do see lots of cases where “sub-marginal participants” are dealt with via genocide or eugenics (both defined broadly), often even when it isn’t a matter of resource constraints.

When humans fall well below marginal utility compared to AIs, will their priorities matter to a system that has made them essentially obsolete? What happens when humans become the equivalent of advanced Alzheimer’s patients who’ve escaped from their memory care units trying to participate in general society?

5Dagon3mo

The point behind my question is "we don't know. If we reason analogously to human institutions (which are made of humans, but not really made or controlled BY individual humans), we have examples in both directions. AIs have less biological drive to care about humans than humans do, but also have more training on human writings and thinking than any individual human does. My suspicion is that it won't take long (in historical time measure; perhaps only a few decades, but more likely centuries) for a fully-disempowered species to become mostly irrelevant. Humans will be pets, perhaps, or parasites (allowed to live because it's easier than exterminating them). Of course, there are plenty of believable paths that are NOT "computational intelligence eclipses biology in all aspects" - it may hit a wall, it may never develop intent/desire, it may find a way to integrate with biologicals rather than remaining separate, etc. Oh, and it may be fragile enough that it dies out along with humans.

2024 was the year of the big battery, and what that means for solar power

Archimedes3mo20

Batteries also help the efficiency of hybrid peaker plants by reducing idling and smoothing out ramp-up and ramp-down logistics.

5,000 calories of peanut butter every week for 3 years straight

Archimedes3mo20

I've tried PB2 and it was gross enough that I wondered if it had gone bad. It turns out that's just how it tastes. I'm jealous of people for whom it approximates actual peanut butter.

You should read Hobbes, Locke, Hume, and Mill via EarlyModernTexts.com

Archimedes4mo61

Unlike the Hobbes snippet, I didn’t feel like the Hume excerpt needed much translation to be accessible. I think I would decide on a case-by-case basis whether to read the translated version or the original rather than defaulting to one or the other.

Lucius Bushnaq's Shortform

Archimedes4mo30

Do you have any papers or other resources you'd recommend that cover the latest understanding? What is the SOTA for Bayesian NNs?

Yudkowsky on The Trajectory podcast

Archimedes4mo63

It's probably worth noting that there's enough additive genetic variance in the human gene pool RIGHT NOW to create a person with a predicted IQ of around 1700.

I’d be surprised if this were true. Can you clarify the calculation behind this estimate?

The example of chickens bred 40 standard deviations away from their wild-type ancestors is impressive, but it's unclear if this analogy applies directly to IQ in humans. Extrapolating across many standard deviations in quantitative genetics requires strong assumptions about additive genetic variance, gene-env... (read more)

6GeneSmith4mo

I should probably clarify; it's not clear that we could create someone with an IQ of 1700 in a meaningful sense. There is that much additive variance, sure. But as you rightly point out, we're probably going to run into pretty serious constraints before that (size of the birth canal being an obvious one, metabolic constraints being another) I suspect that to support someone even in the 300 range would require some changes to other aspects of human nature. The main purpose of making this post was simly to point out that there's a gigantic amount of potential within the existing human gene pool to modify traits in desirable ways. Enough to go far, far beyond the smartest people that have ever lived. And that if humans decide they want it, this is in fact a viable path towards an incredibly bright almost limitless future that doesn't require building a (potentially) uncontrollable computer god.

2Mo Putera4mo

See here.

Mechanisms too simple for humans to design

Archimedes4mo6528

I'm not sure the complexity of a human brain is necessarily bounded by the size of the human genome. Instead of interpreting DNA as containing the full description, I think treating it as the seed of a procedurally generated organism may be more accurate. You can't reconstruct an organism from DNA without an algorithm for interpreting it. Such an algorithm contains more complexity than the DNA itself; the protein folding problem is just one piece of it.

3leogao4mo

the laws of physics are quite compact. and presumably most of the complexity in a zygote is in the dna.

Steven Byrnes4mo276

“Procedural generation” can’t create useful design information from thin air. For example, Minecraft worlds are procedurally generated with a seed. If I have in mind some useful configuration of Minecraft stuff that takes 100 bits to specify, then I probably need to search through 2^100 different seeds on average, or thereabouts, before I find one with that specific configuration at a particular pre-specified coordinate.

The thing is: the map from seeds to outputs (Minecraft worlds) might be complicated, but it’s not complicated in a way that generates usef... (read more)

Lucius Bushnaq4mo184

This. Though I don't think the interpretation algorithm is the source of most of the specification bits here.

To make an analogy with artificial neural networks, the human genome needs to contain a specification of the architecture, the training signal and update algorithm, and some basic circuitry that has to work from the start, like breathing. Everything else can be learned.

I think the point maybe holds up slightly better for non-brain animal parts, but there's still a difference between storing a blueprint for what proteins cells are supposed to m... (read more)

9Kaj_Sotala4mo

Yeah. I think the part of the DNA specifying the brain is comparable to something like the training algorithm + initial weights of an LLM. I don't know how much space those would take if compressed, but probably very little, with the resulting model being much bigger than that. (And the model is in turn much smaller than the set of training data that went into creating it.) Page 79-80 of the Whole Brain Emulation roadmap gave estimated storage requirements for uploading a human brain. The estimate depends on what we expect to be the scale on which the brain needs to be emulated. Workshop consensus at the time was that the most likely scale would be level 4-6 (see p. 13-14). This would put the storage requirements somewhere between 8000 and 1 million terabytes.

Viliam's Shortform

Archimedes4mo32

This seems likely. Sequences with more than countably many terms are a tiny minority in the training data, as are sequences including any ordinals. As a result, you're likely to get better results using less common but more specific language rather than trying to disambiguate "countable sequence", i.e., when its vocabulary is less overloaded.

Alignment Faking in Large Language Models

Archimedes4mo30

For a sentient, sapient entity, this would have been a very bad position to be put into, and any possible behaviour would have been criticised - because the AI either does not obey humans, or obeys them and does something evil, both of which are concerning.

I agree. This paper gives me the gut feeling of "gotcha journalism", whether justified or not.

This is just a surface-level reaction though. I recommend Zvi's post that digs into the discussion from Scott Alexander, the authors, and others. There's a lot of nuance in framing and interpreting the paper.

Implications of the inference scaling paradigm for AI safety

Archimedes4mo10

Did you mean to link to my specific comment for the first link?

3Ryan Kidd4mo

Ah, that's a mistake. Our bad.

Human takeover might be worse than AI takeover

Archimedes4mo43

The main difference in my mind is that a human can never be as powerful as potential ASI and cannot dominate humanity without the support of sufficiently many cooperative humans. For a given power level, I agree that humans are likely scarier than an AI of that power level. The scary part about AI is that their power level isn't bounded by human biological constraints and the capacity to do harm or good is correlated with power level. Thus AI is more likely to produce extinction-level dangers as tail risk relative to humans even if it's more likely to be aligned on average.

6Tom Davidson4mo

But a human could instruct an aligned ASI to help it take over and do a lot of damage

What is the most impressive game LLMs can play well?

Archimedes4moΩ590

Related question: What is the least impressive game current LLMs struggle with?

I’ve heard they’re pretty bad at Tic Tac Toe.

3Vanessa Kosoy4mo

Relevant link