All of Nathan Helm-Burger's Comments + Replies

My personal take is that projects where the funder is actively excited about them and understands the work and wants frequent reports tend to get stuff done faster... And considering the circumstances, faster seems good. So I'd recommend supporting something you find interesting and inspiring, and then keep on top of it.

In terms of groups which have their eyes on a variety of unusual and underfunded projects, I recommend both the Foresight Institute and AE Studio.

In terms of specific individuals/projects that are doing novel and interesting things, which ... (read more)

Oh, for sure mammals have emotions much like ours. Fruit flies and shrimp? Not so much. Wrong architecture, missing key pieces.

5AnthonyC
Fair enough.  I do believe it's plausible that feelings, like pain and hunger, may be old and fundamental enough to exist across phyla.  I'm much less inclined to assume emotions are so widely shared, but I wish I could be more sure either way.

I call this phenomenon a "moral illusion". You are engaging empathy circuits on behalf of an imagined other who doesn't exist. Category error. The only unhappiness is in the imaginer, not in the anthropomorphized object. I think this is likely what's going with the shrimp welfare people also. Maybe shrimp feel something, but I doubt very much that they feel anything like what the worried people project onto them. It's a thorny problem to be sure, since those empathy circuits are pretty important for helping humans not be cruel to other humans.

5AnthonyC
Mostly agreed. I have no idea how to evaluate this for most animals, but I would be very surprised if other mammals did not have subjective experiences analogous to our own for at least some feelings and emotions.
Answer by Nathan Helm-Burger*40

Update: Claude Code and s3.7 has been a significant step up for me. Previously, s3.6 was giving me about a 1.5x speedup. s3.5 more like 1.2x CC+s3.7 is solidly over 2x, with periods of more than that when working on easy well-represented tasks not in an area I know myself (e.g. Node.js)

Here's someone who seems to be getting a lot more out of Claude Code though: xjdr

i have upgraded to 4 claude code sessions working in parallel in a single tmux session, each on their own feature branch and then another tmux window with yet another claude in charge of mergi

... (read more)

This is a big deal. I keep bringing this up, and people keep saying, "Well, if that's the case, then everything is hopeless. I can't even begin to imagine how to handle a situation like that."

I do not find this an adequate response. Defeatism is not the answer here.

1Haiku
The answer is to fight as hard as humanly possible right now to get the governments of the world to shut down all frontier AI development immediately. For two years, I have heard no other plan within an order of magnitude of this in terms of viability. I still expect to die by default, but we won't get lucky without a lot of work. CPR only works 10% of the time, but it works 0% of the time when you don't do it.

If what the bad actor is trying to do with the AI is just get a clear set of instructions for a dangerous weapon, and a bit of help debugging lab errors... that costs only a trivial amount of inference compute.

5Vladimir_Nesov
In the paper, not letting weaker actors get access to frontier models and too much compute is the focus of Nonproliferation chapter. The framing in the paper suggests that in certain respects open weights models don't make nearly as much of a difference. This is useful for distinguishing between various problems that open weights models can cause, as opposed to equally associating all possible problems with them.

Finally got some time to try this. I made a few changes (with my own Claude Code), and now it's working great! Thanks!

This seems quite technologically feasible now, and I expect the outcome would mostly depend on the quality and care that went into the specific implementation. I am even more confident that if the detail of 'the comments of the bot get further tuning via feedback, so that initial flaws get corrected', then the bot would quickly (after a few hundred such feedbacks) get 'good enough' to pass most people's bars for inclusion.

8Thane Ruthenis
We should have empirical evidence about this, actually, since the LW team has been experimenting with a "virtual comments" feature. @Raemon, the EDT issue aside, were the comments any good if you forgot they're written by an LLM? Can you share a few (preferably a lot) examples?

Yes, I was in basically exactly this mindset a year ago. Since then, my hope for a sane controlled transition with humanity's hand on the tiller has been slipping. I now place more hope in a vision with less top-down "yang" (ala Carlsmith) control, and more "green"/"yin". Decentralized contracts, many players bargaining for win-win solutions, a diverse landscape of players messily stumbling forward with conflicting agendas. What if we can have a messy world and make do with well-designed contracts with peer-to-peer enforcement mechanisms? Not a free-for-al... (read more)

I feel the point by Kromem on Xitter really strikes home here.

While I do see benefits of having AIs value humanity, I also worry about this. It feels very nearby trying to create a new caste of people who want what's best for the upper castes with no concern for themselves. This seems like a much trickier philosophical position to support than wanting what's best for Society (including all people, both biological and digital). Even if you and your current employer are being careful to not create any AI that have the necessary qualities of experience such t... (read more)

5evhub
Actually, I'd be inclined to agree with Janus that current AIs probably do already have moral worth—in fact I'd guess more so than most non-human animals—and furthermore I think building AIs with moral worth is good and something we should be aiming for. I also agree that it would be better for AIs to care about all sentient beings—biological/digital/etc.—and that it would probably be bad if we ended up locked into a long-term equilibrium with some sentient beings as a permanent underclass to others. Perhaps the main place where I disagree is that I don't think this is a particularly high-stakes issue right now: if humanity can stay in control in the short-term, and avoid locking anything in, then we can deal with these sorts of long-term questions about how to best organize society post-singularity once the current acute risk period has passed.

Balrog eval has Nethack. I want to see an LLM try to beat that.

Mine is still early 2027. My timeline is unchanged by the weak showing from GPT-4.5, because my timelines were already assuming that scaling would plateau. I was also already taking RL post-training and reasoning into account. This is what I was pointing at with my Manifold Markets about post-training fine-tuning plus scaffolding resulting in a substantial capability jump. My expectation of short timelines is that just something of approximately the current capability of existing SotA models (plus reasoning and research and scaffolds and agentic iterative ... (read more)

I don't think the idea of Superwisdom / Moral RSI requires Moral Realism. Personally, I am a big fan of research being put into a Superwisdom Agenda, but I don't believe in Moral Realism. In fact, I'd be against a project which had (in my view, harmful and incorrect) assumptions about Moral Realism as a core part of its aims.

So I think you should ask yourself whether this is necessarily part of the Superwisdom Agenda, or if you could envision the agenda being at least agnostic about Moral Realism.

4cubefox
Note that Yudkowsky wasn't agnostic about it either, see his theory of moral realism here.
3welfvh
Thanks! Yes, there's lots of convergence between methods, something Joe Carlsmith also wrote about. What do you see as the strongest arguments against Moral Realism?

I mean, suicide seems much more likely to me given the circumstances... but I also would describe this as compelling evidence. Like, if he had been killed and there wasn't a fight, him being drunk makes sense as a way to have pre-rendered him helpless by someone planning to kill him? Similarly, wouldn't a cold-blooded killer be expected to be wearing gloves and to place Suchir's hand on the gun before shooting him?

6Campbell Hutcheson
I agree that a skilled murderer could try make the death look like suicide, but each of the places where the murderer would need to make the death look like a suicide would add an additional failure point with a greater chance of producing some inconsistency. On Suchir being drunk, according to his parents, he came back from a birthday trip to LA with his friends on Friday. So, this might explain why he was drunk. We don't know exactly when he got back though / whether he was drunk when he got back / whether he got drunk afterwards. 

Nice to see my team's work (Tice 2024) getting used!

4Cam
Great to see it come full circle. For the sake of nostalgia, here's the original thread that jump started the project.

Not always true. Sometimes the locks are 'real' but deliberately chosen to be easy to pick, and the magician practices picking that particular lock. This doesn't change the point much, which is that watching stage magicians is not a good way to get an idea of how hard it is to do X, for basically an value of X. Locking Picking lawyer on youtube is a fun way to learn about locks.

Desired AI safety tool: A combo translator/chat interface (e.g. custom webpage) split down the middle. On one side I can type in English, and receive English translations. On the other side is a model (I give an model name, host address, and api key). The model receives all my text translated (somehow) into a language of my specification. All the models outputs are displayed raw on the 'model' side, but then translated to English on 'my' side.

Use case: exploring and red teaming models in languages other than English

4Abhinav Pola
Courtesy of Claude Code ;) https://github.com/abhinavpola/crosstalk

Another take on the plausibility of RSI; https://x.com/jam3scampbell/status/1892521791282614643

(I think RSI soon will be a huge deal)

Have you noticed that AI companies have been opening offices in Switzerland recently? I'm excited about it.

1sanyer
Yes I've heard about it (I'm based in Switzerland myself!) I don't think it changes the situation that much though, since OpenAI, Anthropic, and Google are still mostly American-owned companies

This is exactly why the bio team for WMDP decided to deliberately include distractors involving relatively less harmful stuff. We didn't want to publicly publish a benchmark which gave a laser-focused "how to be super dangerous" score. We aimed for a fuzzier decision boundary. This brought criticism from experts at the labs who said that the benchmark included too much harmless stuff. I still think the trade-off was worthwhile.

Also worth considering is that how much an "institution" holds a view on average may not matter nearly as much as how the powerful decision makers within or above that institution feel.

There are a lot of possible plans which I can imagine some group feasibly having which would meet one of the following criteria:

  1. contains critical elements which are illegal
  2. Contains critical elements which depends on an element of surprise / misdirection
  3. Benefit from the actor bring first mover on the plan. Others can strategy copy, but can't lead.

If one of these criteria or similar applies to the plan, then you can't discuss it openly without sabotaging it. Making strategic plans with all your cards laid out on the table (whole open-ended hide theirs) makes things substantially harder.

4ozziegooen
I partially agree, but I think this must only be a small part of the issue. - I think there's a whole lot of key insights people could raise that aren't info-hazards.  - If secrecy were the main factor, I'd hope that there would be some access-controlled message boards or similar. I'd want the discussion to be intentionally happening somewhere. Right now I don't really think that's happening. I think a lot of tiny groups have their own personal ideas, but there's surprisingly little systematic and private thinking between the power players.  - I think that secrecy is often an excuse not to open ideas to feedback, and thus not be open to critique. Often, what what I see, this goes hand-in-hand with "our work just really isn't that great, but we don't want to admit it" In the last 8 years or so, I've kept on hoping there would be some secret and brilliant "master plan" around EA, explaining the lack of public strategy. I have yet to find one. The closest I know of is some over-time discussion and slack threads with people at Constellation and similar - I think these are interesting in terms of understanding the perspectives of these (powerful) people, but I don't get the impression that there's all too much comprehensiveness of genius that's being hidden.  That said, - I think that policy orgs need to be very secretive, so agree with you regarding why those orgs don't write more big-picture things.

A point in favor of evals being helpful for advancing AI capabilities: https://x.com/polynoamial/status/1887561611046756740

Noam Brown @polynoamial A lot of grad students have asked me how they can best contribute to the field of AI when they are short on GPUs and making better evals is one thing I consistently point to.

It has been pretty clearly announced to the world by various tech leaders that they are explicitly spending billions of dollars to produce "new minds vastly smarter than any person, which pose double-digit risk of killing everyone on Earth". This pronouncement has not yet incited riots. I feel like discussing whether Anthropic should be on the riot-target-list is a conversation that should happen after the OpenAI/Microsoft, DeepMind/Google, and Chinese datacenters have been burnt to the ground.

Once those datacenters have been reduced to rubble, and the chi... (read more)

People have said that to get a good prompt it's better to have a discussion with a model like o3-mini, o1, or Claude first, and clarify various details about what you are imagining, then give the whole conversation as a prompt to OA Deep Research.

Fair enough. I'm frustrated and worried, and should have phrased that more neutrally. I wanted to make stronger arguments for my point, and then partway through my comment realized I didn't feel good about sharing my thoughts.

I think the best I can do is gesture at strategy games that involve private information and strategic deception like Diplomacy and Stratego and MtG and Poker, and say that in situations with high stakes and politics and hidden information, perhaps don't take all moves made by all players at literally face value. Think a bit to yoursel... (read more)

8Mikhail Samin
The private data is, pretty consistently, Anthropic being very similar to OpenAI where it matters the most and failing to mention in private policy-related settings its publicly stated belief on the risk that smarter-than-human AI will kill everyone. 

I don't believe the nuclear bomb was truly built to not be used from the point of view of the US gov. I think that was just a lie to manipulate scientists who might otherwise have been unwilling to help.

I don't think any of the AI builders are anywhere close to "building AI not to be used". This seems even more clear than with nuclear, since AI has clear beneficial peacetime economically valuable uses.

Regulation does make things worse if you believe the regulation will fail to work as intended for one reason or another. For example, my argument that puttin... (read more)

I don't feel free to share my model, unfortunately. Hopefully someone else will chime in. I agree with your point and that this is a good question!

I am not trying to say I am certain that Anthropic is going to be net positive, just that that's my view as the higher probability.

7ozziegooen
I think it's totally fine to think that Anthropic is a net positive. Personally, right now, I broadly also think it's a net positive. I have friends on both sides of this. I'd flag though that your previous comment suggested more to me than "this is just you giving your probability" > Give me your model, with numbers, that shows supporting Anthropic to be a bad bet, or admit you are confused and that you don't actually have good advice to give anyone. I feel like there are much nicer ways to phase that last bit. I suspect that this is much of the reason you got disagreement points. 

I'm pretty sure that measures of the persuasiveness of a model which focus on text are going to greatly underestimate the true potential of future powerful AI.

I think a future powerful AI would need different inputs and outputs to perform at maximum persuasiveness.

Inputs

  • speech audio in
  • live video of target's face (allows for micro expression detection, pupil dilation, gaze tracking, bloodflow and heart rate tracking)
  • EEG signal would help, but is too much to expect for most cases
  • sufficiently long interaction to experiment with the individual and build
... (read more)
4Milan W
This is why I consider it bad informational hygiene to interact with current models in any modality besides text. Why pull the plug now instead of later? To prevent frog-boiling.

Well, or as is often the case, the people arguing against changes are intentionally exploiting loopholes and don't want their valuable loopholes removed.

4Screwtape
Yep, in what's possibly an excess of charity/politeness I sure was glossing "exploiting loopholes and don't want their valuable loopholes removed" as one example of where someone was having an unusual benefit. 

I don't like the idea. Here's an alternative I'd like to propose:

AI mentoring

After a user gets a post or comment rejected, have them be given the opportunity to rewrite and resubmit it with the help of an AI mentor. The AI mentor should be able to give reasonably accurate feedback, and won't accept the revision until it is clearly above a quality line.

I don't think this is currently easy to make (well), because I think it would be too hard to get current LLMs to be sufficiently accurate in LessWrong specific quality judgement and advice. If, at some poin... (read more)

3Knight Lee
I like it, it is worth a try because it could be very helpful if it works! A possible objection is that "you can't mentor others on something you suck yourself," and this would require AGI capable of making valuable LessWrong comments themselves, which may be similarly hard to automating AI research (considering the math/programming advantages of LLMs). This objection doesn't doom your idea, because even if the AI is bad at writing valuable comments, and bad at judging valuable comments written by itself, it may be good at judging the failure modes where a human writes a bad comments. It could still work and is worth a try!

Worth taking model wrapper products into account.

For example:

6Matt Goldenberg
I would REALLY like to see some head to head comparisons with you.com from a subject matter expert, which I think would go a long way in answering this question.
4Davidmanheim
Clarifying question: How, specifically? Do you mean Perplexity using the new model, or comparing the new model to Perplexity?

I think the correct way to address this is by also testing the other models with agent scaffolds that supply web search and a python interpreter.

I think it's wrong to jump to the conclusion that non-agent-finetuned models can't benefit from tools.


See for example:

Frontier Math result

https://x.com/Justin_Halford_/status/1885547672108511281

o3-mini got 32% on Frontier Math (!) when given access to use a Python tool. In an AMA, @kevinweil / @snsf (OAI) both referenced tool use w reasoning models incl retrieval (!) as a future rollout.

METR RE-bench

Model... (read more)

Good work, thanks for doing this.

For future work, you might consider looking into inference suppliers like Hyperdimensional for DeepSeek models.

Well, I upvoted your comment, which I think adds important nuance. I will also edit my shortform to explicitly say to check your comment. Hopefully, the combination of the two is not too misleading. Please add more thoughts as they occur to you about how better to frame this.

Yeah, I just found a cerebras post which claims 2100 serial tokens/sec.

Yeah, of course. Just trying to get some kind of rough idea at what point future systems will be starting from.

4Nick_Tarleton
I don't think it's an outright meaningless comparison, but I think it's bad enough that it feels misleading or net-negative-for-discourse to describe it the way your comment did. Not sure how to unpack that feeling further.

Oops, bamboozled. Thanks, I'll look into it more and edit accordingly.

[Edit 2: faaaaaaast. https://x.com/jrysana/status/1902194419190706667 ] [Edit: Please also see Nick's reply below for ways in which this framing lacks nuance and may be misleading if taken at face value.]

https://blogs.nvidia.com/blog/deepseek-r1-nim-microservice/

The DeepSeek-R1 NIM microservice can deliver up to 3,872 tokens per second on a single NVIDIA HGX H200 system.

[Edit: that's throughput including parallel batches, not serial speed! Sorry, my mistake.

Here's a claim from Cerebras of 2100 tokens/sec serial speed on Llama 80B. https://cerebras.ai/b... (read more)

4JBlack
Isn't that 110 tokens/min, or about 2 tokens/sec? (I think the tokens/word might be words/token, too)
4Nick_Tarleton
I don't see how it's possible to make a useful comparison this way; human and LLM ability profiles, and just the nature of what they're doing, are too different. An LLM can one-shot tasks that a human would need non-typing time to think about, so in that sense this underestimates the difference, but on a task that's easy for a human but the LLM can only do with a long chain of thought, it overestimates the difference. Put differently: the things that LLMs can do with one shot and no CoT imply that they can do a whole lot of cognitive work in a single forward pass, maybe a lot more than a human can ever do in the time it takes to type one word. But that cognitive work doesn't compound like a human's; it has to pass through the bottleneck of a single token, and be substantially repeated on each future token (at least without modifications like Coconut). (Edit: The last sentence isn't quite right — KV caching means the work doesn't have to all be recomputed, though I would still say it doesn't compound.)
4ryan_greenblatt
This is overall output throughput not latency (which would be output tokens per second for a single context). This just claims that you can run a bunch of parallel instances of R1.

How much of their original capital did the French nobility retain at the end of the French revolution?

How much capital (value of territorial extent) do chimpanzees retain now as compared to 20k years ago?

3LTM
I agree that in a takeover scenario where AI capabilities rush wildly ahead of human understanding or control, the ability of the world's second species to retain exclusive resource access will be limited. This is a plausible future, but it is not the only one. A lot of effort is, and very likely will continue to be, directed towards controlling frontier AI and making it as economically beneficial for its owners as possible. If this work bears fruit, a world where AI is made by people with capital for people with capital seems very likely.  The French nobility had their capital taken by force by revolutionaries in a violent takeover - the kind which AI may execute. An analogy for the kind of situation I am concerned about would be some portion of the French nobility making the laws so complex, the financial structure of the kingdom so riddled with loopholes, that the other nobles were disenfranchised by due process rather than violence. Taking this analogy further, the French king had to care about the nobility, and the nobility about the peasantry, in part because of their productive capacity. As described in The Intelligence Curse — LessWrong, when AGI renders your work valueless the incentive for the state to invest in your welfare greatly decreases. 

Anthropic ppl had also said approximately this publicly. Saying that it's too soon to make the rules, since we'd end up mispecifying due to ignorance of tomorrow's models.

7Zac Hatfield-Dodds
There's a big difference between regulation which says roughly "you must have something like an RSP", and regulation which says "you must follow these specific RSP-like requirements", and I think Mikhail is talking about the latter. I personally think the former is a good idea, and thus supported SB-1047 along with many other lab employees. It's also pretty clear to me that locking in circa-2023 thinking about RSPs would have been a serious mistake, and so I (along with many others) am generally against very specific regulations because we expect they would on net increase catastrophic risk.

Some brief reference definitions for clarifying conversations.

Consciousness:

  1. The state of being awake and aware of one's environment and existence
  2. The capacity for subjective experience and inner mental states
  3. The integrated system of all mental processes, both conscious and unconscious
  4. The "what it's like" to experience something from a first-person perspective
  5. The global workspace where different mental processes come together into awareness

Sentient:

  1. Able to have subjective sensory experiences and feelings. Having the capacity for basic emotional res
... (read more)

I have been discussing thoughts along these lines. My essay A Path to Humany Autonomy argues that we need to slow AI progress and speed up human intelligence progress. My plan for how to accomplish slowing AI progress is to use novel decentralized governance mechanisms aided by narrow AI tools. I am working on fleshing out these governance ideas in a doc. Happy to share.

Well... One problem here is that a model could be superhuman at:

  • thinking speed
  • math
  • programming
  • flight simulators
  • self-replication
  • cyberattacks
  • strategy games
  • acquiring and regurgitating relevant information from science articles

And be merely high-human-level at:

  • persuasion
  • deception
  • real world strategic planning
  • manipulating robotic actuators
  • developing weapons (e.g. bioweapons)
  • wetlab work
  • research
  • acquiring resources
  • avoiding government detection of its illicit activities

Such an entity as described could absolutely be an existential threat to hum... (read more)

4Thane Ruthenis
I agree. I think you don't even need most of the stuff on the "superhuman" list, the equivalent of a competent IQ-130 human upload probably does it, as long as it has the speed + self-copying advantages.
Load More