Hey,

On anti-trust laws, see this comment. I also hope to have more to share soon

I asked Claude how relevant this is to protecting something like a H100, here are the parts that seem most relevant from my limited understanding:

What the paper actually demonstrates:

1.⁠ ⁠Reading (not modifying) data from antifuse memory in a Raspberry Pi RP2350 microcontroller
2.⁠ ⁠Using Focused Ion Beam (FIB) and passive voltage contrast to extract information

Key differences between this and modifying an H100 GPU:

3D Transistor Structures: Modern 5nm chips use FinFET or GAAFET 3D structures rather than planar transistors. The critical parts are buried within the structure, making them fundamentally more difficult to access without destroying them.
Atomic-Scale Limitations: At 5nm, we're approaching atomic limits (silicon atoms are ~0.2nm). The physics of matter at this scale creates fundamental boundaries that better equipment cannot overcome.
Ion Beam Physics: Even with perfect equipment, ion beams create interaction volumes and damage zones that become proportionally larger compared to the target features at smaller nodes.

Yonatan Cale's Shortform

Yonatan Cale1mo10

Thanks! Is this true for a somewhat-modern chip that has at least some slight attempt at defense, or more like the chip on a raspberry pi?

Mikhail Samin's Shortform

Yonatan Cale1mo53

(Could you link to the context?)

Yonatan Cale's Shortform

Yonatan Cale1mo170

Patching security problems in big old organizations involves problems that go a lot beyond "looking at code and changing it", especially if aiming for a "strong" solution like formal verification.

TL;DR: Political problems, code that makes no sense, problems that would be easy to fix even with a simple LLM that isn't specialized on improving security.

The best public resource I know is about this is Recoding America.

Some examples iirc:

Not having a clear primary key to identify people with.
Having a website (a form) that theoretically works but doesn't run on any browser that people actually use.
Having a security component which is supposed to catch fake submissions of forms, but is way more likely to catch real submissions (it is imo net negative).

I also learned some surprising things from working on fixing/rewriting a major bank in Israel. I can't share such juicy stories as Recoding America publicly, but here are some that I can:

"written in kobol" is maybe ~1% of the problem and imo not an interesting pain point to focus on
Many systems are microservices (much harder to define the expected functionality of an async system)
1. Different parts of the bank are written using different technologies
An example problem you'd find in the code is "representing amounts of money using javascript floats"
1. It's not complicated to fix, technically
2. But people might say "we're used to using floats, why change it?"
3. Making changes might mean they'll have to run manual tests, so they'll be against it
4. Some teams just don't like having others touch their code, and maybe some care about their job security
Another example easy thing to fix is "use enums"
1. I've seen more than one conversations where "agreeing to use an enum" was a big political debate
Sometimes, to understand a system, I'd do a combination of "looking at the code", "looking at the docs (word documents with the design)", and "talking to one or more of the people in charge of the system" (sometimes different people have different parts of the picture)

[written with the hope that orgs trying to patch security problems will do well]

Yonatan Cale's Shortform

Yonatan Cale1mo10

I want the tool to proactively suggest things while working on the document, optimizing for "low friction for getting lots of comments from the LLM". The tool you suggested does optimize for this property very well

Yonatan Cale's Shortform

Yonatan Cale1mo10

This is very cool, thanks!
1. I'm tempted to add Claude support
It isn't exactly what I'm going for. Example use cases I have in mind:
1. "Here's a list of projects I'm considering working on, and I'm adding curxes/considerations for each"
2. "Here's my new alignment research agenda" (can an AI suggest places where this research is wrong? Seems like checking this would help the Control agenda?)
3. "Here's a cost-effectiveness analysis of an org"

Yonatan Cale's Shortform

Yonatan Cale1mo111

Things I'd suggest to an AI lab CISO if we had 5 minutes to talk

1 minute version:

I think there are projects that can prepare the lab for moving to an air gapped network (protecting more than model weights) which would be useful to start early, would have minimal impact on developer productivity, and could be (to some extent) delegated^[1]

Extra 4 minutes:

Example categories of such projects:

Projects that take serial time but can be done without the final stage that actually hurts developer productivity
1. Toy example: Add extra ethernet cables to the building but don't use them yet
Reduce uncertainty about the problems that will be caused by a future security measure
1. Toy example: Prepare for a (partially?) air gapped network by monitoring (with consent^[2]) which domains employees use and finding alternatives to them, e.g:
  1. Wikipedia --> Download it
  2. Social media --> Buy some employees a personal use computer, see if they like it?
  3. ... each domain becomes a project to prioritize and delegate, hopefully
Projects that require "product market fit" with the engineers
1. Toy example: The lab wants a secure^[3] way to access model weights^[4]. They can try an MVP solution (github PRs?), get user feedback ("too much friction!"), and work on the next draft while the users go back to accessing the weights however they want.
  1. Note how much more it would hurt productivity if we'd wait with this project until security became critical and we'd have to force the engineers to use whatever solution we could come up with quickly. This is a common property of many projects I'd suggest.

^{^}
I'm assuming the CISO's team has limited focus, but spending this focus on delegating projects is a good deal. I'm also assuming this is a problem they're happy to solve with money.
^{^}
I endorse communicating why you want to do this and getting employee agreement, not just randomly following them
^{^}
e.g monitored
^{^}
I'm aware this example is more focused on model weights, but it felt shorter to write than other product-market-fit examples. e.g I think "experiment with opening a new office for employees who like to WFH" is more realistic for an air gapped network but was longer for me to explain

Yonatan Cale's Shortform

Yonatan Cale1mo82

I'm looking for an AI tool which feels like Google Docs but has an LLM proactively commenting/suggesting things.

(Is anyone else interested in something like this?)

Planning for Extreme AI Risks

Yonatan Cale2mo10

This post helped me notice I have incoherent beliefs:

"If MAGMA self-destructs, the other labs would look at it with confusion/pity and keep going. That's not a plan"
"MAGMA should self-destruct now even if it's not leading!"

I think I've been avoiding thinking about this.

So what do I actually expect?

If OpenAI (currently in the lead) would say "our AI did something extremely dangerous, this isn't something we know how to contain, we are shutting down and are calling other labs NOT to train over [amount of compute], and are not discussing the algorithm publicly because of fear the open source community will do this dangerous thing, and we need the government ASAP", do I expect that to help?

Maybe?

Probably nation states will steal all the models+algorithm+slack as quickly as they can, probably a huge open source movement will protest, but it still sounds possible (15%?) that the major important actors would listen to this, especially if it was accompanies by demos or so?

What if Anthropic or xAI or DeepSeek (not currently in the lead) would shut down now?

...I think they would be ignored.

Does that imply I should help advance the capabilities of the lab most likely to act as you suggest?

Does this imply I should become a major player myself, if I can? If so, should I write on my website that I'm open to a coordinated pause?

Should I give up on being a CooperateBot, given the other players have made it so overwhelmingly clear they are happy to defect?

This is painful to think about, and I'm not sure what's the right thing to do here.

Open to ideas from anyone.

Anyway, great post, thanks

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

What the paper actually demonstrates:

Key differences between this and modifying an H100 GPU:

1 minute version:

Extra 4 minutes: