All of Yonatan Cale's Comments + Replies

Yonatan Cale6d20

Would you leave more anonymous feedback for people who ask for it if there was a product that did things like:

Rewrite your feedback to make it more anonymous (with an LLM?)
Aggregate your feedback with other feedback this person received, and only tell them things like "many people said you're rude"
Delay your feedback by some (random?) amount of time to make it less recognizable

I'm mostly interested to hear from people who consider leaving feedback and sometimes don't, I think it would be cool if we could make progress on solving whatever painpoint you have... (read more)

Yonatan Cale3mo20

So FYI, the emoji I posted above was actually AI generated.

The real one looks like this (on my browser) :

I still think the original one looks a lot like a paperclip, but I cheated and exaggerated this in the April Fool's version

Yonatan Cale3mo630

Seems like Unicode officially added a "person being paperclipped" emoji:

An emoji of a person with a paperclip going through their head

Here's how it looks in your browser: 🙂‍↕️

Whether they did this as a joke or to raise awareness of AI risk, I like it!

Source: https://emojipedia.org/emoji-15.1

2Yonatan Cale3mo

So FYI, the emoji I posted above was actually AI generated. The real one looks like this (on my browser) : I still think the original one looks a lot like a paperclip, but I cheated and exaggerated this in the April Fool's version

7Davidmanheim3mo

This one shows the paperclip more clearly for me; 🙂‍↔️

Yonatan Cale4mo10

Hey,

On anti-trust laws, see this comment. I also hope to have more to share soon

1lisathiergart2mo

Cool, thanks for looking into this Yonatan! I found this useful.

Yonatan Cale4mo10

I asked Claude how relevant this is to protecting something like a H100, here are the parts that seem most relevant from my limited understanding:

What the paper actually demonstrates:

1.⁠ ⁠Reading (not modifying) data from antifuse memory in a Raspberry Pi RP2350 microcontroller
2.⁠ ⁠Using Focused Ion Beam (FIB) and passive voltage contrast to extract information

Key differences between this and modifying an H100 GPU:

3D Transistor Structures: Modern 5nm chips use FinFET or GAAFET 3D structures rather than planar transistors. The critical parts are

... (read more)

1Jonathan_H4mo

That's correct. That said, chip modifications are done on the same FIB machine. The cost estimate still seems accurate to me. H100s are manufactured on TSMC's "3nm" node (a brand name), which has a Gate Pitch of 48 nm and a Metal Pitch of 24 nm. The minimum feature size is 9-12nm, according to Claude. You are not at the physical limitations: * Gallium-based FIB circuit edits can go down to ~7nm. * Helium-based FIB circuit edits (~3...5x more expensive than Gallium FIB) can go down even further, 1-2nm. I'd attack the silicon from the backside of the waver. Currently, nearly no chip has conductive traces or protection mechanisms on the backside. And you can directly get to the transistors & gates without needing to penetrate the interconnect structure on the front side of the waver. (Though I want to flag that I heard that there is a push to have power distribution on the backside of the waver. So this might become harder in 2-6 years.) I also want to flag that the attack we are discussing here (modifying the logic within the H100 die) is the most advanced invasive attack I can currently think of. Another simpler attack is to read out the secret key used for authentications. Or even simpler, you could replace the CEC1736 Root-of-Trust chip on the H100-PCB (which authenticates the H100 onboard flash) with a counterfeit one. A paper that further elaborates on the attack vectors is coming out in 2-3 months.

Mikhail Samin's Shortform

Yonatan Cale4mo10

Thanks! Is this true for a somewhat-modern chip that has at least some slight attempt at defense, or more like the chip on a raspberry pi?

1Yonatan Cale4mo

I asked Claude how relevant this is to protecting something like a H100, here are the parts that seem most relevant from my limited understanding: What the paper actually demonstrates: 1.⁠ ⁠Reading (not modifying) data from antifuse memory in a Raspberry Pi RP2350 microcontroller 2.⁠ ⁠Using Focused Ion Beam (FIB) and passive voltage contrast to extract information Key differences between this and modifying an H100 GPU: 1. 3D Transistor Structures: Modern 5nm chips use FinFET or GAAFET 3D structures rather than planar transistors. The critical parts are buried within the structure, making them fundamentally more difficult to access without destroying them. 2. Atomic-Scale Limitations: At 5nm, we're approaching atomic limits (silicon atoms are ~0.2nm). The physics of matter at this scale creates fundamental boundaries that better equipment cannot overcome. 3. Ion Beam Physics: Even with perfect equipment, ion beams create interaction volumes and damage zones that become proportionally larger compared to the target features at smaller nodes.

Yonatan Cale4mo53

(Could you link to the context?)

Yonatan Cale4mo170

Patching security problems in big old organizations involves problems that go a lot beyond "looking at code and changing it", especially if aiming for a "strong" solution like formal verification.

TL;DR: Political problems, code that makes no sense, problems that would be easy to fix even with a simple LLM that isn't specialized on improving security.

The best public resource I know is about this is Recoding America.

Some examples iirc:

Not having a clear primary key to identify people with.
Having a website (a form) that theoretically works but doesn't r

... (read more)

I want the tool to proactively suggest things while working on the document, optimizing for "low friction for getting lots of comments from the LLM". The tool you suggested does optimize for this property very well

2Milan W5mo

I see. Friction management / affordance landscaping is indeed very important for interface UX design.

This is very cool, thanks!
1. I'm tempted to add Claude support
It isn't exactly what I'm going for. Example use cases I have in mind:
1. "Here's a list of projects I'm considering working on, and I'm adding curxes/considerations for each"
2. "Here's my new alignment research agenda" (can an AI suggest places where this research is wrong? Seems like checking this would help the Control agenda?)
3. "Here's a cost-effectiveness analysis of an org"

2Milan W5mo

Seems like just pasting into the chat context / adding as attachments the relevant info on the default Claude web interface would work fine for those use cases.

Yonatan Cale5mo111

Things I'd suggest to an AI lab CISO if we had 5 minutes to talk

1 minute version:

I think there are projects that can prepare the lab for moving to an air gapped network (protecting more than model weights) which would be useful to start early, would have minimal impact on developer productivity, and could be (to some extent) delegated^[1]

Extra 4 minutes:

Example categories of such projects:

Projects that take serial time but can be done without the final stage that actually hurts developer productivity
1. Toy example: Add extra ethernet cables to the

... (read more)

Planning for Extreme AI Risks

Yonatan Cale5mo82

I'm looking for an AI tool which feels like Google Docs but has an LLM proactively commenting/suggesting things.

(Is anyone else interested in something like this?)

1Rasool5mo

I met someone in SF doing this but cannot remember the name of the company! If I remember I'll let you know One idea I thought would be cool related to this is to have several LLMs with different 'personalities' each giving different kinds of feedback. Eg. a 'critic', an 'aesthete', a 'layperson', so just like in Google Docs where you get comments from different people, here you can get inline feedback from different kinds of readers

4Milan W5mo

The nearest thing I can think of off the top of my head is the Pantheon interface. Probably more unconventional than what you had in mind, though.

This post helped me notice I have incoherent beliefs:

"If MAGMA self-destructs, the other labs would look at it with confusion/pity and keep going. That's not a plan"
"MAGMA should self-destruct now even if it's not leading!"

I think I've been avoiding thinking about this.

So what do I actually expect?

If OpenAI (currently in the lead) would say "our AI did something extremely dangerous, this isn't something we know how to contain, we are shutting down and are calling other labs NOT to train over [amount of compute], and are not discussing the algorithm p... (read more)

1MichaelDickens3mo

I think the right way to self-destruct isn't to shut down entirely. It's to spend all your remaining assets on safety (whether that be lobbying for regulations, or research, or whatever). This would greatly increase the total amount of money spent on safety efforts so it might help quite a lot. I do believe shutting down does have a decent chance, although not a comfortingly large one, of scaring government and/or other AI companies into taking the risks seriously.

Zach Stein-Perlman's Shortform

OpenAI already have this in their charter:

We are concerned about late-stage AGI development becoming a competitive race without time for adequate safety precautions. Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project. We will work out specifics in case-by-case agreements, but a typical triggering condition might be “a better-than-even chance of success in the next two years.”

https://x.com/teortaxesTex/status/1885401111659413590

Similar opinion here, also noting they didn't run red-teaming and persuasion evals on the actually-final-version:

2Vladimir_Nesov5mo

Asking for this is a bit pointless, since even after the actually-final-version there will be a next update for which non-automated evals won't be redone, so it's equally reasonable to do non-automated evals only on some earlier version rather than the actually-final one.

Yonatan Cale5mo2-9

I think Control has similar problems to RLHF, where both might fail to generalize out of distribution in similar ways.

This seems important because Control has a central goal of being a fallback for that kind of failure mode.

I want to share my thoughts, including something nice Control does about this problem that I think RLHF could easily incorporate (investigate failures instead of always training against them).

What do I mean by Control generalizing out of distribution:

Our Control experiment might involve giving a model 100 leetcode problems, ... (read more)

An opinion from a former lawyer

[disclaimers: they're not an anti trust lawyer and definitely don't take responsibility for this opinion, nor do I. This all might maybe be wrong and we need to speak to an actual anti-trust lawyer to get certainty. I'm not going to put any more disclaimers here, I hope I'm not also misremembering something]

So,

Having someone from the U.S government sign that they won't enforce anti trust laws isn't enough (even if the president signs), because the (e.g) president might change their mind, or the next president might enfo

MONA: Managed Myopia with Approval Feedback

To the extent you think Anthropic is a good actor, you shouldn't be pressuring good actors like them to take actions that will make them differentially less competitive than worse actors

I think an important part of how one becomes (and stays) a good actor is by being transparent about things like this.

Anthropic could at least avoid making substantially misleading statements

Yes. But also, I'm afraid that Anthropic might solve this problem by just making less statements (which seems bad). Still Yes

3ryan_greenblatt5mo

Making more statements would also be fine! I wouldn't mind if there were just clarifying statements even if the original statement had some problems. (To try to reduce the incentive for less statements, I criticized other labs for not having policies at all.)

MONA: Managed Myopia with Approval Feedback

Hey,

In the article, you measured the MONA setup against a pure RL agent.

I'm curious about measuring MONA against the overseer-AI picking the next move directly^[1]: The overseer-AI probably won't^[2] reward hack more than the MONA setup, but it's unclear to me if it would also have worse performance.

I could imagine discovering the myopic MONA agent converging on

Picking whatever the overseer-AI would pick (since those actions would seem "obviously correct" to the overseer AI and result in the most reward)
Picking actions that seem impressive but are

... (read more)

2Rohin Shah5mo

In our experiments on both Test-driven development and Loan applications you can see that the ground truth reward goes up with MONA. The ground truth reward at step 0 represents the reward the agent would obtain if it were frozen. So this looks like your option (3), assuming that the overseer and the agent were identical. (Partly this is because we are also mixing in non-AI sources of feedback, like whether the code runs and passes the tests, and whether the AI made the correct decision on the loan, but I think this is a realistic model of future AI development.) In Test-driven development the argument above isn't quite correct, because we prompted the agent to be a bad programmer but didn't do this with the reward, so the overseer is "stronger" than the agent. However this was just because the agent is already very strongly finetuned to be good at coding so there was no headroom to climb, and we wanted to demonstrate that MONA would improve things if there was headroom to climb. I would bet that if we had a powerful model that wasn't yet finetuned strongly for coding, then we would once again see your option (3). The rewards are quite easy to provide -- just whether an individual test is valid and correct -- so I think a less capable model should be able to provide them, while still getting the benefits we see in the experiments we did run.

MONA: Managed Myopia with Approval Feedback

nit: I wouldn't use a prediction market as an overseer because markets are often uninterpretable to humans, which would miss some of the point^[1].

^{^}
"we show how to get agents whose long-term plans follow strategies that humans can predict". But maybe no single human actually understands the strategy. Or maybe the traders are correctly guessing that the model's steps will somehow lead to whatever is defined as a "good outcome", even if they don't understand how, which has similar problems to the RL reward from the future that you're trying to avoid.

2Rohin Shah5mo

Discussed in the paper in Section 6.3, bullet point 3. Agreed that if you're using a prediction market it's no longer accurate to say that individual humans understand the strategy.

Yonatan Cale5mo*10

For a simple task like booking a restaurant, we could just ask the (frozen) overseer-AI to pick^[1] actions, no?

The interesting application MONA seems to be when the myopic RL agent is able to produce better suggestions than the overseer

Edit: I elaborated

^{^}
Plus maybe let the overseer observe the result and say "oops" and roll back that action, if we can implement a rollback in this context

2Rohin Shah5mo

If it were as simple as "just ask an LLM to choose actions" someone would have deployed this product a while ago. But in any case I agree this isn't the most interesting case for MONA, I talked about it because that's what Daniel asked about.

Yonatan Cale5mo31

US Gov isn't likely to sign: Seems right.

OpenAI isn't likely to sign: Seems right.

Still, I think this letter has value, especially if it has "P.S. We're making this letter because we think if everyone keeps racing then there's a noticable risk of everyone dying. We think it would be worse if only we stop, but having everyone stop would be the safest, and we think this opinion of ours should be known publicly"

Dario said (Nov 2024):

"I never liked those words [P(DOOM)], I think they're kinda wired, my view is we should measure risks as they come up, and in the meantime we should get all the economic benefits that we could get. And we should find ways to measure the risks that are effective, but are minimally disruptive to the amazing economic process that we see going on now that the last thing we want to do is slow down" [lightly edited for clarity, bold is mine]

If he doesn't believe this^[1], I think he should clarify.

Hopefully cold take: People should say what ... (read more)

https://www.nanoscopeservices.co.uk/fib-circuit-edit/

Yonatan Cale5mo20

Changing the logic of chips is possible:

h/t @Jonathan_H from TemperSec

Open question: How expensive is this, and specifically can this be done in scale for the chips of an entire data center?

2Jonathan_H4mo

TL;DR: Less than you think, likely < 1000 USD. The cost for renting such a machine (FIB) is 100-350 USD/h (depending on which university lab you choose). Some universities also offer to have one of their staff do the work for you (e.g., 165 USD/h at the University of Washington). The duration for a single modification is less than 1 hour. Additionally, there is some non-FIB preparation time, which seems to be ~1 day if you do it for one chip; see here: https://arxiv.org/pdf/2501.13276). I am currently mentoring a SPAR project that calculates more accurate numbers and maps them to specific attack scenarios. We plan to release our results in 2-3 months.

The Case Against AI Control Research

Yonatan Cale5mo53

I don't understand Control as aiming to align a super intelligence:

Control isn't expected to scale to ASI (as you noted, also see "The control approach we're imagining won't work for arbitrarily powerful AIs" here)
We don't have a plan on how to align an ASI using Control afaik
1. Ryan said around March 2024: "On (1) (not having a concrete plan for what to do with smarter systems), I think we should get such a plan". I'm still looking, this seems important.
  1. Edit: Maybe I'm wrong and Redwood does think that getting an ASI-alignment plan out of such an AI is possi

... (read more)

Maybe a good fit for Machine Intelligence Research Institute (MIRI) to flesh out and publish?

9lisathiergart5mo

Speaking in my personal capacity as research lead of TGT (and not on behalf of MIRI), I think work in this direction is potentially interesting. One difficulty with work like this are anti-trust laws, which I am not familiar with in detail but they serve to restrict industry coordination that restricts further development / competition. It might be worth looking into how exactly anti-trust laws apply to this situation, and if there are workable solutions. Organisations that might be well placed to carry out work like this might be the frontier model forum and affiliated groups, I also have some ideas we could discuss in person. I also think there might be more legal leeway for work like this to be done if it's housed within organisations (government or ngos) that are officially tasked with defining industry standards or similar.

meemi's Shortform

Yonatan Cale6mo139

We could also ask if these situations exist ("is there any funder you have that you didn't disclose?" and so on, especially around NDAs), and Epoch could respond with Yes/No/Can'tReply^[1].

Also seems relevant for other orgs.

This would only patch the kind of problems we can easily think about, but it seems to me like a good start

^{^}
I learned that trick from hpmor!

Sounds like a legit pushback that I'd add to the letter?

"if all other labs sign this letter AND the U.S government approves this agreement, then we'll stop" ?

1Yonatan Cale5mo

An opinion from a former lawyer [disclaimers: they're not an anti trust lawyer and definitely don't take responsibility for this opinion, nor do I. This all might maybe be wrong and we need to speak to an actual anti-trust lawyer to get certainty. I'm not going to put any more disclaimers here, I hope I'm not also misremembering something] So, 1. Having someone from the U.S government sign that they won't enforce anti trust laws isn't enough (even if the president signs), because the (e.g) president might change their mind, or the next president might enforce it retroactively. This is similar to the current situation with Tiktok where Trump said he wouldn't enforce the law that prevents Google from having Tiktok on their app store, but Google still didn't put Tiktok back, probably because they're afraid that someone will change their mind and retroactively enforce the law 2. I asked if the government (e.g president) could sign "we won't enforce this, and if we change our mind we'll give a 3 month notice". 1. The former-lawyer's response was to consider whether, in a case the president would change their mind immediately, this signature would hold up in court. He thinks that not, but couldn't remember an example of something similar happening (which seems relevant) 3. If the law changes (for example, to exclude this letter), that works 1. (but it's hard to pass such changes through congress) 4. If the letter is conditional on the law changing, that seems ok My interpretation of this: It seems probably possible to find a solution where signing this letter is legal, but we'd have to consult with an anti-trust lawyer. [reminder that this isn't legal advice, isn't confident, is maybe misremembered, and so on]

2Alice Blair5mo

Right now, the USG seems to very much be in [prepping for an AI arms race] mode. I hope there's some way to structure this that is both legal and does not require the explicit consent of the US government. I also somewhat worry that the US government does their own capabilities research, as hinted at in the "datacenters on federal lands" EO. I also also worry that OpenAI's culture is not sufficiently safety-minded right now to actually sign onto this; most of what I've been hearing from them is accelerationist.

Yonatan Cale6mo62

Phishing emails might have bad text on purpose, so that security-aware people won't click through, because the next stage often involves speaking to a human scammer who prefers only speaking to people^[1] who have no idea how to avoid scams.

(did you ever wonder why the generic phishing SMS you got was so bad? Couldn't they proof read their one SMS? Well, sometimes they can't, but sometimes it's probably on purpose)

This tradeoff could change if AIs could automate the stage of "speaking to a human scammer".

But also, if that stage isn't automated, then I'... (read more)

Daniel Tan's Shortform

I'm trying the same! You have my support

My drafty^[1] notes trying to understand AI Control, friendly corrections are welcome:

“Control” is separate from “Alignment”: In Alignment, you try to get the model to have your values. “Control” assumes^[2] the alignment efforts failed, and that the model is sometimes helping out (as it was trained to do), but it wants something else, and it might try to “betray” you at any moment (aka scheme).

The high level goal is to still get useful work out of this AI.

[what to do with this work? below]

In scope: AIs that could potentially cause seri... (read more)

Yonatan Cale6mo6113

Do we want to put out a letter for labs to consider signing, saying something like "if all other labs sign this letter then we'll stop"?

I heard lots of lab employees hope the other labs would slow down.

I'm not saying this is likely to work, but it seems easy and maybe we can try the easy thing? We might end up with a variation like "if all other labs sign this AND someone gets capability X AND this agreement will be enforced by Y, then we'll stop until all the labs who signed this agree it's safe to continue". Or something else. It would be nic... (read more)

1Yonatan Cale5mo

OpenAI already have this in their charter:

9ryan_greenblatt5mo

In practice, I don't think any currently existing RSP-like policy will result in a company doing this as I discuss here.

1Yonatan Cale6mo

Maybe a good fit for Machine Intelligence Research Institute (MIRI) to flesh out and publish?

5MondSemmel6mo

Law question: would such a promise among businesses, rather than an agreement mandated by / negotiated with governments, run afoul of laws related to monopolies, collusion, price gouging, or similar?

Yonatan Cale6mo*10

If verification is placed sufficiently all over the place physically, it probably can't be circumvented

Thanks! Could you say more about your confidence in this?

the chip needs some sort of persistent internal clocks or counters that can't be reset

Yes, specifically I don't want an attacker to reliably be able to reset it to whatever value it had when it sent the last challenge.

If the attacker can only reset this memory to 0 (for example, by unplugging it) - then the chip can notice that's suspicious.

Another option is a reliable wall clock (though this ... (read more)

2Vladimir_Nesov6mo

Training frontier models needs a lot of chips, situations where "a chip notices something" (and any self-destruct type things) are unimportant because you can test on fewer chips and do it differently next time. Complicated ways of circumventing verification or resetting clocks are not useful if they are too artisan, they need to be applied to chips in bulk and those chips then need to be able to work for weeks in a datacenter without further interventions (that can't be made into part of the datacenter). AI accelerator chips have 80B+ transistors, much more than an instance of certificate verification circuitry would need, so you can place multiple of them (and have them regularly recheck the certificates). There are EUV pitch metal connections several layers deep within a chip, you'd need to modify many of them all over the chip without damaging the layers above, so I expect this to be completely infeasible to do for 10K+ chips on general principle (rather than specific knowledge of how any of this works). For clocks or counters, I guess AI accelerators normally don't have any rewritable persistent memory at all, and I don't know how hard it would be to add some in a way that makes it too complicated to keep resetting automatically.

Yonatan Cale6mo*30

I agree that we won't need full video streaming, it could be compressed (most of the screen doesn't change most of the time), but I gave that as an upper bound.

If you still run local computation, you lose out on some of the advantages I mentioned.

(If remote vscode is enough for someone, I definitely won't be pushing back)

https://www.lesswrong.com/posts/jJ9Hx8ETz5gWGtypf/how-do-you-deal-w-super-stimuli?commentId=3KnBTp6wGYRfgzyF2

Some hands-on experience with software development without an internet connection, from @niplav , which seems somewhat relevant :

Yonatan Cale6mo*40

Off switch / flexheg / anti tampering:

Putting the "verifier" on the same chip as the GPU seems like an approach worth exploring as an alternative to anti-tampering (which seems hard)

I heard^[1] that changing the logic running on a chip (such as subverting an off-switch mechanism) without breaking the chip seems potentially hard^[2] even for a nation state.

If this is correct (or can be made correct?) then this seems much more promising than having a separate verifier-chip and gpu-chip with anti tampering preventing them from being separated (which s... (read more)

2Yonatan Cale5mo

Changing the logic of chips is possible: https://www.nanoscopeservices.co.uk/fib-circuit-edit/ h/t @Jonathan_H from TemperSec Open question: How expensive is this, and specifically can this be done in scale for the chips of an entire data center?

3Vladimir_Nesov6mo

Chips have 15+ metal interconnect layers, so if verification is placed sufficiently all over the place physically, it probably can't be circumvented. I'm guessing a more challenging problem is replay attacks, where the chip needs some sort of persistent internal clocks or counters that can't be reset to start in order to repeatedly reuse old (but legitimate) certificates that enabled some computations at some point in the past.

For profit AI Security startup idea:

TL;DR: A laptop that is just a remote desktop ( + why this didn't work before and how to fix that)

Why this is nice for AI Security:

Reduces the amount of GPUs that will be sent to customers
Somewhat better security for this laptop since it's a bit more like defending a cloud computer. Maybe labs would use this to make the employee computers more secure?

Network spikes: A reason this didn't work before and how to solve it

The problem: Sometimes the network will be slow for a few seconds. It's really annoying if th... (read more)

2Nathan Helm-Burger6mo

I think this gets a lot easier if you drop the idea of a 'full remote computer' and instead embrace the idea of just the key data points move. More like the remote VS Code server or Jupyter Notebook server, being accessed from a Chromebook. All work files would stay saved there, all experiments run from the server (probably by being sent as tasks to yet a third machine.) Locally, you couldn't save any files, but you could do (for instance) web browsing. The web browsing could be made extra secure in some way.

Yonatan Cale6mo150

For-profit startup idea: Better KYC for selling GPUs

I heard^[1] that right now, if a company wants to sell/resell GPUs, they don't have a good way to verify that selling to some customer wouldn't violate export controls, and that this customer will (recursively) also keep the same agreement.

There are already tools for KYC in the financial industry. They seem to accomplish their intended policy goal pretty well (economic sanctions by the U.S aren't easy for nation states to bypass), and are profitable enough that many companies exist that give KYC servi... (read more)

How do you deal w/ Super Stimuli?

Anti Tempering in a data center you control provides very different tradeoffs

I'll paint a picture for how this could naively look:

We put the GPUs in something equivalent to a military base. Someone can still break in, steal the GPU, and break the anti tempering, but (I'm assuming) using those GPUs usefully would take months, and meanwhile (for example), a war could start.

How do the tradeoffs change? What creative things could we do with our new assumptions?

Tradeoffs we don't really care about anymore:
1. We don't need the anti tampering to re

... (read more)

Yonatan Cale6mo50

Do you think this hints at "doing engineering in an air gapped network can be made somewhat reasonable"?

(I'm asking in the context of securing AI labs' development environments. Random twist, I know)

5niplav6mo

I think especially if you have a competent coding LLM in the air-gapped network, then probably yes, if you mean software engineering. The biggest bottlenecks to me look like having nice searchable documentation for libraries the LLM doesn't know well—the documentation for most projects can't easily be downloaded, and the ones for which it can be downloaded easily aren't in a nicely searchable format. Gitbook isn't universal (yet. Growth mindset). (Similar with the kiwix search—you need to basically know what you're looking for, or you won't find it. E.g. I was trying to think of the Nazi Bureaucrat who in the Sino-Japanese war in the 1930s had rescued many Chinese from getting killed in warcrimes, but couldn't think of it—until LLaMa-2-13b (chat) told me the name was John Rabe—but I've despaired over slightly more obscure questions.) A well-resourced actor could try to clone the Stackoverflow content and their search, or create embeddings for a ton of documentation pages of different software packages. That'd make it much nicer. Also, a lot of software straight up doesn't work without an internet connection—see e.g. the incident where people couldn't do arithmetic in Elm without an internet connection. Thankfully it's the exception rather than the norm.

Yonatan Cale6mo40

Oh yes the toll unit needs to be inside the GPU chip imo.

why do I let Nvidia send me new restrictive software updates?

Alternatively the key could be in the central authority that is supposed to control the off switch. (same tech tho)

Why don't I run my GPUs in an underground bunker, using the old most broken firmware?

Nvidia (or whoever signs authorization for your GPU to run) won't sign it for you if you don't update the software (and send them a proof you did it using similar methods, I can elaborate).

Yonatan Cale6mo20

The interesting/challenging technical parts seem to me:

1. Putting the logic that turns off the GPU (what you called "the toll unit") in the same chip as the GPU and not in a separate chip

2. Bonus: Instead of writing the entire logic (challenge response and so on) in advance, I think it would be better to run actual code, but only if it's signed (for example, by Nvidia), in which case they can send software updates with new creative limitations, and we don't need to consider all our ideas (limit bandwidth? limit gps location?) in advance.

Things that s... (read more)

4robo6mo

Thanks! I'm not a GPU expert either. The reason I want to spread the toll units inside GPU itself isn't to turn the GPU off -- it's to stop replay attacks. If the toll thing is in a separate chip, then the toll unit must have some way to tell the GPU "GPU, you are cleared to run". To hack the GPU, you just copy that "cleared to run" signal and send it to the GPU. The same "cleared to run" signal must always make the GPU work, unless there is something inside the GPU to make sure won't accept the same "cleared to run" signal twice. That the point of the mechanism I outline -- a way to make it so the same "cleared to run" signal for the GPU won't work twice. Hmm okay, but why do I let Nvidia send me new restrictive software updates? Why don't I run my GPUs in an underground bunker, using the old most broken firmware?

Yonatan Cale6mo20

I love the direction you're going with this business idea (and with giving Nvidia a business incentive to make "authentication" that is actually hard to subvert)!

I can imagine reasons they might not like this idea, but who knows. If I can easily suggest this to someone from Nvidia (instead of speculating myself), I'll try

I'll respond to the technical part in a separate comment because I might want to link to it >>

2Yonatan Cale6mo

The interesting/challenging technical parts seem to me: 1. Putting the logic that turns off the GPU (what you called "the toll unit") in the same chip as the GPU and not in a separate chip 2. Bonus: Instead of writing the entire logic (challenge response and so on) in advance, I think it would be better to run actual code, but only if it's signed (for example, by Nvidia), in which case they can send software updates with new creative limitations, and we don't need to consider all our ideas (limit bandwidth? limit gps location?) in advance. Things that seem obviously solvable (not like the hard part) : 3. The cryptography 4. Turning off a GPU somehow (I assume there's no need to spread many toll units, but I'm far from a GPU expert so I'd defer to you if you are)