User Comment Replies

Limitations on Formal Verification for AI Safety

From what I gather reading the ACAS X paper, it formally proved a subset of the whole problem and many issues uncovered by using the formal method were further analyzed using simulations of aircraft behaviors (see the end of section 3.3). One of the assumptions in the model is that the planes react correctly to control decisions and don't have mechanical issues. The problem space and possible actions were well-defined and well-constrained in the realistic but simplified model they analyzed. I can imagine complex systems making use of provably correct... (read more)

5Steve_Omohundro10mo

I agree that would be better than what we usually have now! And is more in the "Swiss Cheese" approach to security. From a practical perspective, we are probably going to have do that for some time: components with provable properties combined in unproven ways. But every aspect which is unproven is a potential vulnerability. The deeper question is whether there are situations where it has to be that way. Where there is some barrier to modeling the entire system and formally combining correctness and security properties of components to obtain them for the whole system. Certainly there are hardware and software components whose detailed behavior is computationally complex to predict in advance (eg. searching for solutions to SAT problems or inverting hash functions). So you are unlikely to be able to prove theorems like "For every specification of these n bits in this SAT problem, it will take f(n) time to discover a satisfying value for the remaining bits". But that's fine! It's just that you shouldn't make the correctness or security of your system depend on that! For example, you might put a time bound on the search and have a failsafe path if it doesn't succeed by then. That software does have a provable time bound. So, in general, systems need to be designed to be correct and safe. If you can't put provable bounds on the safety of a system, then I would argue that you have no business exposing the public to that system. It would be great to start collecting examples of subcomponents or compositional designs which are especially difficult to prove properties about. My sense is that virtually all of these formal analyses will be done by AIs and not by humans. And I think it will be important to develop libraries and models of problems which are easy to solve formally and provide formal guarantees about. And those which are more difficult. Thinking about the software verification case, I would argue that every decently written piece of software today, the pr

Limitations on Formal Verification for AI Safety

Steve Kommrusch10mo*3-2

Steve, thanks for your explanations and discussion. I just posted a base reply about formal verification limitations within the field of computer hardware design. In that field, ignoring for now the very real issue of electrical and thermal noise, there is immense value in verifying that the symbolic 1's and 0's of the digital logic will successfully execute the similarly symbolic software instructions correctly. So the problem space is inherently simplified from the real world, and the silicon designers have incentive to build designs that are easy to tes... (read more)

5Steve_Omohundro10mo

Testing is great for a first pass! And in non-critical and non-adversarial settings, testing can give you actual probabilistic bounds. If the probability distribution of the actual uses is the same as the testing distribution (or close enough to it), then the test statistics can be used to bound the probability of errors during use. I think that is why formal methods are so rarely used in software: testing is pretty good and if errors show up, you can fix them then. Hardware has greater adoption of formal methods because it's much more expensive to fix errors after the fact. But the real problems arise from adversarial attacks. The statistical correctness of a system doesn't matter to an adversary. They are looking for the weird outlier cases which will enable them to exploit the system (eg. inputs with non-standard characters that break the parser, or super-long inputs which overflow a buffer and enable unexpected access to memory, etc.). Testing can't show the absence of flaws (unless every input is tested!). I think the increasing plague of cyberattacks is due to adversaries become more sophisticated in their search for non-standard ways of interacting with systems that expose their untested and unproven underbelly. But that kind of sophisticated attack requires highly skilled attackers and those are fortunately still rare. What is coming, however, are AI-powered cyberattack systems which know all of the standard flaws and vulnerabilities of systems, all of the published 1-day vulnerabilities, all of the latest social engineering techniques discussed on the dark web, and have full access to reverse engineering tools like Ghidra. Those AIs are likely being developed as we speak in various government labs (eg. here is a list of significant recent cybe incidents: https://www.csis.org/programs/strategic-technologies-program/significant-cyber-incidents ). How long before powerful cyberattack AIs are available on bittorrent to teenage hackers? So, I believe th

Limitations on Formal Verification for AI Safety

Steve Kommrusch10mo*81

Thanks for the study Andrew. In the field of computer hardware design, formal verification is often used on smaller parts of the design, but randomized dynamic verification (running the model and checking results) is still necessary to test corner cases in the larger design. Indeed, the idea that a complex problem can be engineered so as to be easier to formally verify is discussed in this recent paper formally verifying IEEE floating point arithmetic. In that paper, published in 2023, they report using their divide-and-conquer approach on the problem resu... (read more)

3Steve_Omohundro10mo

Yes, thanks Steve! Very interesting examples! As I understand most chip verification is based on SAT solvers and "Model Checking" https://en.wikipedia.org/wiki/Model_checking . This is a particular proof search technique which can often provide full coverage for circuits. But it has no access to any kind of sophisticated theorems such as those in the Lean mathematics library. For small circuits, that kind of technique is often fine. But as circuits get more complex, it is subject to the "state explosion problem". Looking at the floating point division paper, the double precision divider took 7 hours and 30 minutes indicating a lot of search! But one great thing about proofs is that their validity doesn't depend on how long they take to find or on how big they are. It looks like they did this verification with the Synopsys VC Formal tools (https://www.synopsys.com/verification/static-and-formal-verification/vc-formal.html ) This looks like a nice toolkit but certainly no higher level mathematical proof. It sounds like it's perfectly adequate for this task. But I wouldn't expect it to extend to the whole system very smoothly. To see what should be possible as AI theorem provers come on line, ask how the Synopsys engineers designed the whole chip to begin with. Presumably they had arguments in their heads about why their design was correct. Humans are especially bad at complex mathematical components like floating point divide, so it makes great sense to use a kind of brute force tool to back up their intuitions there. But with general reasoning capabilities (eg. as people are doing in Lean, Coq, etc.) it shouldn't be hard to formalize the engineer's intuitive understanding of why the whole chip is correct. If the engineers already have a mental proof of correctness, what is the benefit of a formal proof? I believe one of the most important aspects of proof is its social role. The proof in the engineer's head is not convincing to anyone else. They can try to expla

4Andrew Dickson10mo

Thanks Steve! I love these examples you shared. I wasn't aware of them and I agree that they do a very good job of illustrating the current capability level of formal methods versus what is being proposed for AI safety.

Fields that I reference when thinking about AI takeover prevention

Steve Kommrusch11mo10

An interesting grouping of fields. While my recent work is in with AI and machine learning, I used to work in the field of computer hardware engineering and have thought there are a lot of key parallels:
1) Critical usage test occurs after design and validation
In the case of silicon design, the fabrication of the design is quite expensive and so a lot of design techniques are included to facilitate debug after manufacturing and also a lot of effort is put in to pre-silicon validation. In the case of transformative AI, using... (read more)

Super-Exponential versus Exponential Growth in Compute Price-Performance

Steve Kommrusch1y10

The Tom's Hardware article is interesting, thanks. It makes the point that the price quoted may not include the full 'cost of revenue' for the product in that it might be the bare die price and not the tested and packaged part (yields from fabs aren't 100% so extensive functional testing of every part adds cost). The article also notes that R&D costs aren't included in that figure; the R&D for NVIDIA (and TSMC, Intel, AMD, etc) are what keep that exponential perf-per-dollar moving along.

For my own curiosity, I looked into current and past inc... (read more)

2moridinamael1y

I think it is also good to consider that it's the good-but-not-great hardware that has the best price-performance at any given point in time. The newest and best chips will always have a price premium. The chips one generation ago will be comparatively much cheaper per unit of performance. This has been generally true since I've started recording this kind of information. As I think I mentioned in another comment, I didn't mention Moore's law at all because it has relatively little to do with the price-performance trend. It certainly is easy to end up with a superexponential trend when you have an (economic) exponential trend inside a (technological) exponential trend, but as other commenters point out, the economic term itself is probably superexponential, meaning we shouldn't be surprised to see price-performance to fall more quickly than exponential even without exponential progress in chip speed.

Transformers Represent Belief State Geometry in their Residual Stream

Steve Kommrusch1y10

This is very interesting work, showing the fractal graph is a good way to visualize the predictive model being learned. I've had many conversations with folks who struggle with the idea 'the model is just predicting the next token, how can it be doing anything interesting'?. My standard response had been that conceptually the transformer model matches up tokens at the first layer (using the key and query vectors), then matches up sentences a few layers up, and then paragraphs a few layers above that; hence the model, when presented with an input, was not j... (read more)

Super-Exponential versus Exponential Growth in Compute Price-Performance

Steve Kommrusch1y10

Thanks for the interesting and thoughtful article. As a current AI researcher and former silicon chip designer, I'd suspect that our perf-per-doller is trending a bit slower than exponential now and not a hyperexponential. My first datapoint in support of this is the data from https://en.wikipedia.org/wiki/FLOPS which shows over 100X perf/dollar improvement from 1997 to 2003 (6 years), but the 100X improvement from 2003 is in 2012 (9 years), and our most recent 100X improvement (to the AMD RX 7600 the author cites) took 11 years. This aligns with TOP500 co... (read more)

1Foyle1y

Current Nvidia GPU prices are highly distorted by scarcity, with profit margins that are reportedly in the 80-90% of sale price range: https://www.tomshardware.com/news/nvidia-makes-1000-profit-on-h100-gpus-report If these were commodified to the point that scarcity didn't influence price then that $/flop point would seemingly leap up by an order of magnitude to above 1e15Flop/$1000 scraping the top of that curve, ie near brain equivalence computation power in $3.5k manufactured hardware cost, and latest Blackwell GPU has lifted that performance by another 2.5x with little extra manufacturing cost. Humans as useful economic contributors are so screwed, even with successful alignment the socioeconomic implications are beyond cataclysmic.

LESSWRONG
LW

All of Steve Kommrusch's Comments + Replies