This is a linkpost for On Duct Tape and Fence Posts.

Eliezer writes about fence post security. When people think to themselves "in the current system, what's the weakest point?", and then dedicate their resources to shoring up the defenses at that point, not realizing that after the first small improvement in that area, there's likely now a new weakest point somewhere else.

 

Fence post security happens preemptively, when the designers of the system fixate on the most salient aspect(s) and don't consider the rest of the system. But this sort of fixation can also happen in retrospect, in which case it manifest a little differently but has similarly deleterious effects.

Consider a car that starts shaking whenever it's driven. It's uncomfortable, so the owner gets a pillow to put on the seat. Items start falling off the dash, so they get a tray to put them in. A crack forms, so they tape over it.

I call these duct tape solutions. They address symptoms of the problem, but not the root cause. The underlying issue still exists and will continue to cause problems until it's addressed directly.1

Did you know it's illegal to trade onion futures in the United States? In 1955, some people cornered the market on onions, shorted onion futures, then flooded the market with their saved onions, causing a bunch of farmers to lose money. The government responded by banning the sale of futures contracts on onions.

Not by banning futures trading on all perishable items, which would be equally susceptible to such an exploit. Not by banning market-cornering in general, which is pretty universally disliked. By banning a futures contracts on onions specifically. So of course the next time someone wants to try such a thing, they can just do it with tomatoes.

Duct-tape fixes are common in the wake of anything that goes publicly wrong. When people get hurt, they demand change, and they pressure whoever is in charge to give it to them. But implementing a proper fix is generally more complicated (since you have to perform a root cause analysis), less visible (therefore not earning the leader any social credit), or just plain unnecessary (if the risk was already priced in). So the incentives are in favor of quickly slapping something together that superficially appears to be a solution, without regards for whether it makes sense.

Of course not all changes in the wake of a disaster are duct-tape fixes. A competent organization looks at disasters as something that gives them new information about the system in question; they then think about how they would design the system from scratch taking that information into account, and proceed from there to make changes. Proper solutions involve attempts to fix a general class of issues, not just the exact thing that failed.

  • Bad: "Screw #8463 needs to be reinforced."
  • Better: "The unexpected failure of screw #8463 demonstrates that the structural simulation we ran before construction contained a bug. Let's fix that bug and re-run the simulation, then reinforce every component that falls below the new predicted failure threshold."
  • Even better: "The fact that a single bug in our simulation software could cause a catastrophic failure is unacceptable. We need to implement multiple separate methods of advance modeling and testing that won't all fail in the same way if one of them contains a flaw."
  • Ideal: "The fact that we had such an unsafe design process in the first place means we likely have severe institutional disfunction. We need to hire some experienced safety/security professionals and give them the authority necessary to identify any other flaws that may exist in our company, including whatever processes in our leadership and hiring teams led to us not having such a security team working for us already."

As this example shows, there isn't necessarily a single objective "root cause". It's always possible to ask "why" another time, and the investigators have to choose where to cut off the analysis. So a "duct tape fix" doesn't refer to any specific level of abstraction; it refers to when the level at which someone chooses to address a problem is not appropriate for the situation, either because the level at which they addressed it is so narrow that it's obvious something else is going to go wrong, or because there exists a fix on a deeper level that wouldn't cost significantly more.

Duct tape fixes are so tempting because they're so easy up front, but often they spiral into higher costs when the cracks keep appearing and you have to keep putting on more and more pieces of duct tape.

One time I was discussing a simple program to check the precision of a decimal number, and due to floating point errors it would fail on specific inputs like 0.07. One person suggested that I should fix this by multiplying the input by an arbitrary constant and then divide this constant out at the end, recommending a particular constant that they had discovered made the program succeed on the 0.07 example I had given. I pointed out that this didn't actually fix the core problem and just shifted the errors to other numbers, such as 0.29. Their response was that I should make a list of all the numbers that were most likely to be given as inputs, and find a constant that succeeded on all the numbers in the list, resigning myself to occasional errors on the uncommon numbers.

This is not how you design a reliable computer program. Checking a number's precision is not a complicated mathematical concept, and there were various one-line fixes I could have applied that would make the function work properly on all potential input numbers, not just some of them. But this person had anchored on the first solution that came to mind, and insisted on tweaking it to cover each new special case rather than realizing that their whole approach was fundamentally flawed.

Or consider the current approach to designing AI chatbots. They have a tendency to say inappropriate things, so companies use reinforcement learning from human feedback to try to curb this behavior, where they give it examples of what not to say, and train it to avoid saying those things. Every time a new version comes out, someone discovers a new unwanted behavior, the company adds that example of what not to do to their reinforcement learning dataset, and goes "ok, all fixed!"

But of course it hasn't been fixed. Someone else is just going to find a new input prompt that leads to inappropriate behavior.

The core problem is that a large language model is a general text-prediction engine, not an agent with any particular goal system. You can tweak it by penalizing strings of text that look a certain way, and hope that once you give it enough examples it will learn to fully generalize, but this is iffy. Sure, it might work someday, like continuing to put additional screws into an unstable structure might eventually make it stop wobbling. But it hasn't worked so far, and it would be a better to understand the underlying forces at play.

Another way that duct-tape fixes manifest is when they address something that is only correlated with the problem, rather than the problem itself. Consider someone who is given a list of photos and asked to write a computer program that identifies when a photo contains a bird. The programmer notices that all the bird photos they were given contain a lot of leaves, and all of the non-bird photos contain no leaves. So they write a program that counts up the green pixels and returns "bird" if the number is high enough.

This program outputs the correct results on the example photos they were looking at, but it will fail pretty much immediately when applied to any new photo. The programmer successfully found a feature of the photos that divided them into the desired final categories, but it was not the relevant feature.

Yet people who lack a security mindset do this sort of thing all the time. I have seen people do almost exactly what I described with the green pixels, because on the dataset they were working with at the time, it looked like they were solving the problem.

This is the danger of duct-tape fixes. They lull people into a false sense of security, letting them feel like the problem has been addressed, when the real issue is still there, lurking.

New Comment
11 comments, sorted by Click to highlight new comments since:
[-]Jay258

Actually ideal:

  1. Reinforce that screw by the end of the day.
  2. Fix the modeling error by the end of the week.
  3. Develop a more robust modeling methodology over the next few months.
  4. Brainstorm ideas to improve the institutional culture (without sacrificing flexibility, because you're aware that these values require a tradeoff).  Have a proposal ready for the next board meeting.
[-]Jay31

I should have added - Determine whether this is a modeling problem or a manufacturing problem.  If the model was sound but the physical screw was faulty, you'll need an entirely different response.

Checking a number's precision correctly is quite trivial, and there were one-line fixes I could have applied that would make the function work properly on all numbers, not just some of them.

I'm really curious about what such fixes look like. In my experience, those edge cases tend to come about when there is some set of mutually incompatible desired properties of a system, the the mutual incompatibility isn't obvious. For example

  1. We want to use standard IEEE754 floating point numbers to store our data
  2. If two numbers are not equal to each other, they should not have the same string representation.
  3. The sum of two numbers should have a precision no higher than the operand with the highest precision. For example, adding 0.1 + 0.2 should yield 0.3, not 0.30000000000000004.

It turns out those are mutually incompatible requirements!

You could say "we should drop requirement 1 and use a fixed point or fraction datatype" but that's emphatically not a one line change, and has its own places where you'll run into mutually incompatible requirements.

Or you could add a "duct tape" solution like "use printf("%.2f", result) in the case where we actually ran into this problem, in which we know both operands have a 2 decimal precision, and revisit if this bug comes up again in a different context".

The sum of two numbers should have a precision no higher than the operand with the highest precision. For example, adding 0.1 + 0.2 should yield 0.3, not 0.30000000000000004.

I would argue that the precision should be capped at the lowest precision of the operands. In physics, if you add to lengths, 0.123m+0.123456m should be rounded to 0.246m.

Also, IEEE754 fundamentally does not contain information about the precision of a number. If you want to track that information correctly, you can use two floating point numbers and do interval arithmetic. There is even an IEEE standard for that nowadays. 

Of course, this comes at a cost. While monotonic functions can be converted for interval arithmetic, the general problem of finding the extremal values of a function in some high-dimensional domain is a hard problem. Of course, if you know how the function is composed out of simpler operations, you can at least find some bounds. 

 

Or you could do what physicists do (at least when they are taking lab courses) and track physical quantities with a value and a precision, and do uncertainty propagation. (This might not be 100% kosher in cases where you first calculate multiple intermediate quantities from the same measurement (whose error will thus not be independent) and continue to treat them as if they were. But that might just give you bigger errors.) Also, this relies on your function being sufficiently well-described in the region of interest by the partial derivatives at the central point. If you calculate the uncertainty of  for  using the partial derivatives you will not have fun.

In the general case I agree it's not necessarily trivial; e.g. if your program uses the whole range of decimal places to a meaningful degree, or performs calculations that can compound floating point errors up to higher decimal places. (Though I'd argue that in both of those cases pure floating point is probably not the best system to use.) In my case I knew that the intended precision of the input would never be precise enough to overlap with floating point errors, so I could just round anything past the 15th decimal place down to 0.

That makes sense. I think I may have misjudged your post, as I expected that you would classify that kind of approach as a "duct tape" approach.

Hmm, interesting. The exact choice of decimal place at which to cut off the comparison is certainly arbitrary, and that doesn't feel very elegant. My thinking is that within the constraint of using floating point numbers, there fundamentally isn't a perfect solution. Floating point notation changes some numbers into other numbers, so there are always going to be some cases where number comparisons are wrong. What we want to do is define a problem domain and check if floating point will cause problems within that domain; if it doesn't, go for it, if it does, maybe don't use floating point.

In this case my fix solves the problem for what I think is the vast majority of the most likely inputs (in particular it solves it for all the inputs that my particular program was going to get), and while it's less fundamental than e.g. using arbitrary-precision arithmetic, it does better on the cost-benefit analysis. (Just like how "completely overhaul our company" addresses things on a more fundamental level than just fixing the structural simulation, but may not be the best fix given resource constraints.)

The main purpose of my example was not to argue that my particular approach was the "correct" one, but rather to point out the flaws in the "multiply by an arbitrary constant" approach. I'll edit that line, since I think you're right that it's a little more complicated than I was making it out to be, and "trivial" could be an unfair characterization.

BTW as a concrete note, you may want to sub in 15 - ceil(log10(n)) instead of just "15", which really only matters if you're dealing with numbers above 10 (e.g. 1000 is represented as 0x408F400000000000, while the next float 0x408F400000000001 is 1000.000000000000114, which differs in the 13th decimal place).

It's duct tapes all the way down!

Duct-tape fixes are common in the wake of anything that goes publicly wrong. When people get hurt, they demand change, and they pressure whoever is in charge to give it to them. But implementing a proper fix is generally more complicated (since you have to perform a root cause analysis), less visible (therefore not earning the leader any social credit), or just plain unnecessary (if the risk was already priced in). So the incentives are in favor of quickly slapping something together that superficially appears to be a solution, without regards for whether it makes sense.

Wow, I kinda already knew this. But it had never been said so clearly and brought to the front of my mind in this way. It perfectly describes the strategies YouTube has used through its various apocalypses.

Bad: "Screw #8463 needs to be reinforced."

The best: "Book a service appointment, ask them to replace screw #8463, do a general check-up, and report all findings to the central database for all those statistical analyses that inform recalls and design improvements."