I go to Amazon, search for “air conditioner”, and sort by average customer rating. There’s a couple pages of evaporative coolers (not what I’m looking for), one used window unit (?), and then this:
Average rating: 4.7 out of 5 stars.
However, this air conditioner has a major problem. Take a look at this picture:
Key thing to notice: there is one hose going to the window. Only one.
Why is that significant?
Here’s how this air conditioner works. It sucks in some air from the room. It splits that air into two streams, and pumps heat from one stream to the other - making some air hotter, and some air cooler. The cool air, it blows back into the room. The hot air, it blows out the window.
See the problem yet?
Air is blowing out the window. In order for the room to not end up a vacuum, air has to come back into the room from outside. In practice, houses are very not airtight (we don’t want to suffocate), so air from outside will be pulled in through lots of openings throughout the house. And presumably that air being pulled in from outside is hot; one typically does not use an air conditioner on cool days.
The actual effect of this air conditioner is to make the space right in front of the air conditioner nice and cool, but fill the rest of the house with hot outdoor air. Probably not what one wants from an air conditioner!
Ok, that’s amusing, but the point of this post is not physics-101 level case studies in how not to build an air conditioner. The real fact of interest is that this is apparently the top rated new air conditioner on Amazon. How does such a bad design end up so popular?
One aspect of the story, presumably, is fake reviews. That phenomenon is itself a rich source of insight, but not the point of this post, and definitely not enough to account for the popularity of this air conditioner. The reviews shown on the product page are all “verified purchase”, and mostly 5-stars. There are only 4 one-star reviews (out of 104). If most customers noticed how bad this air conditioner is, I do not think a 4.7 rating would be sustainable. Customers actually do like this air conditioner.
And hey, this air conditioner has a lot going for it! There’s wheels on the bottom, so it’s very portable. Setup is super easy - only one hose to the window, much less fiddly than those two-hose designs where you attach one hose and the other pops off.
Sure, the air conditioner has a major problem, but it’s not a major problem which most people will notice. They may notice that most of the house is still hot, but the space right in front of the air conditioner will be cool, so obviously the air conditioner is doing its job. Very few people will realize that the air conditioner is drawing hot air into the rest of the house. (Indeed, I saw zero reviews which mentioned that the air conditioner pulls hot air into the house - even the 1-star reviewers apparently did not realize why the air conditioner was so bad.)
[EDIT: several commenters seem to think that I'm claiming this air conditioner does not work at all, so I want to clarify that it will still cool down a room on net. If the air inside is all perfectly mixed together, it will still end up cooler with the air conditioner than without. The point is not that it doesn't work at all. The point is that it's stupidly inefficient in a way which I do not think consumers would plausibly choose over the relatively-low cost of a second hose if they recognized the problems.]
Generalization
Major problems are only fixed when those problems are obvious. Problems which most people won’t notice (or won’t attribute correctly) tend to stick around. There’s no economic incentive to fix them.
And in practice, there are plenty of problems which most people won’t notice. A few more examples:
- Most charities have pretty mediocre impact. But the actual impact is very-not-visible to the person making donations, so people keep donating. (Also people care about things besides impact, but nonetheless I doubt low-impact charities would survive if their ineffectiveness were generally obvious.)
- Medical research has a replication rate below 50%. But when the effect sizes are expected to be small anyways, it’s hard to tell whether it’s working, so doctors (and patients) keep using crap treatments.
- Based on my firsthand experience with the B2B software industry, success is mostly determined by how good the product looks to managers making the decision to purchase. Successful B2B software (think “enterprise software”) is usually crap, but has great salespeople and great dashboards for the managers.
… and presumably this extends to lots of other industries which I’m less familiar with.
Two points to highlight here:
- Regulation does not fix the problem, just moves it from the consumer to the regulator. A regulator will only regulate a problem which is obvious to the regulator. A regulator may sometimes have more expertise than a layperson, but even that requires that the politicians ultimately appointing people can distinguish real from fake expertise, which is hard in general.
- Waiting longer does not fix the problem. All those people who did not notice their air conditioner pulling hot air into the house will not start noticing if we just wait a few years. Problems do not automatically become obvious over time.
How Does This Relate To Takeoff Speeds?
There’s a common view that, as long as AI does not take off too quickly, we’ll have time to see what goes wrong and iterate on it. It's a view with a lot of intuitive outside-view appeal: AI will work just like other industries. We try stuff, see what goes wrong, fix it. It worked like that in all the other industries, presumably it will work like that in AI too.
The point of the air conditioner is that other industries do not, in fact, work like that. Other industries are absolutely packed with major problems which are not fixed because they’re not obvious. Even assuming that AI does not take off quickly (itself a dubious assumption at best), we should expect the same to be true of AI.
… But Won’t Big Problems Be Obvious?
Most industries have major problems which aren’t fixed because they’re not obvious. But these problems can only be so bad. If they were really disastrous, the disasters would be obvious. Why not expect the same from AI?
Because AI will eventually be far more capable than human industries. It will, by default, optimize way harder than human industries are capable of optimizing.
What does it look like, when the optimization power is turned up to 11 on something like the air conditioner problem? Well, it looks really good. But all the resources are spent on looking good, not on actually being good. It’s “Potemkin village world”: a world designed to look amazing, but with nothing behind the facade. Maybe not even any living humans behind the facade - after all, even generally-happy real humans will inevitably sometimes appear less-than-maximally “good”.
… But Isn’t Solving The Obvious Problems Still Valuable?
The nonobvious problems are the whole reason why AI alignment is hard in the first place.
Think about the “game tree” of alignment - the basic starting points, how they fail, what strategies address the failures, how those fail, etc. The most basic starting points are generally of the form “collect data from humans on which things are good/bad, then train something to do good stuff and avoid bad stuff”. Assuming such a strategy could be implemented efficiently, why would it fail? Well:
- In cases where humans label bad things as “good”, the trained system will also be selected to label bad things as “good”. In other words, the trained AI will optimize for things which look “good'' to humans, even when those things are not very good.
- The trained system will likely end up implementing strategies which do “good”-labeled things in the training environment, but those strategies will not necessarily continue to do the things humans would consider “good” in other environments.
(Somewhat more detail on these failure modes here.) Optimizing for things which look “good” to humans obviously raises exactly the sort of failure which the air conditioner points to. Failure of systems to generalize in “good” ways is less centrally about obviousness, but note that if it were obvious that the system were going to generalize badly, this would also be a pretty easy issue to solve: just don’t deploy the system if it will generalize badly. Problem is, we can’t tell whether a system will do what we want in deployment just by looking at what it does in training; we can’t tell by looking at the system's behavior whether there’s problems in there.
Point is: problems which are highly visible to humans are already easy, from an alignment perspective. They will probably be solved by default. There’s not much marginal value in dealing with them. The value is in dealing with the problems which are hard to recognize.
Corollary: alignment is not importantly easier in slow-takeoff worlds, at least not due to the ability to iterate. The hard parts of the alignment problem are the parts where it’s nonobvious that something is wrong. That’s true regardless of how fast takeoff speeds are. And the ability to iterate does not make that hard part easier. Iteration mainly helps on the parts of the problem which were already easy anyway.
So I don't really care about takeoff speeds. The technical problems are basically similar either way.
... though admittedly I did not actually learn everything I need to know about takeoff speeds just from air conditioner ratings on Amazon. It took a lot of examples in different industries. Fortunately, there was no shortage of examples to hammer the idea into my head.
To me this is a metaphor for Alignment research, and LW-style rationality in general, but with an opposite message.
To start, I have this exact AC in my window, and it made a huge difference during last year's heat dome. (I will use metric units in the following, because eff imperial units.) It was around 39-40C last summer, some 15C above average, for a few days, and the A/C cooled the place down by about 10C, which made a difference between livable and unlivable. It was cooler all through the place, not just in the immediate vicinity of the unit.
How could this happen, in an apparent contradiction to the laws of physics?
Well, three things:
So, physics is safe! What isn't safe is the theoretical reasoning that "a single-hose AC does not work".
Why? Because of the assumptions that go into the reasoning, some stated, some unstated. For example, that outside air is necessarily hot, that there is an extra ingress of outside air due to AC, that people use this AC for detached houses, etc.
AI research is much more complicated than HVAC analysis, and so there are many more factors, assumptions and effects that go into it. Confidently proclaiming a certain outcome, like "we are all doomed!" is based on a sound analysis of incomplete data, and better data, given the bounded rationality constraint, can only be obtained iteratively through experiment and incremental improvement.
Note that the analysis can look airtight (no pun intended) from outside: in the AC example this is basic energy conservation and the continuity equation. In the dieting case it's calories-in-calories out. In AI research it's the... inability to steer the direction of recursive self-improvement or something. But that model of analysis has been falsified again and again: The Waterfall approach to software development gave way to Agile. Space X's fast iterations let it run rings around Boeing's "we carefully design it once and then it will fly".
The more complicated a problem, the more iterative an acceptable solution will be.
And you cannot pronounce something possible or impossible until you went through a lot of iterations of actually building something and gained the experience that becomes knowledge of what works and what does not. That's how Junior developers become Senior ones.
Now, note that MIRI has not built a single AI. Unlike those other companies. All their reasoning is a sound theoretical analysis... based on none or very little experimental data.
It may well be that Eliezer and Co are correct about everything. But the outside view puts them in the reference class of those who are more often wrong than right: those who rely exclusively on logic and eschew actually building things and see where they work, where they break and why.
To be fair, there are plenty of examples where theoretical reasoning is enough. For example, all kinds of Perpetual Motion machines are guaranteed to be bunk. Or EM drive-style approaches. Or reducing the entropy of a closed system. If we had ironclad experimentally tested and confirmed laws like that for AI research, we would be able to trust the theoretical conclusions. As it is, we are way too early in the process of experimenting and generalizing our experimental data into laws, as far as AI research and especially Alignment research is concerned. We may be unlucky and accidentally end up all dead. But to figure out the odds of it happening one needs to do realistic experimental alignment research: build small and progressively large AI and actually try to align it. "Debates" are all nice and fun, but no substitute for data.
So the message I get from the OP is "experiment to get the data and adjust your assumptions", or else you may end up overpaying for a super-duper HVAC system you don't need, or worse, deciding you cannot afford it and dying from a heat stroke.
That's true! When I opened the box, I first dug around looking for the second hose. Then I thought they must have made a mistake and not sent the second hose. Then eventually I noticed that the AC only had one hose-slot, and the pictures only had one hose, and I was just very confused as to why on earth someone would build a portable air conditioner with only one hose.