I agree that people can easily fail to fix alignment problems, and can instead paper over them, even given a long time to iterate. But I'm not really convinced about your analogy with single-hose air conditioners.
Physics:
The air coming out of the exhaust is often quite a bit hotter than the outside air. I've never checked myself, but just googling has many people reporting 130+ degree temperatures coming out of exhaust from single-hose units. I'm not sure how hot this unit's exhaust is in particular, but I'd guess it's significantly hotter than outside air.
If exhaust is 130 and you are trying to cool from 100 to 70 you'd then only be losing 50% efficiency. Most people won't be cooling by 30 degrees so the efficiency losses would be smaller. In practice I think the actual efficiency loss relative to a 2-hose unit is more like 25-30% (see stats on top wirecutter picks below).
Discourse:
I actually think that this factor(sucking in hot air from the outside) is probably already included in the SACC (seasonally adjusted cooling capacity) and hence CEER reported for this air conditioner. I don't really know anything about air conditioners but it's discussed extensively in the definition of...
My overall take on this post and comment (after spending like 1.5 hours reading about AC design and statistics):
Overall I feel like both the OP and this reply say some wrong things. The top Wirecutter recommendation is a dual-hose design. The testing procedure of Wirecutter does not seem to address infiltration in any way, and indeed the whole article does not discuss infiltration as it relates to cooling-efficiency.
Overall efficiency loss from going to dual to single is something like 20-30%, which I do think is much lower than I think the OP implied, though it also is quite substantial, and indeed most of the top-ranked Amazon listings do not use any of the updated measurements that Paul is talking about, and so consumers do likely end up deceived about that.
The top-rated AC Wentworth links to is really very weak if you take into account those losses, and I would be surprised if it adequately cooled people's homes.
My current model: Wirecutter is doing OK but really not great here (with an actively confused testing procedure), Amazon ratings are indeed performing quite badly, and basically display most of the problems that Wentworth talks about. It's unclea...
Update: I too have now spent like 1.5 hours reading about AC design and statistics, and I can now give a reasonable guess at exactly where the I-claim-obviously-ridiculous 20-30% number came from. Summary: the SACC/CEER standards use a weighted mix of two test conditions, with 80% of the weight on conditions in which outdoor air is only 3°F/1.6°C hotter than indoor air.
The whole backstory of the DOE's SACC/CEER rating rules is here. Single-hose air conditioners take center stage. The comments on the DOE's rule proposals can basically be summarized as:
This quote in particular stands out:
...De’ Longhi [an AC manufacturer] expressed concern that modifying the AHAM PAC-1-2014 method to account for infiltration air would disproportionately impact single-duct portable AC
I still the 25-30% estimate in my original post was basically correct. I think the typical SACC adjustment for single-hose air conditioners ends up being 15%, not 25-30%. I agree this adjustment is based on generous assumptions (5.4 degrees of cooling whereas 10 seems like a more reasonable estimate). If you correct for that, you seem to get to more like 25-30%. The Goodhart effect is much smaller than this 25-30%, I still think 10% is plausible.
I admit that in total I’ve spent significantly more than 1.5 hours researching air conditioners :) So I’m planning to check out now. If you want to post something else, you are welcome to have the last word.
SACC for 1-hose AC seems to be 15% lower than similar 2-hose models, not 25-30%:
I agree the DOE estimate is too generous to 1-hose AC, though I think it’s ...
Regulation does not fix the problem, just moves it from the consumer to the regulator. A regulator will only regulate a problem which is obvious to the regulator. A regulator may sometimes have more expertise than a layperson, but even that requires that the politicians ultimately appointing people can distinguish real from fake expertise, which is hard in general.
It seems like the DOE decided to adopt energy-efficiency standards that take into account infiltration. They could easily have made a different decision (e.g. because of pressure from portable AC manufacturers, or because it's legitimately unclear how to define the standard, or because it makes measurement harder), but it wouldn't be because the issue wasn't obvious (I think it's not even anywhere close to the "failure because the issue wasn't obvious" regime).
Overall I agree with the bottom line that regulation is unlikely to help that much with alignment. But I don't think this seems like the right model of why that is or how you could fix it.
...Waiting longer does not fix the problem. All those people who did not notice their air conditioner pulling hot air into the house will not start noticing if we just wait a few
Obviously the point about air conditioners doesn't matter
I'd like to remark that, at least for me, the facts-of-the-matter about whether this particular air conditioner works by Goodharting consumer preferences actually affect my views on AI. The OP quite surprised my world model, which did not expect one of the most popular AC units on Amazon to work by deceiving consumers. If lots of the modern world works this way, then John's intuition that advanced ML systems are almost certain to work by Goodharting our preferences seems much more likely. Before seeing the above comment and jbash's comment, I was in the process of updating my views, not because I thought the OP was an enlightening allegory, but because it actually changed what I thought the world was like.
Conversely, the world model "sometimes the easiest way to achieve some objective is to actually do the intended thing instead of Goodharting" would predict that air conditioner example was wrong somehow, a prediction which seems to have been right (if Paul's and jbash's comments are correct, that is). I was quite impressed by this, and am now more confident in the "Goodharting isn't omnipresent" world model.
In any case, my main point is that I actually do care about what's going on in this air conditioning example (and I encourage further discussion on whether the OP's characterization of it is accurate or not).
I can’t believe I’m about to write a comment about air conditioners on a thread about world-ending AI, but having bought one of these one-hose systems for my apartment during a particularly hot summer I can say I was pretty disappointed with its performance.
The main drawback to the one hose system is the cool air never makes it outside the room with the unit. I tried putting a bunch of fans to blow the air to the rest of the house, but as you can imagine that didn’t work very well.
I had no idea why until I zoned out one day while thinking about the air conditioner and realized it was sucking the cold air into the intake and blowing it out of the house. And I did indeed read a bunch of reviews from Costco customers before I bought the unit, none of which mentioned the problem.
Wow, the air conditioner systematically sucking the cold air it's generated back into the intake sort of seems like another problem with this design. (Possibly the same problem in another guise, thermodynamically, but in any case, different in terms of actual produced experience.)
I apologize if this is piling on, but I would like to note that this error strikes me as very similar to another one made by the same author in this comment, and which I believe is emblematic of a certain common failure mode within the rationalist community (of which I count myself a part). This common failure mode is to over-value our own intelligence and under-value institutional knowledge (whether from the scientific community or the Amazon marketplace), and thus not feel the need to tread carefully when the two come into conflict.
In the comment in question, johnswentworth asserts, confidently, that there is nothing but correlational evidence of the role of amyloid-β in Alzheimer's disease. However, there is extensive, strong causal evidence for its role: most notably, that certain mutations in the APP, PSEN1, and PSEN2 genes deterministically (as in, there are no known exceptions for anyone living to their 80's) cause Alzheimer's disease, and the corresponding proteins are well understood structurally and functionally to be key players in the production of amyloid-β. Furthermore, the specific mutations in question are shown through multiple lines of evidence (structural analysi...
After this comment there was a long thread about AC efficiency.
Summarizing:
The reason for the adjustments were roughly:
John also attempted to measure the loss empirically, but I'd summarize as "too hard to measure":
In this particular case, I indeed do not think the conflict is worth the cost of exploring - it seems glaringly obvious that people are buying a bad product because they are unable to recognize the ways in which it is bad.
The wirecutter recommendation for budget portable ACs is a single-hose model. Until very recently their overall recommendation was also a single-hose model.
The wirecutter recommendations (and other pages discussing this tradeoffs) are based on a combination of "how cold does it make the room empirically?" and quantitative estimates of cooling that take into account infiltration. This issue is discussed extensively, with quantitative detail, by people who quite often end up recommending 1-hose designs for small rooms (like the one this AC is advertised for).
One AC unit tested by the wirecutter is convertible between 2-hose and 1-hose. They write:
...The best thing we took away from our tests was the chance at a direct comparison between a single-hose design and a dual-hose design that were otherwise identical, and our experience confirmed our suspicions that dual-hose portable ACs are slightly more effective than single-hose models but not effective enough to make a re
The best thing we took away from our tests was the chance at a direct comparison between a single-hose design and a dual-hose design that were otherwise identical, and our experience confirmed our suspicions that dual-hose portable ACs are slightly more effective than single-hose models but not effective enough to make a real difference
After having looked into this quite a bit, it does really seem like the Wirecutter testing process had no ability to notice infiltration issues, so it seems like the Wirecutter crew themselves is kind of confused here?
The... Wirecutter article does also not seem to discuss the issue of infiltration of hot air in any reasonable way. Instead it just says that:
...This produces a slight vacuum effect, which pulls in “infiltration air” from anywhere it can in order to equalize the pressure. In the presence of a gas-powered device such as a furnace, that negative pressure creates a backdraft or downdraft, which can cause the machine to malfunction—or worse, fill the room with gas fumes and carbon monoxide. We don’t think that most people plan to use their portable AC in such a room, but if your home is set up in such a way that you’re concerned ab
(Also, I expect it to seem like I am refusing to update in the face of any evidence, so I'd like to highlight that this model correctly predicted that the tests were run someplace where it was not hot outside. Had that evidence come out different, I'd be much more convinced right now that one hose vs two doesn't really matter.)
From how we tested:
Over the course of a sweltering summer week in Boston, we set up our five finalists in a roughly 250-square-foot space, taking notes and rating each model on the basic setup process, performance, portability, accessories, and overall user experience.
ETA: it's not clear that's the same testing setup used in the other tests they described. But they do talk about how the 1-vs-2 convertible unit "struggled to make the room any cooler than 70 degrees" which sounds like it was probably reasonably hot.
Does anyone in-thread (or reading along) have any experiments they'd be interested in me running with this air conditioner? It doesn't seem at all hard for me to do some science and get empirical data, with a different setup to Wirecutter, so let me know.
Added: From a skim of the thread, it seems to me the experiment that would resolve matters is testing in a large room with temperature sensors more like 15 feet away in a city or country that's very hot outside, and to compare this with (say) Wirecutter's top pick with two-hoses. Confirm?
... I actually already started a post titled "Preregistration: Air Conditioner Test (for AI Alignment!)". My plan was to use the one-hose AC I bought a few years ago during that heat wave, rig up a cardboard "second hose" for it, and try it out in my apartment both with and without the second hose next time we have a decently-hot day. Maybe we can have an air conditioner test party.
Predictions: the claim which I most do not believe right now is that going from one hose to two hose with the same air conditioner makes only a 20%-30% difference. The main metric I'm interested in is equilibrium difference between average room temp and outdoor temp (because that was the main metric relevant when I was using that AC during the heat wave). I'm at about 80% chance that the difference will be over 50%.
(Back-of-the-envelope math a few years ago said it should be roughly a factor-of-two difference, and my median expectation is close to that.)
I also expect (though less strongly) that, assuming the room's doors and windows are closed, corners of the room opposite the AC in single-hose mode will be closer to outdoor temp than to the temp 3 ft away from the AC, and that this will not be the case ...
I studied the impact of infiltration because of clothes dryers when I was doing energy efficiency consulting. The nonobvious thing that is missing from this discussion is that the infiltration flow rate does not equal the flow rate of the hot air out the window. Basically absent the exhaust flow, there is an equilibrium of infiltration through the cracks in the building equaling the exfiltration through the cracks in the building. When you have a depressurization, this increases the infiltration but also decreases the exfiltration. If the exhaust flow is a small fraction of the initial infiltration, the net impact on infiltration is approximately half as much as the exhaust flow. The rule of thumb for infiltration is it produces about 0.3 air changes per hour, but it depends on the temperature difference to the outside and the wind (and the leakiness of the building). I would guess that if you did this in a house, the exhaust flow would be relatively small compared to the natural infiltration. So roughly the impact due to the infiltration is about half as much as the calculations indicate. But if you were in a tiny tight house, then the exhaust flow would overwhelm the natural infi...
Here is the wirecutter discussion of the distinction for reference:
...Starting in 2019, we began comparing dual- and single-hose models according to the same criteria, and we didn’t dismiss any models based on their hose count. Our research, however, ultimately steered us toward single-hose portable models—in part because so many newer models use this design. In fact, we found no compelling new double-hose models from major manufacturers in 2019 or 2020 (although a few new ones cropped up in 2021, including our new top pick). Owner reviews indicate that most people prefer single-hose models, too, since they’re easier to set up and don’t look quite as much like a giant octopus trash sculpture. Although our testing has shown that dual-hose models tend to outperform some single-hose units in extremely hot or muggy weather, the difference is usually minimal, and we don’t think it outweighs the convenience of a single hose.
The one major exception, however, is if you plan on setting up your portable AC in a room with a furnace or hot water heater or anything else that uses combustion. When a single-hose AC model forces air out through its exhaust hose, it can create negative pressure in the
To me this is a metaphor for Alignment research, and LW-style rationality in general, but with an opposite message.
To start, I have this exact AC in my window, and it made a huge difference during last year's heat dome. (I will use metric units in the following, because eff imperial units.) It was around 39-40C last summer, some 15C above average, for a few days, and the A/C cooled the place down by about 10C, which made a difference between livable and unlivable. It was cooler all through the place, not just in the immediate vicinity of the unit.
How could this happen, in an apparent contradiction to the laws of physics?
Well, three things:
So, physics is safe! What isn't safe is the theoretical re...
Ok, I want to say thank you for this comment because it contains a lot of points I strongly agree with. I think the alignment community needs experimental data now more than it needs more theory. However, I don't think this lowers my opinion of MIRI. MIRI, and Eliezer before MIRI even existed yet, was predicting this problem accurately and convincingly enough that people like myself updated. 15 years ago I began studying neuroscience, neuromorphic computing, and machine learning because I believed this was going to become a much bigger deal than it was then. Now the general gist of the message has absolutely been proven out. Machine learning is now a big impressive thing in the world, and scary outcomes are right around the corner. Forecasting that now doesn't win you nearly as many points as forecasting that 15 or 20 years ago. Now we are finally close enough that it makes sense to move from theorizing to experimentation. That doesn't mean the theorizing was useless. It laid an incredible amount of valuable groundwork. It gave the experimental researchers a server of what they are up against. Laid out the scope of the problem, and made helpful pointers towards important characteri...
Um, the single-hose air conditioners do in fact work passably, probably because they're designed to minimize the volume of air exhausted compared to the amount circulated. The air you're blowing out is way hotter than the air you're drawing in. This makes the heat pump work harder, but it reduces the air exchange problem.
And a lot of structures already have huge amounts of air exchange going on anyhow. And, by the way, a lot of uncooled structures actually do run hotter on the inside than the temperature of the environment, so the air you're drawing in may not be all that hot depending on where it's coming from and when you run the machine.
And the market has noticed that the single hose design is inefficient, which is why there are two-hose ones available. In fact, if I were writing a review, I probably wouldn't bother to mention the matter because I'd assume everybody already knew about the issue. That's even though I do in fact buy two-hose models for exactly the reasons you describe.
Perhaps people are dumb, but they are not as dumb as you are making them out to be. And I think I have to add that an awful lot of "rationalists" are very fond of talking about how everything is stupid, without in fact having studied the matters in question closely enough to really be allowed opinions...
The fact that you chose to use your superior knowledge to buy the much better air conditioner, while also choosing to not leave a review explaining this, is an illustration of OP's point, and not a refutation.
Regarding the back-and-forth on air conditioners, I tried Google searching to find a precedent for this sort of analysis; the first Google result was "air conditioner single vs. dual hose" was this blog post, which acknowledges the inefficiency johnswentworth points out, overall recommends dual-hose air conditioners, but still recommends single-hose air conditioners under some conditions, and claims the efficiency difference is only about 12%.
Highlights:
...In general, a single-hose portable air conditioner is best suited for smaller rooms. The reason being is because if the area you want to cool is on the larger side, the unit will have to work much harder to cool the space.
So how does it work? The single-hose air conditioner yanks warm air and moisture from the room and expels it outside through the exhaust. A negative pressure is created when the air is pushed out of the room, the air needs to be replaced. In turn, any opening in the house like doors, windows, and cracks will draw outside hot air into the room to replace the missing air. The air is cooled by the unit and ejected into the room.
...
Additionally, the single-hose versions are usually less expensive than their dual-hose
I think I'm missing the most important part of this debate. How does the second hose help? The air outside is hot; with one hose, hot air enters the house because of the vacuum effect; with two hoses, the second hose explicitly sucks in air from the outside... which is still hot. Where is the difference?
With two hoses, the air sucked in never mixes with the cool air in the room; it's kept completely separate. Only heat is exchanged by the AC, not air.
A two hose AC does take in both indoor and outdoor air, but they never mix. (The two hoses both carry outdoor air; indoor air is pumped through two vents in the AC.) The AC just pumps heat from the indoor air to the outdoor air. Similar to a fridge.
The nonobvious problems are the whole reason why AI alignment is hard in the first place.
I disagree with the implication that there’s nothing to worry about on the “obvious problems” side.
An out-of-control AGI self-reproducing around the internet, causing chaos and blackouts etc., is an “obvious problem”. I still worry about it.
After all, consider this: an out-of-control virus self-reproducing around the human population, causing death and disability etc., is also an “obvious problem”. We already have this problem; we’ve had this problem for millennia! And yet, we haven’t solved it!
(It’s even worse than that—it’s an obvious problem with obvious mitigations, e.g. end gain-of-function research, and we’re not even doing that.)
I find it funny that there's more discussion in the comments section of the details of how single-hose air conditioners work compared to the object-level claims made in the post about the difficulty distribution of problems that are likely to come up in AI alignment.
I interpreted the air conditioning story as a fable meant to illustrate a point, not as Bayesian evidence for us to use in order to update towards a particular view. Are people here reading the post through a different lens?
No, they're trying to avoid generalizing from fictional evidence. John is offering the Fable of the Air Conditioners as an example of a particular phenomenon that he says also applies to the AI alignment problem. If his chosen example of this phenomenon is not in fact a good example of the phenomenon, then one might reasonably be less inclined to believe that the phenomenon is as common and as important as he suggests it is, and/or less inclined to believe what he says about the phenomenon.
What gives?
Some people may simply have been nerd-sniped, but the OP does seem to present the air conditioner thing as a real piece of evidence, not just a shallow illustrative analogy. When they get literal at the end, they say:
admittedly I did not actually learn everything I need to know about takeoff speeds just from air conditioner ratings on Amazon. It took a lot of examples in different industries.
Also, given that the example was presented with such high confidence, and took up a significant portion of a post that was otherwise only moderately detailed, I don't think it's unreasonable for people's confidence in the poster and the post to drop if the example turns out to be built on a misunderstanding.
(I'm not suggesting the OP was right or wrong, I have no object-level knowledge here.)
Side note: I think that most people are clueless enough of the time that Aumann should mostly be ignored. This also holds for people updating off of what I think: I do not think most readers actually have enough bits of evidence about the reliability of my reasoning that they should Aumann-style update off of it. Instead, I try to make my own reasoning process as legible as possible in my writing, so that people can directly follow the gears and update based on the inside view, rather than just trust my judgement.
What does it look like, when the optimization power is turned up to 11 on something like the air conditioner problem?
I think it looks exactly like it does now; with a lot of people getting very upset that local optimization often looks un-optimized from the global perspective.
If I needed an air-conditioner for working in my attic space, which is well-insulated from my living space and much, much hotter than either my living space or the outside air in the summer, the single-vent model would be more efficient. Indeed, it is effectively combining the m...
The Youtube channel Technology Connections explored the disadvantages of one-hose air conditioners here: https://www.youtube.com/watch?v=_-mBeYC2KGc
unfortunately I have a meeting and don't remember the conclusion.
I used this type of air conditioner for years (got it for free and needed it only a few days in a year, as I lived in colder climate). It can lower the temperature in the room for several degrees C, but not more. If outside is 30 C, it can make 25 C and it is enough.
I love this air conditioner example, not just for alignment but also as a metaphor for many other inference problems.
Data point: I cover my bedroom door with a curtain rather than closing the door, and it was clear with the air conditioner on the bedroom was lower pressure than the main room. The temperature effects are weird because my apartment can stay above outside temp for hours even with the obvious fixes done, but it was depressurizing.
Of course the commenters talking about cooling gradients vs net cooling can’t agree on an air conditioning utility function
Corollary: alignment is not importantly easier in slow-takeoff worlds, at least not due to the ability to iterate. The hard parts of the alignment problem are the parts where it’s nonobvious that something is wrong. That’s true regardless of how fast takeoff speeds are.
This is the important part and it seems wrong.
Firstly, there's going to be a community of people trying to find and fix the hard problems, and if they have longer to do that then they will be more likely to succeed.
Secondly, 'nonobvious' isn't a an all-or-nothing term. There can easily be...
This post is difficult to understand for me because of the lack of quantitative forecasts. I agree that "the technical problems are similar either way", but iterating gives you the opportunity to solve some problems more easily, and the assumption that "the only problems that matter are the ones iteration can't solve" seems unjustified. There are a lot of problems you'll catch if you have 50 years to iterate compared to only 6 months, and both of those could count as "slow takeoff" depending on your definition of "slow".
To make this more explicit, suppose ...
I feel like an important lesson to learn from analogy to air conditioners is that some technologies are bounded by physics and cannot improve quickly.(or at all). I doubt anyone has the data, but I would be surprised if average air conditioning efficiency in BTUs per Watt plotted over the 20th century is not a sigmoid.
Probably everything involving humans is inefficient, especially human values. An AI willing to erase its lifetime memories the moment they aren't needed anymore would be 1% more efficient, and hence take over the universe.
I go to Amazon, search for “air conditioner”, and sort by average customer rating. There’s a couple pages of evaporative coolers (not what I’m looking for), one used window unit (?), and then this:
Average rating: 4.7 out of 5 stars.
However, this air conditioner has a major problem. Take a look at this picture:
Key thing to notice: there is one hose going to the window. Only one.
Why is that significant?
Here’s how this air conditioner works. It sucks in some air from the room. It splits that air into two streams, and pumps heat from one stream to the other - making some air hotter, and some air cooler. The cool air, it blows back into the room. The hot air, it blows out the window.
See the problem yet?
Air is blowing out the window. In order for the room to not end up a vacuum, air has to come back into the room from outside. In practice, houses are very not airtight (we don’t want to suffocate), so air from outside will be pulled in through lots of openings throughout the house. And presumably that air being pulled in from outside is hot; one typically does not use an air conditioner on cool days.
The actual effect of this air conditioner is to make the space right in front of the air conditioner nice and cool, but fill the rest of the house with hot outdoor air. Probably not what one wants from an air conditioner!
Ok, that’s amusing, but the point of this post is not physics-101 level case studies in how not to build an air conditioner. The real fact of interest is that this is apparently the top rated new air conditioner on Amazon. How does such a bad design end up so popular?
One aspect of the story, presumably, is fake reviews. That phenomenon is itself a rich source of insight, but not the point of this post, and definitely not enough to account for the popularity of this air conditioner. The reviews shown on the product page are all “verified purchase”, and mostly 5-stars. There are only 4 one-star reviews (out of 104). If most customers noticed how bad this air conditioner is, I do not think a 4.7 rating would be sustainable. Customers actually do like this air conditioner.
And hey, this air conditioner has a lot going for it! There’s wheels on the bottom, so it’s very portable. Setup is super easy - only one hose to the window, much less fiddly than those two-hose designs where you attach one hose and the other pops off.
Sure, the air conditioner has a major problem, but it’s not a major problem which most people will notice. They may notice that most of the house is still hot, but the space right in front of the air conditioner will be cool, so obviously the air conditioner is doing its job. Very few people will realize that the air conditioner is drawing hot air into the rest of the house. (Indeed, I saw zero reviews which mentioned that the air conditioner pulls hot air into the house - even the 1-star reviewers apparently did not realize why the air conditioner was so bad.)
[EDIT: several commenters seem to think that I'm claiming this air conditioner does not work at all, so I want to clarify that it will still cool down a room on net. If the air inside is all perfectly mixed together, it will still end up cooler with the air conditioner than without. The point is not that it doesn't work at all. The point is that it's stupidly inefficient in a way which I do not think consumers would plausibly choose over the relatively-low cost of a second hose if they recognized the problems.]
Generalization
Major problems are only fixed when those problems are obvious. Problems which most people won’t notice (or won’t attribute correctly) tend to stick around. There’s no economic incentive to fix them.
And in practice, there are plenty of problems which most people won’t notice. A few more examples:
… and presumably this extends to lots of other industries which I’m less familiar with.
Two points to highlight here:
How Does This Relate To Takeoff Speeds?
There’s a common view that, as long as AI does not take off too quickly, we’ll have time to see what goes wrong and iterate on it. It's a view with a lot of intuitive outside-view appeal: AI will work just like other industries. We try stuff, see what goes wrong, fix it. It worked like that in all the other industries, presumably it will work like that in AI too.
The point of the air conditioner is that other industries do not, in fact, work like that. Other industries are absolutely packed with major problems which are not fixed because they’re not obvious. Even assuming that AI does not take off quickly (itself a dubious assumption at best), we should expect the same to be true of AI.
… But Won’t Big Problems Be Obvious?
Most industries have major problems which aren’t fixed because they’re not obvious. But these problems can only be so bad. If they were really disastrous, the disasters would be obvious. Why not expect the same from AI?
Because AI will eventually be far more capable than human industries. It will, by default, optimize way harder than human industries are capable of optimizing.
What does it look like, when the optimization power is turned up to 11 on something like the air conditioner problem? Well, it looks really good. But all the resources are spent on looking good, not on actually being good. It’s “Potemkin village world”: a world designed to look amazing, but with nothing behind the facade. Maybe not even any living humans behind the facade - after all, even generally-happy real humans will inevitably sometimes appear less-than-maximally “good”.
… But Isn’t Solving The Obvious Problems Still Valuable?
The nonobvious problems are the whole reason why AI alignment is hard in the first place.
Think about the “game tree” of alignment - the basic starting points, how they fail, what strategies address the failures, how those fail, etc. The most basic starting points are generally of the form “collect data from humans on which things are good/bad, then train something to do good stuff and avoid bad stuff”. Assuming such a strategy could be implemented efficiently, why would it fail? Well:
(Somewhat more detail on these failure modes here.) Optimizing for things which look “good” to humans obviously raises exactly the sort of failure which the air conditioner points to. Failure of systems to generalize in “good” ways is less centrally about obviousness, but note that if it were obvious that the system were going to generalize badly, this would also be a pretty easy issue to solve: just don’t deploy the system if it will generalize badly. Problem is, we can’t tell whether a system will do what we want in deployment just by looking at what it does in training; we can’t tell by looking at the system's behavior whether there’s problems in there.
Point is: problems which are highly visible to humans are already easy, from an alignment perspective. They will probably be solved by default. There’s not much marginal value in dealing with them. The value is in dealing with the problems which are hard to recognize.
Corollary: alignment is not importantly easier in slow-takeoff worlds, at least not due to the ability to iterate. The hard parts of the alignment problem are the parts where it’s nonobvious that something is wrong. That’s true regardless of how fast takeoff speeds are. And the ability to iterate does not make that hard part easier. Iteration mainly helps on the parts of the problem which were already easy anyway.
So I don't really care about takeoff speeds. The technical problems are basically similar either way.
... though admittedly I did not actually learn everything I need to know about takeoff speeds just from air conditioner ratings on Amazon. It took a lot of examples in different industries. Fortunately, there was no shortage of examples to hammer the idea into my head.