There's an awkward valley between "reasonably reliable, but with a major outage every few years in a storm or something" and "completely reliable, and you can trust your life on it" where the system is reliable enough that we stop thinking of it as something that might go away but it's not so reliable that we should.
Apropos of my other comment on SRE/complex system failure applications to writing/math, this is a known practice: if a service is too reliable for a time and has exceeded its promised 'error budget', it will be deliberately taken down to make sure the promised number of errors happen.
From ch4
An SLO is a service level objective: a target value or range of values for a service level that is measured by an SLI. A natural structure for SLOs is thus SLI ≤ target, or lower bound ≤ SLI ≤ upper bound. For example, we might decide that we will return Shakespeare search results "quickly," adopting an SLO that our average search request latency should be less than 100 milliseconds...Choosing and publishing SLOs to users sets expectations about how a service will perform. This strategy can reduce unfounded complaints to service owners about, for example, the service being slow. Without an explicit SLO, users often develop their own beliefs about desired performance, which may be unrelated to the beliefs held by the people designing and operating the service. This dynamic can lead to both over-reliance on the service, when users incorrectly believe that a service will be more available than it actually is (as happened with Chubby: see "The Global Chubby Planned Outage"), and under-reliance, when prospective users believe a system is flakier and less reliable than it actually is.
"The Global Chubby Planned Outage"
[Written by Marc Alvidrez]
Chubby [Bur06] is Google’s lock service for loosely coupled distributed systems. In the global case, we distribute Chubby instances such that each replica is in a different geographical region. Over time, we found that the failures of the global instance of Chubby consistently generated service outages, many of which were visible to end users. As it turns out, true global Chubby outages are so infrequent that service owners began to add dependencies to Chubby assuming that it would never go down. Its high reliability provided a false sense of security because the services could not function appropriately when Chubby was unavailable, however rarely that occurred.
The solution to this Chubby scenario is interesting: SRE makes sure that global Chubby meets, but does not significantly exceed, its service level objective. In any given quarter, if a true failure has not dropped availability below the target, a controlled outage will be synthesized by intentionally taking down the system. In this way, we are able to flush out unreasonable dependencies on Chubby shortly after they are added. Doing so forces service owners to reckon with the reality of distributed systems sooner rather than later.
...Don’t overachieve:
Users build on the reality of what you offer, rather than what you say you’ll supply, particularly for infrastructure services. If your service’s actual performance is much better than its stated SLO, users will come to rely on its current performance. You can avoid over-dependence by deliberately taking the system offline occasionally (Google’s Chubby service introduced planned outages in response to being overly available),18 throttling some requests, or designing the system so that it isn’t faster under light loads.
Thanks! Chubby planned outages were in fact one of the things I was thinking about in writing this, but I hadn't known that it was public outside Google.
(Quite a lot is public outside Google, I've found. It's not necessarily easy to find, but whenever I talk to Googlers or visit, I find out less than I expected. Only a few things I've been told genuinely surprised me, and honestly, I suspected them anyway. Google's transparency is considerably underrated.)
You inform the power co of the situation and they instantly have a legal liability if your power goes out.
Do you know more about how this works, or know where I could read about it? Searching online I wasn't able to find anyone talking about the system or consumer-facing docs on how to notify the power company and what to expect if you do.
Does it cover damage from natural disasters? Flooding, wind, earthquakes?
I found https://www.energymadeeasy.gov.au/sites/default/files/1519_AER Life Support DL Brochure_D02.pdf which seems to say:
You're responsible for figuring out backup power for your medical equipment
If you register with your utility they have to notify you before they turn off your power, but unexpected outages can still happen.
This doesn't sound that different from most countries? And sounds much less strict that you were describing.
Registering looks like visiting https://www.synergy.net.au/Your-home/Manage-account/Register-for-life-support or the equivalent for your utility.
I also found https://www.aemc.gov.au/sites/default/files/content/a4094ca5-dc6a-4dfb-bbe7-8aa9a3baa831/Life-Support-rule-change-RRC0009-Final-Rule-For-Publication.pdf which gives what I think are the full rules with obligations for retailers and distributors, which doesn't change my understanding from above. The only way it looks like this would have been different in Australia is that the power company would have been required to give more notice.
Specifically, they talk about: "retailer planned interruptions", "distributor planned interruptions", and "unplanned interruptions". And then they say:
The retailer can't intentionally turn off the power except by following the rules for "retailer planned interruptions", which include "4 business days written notice".
Same for the distributor, for "distributor planned interruptions"
I'm having trouble finding the official rules, but I found an example commercial contract (https://www.essentialenergy.com.au/-/media/Project/EssentialEnergy/Website/Files/Our-Network/AERApprovedDeemedHV.pdf?la=en&hash=FA1892961DBA269D0B82E2416991FF9FDFAE25DD) which has:
12.2 Distributor planned interruptions (maintenance, repair, etc)
12.2.a We may make distributor planned interruptions to the supply of Energy to the Premises for the following purposes:
12.2.a.i for the maintenance, repair or augmentation of the Transmission System or the Distribution System, including maintenance of metering equipment; or
12.2.a.ii for the installation of a New Connection or a Connection Alteration to another Customer.
12.2.b If your Energy supply will be affected by a distributor planned interruption and clause 6.4(d)(iii) does not apply:
12.2.b.i we may seek your explicit consent to the Interruption occurring on a specified date; or
12.2.b.ii we may seek your explicit consent to the Interruption occurring on any day within a specified 5 Business Day range; or
12.2.b.iii otherwise, we will give you at least 4 Business Days notice of the Interruption by mail, letterbox drop, press advertisement or other appropriate means, or as specified in the Operating Protocol for your Premises.
12.3 Unplanned Interruptions
12.3.a We may interrupt the supply of Energy to your Premises in circumstances where we consider that a Customer’s Energy installation or the Distribution System poses an immediate threat of injury or material damage to any person, property or the Distribution System, including:
12.3.a.i for unplanned maintenance or repairs; or
12.3.a.ii for health or safety reasons; or
12.3.a.iii in an Emergency; or
12.3.a.iv as required by a Relevant Authority; or
12.3.a.v to shed demand for Energy because the total demand at the relevant time exceeds the total supply available; or
12.3.a.vi to restore supply to a Customer.
12.3.b If an Unplanned Interruption is made, we will use our best endeavours to restore Energy supply to the Premises as soon as possible.
12.3.c We will make information about Unplanned Interruptions (including the nature of any Emergency and, where reasonably possible, an estimate of when Energy supply will be restored) available on a 24 hour telephone information service.
So it sounds like a shutdown like the one in CA, being for safety reasons (preventing sparking a fire) might qualify under the rules for unplanned interruptions, and so not require any notice at all.
Interesting insight - could you explain why you think they are dubious and politically motivated ? Thks
I haven't looked into it fully, but it sounds to me like after PG&E was found liable for the Camp Fire they're responding by trying to pressure the government into not having them be liable ("if we're liable then we just can't take the risk of operating on dry days with high-wind") instead of doing things to reduce the risk of fires (clearing brush).
Good point about everyone needing to have backup plans for power outages.
Another observation is: It’s crazy that few people were aware that an outage would be happening with elevated probability until the day of. It probably would have avoided $1B in marginal lost productivity to have proper communication.
You will be disappointed to learn that the electric companies all around the United States have little incentive to care about their poles leading to residential areas, because those areas use half as much power as industrial customers. So outages of a few hours after every thunderstorm are pretty common in Midwestern cities.
And yes, our society is woefully unprepared to go more than two hours without power. I really think we should be prepared for five days at all times (not that I am, but just saying). To prepare for such things would be massively expensive and radically change communities if they had to undergo regular stress tests lasting 12 hours or so.
I agree that power outages kill, in a statistical sense: some will die without A/C, some will eat bad food, etc. I disagree that humans have no responsibility over their own safety and health. Almost all services are provided on a best-efforts basis. Police aren't liable if a known criminal attacks you. FDA isn't liable for denying you a life-saving drug. Insurance companies aren't (despite appearances) money machines - any additional benefit comes with increased premiums, and they're pretty much never liable for your suffering.
You can argue that (some of) the current California outages are negligent or are negligently handled (not enough notice or assistance to those whose health is impacted). Courts can sort that out, slowly and usually in favor of the more expensive legal team.
In the meantime, if you need power, you need to have enough backup to be able to survive an outage, and to travel somewhere safer if it lasts too long. Whether insurance covers it is a separate issue, unrelated to your ultimate responsibility for yourself.
Where do you read me as saying "humans have no responsibility over their own safety and health"?
Where do you read me as saying "humans have no responsibility over their own safety and health"?
I read the passive voice in your recommendations about insurance and top-down testing as an indication that you don't think the primary responsibility for preparedness is in individuals. The lack of any recommendation for individual action (have batteries, test and replace them annually, consider whether to leave the area for long-term disruptions) is another data point toward this reading. I apologize if I misunderstood your intent.
I think individuals should take steps to be more prepared, and the main reason they don't is that the grid's reliability falls into an awkward valley where it's reliable enough that you think you can count on it but not so reliable that you should. Planned outages would help fix this, and I expect people would respond by planning.
Backup power supplies are going to be good for a given amount of time. How do you plan to regulate how much of an outage should be possible to buffer with backup energy generation?
With the dubiously motivated PG&E blackouts in California there are many stories about how lack of power is a serious problem, especially for people with medical dependencies on electricity. Examples they give include people who:
Have severe sleep apnea, and can't safely sleep without a CPAP.
Sleep on a mattress that needs continous electricity to prevent it from deflating.
Need to keep their insulin refrigerated.
Use a medicine delivery system that requires electricity every four hours to operate.
This outage was dangerous for them and others, but it also seems like a big problem that they're in a position where they need absolutely reliable grid power. Even without politically motivated outages, the grid isn't built to a standard of complete reliabilty.
There's an awkward valley between "reasonably reliable, but with a major outage every few years in a storm or something" and "completely reliable, and you can trust your life on it" where the system is reliable enough that we stop thinking of it as something that might go away but it's not so reliable that we should.
We can't get California out of this valley by investing to the point that there won't be outages; earthquakes, if nothing else, ensure that. So instead we should plan for outages, and make outages frequent enough that this planning will actually happen. Specifically:
Insurance should cover backup power supplies for medical equipment, and they should be issued by default.
When there hasn't been an outage in ~1y, there should be a test outage to uncover unknown dependencies.