As I write this, a large part of the city and state around me are without power; for many people, it’s been like this for multiple days.
The state of Texas has had a major blackout in the midst of a historic winter storm. My family slept through two very cold nights in our house without electricity or heat, waking up to indoor temps of 45ºF (7ºC) after the overnight low here in Austin plunged to a historic 7ºF (-14ºC). (For context, Austin’s average overnight low for February is 46ºF (7ºC), and freezes rarely last more than a few hours.) My family is only warm now because we made a risky drive on treacherous, hilly roads to get to relatives who’ve remained with electricity.
As a self-proclaimed student of disasters, this is the first time I’ve found myself living through one. It’s been terrifying at times. In situations where I don’t have a lot of control, my coping mechanism tends to be information acquisition, so I’ve learned more in the last 72 hours than anyone could ever want to know about power generation and distribution in the state of Texas. As always, there are some amazing lessons here for those of us who lead software engineering teams.
What happened in Texas?
It seems counter-intuitive that Texas, the heart of oil and gas production in the United States, would be unable to generate enough electricity to keep itself warm. But that’s exactly what happened in the early morning hours of Monday, February 15, 2021.
As an Arctic air mass moved southward and snow blanketed places that rarely see the stuff, heaters across the state were hard at work. Because our winters are so mild, houses in Texas are designed to shed heat in our hot summers rather than retain it in the winter. This meant furnaces had to run almost constantly to keep houses warm. Residential heating in Texas is a mix of mostly natural gas and electric heat pump forced-air furnaces, so this need for heat spiked demand on both the natural gas and electric grids. As temperatures dropped, a couple of critical things happened.
First, the electric heat pumps prevalent in Texas are only able to function above around 20ºF (-7ºC). As temperatures dropped below that point, the backup heating systems on these electric heat pumps started to kick in across the state. These backup systems use resistive heat (like the glowing orange element in your oven) and require about 3x more electricity than a heat pump. Because of this, the Texas grid quickly hit a winter demand record of 69,150 MW, exceeding the previous record by 3,200 MW.
Second, electric generation capacity plummeted. Texas ordinarily has plenty of electric capacity given all the air conditioning we need to survive our sweltering 100ºF (37ºC) summers. Our summer demand record is 74,531 MW – well above what this winter storm required – and the grid handled it well. But just as our houses aren’t designed to handle temperatures this cold, neither are our power plants or our natural gas infrastructure.
In West Texas, wind turbines froze up, pulling 2,500 MW of forecast capacity off the grid. But that was only a tiny fraction of the roughly 22,000 MW of generation that tripped offline. Coal-fueled plants were fighting coal piles frozen solid, starving their furnaces. South Texas Project Unit 1, a nuclear unit on the Texas Gulf Coast, tripped because a frozen instrument line indicated that no feed water was flowing. The massive 1,320 MW plant was offline for more than 24hrs as workers scrambled to bring the reactor back up after the automated emergency shutdown.
The bulk of lost generation capacity, though, was natural gas. On the morning of February 15th, the natural gas-fueled plants that make up around 50% of Texas’s generating capacity were struggling to get enough fuel to stay online.The need for heating had spiked residential gas demand just as gas producers were struggling to keep wells, gathering lines, and refineries from freezing up. Retail gas providers in Texas have firm delivery contracts, so when residential demand spiked and production fell, there wasn’t enough gas to go around. When power plants couldn’t buy the gas they needed, they had no choice but to go offline.
The balancing act of supply and demand
In total, Texas lost nearly 50% of available generation capacity in the early morning hours of February 15th Texas’s insistence on operating an independent electric grid, disconnected from its neighbors, meant it couldn’t import available power from out of state either The loss of generation combined with the surge in demand left the electric grid in a very dangerous place. Supply and demand on electric grids must be carefully balanced in order to maintain the 60Hz frequency of alternating current. When demand exceeds supply, the frequency will start to drop. If it drops too far, it can cause damage to the grid itself.
ERCOT, the independent grid operator in Texas, ordered ‘load shedding’ to begin at 1:23am, dropping 1,000 MW of load from the grid. In practical terms, this means that electric providers around the state began turning off whole neighborhoods to reduce demand on the grid. The initial drop in load wasn’t enough, however, as demand continued to rise and generation capacity continued to be lost. ERCOT ordered another 1,000 MW shed at 1:47am as the frequency on the grid dropped to 59.8Hz.
Four minutes later, as one generation plant after another continued to fall victim to the freezing temperatures, the frequency on the grid rapidly plummeted to a dangerous 59.3Hz. In response, ERCOT ordered another 3,000 MW of load shed at 1:51am. This was enough to stabilize the frequency, but the grid was in a very precarious place.
In order to protect equipment attached to the grid, there are circuit breakers that automatically disconnect things like generators if the frequency stays too low for too long. One of the conditions that can trigger this is a frequency below 59.4Hz for more than 9 minutes. The Texas grid had been below that threshold for nearly 5 minutes at 1:56am when ERCOT ordered another 3,500 MW of load shed, finally allowing the frequency to begin slowly climbing. It would take another 2,000 MW shed at 2:01am, bringing the total to 10,500 MW, to get the grid back up to its designed operating frequency of 60Hz.
Had those circuit breakers tripped, the grid would’ve crashed in an uncontrolled blackout, leaving large parts of Texas without electricity for multiple weeks. Avoiding that required intentionally disconnecting customers from electricity (and heat) in the middle of the coldest winter storm Texas has seen in decades. ERCOT’s original intention was to conduct rolling blackouts, but this proved impossible. By the time enough load had been shed to stabilize the grid, there were only critical circuits powering things like hospitals and water treatment plants (and the neighborhoods nearby) remaining, meaning there were no circuits that could be de-energized to bring others online. What were originally meant to be rolling outages lasting 10–40 minutes stretched on for as long as 4 days for some customers because there was such a shortage of electricity.
When temperatures warmed later in the week, generation capacity recovered and most households had power again by Friday the 19th. As the immediate crisis ended, though, the questions began. Texans wanted to know how their power grid had failed them so badly. The state had experienced cold-weather blackouts related to generation capacity back in 2011 and had resolved to fix them then, so why was the situation even worse 10 years later?
The answer largely comes down to incentives.
Historically, the Public Utility Commission of Texas was in charge of generation capacity and electric rates in the state. That changed in 1999 when then-governor George W. Bush signed a bill into law deregulating the Texas electric market. That bill, heavily influenced by Enron, created a unique ‘energy-only’ market in the state.
In typical markets, generation companies are compensated to some degree for ensuring electric capacity is available to the grid whether there’s demand for it or not. In an energy-only market like Texas, regulators instead rely on the cost of power to incentivize generation companies to make power available. During scarcity events like the recent blackouts, wholesale rates are allowed to spike from their usual $30-$40/MWh to an eye-watering $9,000/MWh – roughly a 30,000% jump. This incentive structure generally works well, as Texas has plenty of ‘peaker’ plants that only come online to produce power when demand (and therefore rates) are exceptionally high.
In normal operating conditions, however, this market structure creates a race towards the bottom for electric prices. Generation companies have to bid their power into the market, with the lowest bidders winning the right to sell their power. This results in desirable cheap wholesale power for consumers, but it perversely incentivizes generation companies to minimize ongoing maintenance and operating expenses as much as possible in order to sell power into the market competitively.
This cost-cutting meant that many generation facilities in Texas had not done the level of winterization needed to stay online in single-digit temperatures. ERCOT required power plants in Texas to publish their winterization plans in the wake of the 2011 blackouts in hopes they could prevent future outages. ERCOT has no enforcement powers, though, so they couldn’t actually require generation companies to follow through on those plans. There was no effective counter for the perverse cost-cutting incentive created by the market, and the catastrophic week-long blackouts in Texas were the result.
The power of incentives
There are some obvious lessons to be drawn from the Texas blackout about technical debt and designing systems to deal with unforeseen situations, but the lesson that I find most interesting for us as technical leaders is about incentives. As leaders, incentives are one of our primary currencies. They can be as simple as telling someone ‘Nice work!’ or as complex as multi-year performance-based equity grants. Almost every conversation you have and every action you take with your team is incentivizing something whether you realize it or not.
The problem with incentives is that they’re incredibly powerful at motivating behavior, but it’s not always the behavior you intended to motivate. The designers of the Texas energy market certainly didn’t set out to encourage generation companies to cut corners on maintenance and winterization, but that came along with the incentive towards cheap wholesale electricity. We’ve all seen the following situations in our careers:
- The team that buried itself in technical debt because the company incentivized delivering features rapidly at all costs.
- The person who stayed at a company far longer than they should’ve – dragging their teammates down with them – because they had too much equity on the line to leave.
- The organization stuck in a cycle of hero culture because leaders go out of their way to thank and reward those who work nights and weekends to bail the company out of sticky situations (resulting in those who can’t or won’t put in the extra hours leaving the company).
The lesson in the Texas blackouts is that even the best-designed, most effective incentives bring along a set of perverse ones that you need to be aware of and actively counter on an ongoing basis. When you see undesired behaviors in your team, think about your system of rewards and how it might be accidentally encouraging those outcomes. When someone goes the extra mile, consider how you can thank them without setting that as the bar for your team. Incentives are a wonderful and powerful tool. Using them well can help your team grow and stretch in all the right ways, but it requires incredible thoughtfulness and care to make sure you’re actually encouraging the things you think you are.