Does “ready, fire, aim” really get us to good corrective actions?
R ecently we read about the incident where a Hawaii civil defense employee sent an Emergency Alert that a ballistic missile threat was headed for Hawaii, and that residents should seek shelter. It was a mistake, a human error, and it took 38 minutes to get it corrected. We all heard about the panic that resulted, and the predictable questions started to be asked about who should be disciplined for messing up. But we have not heard much about the questions that should have been asked.
Initial information reported that the worker sent the alert by a single misstep of selecting “Missile Alert” instead of “Test Missile Alert” from a drop-down list and sending. The employee was reassigned pending the investigation … pretty clearly signaling tentative assignment of blame, rather than undertaking an analysis of the event to find out how it was possible for it to occur. So, let’s think about what happens when we follow the “ready, fire, aim” approach to thinking about disciplinary action.
Preliminary information from the initial investigation revealed some additional key factors:
The employee was not aware this was a drill, and had been counseled twice before for confusing drills for real events.
The supervisor’s recorded message included “exercise, exercise, exercise”, but also included “this is not a drill.”
There was miscommunication between the outgoing and incoming shift supervisors around the time of the event.
The software in use does not distinguish between a testing environment and a live alert environment, and contained no safeguards to prevent a single person from initiating the missile alert.
The employee has been fired, and three others have either been suspended or resigned.
We see hints at possible corrective actions in the statement about requiring a two-person process and a software change to distinguish between drill and testing environments, but nothing definitive yet. We see a number of factors that likely played a role in the event, including the “this is not a drill” statement and miscommunication. We should also note this is the third time a similar error has been made. What happened before appears to have been counseling, and what has happened now (so far) is placing the blame on the employees. Interestingly, it appears that nothing has been done to examine and address work process issues in the face of at least two previous similar errors.
What intrigues me is how the conditions and information present at the time enabled the employee to conclude that sending a missile alert (rather than a test) was appropriate, because with the understanding that this was a real event, sending the alert was logical. Put on your investigation leader hat and think about the conversation you will have with operational leadership. They will likely want you to discuss your confidence that you understand how this event was able to occur, and how the employee was able to conclude the test was real. You will likely have to talk about your confidence that corrective actions will prevent a reoccurrence.
To help you think about how you would respond let’s take a look at how we can likely improve the quality of learning and the corrective actions by applying human & organizational performance (HOP) principles.
First, let’s look at the elephant in the room – discipline – and the long-ingrained tendency to find a human culprit who erred so we can punish him or her. The problem is that the employee likely never intended to make this mistake or cause the resulting panic. (Otherwise, we have a case of sabotage that the police should be investigating as a criminal act). If the employee never intended to make the mistake or achieve the resulting outcome, we need to ask ourselves a tough question: What part of disciplining the person for something they did not intend to do in the first place will keep the same mistake from happening again? You should be coming up with the answer “practically nothing.” In other words, discipline might make us feel better, and assure us that we took effective action, but in reality, we have done nothing to improve the work process or reduce the potential for a reoccurrence!
We should be engaging the worker involved and others who know the process to ask: “How was it possible for this event/mistake to take place?” One possible way to explore this aspect of the event is by using the substitution test. This would include a group of other employees who are familiar with the alert process, sharing the conditions and information available in this situation, and asking them to tell us how they would make decisions and take actions. We don’t know if this was done here. What we do know is that once action is taken against the employee(s) it becomes much harder to have this discussion. Why? Because the other employees often become reluctant to talk openly, especially if they think they might reach the same conclusions and take similar actions.
The action of triggering a missile alert to the public meets the criteria for a “critical step,” an action that once taken is irreversible without causing significant harm (panic in this case). Because a mistake at a critical step will cause significant harm, a work process design should have layered defenses in place, two or more depending on the risk. So, when a single employee makes the wrong drop-down selection and clicks “send,” there is no safety margin to accommodate what we know about people. It is part of the human condition to make a mistake and, left to the same conditions, it is highly likely to happen again at some point. But if our analysis of the events looks for the factors that allowed the event to occur, one option we might come up with is to redesign the system so that sending a missile alert first triggers a second message – something to the effect of “confirming send missile alert” (as opposed to a test alert). Or as suggested in the preliminary report, the process might be changed to require two people for completion, or the drill and real production parts of the software environment could be separated. This would add an additional layer of protection at precisely the point where we can anticipate a likely error at a critical step.
Let’s assume there is a decision to accept the two-person process improvement. This would give us another chance to catch the same error. Good, can you explain your confidence that two people would not reach the same mistaken conclusion when facing a similar future event? What if we add the software enhancement to separate the drill and real environment? Likely a better improvement, but both actions are better yet. Have we learned all there is to gain from this event, or is it possible there are gaps in our learning that leave a hidden on-going vulnerability for the organization? Is discipline appropriate, or did the system set our people up for failure?
That brings us back to the original question of “when is discipline appropriate?” In this case, taking disciplinary action (temporary reassignment is a form of discipline … just ask anyone to whom it has happened, often followed by later termination) before learning how the event was able to take place chills the learning potential and makes it less likely that any corrective actions will address the actual factors involved. Don’t get me wrong, there is a time and place for discipline and HOP embraces accountability. But we need to take the time to learn first. Or more colloquially, to use the ready, aim (learn), fire approach and not ready, fire, aim.
So, how do you think you would respond?