Finding Root Cause

Finding Root Cause
The purpose of a Root Cause investigation is to identify fixable solutions for the problem being investigated, in order to prevent a reoccurrence of that problem.
When something happens, we need to know how and why it happened if we hope to prevent that something from happening again. Addressing only the surface circumstances seldom helps us avoid a reoccurrence; we need to get to the root of the problem, we need to find the root cause.
Root Cause – a working definition:
Root cause is the most basic, initiating, and correctable cause for the problem being investigated, which if eliminated, would have prevented the problem or resulted in less severe consequences.
The above working definition includes “correctable” because if the most basic initiating cause can not be corrected in a practical manner, it is not a root cause we can work with.
DMAIC, which is the acronym for Define, Measure, Analyze, Improve, and Control. DMAIC is a proven and practical method for identifying, solving, and controlling simple or complex problems. The DMAIC phases, as applied to incident analysis, are detailed below.
Define
In the Define phase it is important to clearly define the issue being analyzed in a concise Problem Statement.
The Define phase also captures the where, when, and who details of the incident and also identifies the Desired State. Desired State is the goal or acceptable level of improvement.
Human Performance issues, if known, should also be identified in the Define phase. Human Performance is the performance of the people involved with the process or problem. This method of Root Cause analysis assumes positive intent by the people involved and does not seek to blame people.
Measure
In the Measure phase a Sequence of Events timeline is identified, and Relevant Facts are collected.
Sequence of Events – Events leading up to the problem are listed in chronological order starting at a point that puts the problem in context and may include potential contributing factors and root causes.
Relevant Facts – Includes known facts and information that may be important to the issue being investigated. This data can help to clarify speculation concerning what apparently happened and, along with the timeline, identify what actually happened.
Speculation or information that can not be verified should not be captured as fact. A reasonable assumption based on other known facts can be included but should be identified as an assumption, or an event likely to have occurred. If a fact could be considered an opinion, include supporting examples.
The sequence of events and facts identified should substantiate the Problem Statement and any Human Performance issues identified. If there is a disconnect, more investigation may be necessary or the validity of the Problem Statement or Human Performance issue should be reviewed.
Analyze
In this phase the problem is analyzed to determine Root Cause. All potential causes need to be considered and all root causes should be identified. It is important to stay focused on the problem identified
The root causes are generally found in the facts that apply to the incident being investigated. A fact, that if changed or eliminated, would have prevented the problem or resulted in a less severe consequence, can be identified as a Root Cause. Root causes identified should be backed up by proven evidence and corroborated facts and need to pass the “correctable” test.
The Analyze phase can be time consuming but is arguably the most important part of the DMAIC process because this is where fixable root causes are identified.
Improve
The time spent identifying root causes will be wasted if Preventive Measures are not identified in the Improve phase of the DMAIC process.
In some cases a single Preventive Measure will address multiple root causes, while in other cases multiple measures are necessary to address a single root cause.
To be effective the Preventive Measure should be comprehensive and address all identified root causes in order to prevent a reoccurrence of the incident. Broader application of the preventive measures should also be considered if any of the root causes identified are generic in nature.
Control
Control is the last phase in the DMAIC process. In this phase a plan is formulated to implement the identified Preventive Measures. The control plan may also include monitoring the to verify the Preventive Measures have been effective in preventing a repeat or similar problem.
The Preventive Measures and the Control Plan should align to achieve the Desired State identified in the Define phase.
Corrective Actions vs. Preventive Measures – These terms are often interchanged, but for the purposes of this primer and the investigation, corrective actions are the decisions that took place to correct the problem prior to or during the investigation. Preventive Measures are things that still need to be done to prevent a reoccurrence.
Example: If the problem being investigated was a fire, putting the fire out would be corrective action that would occur before an investigation. Discovering the source of the fire, and identifying how to prevent a future fire is a preventive measure.