Wednesday, May 12, 2010

Classical Troubleshooting – A Trip Around It

Change is difficult. For manufacturers, process control and product quality are battered by incessant change. For employees, new methods dump fresh problems & strange mistakes. For customers, favorite products mutate or disappear.

Once upon a time, the pace of change was slower; before PCs, before the introduction of Six Sigma (~1987 by Motorola), before recognizing the strength of Japanese manufacturing methods. For businesses, the mix of challenges was different. Teams supporting manufacturing operations were schooled in a type of problem solving called “troubleshooting”. The purpose of troubleshooting was to rapidly restore capability. This is classical troubleshooting. This focused approach can be applied to any deviating system, process, or machine. It’s a specialized tool, and it’s very powerful.

As with all problem solving, the goal is to find the root cause. In classical troubleshooting, original performance metrics are used as both the baseline and the goal. Root cause is deduced by analyzing the details about the deviation from baseline performance.

If good alternatives are available, avoid classical troubleshooting. The goal is to rapidly resolve the deviation. Try other solutions if they would be quicker, minimal risk, & cheap. For example, recheck the service manual, use intelligent guesswork, or contact the service representative. Creating a troubleshooting team to resolve a big crisis is very disruptive. However, don’t wait too long to begin.

Describe the problem
State the problem in simple terms. What is affected? What is the observed symptom, deviation, or defect?

List symptom details
  • Provide many answers for what, where, when, and how much (including rate of change).
  • Categorize all findings into IS and IS NOT groups for contrast. This exposes knowledge gaps. For example, “product cracks were observed from Machine A” and “not from any other machine”.
  • Continually accumulate more facts. This body of knowledge can grow quickly; stay organized. As soon as possible, determine “when” the symptom began; this helps limit the data required.
  • The symptom description is never finished. Besides adding fresh facts, previous findings often need to be verified or amended.
Determine potential causes
  • When the symptom is well defined, experienced team members & subject matter experts can suggest good ideas about the root cause.
  • The distinctions between IS and IS NOT will inspire new ideas.
  • Investigate whether the timing, sequence, or interactions of the symptom can be linked to any external activities (seemingly unrelated to the problem). For example, machine problems may have begun when it was re-calibrated. If a link is found, collect more details.
  • Reconsider whether you and your team members have adequate knowledge & experience.
  • All teams include some voices that are loud. Listen, take note, and then find ways to hear the quieter comments.
Prioritize & evaluate potential causes

  • This is a thought process.
  • Which of the possible causes best fit the symptoms?
  • Take some time to analyze each cause.
  • More causes and new knowledge gaps will be suggested.
Fix the root cause

  • Attempt to resolve or isolate the likely cause(s).
  • Team leaders need to stay alert to spiraling around a root cause while the clock is ticking.  If a major system is identified as a likely cause, make a decision whether or not to exchange that entire system and evaluate the result.  When possible, remove detailed debugging & nitpicking from the critical path. 
  • Don’t waste effort testing causes that are clearly unlikely.
Start over

  • Sometimes the most likely root cause is wrong.
  • Were any discussions rushed? Were too many assumptions included (vs. facts)?
  • Reconsider whether you and your team members have sufficient knowledge & experience. New team members might be unwelcome, but fresh perspectives often help.
  • Consider circulating the problem analysis for review by other people.
  • Stay organized and coordinated.  This is not the time to try random tweaks.
Don’t start over

  • At some point, being practical is worth more than being smart. If all evidence suggests the problem is too difficult and too time consuming, then maybe you should rebuild. This is a one-way decision where repairs continue until the symptom is gone. With this plan, besides the cost, there is a risk of the original root cause recurring later.
  • My toughest extraordinary problems happened at the edge of tolerances and specifications. Designs aim at a center point. Eventually, requirements shift, tolerances accumulate, or quality standards slip. Any of these can cause an invisible problem. Only careful observation & experiments can detect it.
  • My toughest routine problems have had randomly intermittent symptoms. Nobody enjoys fixing a problem today and experiencing the same symptom tomorrow.  I wasn't comforted by knowing that intermittent problems only “seem” to be intermittent. After the true root cause is understood, the pattern wasn't random or intermittent.

No comments:

Post a Comment