Read ReliaSoft's Reliability Edge, Volume 4 Issue 1 text version
ReliaSoft's Reliability Edge
Volume 4, Issue 1
Fault Tree Analysis, Reliability Block Diagrams and BlockSim FTI Edition
Fault trees and reliability block diagrams are both symbolic analytical logic techniques that can be applied to analyze system reliability and related characteristics. Although the symbols and structures of the two diagram types differ, most of the logical constructs in a fault tree diagram (FTD) can also be modeled with a reliability block diagram (RBD). Given this similarity, ReliaSoft is expanding the BlockSim software family by introducing the BlockSim FTI edition (Fault Tree Interface edition) in August of this year. The FTI edition is a complete and integrated package in which you can use fault trees or RBDs or combinations of both in your analyses. This article presents a brief introduction to fault tree analysis concepts, illustrates the similarities between fault tree diagrams and reliability block diagrams and introduces some of the capabilities of BlockSim FTI. Fault Tree Analysis: Brief Introduction Bell Telephone Laboratories developed the concept of fault tree analysis in 1962 for the U.S. Air Force for use with the Minuteman system. It was later adopted and extensively applied by the Boeing Company. A fault tree diagram follows a top-down structure and represents a graphical model of the pathways within a system that can lead to a foreseeable, undesirable loss event (or a failure). The pathways interconnect contributory events and conditions using standard logic symbols (AND, OR etc). Fault tree diagrams consist of gates and events connected with lines. The AND and OR gates are the two most commonly used gates in a fault tree. To illustrate the use of these gates, consider two events (called "input events") that can lead to another event (called the "output event"). If the occurrence of either input event causes the output event to occur, then these input events are connected using an OR gate. Alternatively, if both input events must occur in order for the output event to occur, then they are connected by an AND gate. Figure 1 shows a simple fault tree diagram in which either A or B must occur in order for the output event to occur. In this diagram, the two events are connected to an OR gate.
Please Turn to Page 8
Figure 1: Fault tree where either A or B can occur
Examining Risk Priority Numbers in FMEA
The Risk Priority Number (RPN) methodology is a technique for analyzing the risk associated with potential problems identified during a Failure Mode and Effects Analysis (FMEA). This article presents a brief overview of the basic RPN method and then examines some additional and alternative ways to use RPN ratings to evaluate the risk associated with a product or process design and to prioritize problems for corrective action. Note that this article discusses RPNs calculated at the level of the potential causes of failure (Severity x Occurrence x Detection). However, there is a great deal of variation among FMEA practitioners as to the specific analysis procedure and some analyses may include alternative calculation methods. Overview of Risk Priority Numbers An FMEA can be performed to identify the potential failure modes for a product or process. The RPN method then requires the analysis team to use past experience and engineering judgment to rate each potential problem according to three rating scales: Severity, which rates the severity of the potential effect of the failure. Occurrence, which rates the likelihood that the failure will occur. Detection, which rates the likelihood that the problem will be detected before it reaches the end-user/customer.
Please Turn to Page 14
Reliability Growth Analysis and Plots Reliability and Maintainability Analysis for a Remote Telecommunications System And much more...
Volume 4, Issue 1
In this Issue
Fault Tree Analysis, Reliability Block Diagrams and BlockSim FTI Edition Examining Risk Priority Numbers in FMEA Theoretical Resources for the Reliability Professional From the Editor's Desk Reliability Growth Analysis and Plots Criticality Analysis Reliability and Maintainability Analysis for a Remote Telecommunications System For Your Information Bulletin Board
Theoretical Resources for the Reliability Professional
In addition to world-class software for performing reliability analysis, plotting and reporting, ReliaSoft also provides an extensive array of theoretical resources for the reliability professional in a variety of formats. Training Seminars ReliaSoft provides reliability training seminars, both public and on-site, in a variety of locations throughout the year. These seminars provide instruction in reliability engineering principles and theory, as well as hands-on experience with the software tools designed to put that theory into practice. Theoretical Textbooks Whenever applicable, ReliaSoft's standard software products are shipped with theoretical reference textbooks that describe the principles and theory that underlie the analyses performed by the software. These textbooks include numerous demonstration examples. Online versions are published on www.weibull.com and additional printed copies of each reference can be purchased from ReliaSoft. Available references include: Life Data Analysis Reference Accelerated Life Testing Reference System Analysis Reference Reliability Growth Reference Knowledgeable Technical Support ReliaSoft provides free technical support for registered software users via phone, fax or e-mail. Support personnel are knowledgeable about both the software and the underlying principles and theory. Professional Consulting Services ReliaSoft Consulting Services (RCS) provides professional consulting services on an as-needed basis. Areas of expertise include reliability program development; test/experiment design; data analysis; developing, evaluating and communicating reliability requirements; equipment reliability, maintainability and availability assessment and theoretical development. weibull.com In addition to the on-line theoretical textbooks on life data analysis and other important topics, the weibull.com Web site provides thousands of pages of free resources that are updated and expanded on a continual basis. Available resources include: Quick Subject Guides and Glossary Web Links and Recommended Books Case Studies and Training Guides Discussion Forums Free Reliability Software Tools Probability Plotting Papers RS RGA 6 supports the analysis of data from reliability growth tests (both continuous times-to-failure and discrete success/failure results) using all of the major reliability growth analysis models. This includes the Crow (AMSAA) model and variations that can also be used for reliability growth projections and repairable systems analysis. Beginning on page 3, we present a brief overview of some of RGA's capabilities and introduce some of the results and plots that can be generated for your reliability growth analyses. As always, we welcome your feedback on these and other articles and appreciate your ideas for other subjects of interest. --Lisa Hacker
Page 3 Page 16
Reliability Edge is published up to four times a year. To obtain a free subscription, to send comments or to submit articles for consideration: ReliaSoft Publishing ReliaSoft Plaza 115 S. Sherwood Village Drive Tucson, AZ 85710 USA Telephone: +1.520.886.0366 Fax: +1.520.886.0399 E-mail: [email protected] Correspondence with the editor may be published, in whole or in part, in future issues of ReliaSoft publications. An electronic copy of this document can be viewed or downloaded from: www.ReliaSoft.com/newsletter For information about products and services: ReliaSoft Sales ReliaSoft Plaza 115 S. Sherwood Village Drive Tucson, AZ 85710 USA Telephone: +1.520.886.0410 Fax: +1.520.886.0399 E-mail: [email protected] Web Site: www.ReliaSoft.com This document may be reproduced without permission provided that it is not altered in any way and all pages are included in any reproduction.
©2003 ReliaSoft Corporation, ALL RIGHTS RESERVED. ReliaSoft, Weibull++, Weibull.com, Reliability Edge, ALTA, BlockSim, Xfmea and RGA are registered trademarks of ReliaSoft Corporation.
From the Editor's Desk...
This issue of Reliability Edge is once again packed with a mix of articles covering both reliability principles/theory and examples for practical application. We are particularly pleased to be able to provide some "sneak preview" information for two new software products that will be available from ReliaSoft in the next few months: BlockSim FTI and RGA 6. BlockSim FTI gives classic fault tree analysis (FTA) a boost with flexible and robust modeling and analysis capabilities achieved through integration with BlockSim's enhanced reliability block diagram (RBD) support. Beginning on page 1, we present a comparison between fault tree and RBD techniques for system analysis, including coverage of all major FTA gate and event symbols.
Volume 4, Issue 1
Reliability Growth Analysis and Plots
During the first phases of a product's development, the estimate of the product's final reliability is called the reliability goal. However, the first prototypes produced will almost certainly contain design, manufacturing and/or engineering deficiencies that prevent the product from reaching that goal. In order to identify and correct these deficiencies, prototypes are usually subjected to a rigorous testing program and appropriate corrective actions are implemented to improve the design. This structured process of finding reliability problems and monitoring the increase of the product's reliability through successive phases is called reliability growth. Until now, the software available for analyzing reliability growth data has been fairly limited. However, ReliaSoft is currently working in cooperation with Dr. Larry Crow, the premier expert in the field of reliability growth, to develop the next generation of reliability growth analysis software, RGA 6. This article presents a brief overview of the capabilities of RGA 6 (scheduled for release in Fall 2003) and an introduction to some of the results and plots that are available for reliability growth and related analyses. RGA 6 Overview The RGA 6 software includes a wide variety of features to aid you in your reliability growth analyses so that you not only can obtain the results, but also understand the results. Features include: Data entry spreadsheets support continuous (time-tofailure), discrete (success/failure) and reliability data. Data analysis with the major reliability growth models: Crow (AMSAA), Gompertz, Modified Gompertz, Logistic and Lloyd-Lipow. Maximum Likelihood Estimation (MLE) is used for parameter estimation. Projections analysis using A, B and C failure mode classifications and the definition of effectiveness factors for use in the Crow (AMSAA) Projection model. This model can be used to estimate the number of unseen failure modes, the maximum achievable reliability and other important metrics. Repairable systems (overhauls) analysis using the Crow (AMSAA) model and Dr. Crow's analysis methodology. Chi-Squared and CVM methods for goodness-of-fit testing (depending on the data type) as well as the Statistical Test for Growth. Expanded plotting capabilities and automated custom reports created in Microsoft Word and Excel. The ability to attach any type of file to the analysis. For example, you can attach an Excel file that has the original data or a Word document that contains the report based on the data analysis. Basic Analysis Results and Plots Although the results and plots that can be generated for your analysis will depend on the type of data that you have collected and the reliability growth model selected for analysis, some basic plots and results can be generated for all analyses. Figures 1 and 2 demonstrate two plots that present reliability growth results over time. Figure 1 presents the expected number of failures and Figure 2 presents the instantaneous MTBF. RGA 6's QCP also provides point estimates for these metrics given time. In addition, you can generate charts and results for the cumulative MTBF and similar output for instantaneous and cumulative failure rates.
Please Turn to Page 5
Figure 1: Expected Number of Failures vs. Time
Figure 2: Instantaneous MTBF vs. Time
Volume 4, Issue 1
Continued from Page 3: "Reliability Growth Analysis and Plots"
Page 5 RGA also provides pie charts and bar charts for this type of analysis. For example, the bar chart in Figure 5 displays the actual (current) failure rate with the predicted failure rate for all the B modes in the analysis. The chart can also be generated for each individual failure mode. In these charts, the red bar (left) represents the actual failure rate and the green bar (right) represents the failure rate after the fixes have been implemented. From the chart in Figure 5, you can see how each failure mode is contributing to the failure rate of the system. In addition, you can also see how the failure rate for each failure mode is decreasing after the implementation of the fix.
Reliability Growth Projections RGA 6 will of course support the analysis of data from test-fix-test reliability growth tests, where the fixes are applied as the problems are discovered during the test. In addition, the software will also support the incorporation of test-find-test data, where the fixes are delayed until after the completion of the test. You can use the Crow (AMSAA) Projection model, which utilizes A, B and C failure mode classifications, to analyze this type of data. Using this terminology, you can specify which failure modes you are not going to fix (A), which failure modes will be corrected at the end of the test (B) and which failures modes will be fixed before the test has been completed (C). In addition, you can assign a factor to each B mode that estimates the effectiveness of the correction that will be implemented after the test. There is no reliability growth for A modes and the effectiveness of the corrective actions for C modes is assumed to be demonstrated during the test. Analysis with Crow's projection model then allows you to consider different management strategies to see if you will still reach your goal for reliability growth. A variety of charts and results are available to support this effort. For example, Figure 3 shows the Growth Potential MTBF plot, which presents the reliability achieved during the test, the reliability that is projected after the implementation of delayed fixes and the maximum achievable reliability, given the current management strategy. If you determine that you will not meet your reliability goal, then you can re-evaluate your failure modes and change some A modes to B modes. In other words, you can decide to correct more failure modes. While doing projections, the assumption is that Beta is equal to one. Figure 4 shows one of several methods to check whether this assumption is valid, the Beta Bounds plot. This plot displays the confidence bounds on Beta at different confidence levels and demonstrates how these compare to the line where Beta equals one.
Please Turn to Page 6
Figure 4: Beta Bounds plot to confirm assumption
Figure 3: Growth Potential MTBF plot
Figure 5: Before and after failure rates for B modes
Continued from Page 5: "Reliability Growth Analysis and Plots"
Volume 4, Issue 1 motorcycles or ships) such that each of these systems can undergo an overhaul or a repair and be placed back into the field. Analysis of a repairable system using RGA 6 allows you to get an overview of the system without having the large data requirements that would normally be required for system reliability analysis, as in the BlockSim 6 software. You may want to use RGA 6 to track the progress of the system during development and then use BlockSim 6 in accordance with the already known results to gain more detailed information. In RGA 6's repairable systems interface, you can enter a start and end time for each system, along with any failure data that you may have for the system. You also have the ability to remove individual systems from consideration in a particular analysis if, for example, the data is not representative of the rest of the population. You can then analyze the data to combine each of these individual systems into a single "superposition" system. The parameters Beta and Lambda for that system, along with the results of the Laplace Trend Test and the Cramer Von Mises goodness-of-fit test, are also displayed for each system individually and for the combined "superposition" system. Figure 7 displays one of the plots available for repairable systems analysis in RGA 6. This is the System Operation plot, which displays a timeline of the failures for each of the individual systems, along with the failures for the combined "superposition" system. You can also generate plots of reliability and unreliability vs. time for the extrapolated "superposition system," as shown in Figure 8. Other plots, such as the Cumulative Number of Failures vs. Time plot with either linear or logarithmic axes, are also available. Conclusion ReliaSoft's RGA software provides these and other capabilities and will be released in Fall 2003. RS
Figure 6 displays a pie chart of failure modes with categories to represent the steps taken to address the modes: "Type A" will not be addressed. "Type B-Unseen" have not yet appeared in testing but are estimated from the analysis. "Type B-Remain" are failure modes that remain in the system because the corrective actions were not 100% effective and "Type B-Removed" are failure modes that will be removed through the implementation of corrective actions. For example, according to this pie chart, almost 29% of the failure modes have not even been observed yet through testing. Repairable Systems Analysis RGA 6 will also facilitate the analysis of repairable systems data using the Crow (AMSAA) model. For example, you may have a fleet of systems (e.g. a population of cars,
Figure 6: Failure modes pie chart
Figure 7: Failures for individual, superposition systems
Figure 8: Reliability vs. Time for superposition system
Continued from Page 1: "Fault Tree Analysis, Reliability Block Diagrams and BlockSim FTI Edition"
Volume 4, Issue 1 all events occur in a specific sequence, etc. An event (or a condition) in a fault tree is similar to a standard block in an RBD in that it can be associated with a probability of occurrence (or a distribution function). However, fault trees also use several graphical symbols to represent different types of events. For example, a circle typically represents a basic initiating event in a fault tree diagram, while a pentagon represents an event that is normally expected to occur. All events are treated the same from an analytical perspective. Table 1 shows the gate symbols that are used in classic fault tree analysis and Table 2 (page 9) shows the event symbols. For both tables, the reliability block diagram equivalents are described when applicable. Note: the "classic" FTA symbols in these tables are based on the definitions used in the Fault Tree Handbook (NUREG0492) prepared by the U.S. Nuclear Regulatory Commission.
Please Turn to Page 9
If the output event is system failure and the two input events are component failures, then this fault tree indicates that the failure of A or B causes the system to fail. The RBD equivalent for this configuration is a simple series system with two blocks, A and B, as shown next.
Drawing Fault Trees: Gates and Events Gates are the logic symbols that interconnect contributory events and conditions in a fault tree diagram. In addition to the AND and OR gates described above, fault trees can also logically connect events with other gates, such as the Voting OR gate, in which the output event occurs if a certain number of the input events occur (i.e. k-out-of-n redundancy), the Sequence Enforcing gate, in which the output event occurs if
Volume 4, Issue 1
Continued from Page 8: "Fault Tree Analysis, Reliability Block Diagrams and BlockSim FTI Edition"
Page 9 events are dependent. That is, the occurrence of each event affects the probability of occurrence of the other events. This type of dependency has not been utilized in classic FTA methods. Likewise, a traditional fault tree cannot take into account both of the probabilities in a true standby configuration: the probability of occurrence when active and when on standby (dormant, quiescent, inactive). A Priority AND gate or a Sequence Enforcing gate could be used to represent standby redundancy in classic FTA. However, it would not take into account the quiescent probability of occurrence. Therefore, we replaced these gates in BlockSim FTI with a more general Standby gate with a switch that can fail and be restored. Finally, and to provide true interoperability between fault trees and RBDs, all repair, maintenance and logistic properties available for RBD blocks are also available for fault tree event blocks. Examples Comparing FTDs and RBDs A couple of examples will further illustrate the concepts of FTA and its relationship to reliability block diagram techniques. First, Figure 2 presents a fault tree with a Voting
Please Turn to Page 11
Comparing Fault Trees and RBDs The most fundamental difference between FTDs and RBDs is that you work in the "success space" in an RBD while you work in the "failure space" in a fault tree. In other words, the RBD looks at success combinations while the fault tree looks at failure combinations. In addition, fault trees have traditionally been used to analyze fixed probabilities (i.e. each event that comprises the tree has a fixed probability of occurring) while RBDs may include time-varying distributions for the success (reliability equation) and other properties, such as repair/restoration distributions. In general (and with some specific exceptions), a fault tree can be easily converted to an RBD. However, it is generally more difficult to convert an RBD into a fault tree, especially if one allows for highly complex configurations. As you can see from Tables 1 and 2, there is an RBD equivalent for most of the constructs that are supported by classic FTA. The one exception is the XOR gate, which specifies that the output event occurs if exactly one input event occurs. This is similar to an OR gate with the exception that if more than one input event occurs then the output event does not occur. For example, if there are two input events then the XOR gate indicates that the output event occurs if one of those events occurs but not if zero or both of those events occur. From a system reliability perspective, if each input event is the failure of a component and the output event is system failure, this would imply that a two-component system would function, even if both components had failed. BlockSim FTI Given the similarities described above, ReliaSoft set out to blur the distinction between fault trees and RBDs. BlockSim FTI allows interchangeable use of either RBD or fault tree in the analysis. To accomplish this integration, we introduced two new constructs (gates) that are supported in BlockSim's RBDs but do not have an equivalent in classic FTA. These are the Load Sharing gate and the True Standby gate with a quiescent probability. In a load sharing configuration, the output event occurs if all input events occur; however, the
Fault Tree RBD
Figure 2: Fault tree and RBD for k-out-of-n configuration
Volume 4, Issue 1
Continued from Page 9: "Fault Tree Analysis, Reliability Block Diagrams and BlockSim FTI Edition"
Page 11 With BlockSim FTI, you can define and analyze fault trees using the major gates and event symbols. You can also expand your traditional fault tree analyses with the maintainability, throughput and other options that are available in BlockSim's RBDs. You can automatically convert a fault tree to a reliability block diagram and you can also "mix and match" FTDs and RBDs within the same project by, for example, linking a fault tree diagram as a subdiagram to a higher level RBD.
OR gate along with the equivalent reliability block diagram. As you can see, a Voting OR gate in FTA is equivalent to a k-outof-n parallel RBD configuration, in which some quantity (m) of all input events (qty = n) must occur for the output event to occur. As another comparison example, consider a "bridge" configuration like the one shown in Figure 3. An inspection of the reliability-wise configuration of this system reveals that any of the following failures will cause the system to fail: Failure Failure Failure Failure of of of of components components components components 1 3 1 2 and and and and 2. 4. 5 and 4. 5 and 3.
These sets of events are also called "minimal cut sets." In probability terminology, this configuration can be described as: (1 AND 2) OR (3 AND 4) OR (1 AND 5 AND 4) OR (2 AND 5 AND 3). Representation of this bridge configuration as a fault tree diagram requires the utilization of duplicate (or mirrored) events, since gates can only represent components in series and parallel. Figure 4 shows the fault tree diagram for this situation, in which the top output event is the failure of the system and the input events are individual component failures. Events with the same number represent the failure of the same component. Figure 5 presents this configuration in a reliability block diagram. This diagram also requires the use of more than one block in the diagram to represent the same component. Blocks with the same number in the diagram are identical. These are called "mirrored" blocks in BlockSim. Conclusion As this article demonstrates, fault tree diagrams and reliability block diagrams can be used to model and analyze similar types of logical configurations required for system reliability and related analyses. The BlockSim FTI software provides the full array of reliability block diagram capabilities that are available in the standard version of BlockSim and adds an integrated capability for fault tree analysis.
More information is available http://BlockSim.ReliaSoft.com.
Figure 3: Complex "bridge" configuration
Figure 4: Fault tree for complex "bridge" configuration
Figure 5: Reliability block diagram for complex "bridge" configuration
BlockSim algebraically computes exact system reliability results and optimum reliability allocations and also provides a sophisticated discrete event simulation engine for maintainability, availability, throughput, life cycle cost summaries and related analyses.
Flexible Reliability Block Diagram (RBD) Creation
BlockSim's interface for reliability block diagram creation is the most intuitive, flexible and polished in the industry. Simple drag-and-drop techniques allow you to build RBDs for the simplest to the most complex systems. Configuration options include series, parallel, complex and k-out-of-n plus load sharing and standby redundancy. Mirrored, multi and subdiagram blocks provide additional modeling flexibility.
Exact Reliability Results/Plots and Optimum Reliability Allocation
BlockSim algebraically computes the exact system reliability function, which can be used to obtain exact system reliability results/plots and to determine the most cost-effective component reliability allocation strategy to meet a system reliability goal.
Extensive Simulation Options for Analysis of Repairable Systems
BlockSim's sophisticated and realistic simulation engine considers the effects of reliability, corrective maintenance, preventive maintenance (PM) and inspections on system performance. This information can be used to generate reliability, maintainability and availability results/plots for the system and also for resource allocation, throughput, life cycle cost summaries and related analyses. Configuration options include the ability to define: Maintenance Policies (conditions for corrective maintenance, PM and inspections) Maintenance Duration and Restoration Factors (100% restoration or imperfect repairs) Resource Availability (spare parts and maintenance crews) Direct and Indirect Maintenance Costs Throughput (output in a given time)
BlockSim allows you to attach other files to the analysis and provides direct integration with ReliaSoft's Weibull++, ALTA and Xfmea, as well as RAC's PRISM.
(toll free in US and Canada)
The most comprehensive RBD software on the market PLUS complete Fault Tree Analysis support ... integrated together in the same powerful software package.
The BlockSim FTI edition provides all of the powerful capabilities of the standard BlockSim software plus integrated support for Fault Tree Analysis (FTA)... all for under $3,000!
Support for All Major Fault Tree Analysis Gates and Events
BlockSim's interface for fault tree analysis supports all of the traditional FTA gates and event symbols that are applicable to system reliability and related analyses. In addition, BlockSim FTI allows you to expand the modeling capabilities through the introduction of new logic gates to represent load sharing and standby redundancy configurations. Gates: AND, OR, Voting, Inhibit, Load Sharing, Standby Events: Basic, Trigger (External, House), Undeveloped, Conditional, Resultant
Work with Fault Trees and/or RBDs in the Same Environment
With the FTI edition, your BlockSim projects can contain both fault trees and reliability block diagrams, together in the same analysis environment. You can even integrate your fault trees and RBDs in several ways, including: Mix and match fault trees and RBDs by linking a fault tree as a subdiagram to an RBD or vice versa. Copy events from a fault tree diagram and paste them as blocks in an RBD. Automatically convert any fault tree diagram to a reliability block diagram.
Expanded Analysis Capabilities
Because BlockSim FTI's fault trees are directly integrated with an equivalent RBD model, you can expand the analysis capabilities for your fault trees beyond the basic results that are normally possible. Regardless of diagram type (fault tree or RBD), all of BlockSim's analytical and simulation capabilities are available. This includes: Define blocks (events and gates) with time-dependent distributions or fixed probabilities. Consider maintenance characteristics, including maintenance policies (corrective and preventive maintenance, inspections), maintenance durations, restoration factors, resource availability, maintenance costs and even throughput. Model load sharing and true standby redundancy configurations through the application of new logic gates introduced by ReliaSoft. Generate plots, reports and point estimate results based on the analysis.
Scheduled for release in Fall 2003. Upgrade pricing is available for current BlockSim users.
(toll free in US and Canada)
Continued from Page 1: "Examining Risk Priority Numbers in FMEA"
Volume 4, Issue 1 Occurrence and Detection ratings for each issue (using the same rating scales) and multiplies the revised ratings to calculate the revised RPNs. If both initial and revised RPNs have been assigned, the percent reduction in RPN can also be calculated as follows:
Severity, Occurrence and Detection rating scales usually range from 1 to 5 or from 1 to 10, with the higher number representing the higher seriousness or risk. For example, on a ten point Occurrence scale, 10 indicates that the failure is very likely to occur and is worse than 1, which indicates that the failure is very unlikely to occur. The specific rating descriptions and criteria are defined by the organization or the analysis team to fit the products or processes that are being analyzed. As an example, Figure 1 shows a generic five point scale for Severity [Stamatis, 445]. After the ratings have been assigned, the RPN for each issue is calculated by multiplying Severity x Occurrence x Detection. The RPN value for each potential problem can then be used to compare the issues identified within the analysis. Typically, if the RPN falls within a pre-determined range, corrective action may be recommended or required to reduce the risk (i.e. to reduce the likelihood of occurrence, increase the likelihood of prior detection or, if possible, reduce the severity of the failure effect). When using this risk assessment technique, it is important to remember that RPN ratings are relative to a particular analysis (performed with a common set of rating scales and an analysis team that strives to make consistent rating assignments for all issues identified within the analysis). Therefore, an RPN in one analysis is comparable to other RPNs in the same analysis but it may not be comparable to RPNs in another analysis. The rest of this article discusses related techniques that can be used in addition to or instead of the basic RPN method described here. Revised RPNs and Percent Reduction in RPN In some cases, it may be appropriate to revise the initial risk assessment based on the assumption (or the fact) that the recommended actions have been completed. This provides an indication of the effectiveness of corrective actions and can also be used to evaluate the value to the organization of performing the FMEA. To calculate revised RPNs, the analysis team assigns a second set of Severity, Rating 1 2 3 4 5 Description Very Low or None Low or Minor Moderate or Significant High Very High or Catastrophic Criteria Minor nuisance. Product operable at reduced performance. Gradual performance degradation. Loss of function. Safety-related catastrophic failures.
For example, if the initial ratings for a potential problem are S = 7, O = 8 and D = 5 and the revised ratings are S = 7, O = 6 and D = 4, then the percent reduction in RPN from initial to revised is (280-168)/280, or 40%. This indicates that the organization was able to reduce the risk associated with the issue by 40% through the performance of the FMEA and the implementation of corrective actions. Severity Initial Revised 7 7 Occurrence 8 6 Detection 5 4 R PN 280 168 40%
% Reduction in RPN
Figure 1: Generic five point Severity scale
Occurrence/Severity Matrix Because the RPN is the product of three ratings, different circumstances can produce similar or identical RPNs. For example, an RPN of 100 can occur when S = 10, O = 2 and D = 5; when S = 1, O = 10 and D = 10; when S = 4, O = 5 and D = 5, etc. In addition, it may not be appropriate to give equal weight to the three ratings that comprise the RPN. For example, an organization may consider issues with high severity and/or high occurrence ratings to represent a higher risk than issues with high detection ratings. Therefore, basing decisions solely on the RPN (considered in isolation) may result in inefficiency and/or increased risk. The Occurrence/Severity matrix provides an additional or alternative way to use rating scales to prioritize potential problems. This matrix displays the Occurrence scale vertically and the Severity scale horizontally. The points represent potential causes of failure and they are marked at the location where the Severity and Occurrence ratings intersect. The analysis team can then establish boundaries on the matrix to identify high, medium and low priorities. Figure 2 (page 15) displays a matrix chart generated with ReliaSoft's Xfmea software. In this example, the Occurrence and Detection ratings were set based on a ten point scale, the high priority issues are identified with a red triangle (up), the medium priority issues are identified with a yellow circle and the low priority issues are identified with a green triangle (down). Within the software, when the user clicks a point in the matrix, the description of the potential problem is displayed. For presentation in other documents, a text legend can be used to accompany the matrix graphic.
Please Turn to Page 15
Volume 4, Issue 1
Continued from Page 14: "Examining Risk Priority Numbers in FMEA"
Rank Issues by Severity, Occurrence or Detection Ranking issues according to their individual Severity, Occurrence or Detection ratings is another way to analyze potential problems. For example, the organization may determine that corrective action is required for any issue with an RPN that falls within a specified range and also for any issue with a high severity rating. In this case, a potential problem may have an RPN of 40 (Severity = 10, Occurrence = 2 and Detection = 2). This may not be high enough to trigger corrective action based on RPN but the analysis team may decide to initiate a corrective action anyway because of the very high severity of the potential effect of the failure. Figure 3 presents a graphical view of failure causes ranked by likelihood of occurrence in a pareto (bar) chart generated by Xfmea. This chart provides the ability to click a bar to display the issue description and to generate a detailed legend for print-ready output. Xfmea also provides this information in a print-ready tabular format and generates similar charts and reports for Severity and Detection ratings. Risk Ranking Tables In addition to, or instead of, the other risk assessment tools described here, the organization may choose to develop risk ranking tables to assist the decision-making process. These tables will typically identify whether corrective action is required based on some combination of Severity, Occurrence, Detection and/or RPN values. As an example, the table in Figure 4 places Severity horizontally and Occurrence vertically [McCollin, 39]. The letters and numbers inside the table indicate whether a corrective action is required for each case.
N = No corrective action needed. C = Corrective action needed. # = Corrective action needed if the Detection rating is equal to or greater than the given number. For example, according to the risk ranking table in Figure 4, if Severity = 6 and Occurrence = 5, then corrective action is
Please Turn to Page 16
Figure 3: Chart of causes ranked by Occurrence rating generated with Xfmea O/S 1 2 3 4 5 6 7 8 9 10 Figure 2: Occurrence/Severity Matrix generated with Xfmea's Plot Viewer 1 N N N N N N N N N N 2 N N N N N N 10 8 7 6 3 N N N N 10 7 6 5 5 4 4 N N N 8 6 5 4 4 3 3 5 N N 10 6 5 4 3 3 3 2 6 N N 7 5 4 3 3 2 2 2 7 N 10 6 4 3 3 2 2 2 1 8 N 8 5 4 3 2 2 2 1 1 9 C C C C C C C C C C 10 C C C C C C C C C C
Figure 4: Sample risk ranking table
Continued from Page 15: "Examining Risk Priority Numbers in FMEA"
Volume 4, Issue 1 of these techniques rely heavily on engineering judgment and must be customized to fit the product or process that is being analyzed and the particular needs/priorities of the organization. ReliaSoft's Xfmea software facilitates analysis, data management and reporting for all types of FMEA, with features to support most of the RPN techniques described here. On the Web at http://Xfmea.ReliaSoft.com. References The following references relate directly to the examples presented in this article. Numerous other resources are available on FMEA techniques and styles. For more information, see http://Xfmea.ReliaSoft.com/resources.htm. Crowe, Dana and Alec Feinberg, Design for Reliability, Chapter 12 "Failure Modes and Effects Analysis." CRC Press, Boca Raton, FL, 2001. McCollin, Chris, "Working Around Failure." Manufacturing Engineer, February 1999. Pages 37-40. Stamatis, D.H., Failure Mode and Effect Analysis: FMEA from Theory to Execution. American Society for Quality (ASQ), Milwaukee, Wisconsin, 1995.
required if Detection = 4 or higher. If Severity = 9 or 10, then corrective action is always required. If Occurrence = 1 and Severity = 8 or lower, then corrective action is never required, and so on. Other variations of this decision-making table are possible and the appropriate table will be determined by the organization or analysis team based on the characteristics of the product or process being analyzed and other organizational factors, such as budget, customer requirements, applicable legal regulations, etc. Higher Level RPNs Finally, it may be desirable to assign RPNs at higher levels in the analysis based on the RPNs calculated for the causes of failure. For example, Item RPNs might be useful as a way to compare components to determine priority for corrective action or to determine which component will be selected for inclusion in the design. The higher level RPN can be calculated by obtaining the sum of all RPNs for all associated causes of failure. For example, to calculate the Item RPN, it is necessary to calculate the RPNs for each cause associated with the item and then to obtain the sum of those RPNs, as shown next:
Within Xfmea, users have the option to roll up the RPN from the cause level to the effect, failure, function and/or item levels of the analysis. Figure 5 shows these calculated values in the hierarchical view of the analysis within Xfmea. Conclusion As this article demonstrates, the Risk Priority Number (RPN) methodology can be used to assess the risk associated with potential problems in a product or process design and to prioritize issues for corrective action. A particular analysis team may choose to supplement or replace the basic RPN methodology with other related techniques, such as revised RPNs, the Occurrence/Severity matrix, ranking lists, risk ranking tables and/or higher level RPNs. All
Criticality Analysis is another method of risk assessment that can be used in conjunction with an FMEA. MIL-STD1629A describes the requirements for two types of failure modes, effects and criticality analysis (FMECA): quantitative and qualitative. To perform a quantitative criticality analysis, the analysis team must: Identify the functions, failures, effects and causes for each item of interest. Define the reliability/unreliability for each item, at a given operating time. Identify the portion of the item's unreliability that can be attributed to each potential failure mode. Rate the probability of loss (or severity) for the effects of each failure mode, using a number from 0 to 1. Calculate the criticality for each potential failure mode by multiplying the three factors: Item Unreliability x Mode Ratio of Unreliability x Probability of Loss. Calculate the criticality for each item by obtaining the sum of the criticalities for each failure mode that has been identified for the item. A qualitative criticality analysis, as described in the military standard, is similar to the Risk Priority Number (RPN) method. The analysts use pre-defined rating scales to rate the likelihood of occurrence for each failure mode and the severity of the potential effects of failure. However, the probability of prior detection is not considered. A matrix with severity on the horizontal and occurrence on the vertical axes can be used to compare failure modes from this analysis.
Figure 5: Xfmea project with higher level RPNs calculated
RAC provides total, turn-key reliability solutions including training, data, consulting services, and information for both government and commercial customers. We offer a number of free services through our web site including technical information, searchable bibliographic databases, industry directories, an open industry forum, calendar of events, and more.
Reliability Analysis Center 201 Mill Street, Rome, NY 13440-6916 Phone: 315.337.0900 or 1.888.RAC.USER Fax: 315.337.9932 E-mail: [email protected] Web Site: http://rac.alionscience.com
Volume 4, Issue 1
Reliability and Maintainability Analysis for a Remote Telecommunications System
This article presents a fictional example designed to demonstrate some useful techniques for system reliability, maintainability and availability analysis. The purpose is to investigate the reliability and maintainability of a telecommunications system that will be constructed in an uninhabited stretch of jungle. The BlockSim 6 software is used to model the system and perform the analysis. Reliability-Wise System Configuration The first step in the analysis is to model the reliability-wise configuration of the system, which consists of a transmitter and receiver with six relay stations to connect them. The relays are situated so that the signal originating from one station can be picked up by the next two stations down the line. For example, a signal from the transmitter can be received by relays 1 and 2; a signal from relay 1 can be received by relays 2 and 3; and so forth. Thus, this arrangement requires two consecutive relays to fail for the system to fail. (This is also known as a consecutive-k-out-ofn:F system.) Figure 1 displays the reliability block diagram (RBD) to describe the reliability-wise configuration of the system. In addition, the transmitter and receiver are made up of three subassemblies each, while the relay stations have two subassemblies each (all in series). Specifically: Subassembly SPS1 (solar power supply) is common to all. The transmitter has two additional subassemblies: TRC1 and TRC2. The receiver has two additional subassemblies: RCR1 and RCR2. Each relay station has one additional subassembly: RLYC1. These subassemblies are defined in BlockSim as subdiagrams to the master diagram (i.e. separate diagrams linked to blocks in the main diagram). The subdiagrams are presented in Figure 2. In addition, Table 1 presents the failure distributions and parameters that have been estimated from data collected for each subassembly. Basic Reliability Analysis Once the analysts have modeled the reliability-wise configuration of the system and defined the reliability characteristics of the components, they can use BlockSim to calculate the reliability function for the system and answer questions of interest regarding the reliability of the system. For example, they can use the Analytical QCP to determine that the reliability of the system after 1000 hours of operation is 97.67%. Table 1: Distribution and Parameters to Describe the Failure Properties of Each Subassembly (in hours) Component SPS1 TR C 1 TR C 2 RCR1 RCR2 RLYC1 Failure Distribution Weibull Weibull Exponential Exponential Weibull Exponential Parameters Beta Eta Beta Eta Mu Mu Beta Eta Mu 2 25,000 3 20,000 85,000 150,000 2 30,000 100,000
Please Turn to Page 20
Figure 2: Subdiagrams for the three types of component
Figure 1: RBD for remote telecommunications system
Continued from Page 19: "Reliability and Maintainability Analysis for a Remote Telecommunications System"
Volume 4, Issue 1 Table 2: Distribution and Parameters for Corrective Maintenance Durations for Each Subassembly (in hours) Component SPS1 TR C 1 TR C 2 RCR1 RCR2 RLYC1 Repair Distribution Lognormal Exponential Exponential Exponential Exponential Exponential Parameters ln (mean) ln (std) Mu Mu Mu Mu Mu 2.3 0.55 10 10 10 10 10
In addition, the analysts can generate a Reliability Importance vs. Time plot and use it to determine whether different relays have different impacts on the reliability of the system when they fail, based on their position in the configuration. As shown in Figure 3, the position of the relays within the diagram does matter, even though they are reliability-wise identical. The failure of relay 1 or 5 has the greatest impact on the reliability of the system. Relays 3 and 4 have the second greatest impact and relays 1 and 6 have the smallest impact on system reliability. Basic Maintainability and Availability Analysis By expanding the analysis to include information on the maintenance plan for the system, the analysts can make important estimates regarding the maintainability and availability of the system. In this example, all of the components described in Table 1 (page 19) are line replaceable units that can be 100% restored by replacing the failed component with a new one. To simplify the example, we will assume that each repair begins immediately upon the failure of a unit, that there are unlimited maintenance crews and spare parts to perform the maintenance and that no logistical delays exist. In addition, the components do not continue to operate (i.e. accumulate age) when the system is down. Table 2 presents the repair distributions and parameters that have been estimated from data collected for each subassembly. Note that the maintenance plan described here consists of corrective maintenance (CM) only and does not include preventive maintenance (PM) or inspections. When the maintenance characteristics for the system have been added to the model, the analysts can use
BlockSim's simulation utility to obtain desired results regarding the maintainability and availability of the system. The results generated by completing 10,000 simulation runs for one year (8760 hours) of operation include: The point availability of the system after one year of operation, A(t = 8760), is 99.86%. This represents the probability that the system is operational at the given time. The average availability of the system after one year of operation is 99.93%. This is also called "operational availability" and it represents the total uptime divided by the total downtime. The mean time to first failure (MTTFF) of the system is 16,397 hours, or almost two years. The total system downtime is 6.16 hours per year. In addition, the analysts can rank the components according to their Failure Criticality Index (RS FCI), which represents the percentage of the system failures that were due to the failure of the given component. As shown in Figure 4, 99.9% of all system failures were due to the transmitter or the receiver. Among those failures, 54% were due to the transmitter and 20.5% of the transmitter failures were due to the solar power supply (SPS1) component. Therefore, improvement to the availability of the SPS1 component will have the greatest impact on the availability of the system.
Please Turn to Page 21
Figure 3: Reliability Importance vs. Time plot to compare the impact of the relays on the system reliability
Figure 4: Summary of selected RS FCI results
Volume 4, Issue 1
Continued from Page 20: "Reliability and Maintainability Analysis for a Remote Telecommunications System"
Page 21 On-hand spares are available immediately but other parts must be ordered and shipped when needed. The time of arrival for all parts that are ordered and shipped follows a normal distribution with a mean of 72 hours and a standard deviation of 12. Under this scenario, the analysts must expand the system model to include additional information on the resources that are required to perform repairs (i.e. maintenance personnel and spare parts). In BlockSim, this requires the assignment of a maintenance crew policy and spare parts policy to each component. The maintenance crew policy describes any limitations on the number of simultaneous repairs that can be performed on the system (two), any logistical delay time before the maintenance personnel can initiate the action (duration follows a normal distribution with Mean = 36 and Std = 6) and any costs associated with engaging the crew (none). The spare parts policy describes the number of parts in stock (1 each for SPC1 and RLYC1, 0 for the rest), any logistical delay time before an available part can be used for a maintenance action (none) and the conditions for ordering and shipping parts when needed (order 1 when the stock drops to 0, time for arrival follows normal distribution with Mean = 72 and Std = 12). When the simulation is repeated for one year of operation according to the modified maintenance plan, new maintainability and availability results are generated. For example, the average availability after one year of operation is 99.6% with the personnel and parts limitations established by the subcontractor. This is slightly less than the 99.93% estimated for unlimited spares and maintenance crews. The expected system downtime is 36.21 hours per year, which is greater than the downtime estimate that did not take maintenance resources into account. Conclusion As this article demonstrates, there are many factors that can affect the performance of a repairable system. Flexible reliability block diagram (RBD) techniques along with powerful analysis and simulation engines enable analysts to model systems as realistically as possible in order to obtain reliability, maintainability and availability estimates that can be used to improve performance, reduce cost and avoid risk. ReliaSoft's BlockSim 6 software supports these and other analysis techniques, plots and results. Other case study examples are available on the Web at http://BlockSim.ReliaSoft.com. RS
Finally, the analysts can estimate the number of spare parts required to maintain the system by looking at the expected number of failures for component, presented in Figure 5. Because all maintenance in this example involves the replacement of a failed component, a spare part will be required for each failure. For SPS1, 0.9579 failures are expected per year. Another way to look at this is to say that there is a 96% chance that maintenance personnel will need a spare part for SPS1 during the year. Of course, the choice as to whether to keep spare parts in stock is based on additional economic and logistic information (e.g. How quickly can the part be obtained? How much does it cost to keep the part on-hand? etc.) More Complex System Analysis Although the analysis up to this point has been purposely simplified to consider only failure and repair distributions and system configuration, other factors may impact the reliability, maintainability and availability of a system in a real-world situation. This may include other maintenance approaches (PM and/or inspections), components that continue to operate when the system is down, dormant (hidden) failures, imperfect repairs (i.e. the component is less than 100% restored by the maintenance action), limitations on maintenance personnel and/or spare parts, etc. To demonstrate two such factors, we will modify this example to suppose that a subcontractor has been engaged to repair the system when needed and that it takes an average of 36 hours (following a normal distribution with a standard deviation of 6) for a technician to reach the site and begin the repair. Furthermore, we will assume that only two technicians are qualified to service the system and that the subcontractor keeps a single spare for SPC1 and RLYC1 onhand. When one of these parts is used, another is ordered.
Part [Subsystem] RCR1 [Receiver] RCR2 [Receiver] RLYC1 [Relay] RLYC1 [Relay] RLYC1 [Relay] RLYC1 [Relay] RLYC1 [Relay] RLYC1 [Relay] SPS1 [Receiver] SPS1 [Relay] SPS1 [Relay] SPS1 [Relay] SPS1 [Relay] SPS1 [Relay] SPS1 [Relay] SPS1 [Transmitter] TRC1 [Transmitter] TRC2 [Transmitter]
Expected # per Part [Subsystem] 0.0582 0.0832 0.0827 0.0858 0.0874 0.0889 0.0897 0.0903 0.1200 0.1166 0.1181 0.1205 0.1208 0.1211 0.1242 0.1166 0.0803 0.1088
Expected # per Part 0.0582 0.0832 0.5248
Unless otherwise attributed, the articles in Reliability Edge have been developed by ReliaSoft's R&D staff. This dedicated team of engineers, statisticians, mathematicians and programmers works continuously with top experts in the discipline to develop principles and theory that significantly advance the current state of research and with industry practitioners to successfully apply those principles in the field. Contributing authors hold advanced degrees in reliability engineering and related fields.
Figure 5: Expected failures (spare parts required)
Reliability Training Seminars
ReliaSoft's training seminars provide instruction in reliability engineering principles and theory as well as the ReliaSoft standard software tools designed to put that theory into practice. By applying this training in the workplace, you can decrease defect rates, reduce warranty costs, improve product design, shorten test time, reduce life cycle costs, reduce maintenance costs... and more!
ReliaSoft's applications take care of the math... the rest is fun!
Life Data Analysis
Three day course covering the basics of life data analysis as it applies to reliability engineering. Session includes hands-on experience with ReliaSoft's Weibull++ software.
Accelerated Life Testing
One day course covering the basics of quantitative accelerated life testing (QALT) data analysis and the use of ReliaSoft's ALTA software.
QALT Boot Camp
Intensive three day course covering advanced concepts and applications of quantitative accelerated life testing (QALT) data analysis. ReliaSoft's ALTA 6 PRO software will be utilized.
One day course covering the basics of system reliability, maintainability and availability analysis using a reliability block diagram (RBD) approach. The session includes basic training in the use of ReliaSoft's BlockSim software.
Advanced System Analysis
Three day course covering advanced concepts and applications of reliability, maintainability and availability analysis for repairable systems and the use of advanced features in ReliaSoft's BlockSim. Please visit the Web or contact ReliaSoft for information on other available courses, on-site seminars and the current public seminar schedule. +1.520.886.0410 http://Seminars.ReliaSoft.com
Volume 4, Issue 1
For Your Information
ReliaSoft and the Reliability Analysis Center (RAC) along with our corporate sponsors GE, Quantum and the US Army, are very excited to announce the 1st Annual Applied Reliability Symposium (ARS) scheduled for June 16 18, 2004 in San Diego, California. The 2004 ARS program matrix includes twenty presentations and four tutorial sessions through two concurrent tracks over two and a half days. The presenters are expert industry practitioners who have been applying reliability and maintainability principles in their day-to-day work for years. Topics will include: Warranty reduction Reliability as an investment, not a cost Specifying reliability and reliability metrics Supplier management Data collection, management and analysis Reliability and safety Reliability and market share Reliability testing Software reliability Manufacturing reliability Please mark your calendars and watch for the soon to be launched symposium Web site for more information. --Doug Ogden Vice President, Corporate Relations
Seminars The next "Master the Subject, Master the Tools" basic training seminars are scheduled for October 6 - 10 in Larnaca, Cyprus, October 27 - 31 in Shanghai, China and November 17 - 21 in Tucson, Arizona. On the Web at http://Seminars.ReliaSoft.com. Software Detailed product information for ReliaSoft software products, including free product updates and free evaluation copies, is available on the Web.
ReliaSoft's Standard Softw are Products
Weibull++ Version 6.0.11, Built 6/24/03 Weibull++ MT (Machine Tools) Weibull++ DE (Developer Edition) ALTA and ALTA PRO Version 6.0.12, Built 5/30/03 Life Data Analysis http://Weibull.ReliaSoft.com http://Weibull.ReliaSoft.com/mt http://Weibull.ReliaSoft.com/deved Accelerated Life Test Data Analysis http://ALTA.ReliaSoft.com
System Reliability, Maintainability and BlockSim Version 6.0.7, Built 6/24/03 Availability Analysis FTI Edition to be released Fall 2003 http://BlockSim.ReliaSoft.com MPC 3 Version 3.0.9, Built 5/22/03 Xfmea Version 1.0.4, Built 6/26/03 RGA To be released Fall 2003 Technical Support
Phone: +1.520.886.0366 Fax: +1.520.886.0399 E-mail: [email protected]
Maintenance Program Creator http://MPC.ReliaSoft.com Failure Modes and Effects Analysis http://Xfmea.ReliaSoft.com Reliability Growth Analysis http://RGA.ReliaSoft.com
Toll Free: 1.888.886.0410 (U.S. and Canada) Phone: +1.520.886.0410 Fax: +1.520.886.0399 E-mail: [email protected]
Reliability software tools Free downloads and on-line utilities Probability plotting papers On-line eTextbooks, including: Life Data Analysis Reference Accelerated Life Testing Reference System Analysis Reference Reliability Growth Reference HotWire eMagazine Subject indexes, glossaries, guides, links Research papers, case studies, examples Discussion forum
weibull.com is a free service for the reliability community, provided by ReliaSoft Corporation.
ReliaSoft Corporation ReliaSoft Plaza 115 S. Sherwood Village Drive Tucson, AZ 85710
Prst Std U.S. Postage PAID Tucson, AZ Permit #541
Volume 4, Issue 1
ReliaSoft's Reliability Edge, Volume 4 Issue 1
Our content is added by our users. We aim to remove reported files within 1 working day. Please use this link to notify us: