Reliability

NPHS 1530: Analytics

Reliability

RELIABILITY IN DESIGN

The network designer generally has a reliability requirement stated for him or her as part of the network design specifications. This is usually done in one of two ways. First, a reliability floor(s) may be specified. These are the minimum levels of reliability that all or some network paths must have. Secondly, the designer may be asked to "maximize" reliability with respect to cost. In other words, achieve the highest level of reliability within a given budgeted cost. Finally, the designer may have as his objective to minimize the total cost of constructing and operating the network. Each of these cases involves the relative tradeoffs between increased reliability and the costs of achieving that reliability. The following section explores the cost/reliability relationship.

As this chapter has shown, there are in general two way to increase a network system's reliability:

Use components that are inherently more reliable.
Add parallel components.

The network designer must decide which of these alternatives can be used to produce a system with the desired level of reliability and an acceptable cost.

Assessing the impact and cost of components that are more reliable is fairly straightforward. We often refer to the most reliable components as being "gold plated". This is because higher reliability in a component can only be achieved through greater expense in its manufacture. The cost of a component typically increases at a much faster rate than its reliability. Producing a system using the most reliable components may result in a system cost that is prohibitive. In addition, there is an upper bound to the reliability to which a component can be built. The existing levels of material and manufacturing technologies may not permit us to build a component with a reliability beyond a certain point.

Often an economical way to achieve higher reliability is through redundancy - adding more components in parallel. As our preceding analysis has shown, significant reliability improvements can be achieved by adding components in parallel. This practice is common in computer and other electronic systems where very high reliability is desired.

One question that immediately comes to mind is: How much reliability is enough? We can answer this question in terms of total system reliability cost. The total system reliability cost (TC) is the sum of the costs to make the system reliable (TS) plus the cost associated with the system having a failure (TF). Formula 8.27 shows this relation.

TC = TS + TF
We have already shown that designing for increased reliability causes increased component costs. Formula 8.28 shows the system reliability cost, where Cc is the cost of a component and n is the number of components. This relationship is portrayed graphically in Figure 8.20.

TS = n * Cc

Figure 8.20 System Reliability Costs

The total failure cost, TF, is a statistical cost. A failure will not always occur. We will have some periods of operation where no failure will occur and thus no failure cost will be incurred. At other times failures (F percent of the time) will occur and we will incur the full cost of the failure, C_f. On the average, our cost will be the statistical average cost of these two conditions.

TF = F * Cf + ( 1 - F ) * 0 = F * Cf
To simplify the analysis, we will assume that we are achieving reliability by adding components of equal reliability, r, in parallel. This results in Formula 8.30. In practice any of the system failure probabilities can be substituted into Formula 8.29, depending on the nature of the system under construction.

TF = ( 1 - r )ⁿ * Cf
The cost of a failure, C_f, is a difficult quantity to measure. Part of this cost comes from directly measurable primary costs, such as lost revenue due to the system being down and the cost of diagnosis and repair of the network. Also included in C_f are secondary costs that are difficult to identify and measure. The cost of customer dissatisfaction and potential loss of future business are examples of these costs. As Figure 8.20 and Formula 8.30 show, adding parallel components will decrease the probability of failure and thus, reduce the total expected failure cost.

As Figure 8.20 and Formula 8.31 also show, the total system reliability cost, TC, is the sum of the component and failure costs. Substituting for component and failure cost, we obtain:

TC = n * Cc + ( 1 - r )ⁿ * Cf
If the variables of this function were continuous, we could find the minimum total cost by taking the first derivative of this function with respect to the number of components n, setting the value of the derivative equal to zero and solving for n.

= 0 = Cc + ln ( 1 - r ) * Cf * ( 1 - r )ⁿ
In this case, however, the number of components, n, is a discrete variable and thus, differentiation is not permitted We usually however use Formula 8.32 as an approximation to the correct number of components or, since the number of parallel components is typically small, we find the optimum cost by evaluating Formula 8.31 for increasing numbers of parallel components.

As an example, suppose a system is to be built of parallel components costing $100.00 each. The cost of failure of the system has been set at $50,000 for each failure. Each component has a reliability of .9. We wish to design the system such that the total system reliability cost is minimum. How many components must we use?

Table 8.8 gives the component, failure and system costs for various numbers of components in parallel. With few components in parallel, the cost attributed to failures is excessive. With many components in parallel, the cost of the components outweighs the cost expected due to failures. At 3 parallel components, the total cost is minimized.

Table 8.8 Example Reliability Cost Solution

n	TS	F	TF	TC
1	100.00	.1	5000.00	5100.00
2	200.00	.01	500.00	700.00
3	300.00	.001	50.00	^*350.00
4	400.00	.0001	5.00	405.00
5	500.00	.00001	0.50	500.50

*Minimum Total System Cost

PARCOST Computer Program The accompanying program PARCOST will calculate the minimum total system cost for a parallel component system. When given a number of components in parallel, a component reliability and the system failure cost, the program computes the minimum total cost system for the entered system data. Figure 8.21 below shows a sample run for a system with each component costing $250.00 each of whose components reliabilities are .8. The failure cost for this system is $150,000.00. User inputs are highlighted in bold.

Figure 8.21 PARCOST Example Computer Run

Parallel Reliability Cost Calculation

Enter the Cost of a Component 250.00

Enter the Component Reliability .8

Enter the Cost of a Failure 150000.00

Number of Components	Component Cost	Failure Probability	Failure Cost	Total System Cost
1	250.00	.2	30000.00	30250.00
2	500.00	.04	6000.00	6500.00
3	750.00	.008	1200.00	1950.00
4	1000.00	.0016	240.00	1240.00
5	1250.00	.00064	96.00	1346.00

Minimum Total System Cost 1240.00

Number of Components 4