Reliability

NPHS 1530: Analytics

Reliability

Introduction

Reliability is a concept that has applications in a large number of domains. In emergency systems, reliability has been applied to, for example:

Human response
Emergency Support Functions (ESFs)
- Emergency medicine
- Communications systems
Critical Infrastructure
- Energy, particularly electrical
- Tranportation systems
Equipment

Definition

Reliability: The confidence that an emergency system will function as planned when the next large-scale incident or disaster occurs.

Source: Evaluating the Reliability of Emergency Response Systems for Large-Scale Incident Operations

Critical Infrastructure: “Systems and assets, whether physical of virtual, so vital to the United States that the incapacity or destruction of such systems and assets would be a debilitating impact on security, national economic security, national public health or safety, or any combination of those matters”

Source: USA Patriot Act, Sec. 1016(e).

Emergency Support Functions (ESFs): Used by the Federal Government and many State governments as the primary mechanism at the operational level to organize and provide assistance. ESFs align categories of resources and provide strategic objectives for their use. ESFs utilize standardized resource management concepts such as typing, inventorying, and tracking to facilitate the dispatch, deployment, and recovery of resources before, during, and after an incident.

Source: NRF Resource Center.

We want any emergency system to successfully and accurately operate whenever an emergency occurs. This involves three elements:

The system must have the necessary components to respond.
The system components must be configured to allow the system to achieve its goals.
The system must operate without error.

In this lesson we will address the first two of these factors which relates to the reliability of the network. Reliability is one of the most important design factors for emergency systems.

Reliability can be viewed as an economic or business issue. Low reliability means that the network may not be available to provide service when a user needs it. This may directly result in loss of revenue for the network operators as well as loss of future revenue in terms of customer satisfaction and loyalty. On the cost side, reliability is a network feature that must be consciously designed into a system. While any network design has a certain level of reliability, that level may not be high enough to supply the required availability for service. The amount of reliability needed is often a function of the applications that the network supports. We would typically expect a network for processing inventory requests in a retail store to require a lower level of reliability than a network that supports a critical care unit in a hospital.

Redundant Emergency Equipment

A typical definition for reliability (r) is the probability that a system will operate in the next instant. This probability is not directly assessable since it requires us to predict the future, so we usually estimate it from historical data for identical systems or through laboratory tests. If we assume that systems fail at a constant rate, the reliability of a system can be estimated as the percent of the systems which are operating at the end of the test. By definition, as a probability, the reliability of a device must be between zero and one.

0 <= r <= 1

A reliability of zero means that the device never works. A reliability of one means that the system is perfect - it never fails. Real devices have reliabilities that are less than one. It is not unusual for a highly reliable device to have a reliability greater than .9999 for example. In practice, steps such as periodic maintenance, diagnostic testing and regular component replacement are taken to improve the reliability of a system.

We can also define a failure probability (f) which is the likelihood of the system failing in the next instant. Applying the laws of probability, these are mutually exclusive and collectively exhaustive states and thus:

r + f = 1

The definition and use of probability given above is not particularly useful when dealing with Emergency applications as a whole. There are several reasons for this.

Emergency systems are typically so large and diverse that it is economically and logistically infeasible to test them.
It is unethical to create an emergency merely to test an emergency system.
An emergency response system implementation is usually unique. It is designed for a particular application, location, performance and environment. Thus, comparable data from similar emergency systems is unlikely to exist.
Citizens are seldom worried about the reliability of the emergency response system as a whole. They are only concerned with the reliability of that portion of the system which is used to service the their request.
Emergency response systems are loosely coupled combinations of links and processors. Portions of the emergency response system can fail without having a significant impact on the rest of the system.

For these and other reasons it is useful for us to have methods of estimation of system and/or subsystem reliability that develop reliability figures based on the reliabilities of testable system components. The following sections explore the development and use of reliability theory as it pertains to telecommunications systems.

Resources

Evaluating the Reliability of Emergency Response Systems for Large-Scale Incident Operations, Brian A. Jackson, Kay Sullivan Faith, Henry H. Willis, RAND Corporation, 2010.

Shari Welch, MD,FACEP, The Concept of Reliability in Emergency Medicine, American Journal of Medical Quality.

Michael O. Ball, Feng L. Lin, A Reliability Model Applied to Emergency Service Vehicle Location, Operations Research, Vol. 41, No. 1, January-February 1993, pp. 18-36.

Edward Mahinda, Brian Whitworth, EVALUATING FLEXIBILITY AND RELIABILITY IN EMERGENCY RESPONSE INFORMATION SYSTEMS

Brian A. Jackson, The Problem of Measuring Emergency Preparedness The Need for Assessing ''Response Reliability'' as Part of Homeland Security Planning, RAND Corporation Occasional Paper OP-234

Bastien Mainaud, Mariem Zekri, Hossam Afifi, Improving Routing Reliability on Wireless Sensors Network with Emergency Paths, pp. 545-550, The 28th International Conference on Distributed Computing Systems Workshops, 2008.

Jung, W.D.,Yoon, W.C.,Kim, J.W., Structured information analysis for human reliability analysis of emergency tasks in nuclear power plants, (2001).

Paul Sorensen, and Richard Church, Integrating expected coverage and local reliability for emergency medical services location problems, Socio-Economic Planning Sciences, Volume 44, Issue 1, March 2010, Pages 8-18

Walter W. Jones and Paul W. Reneke, HIGH RELIABILITY SAFETY SYSTEMS FOR EMERGENCY RESPONSE IN THE BUILDT ENVIRONMENT, Building and Fire Research Laboratory, National Institute of Standards and Technology.