The science of resilience aims to determine which variables an organization should pay attention to-and to what degree-in order to maintain a data center that resists failures, tolerates faults, and provides continuity of service to its customers. In February 2011, the European Network and Information Security Agency (ENISA) issued a report on its efforts to develop a framework for data network resilience for EU government services—a framework that would then be the model for data centers throughout Europe.
To ENISA’s surprise, the various stakeholders’ definitions of resilience did not, even closely, agree with one another. “Resilience was not considered to be a well-defined term and depending on the context, it encompassed several interpretations and viewpoints,” the ENISA report concluded. “Additionally, there was consensus on the fact that information sharing and sources of consolidated information on resilience metrics were not readily available. These challenges were recognized as serious obstacles to the adoption of resilience metrics.”
The Resilience Formula
In searching for a common framework that everyone could agree on, ENISA discovered a concept being implemented at the University of Kansas. There, Professor James P. G. Sterbenz, his colleagues and students had developed an architecture called ResiliNets. It utilizes a six-point strategy for assessing and attaining resilience, along with a charting mechanism that can be adapted to any organization.
As Professor Sterbenz explained, resilience under the ResiliNets system is a cost/benefit trade-off. “The more resources you’re willing to devote, the more resilient your system is going to be,” he says. The perfectly fault-resistant network may be unattainable. So an organization needs to determine the extent of tolerance their network can withstand, and develop policies and procedures to respond fully to events within that tolerance level. “In the extreme, the most resilient network possible is a full mesh,” he says. “That is, every pair of nodes and every pair of routers has a link between them. That is absurdly expensive, so that’s not cost-effective. So organizations have to make a trade-off.”

The ResiliNets formula, D2R2 + DR, stands for: Defend against threats and challenges to normal operation-both actively with people and passively with software and controls; detect when an adverse event has occurred; remediate the effects of the event in order to minimize its impact (Professor Sterbenz described a concept called graceful degradation, as a reasonable alternative to “crashing”); recover to normal operations; then Diagnose the root cause of the event; and finally refine future behavior as necessary.

Once an organization has integrated this formula into its everyday processes, the state of its network at any one time can be plotted on a simple chart-the Resilience State Space (Figure 2). On the chart, the horizontal axis represents the operational state of the network—what the network administrator sees. The vertical axis represents the level of service that the customer perceives from the network. Professor Sterbenz classifies both malicious threats and natural events, such as power outages and device failures, as challenges. When a challenge arises, the state point for the network on this chart will move to the right. The objective for resilience is to keep that point as close to the lower right corner as possible.
The Resilience Framework
The strategy conceived by the University of Kansas was intended to apply to networks, systems, devices, interconnects, and routers. Yet it does not take into account the other principal component of every organization’s network-its people. While a recent survey conducted by the Economist Intelligence Unit suggested that executives believe their organizations’ employees to be the least important stakeholders in cyber resilience, the survey also suggested broad complacency toward cyber threats within organizations, and may explain why employees were not granted due consideration.
Educating employees about cybersecurity and resilience needs to be a key part of any resilience strategy. This requires making resilience principles less mathematical and more contextual, tangible, and teachable. IBM has developed the Business Resiliency Framework (BRF), and its core principle is that the responsibility for resilience cuts across all departments and divisions of an organization equally.

While many companies designate individuals within each of their divisions as responsible for security—which may have seemed sensible in an earlier era, today’s data networks do not follow the same structure as yesterday’s organizational charts. This demands a rethink of how resilience is defined, approached and achieved by organizations. Linda Laun, who manages global consulting methods and the Resiliency Consulting Services portfolio development tools at IBM’s Business Continuity and Resiliency Services (BCRS) division, says that organizations need to designate a centralized governance authority—a “chief resilience officer”—or a centralized resilience authority to coordinate the implementation of a single set of policies across the entire organization.
Reality is Somewhere In Between
The ResiliNets formula and the BRF framework have their roots in two separate aspects of resilience management—network engineering and human resources, respectively. Both are very complex systems, and organizations have had difficulty wrapping their heads around very systematic approaches. Simpler diagrams and less jargon may address some of the challenges, but as security engineers in the field are suggesting, it is not just a question of translating the jargon. Resilience processes, such as risk assessment and management need to become practical, simple, everyday affairs, rather than ideas only addressed in annual or semi-annual reviews.
While risk management is an everyday part of business strategy, it remains oddly alien within IT departments. “Businesses are used to making risk decisions,” says Garry Sidaway, director of security strategy for Integralis, an IT and security consulting firm. But those principles tend to be lost when it comes to cybersecurity.