Archived Assets

The High Availability Puzzle

Creating a highly available environment requires more than just technology. Make sure you have all the pieces before it's too late.

Michael Otey

April 18, 2005

2 Min Read

Of all a DBA's missions, none is more important than ensuring that vital business services are available to end users. All of your high-end scalability hardware and modern .NET coding techniques will make little difference if users can't access data. Unplanned downtime for an application or the database server can cost an organization dearly in money and reputation. Outages for large online retailers or financial institutions can cost millions of dollars per hour, and when users can't access a site or its vital applications, the organization loses face and customer goodwill.

Microsoft and other enterprise database vendors have devised several high-availability technologies. For example, Microsoft Clustering Services lets one or more cluster nodes assume the work of any failed nodes. Log shipping and replication help organizations protect against both server and site failure by duplicating a database on a remote server. And traditional backup-and-restore technology protects against server and site failure as well as application-data corruption by periodically saving a database's data and log files so you can rebuild the database to a specified date and time. Although these technologies can help you create a highly available environment, by themselves they can go only so far. Technology alone can't address two critical pieces of the complex high-availability puzzle: the people and processes that touch your system.

Server and site failure can produce downtime, but they're relatively rare compared to human error. The mean time between failures (MTBF) for servers is high, and today's hardware, although not perfect, is usually reliable, making server failures uncommon. In contrast, users, operators, programmers, and administrators interact with your systems virtually all the time, and the high volume gives more chances for problems to arise. Thus, the ability to quickly and efficiently recover from human errors is essential for a highly available system. An operator error can take down a database or server in a few seconds, but recovery could take hours. However, with proper planning, you can reduce downtime due to human error by creating adequate application documentation and by ensuring that personnel receive proper training.

Processes are also critical for a highly available environment. Standardized operating procedures can help reduce unnecessary downtime and enable quicker recovery from planned and unplanned downtime. You need written procedures for performing routine operational tasks as well as documentation that covers the steps necessary to recover from various types of disasters. In addition, the DBA and operations staff should practice these recovery plans to verify their accuracy and effectiveness. Another process-related factor that can contribute to high availability is standardizing hardware and software configurations. Standardized hardware components simplify implementing system repairs and acquiring replacement components after a hardware failure. Standardized software configurations make routine operations simpler, reducing the possibility of operator error.

Creating a highly available environment requires more than just technology. Technology provides the foundation for a highly available environment. But true high availability combines platform capabilities, effective operating procedures, and appropriate training of everyone involved with the system.

About the Author

Michael Otey

See more from Michael Otey

Related Topics

Recent in Cloud

Related Topics

Recent in OS

Related Topics

Recent in IT Mgmt

Related Topics

Recent in Career

Related Topics

Recent in Storage

Related Topics

Recent in Security

Related Topics

Recent in Dev

Related Topics

Recent in DX

Related Topics

Recent in Infrastructure

Related Topics

The High Availability Puzzle

About the Author

Editor's Choice

Featured Technical Explainers

Recent What Is

Related Topics

Recent in Cloud

Related Topics

Recent in OS

Related Topics

Recent in IT Mgmt

Related Topics

Recent in Career

Related Topics

Recent in Storage

Related Topics

Recent in Security

Related Topics

Recent in Dev

Related Topics

Recent in DX

Related Topics

Recent in Infrastructure

Related Topics

<span class="ArticleBase-LargeTitle">The High Availability Puzzle</span>The High Availability Puzzle

About the Author

Editor's Choice

Featured Technical Explainers

Recent What Is

The High Availability Puzzle