|
Case Studies> Designing Highly Available Voice Networks
When most of us think of reliability, we think dependable. We think "it will be there when you need it". This is not reliability. Reliability is a function of the observed mean time between failures (MTBF) in a laboratory setting. Further, reliability is typically not measured in terms of a complete system; it is usually measured in terms of system components. For example, TDM PBXs have been advertised as being "5 9s" reliable, but the line cards that support all the phones have a level of measured reliability that is much lower. In other words, you can have a voice system with a failure that has taken down all phones, yet it would still be considered reliable if its processors were functional. Reliability also does not include external factors, like power, wiring, and availability of the phone services from your local provider. Lastly, reliability does not include planned down-time for maintenance and upgrades.
Instead of reliability, most organizations are really interested in availability. Availability determines if your call is able to be completed, or if it will fail (due to a technical problem). By examining availability, we are able to design a solution that truly accomplishes what we want - for the system to work when we want to use it.
While this paper focuses on designing highly available voice networks, many of the same concepts can be applied to data, mobile radio, and SCADA networks. The items that follow are a comprehensive list of items to be considered in building highly available networks. Availability requirements company for different organizations. Each organization will need to make educated decisions on what level of availability is required, and if there is sufficient value in a given solution.
Power Requirements
Power is the first item to examine. Without power, nothing else will work (obviously). A UPS or battery system is necessary to prevent temporary outages. Often these systems will provide protection from other power events like spikes and surges. Generators should also be considered to provide power in sustained outages. Generators, UPSs, and battery systems should always be sufficiently rated for the load they will carry. They should be monitored and maintained. One of the common mistakes is to install a power backup system, and never perform maintenance. Components of that system may not work as designed within a year, due to chemical breakdown of energy storage. If you are unable to commit to a maintenance program, it is better to forego a power backup system and concede availability during power events.
Electricians and you power company may also be able to enhance your facility. Your power company may be able to deliver diverse power feeds. Electricians may be able to recommend grounding and protection equipment.
Phone Systems
TDM PBXs have extremely stable operating systems. IP Telephony servers are generally built on servers running much less reliable operating systems. But IP Telephony systems almost always have the following features, which enhance availability:
- Clustering - IP Telephony systems are almost always made up of more than one server. The complete failure of one server may have little or no effect on system operations. Servers can be located in geographically diverse locations (check vendor specifications) avoiding failures caused by a local problem, like a faulty power outlet for example. This is also an advantage when performing system upgrades.
- Backups - IP Telephony systems may use RAID disk systems, and automatic backup software. TDM PBXs are almost always built using proprietary disk systems, and have a manual tape backup system.
- Secure Remote Access - One would think that TDM PBXs would be more secure than IP systems, but this is not necessarily the case. TDM PBXs generally have one type of remote access, dial in. TDM PBXs may have only one username and one shared password, and no process for changing these periodically to keep out past staff. IP Telephony systems can exploit all of the policies created for your data network, plus additional security added just for IP Telephony servers. IP Telephony systems can provide greater security for remote access.
- Residual Benefits - Availability enhancements made to your data network will provide benefits to your voice network.
For those who are planning to migrate to IP Telephony, it may be best to buy an IP Phone system and gradually replace phones. By doing this, you will have two phone systems, which will provide an additional level of risk mitigation.
Wiring
It is common to overlook wiring as a threat to system availability. Unfortunately, many have "home run" wiring from their PBX to their phone company demarc or from their PBX to their wiring closets. Localized disasters may disconnect the wiring. Consider having diverse wiring, or have a wireless system on hand for emergencies.
Phones
A power event may disable multiple phones, even with power protection devices in place. Keeping a number of spares on hand will greatly reduce downtime.
Telephone Company Facilities
With the increased availability of facilities based competitive local exchange carriers (CLEC) it is now may be possible to have local phone service delivered by two different companies. Simply selecting service from a CLEC is not enough though, be sure that the CLEC is not selling UNE-P service (unbundled network element-platform) which is just a pass through of Incumbent Local Exchange Carrier (ILEC) Services. While UNE-P offerings might save money, they will not offer an increase in availability.
Even if you are unable to obtain a facilities based CLEC as a secondary provider, you will want to consider the value of diverse paths, POPs (Point of Presence), and demarcs. Communications facilities are often delivered to a building over a single group of copper cables. A construction crew may accidentally cut through your services (this is not entirely uncommon). AT&T may be able to offer wireless access to your facility. This greatly enhances reliability.
Lastly, it is a good idea to have lines PICed (Primary Interexchange Carrier) to different long distance providers, or use a dial around code. While there is a minute chance that a long distance company will be unable to provide service on global scale for even a short period of time, maintenance and failures at your local phone company may temporarily block access to your long distance provider. It is generally best to use a 10-10 code to accomplish this (this can be programmed into the phone system if necessary) since it allows the greatest flexibility, and prevents you from having to manage multiple bills.
Satellite and Cellular
End users who are unable to complete a dialed call are likely to try the same call on their cellular phone, if they have reception. Cellular companies and equipment distributors will generally have equipment that con be mounted on the outside of your facility, and deliver lines to your phone system. Satellite phone companies offer the same type of equipment. Surprisingly, this equipment is relatively inexpensive. Unfortunately, usage rates for cellular and satellite can be extremely high. Effective classification of user phones will increase the likelihood of your most important calls being completed.
Private Networks
By owning your own facilities - fiber, copper, digital microwave, etc. - you are able to control the availability of your communications infrastructure to a greater degree. For sure, this can be an extremely expensive undertaking, but in addition to greater control of availability, you are also able to control costs. This is only applicable to organizations with multiple locations.
Maintenance and Change Controls
Regardless of what steps are taken to increase voice network availability, it is absolutely necessary that periodic maintenance be performed to ensure systems are working as designed. Change controls are necessary to prevent unauthorized changes which will defeat or undue steps taken to enhance availability.
Summary
While high availability is often described as the holy grail of technology departments, it is important to assess requirements and weigh costs and benefits. Losing sight of the level of protection required will result in high expenditures that may produce little or no value. A competent technology consultant will be able to help identify and prioritize needs. By making educated decisions, you'll have confidence in the availability of your voice network.
|