Statement of Kelly Croft
Deputy Commissioner for Systems,
before the Committee On Ways And Means Subcommittee On Social Security, and
the Committee On Transportation And Infrastructure Subcommittee On Economic Development, Public Buildings, And Emergency Management
February 11, 2011
Chairmen, Ranking Members, and Members of the Subcommittees:
Thank you for this opportunity to share information about our data center replacement project.
Many good things are happening at Social Security. Despite large workloads caused by the economic downturn, we have cut the wait for a disability hearing from a high of nearly 18 months in August 2008 to just over a year as of January 2011. We have significantly reduced busy signals and wait times for telephone service, and wait times in field offices are down slightly. Productivity, program integrity work, and employee satisfaction are all up. Our 20 highly regarded Internet applications give the public convenient access to our services while alleviating traffic in our offices.
None of these improvements would have been possible without smarter use of information technology (IT), and that technology relies on a smoothly functioning core computer system that is one of the largest in Government. It certifies the payment of more than $60 billion each month to over 50 million seniors and disabled people, money that is pumped into the national economy. Jobs and lives depend on Social Security doing its job without interruption.
For over 30 years, the National Computer Center (NCC) has housed our core computer systems. Many of the NCC’s facility infrastructure systems are well past their designed life cycle. Without a long-term replacement, the NCC will deteriorate to the point that a major failure to the building systems could jeopardize our ability to handle our increasing workloads without interruption. Recognizing the urgency of the situation, Congress provided us with $500 million in 2009 toward constructing and partially equipping a new data center.
We refer to the new data center as the National Support Center (NSC). Once complete, the NSC will meet our anticipated IT workloads for the next 20-plus years. Throughout the project to construct the NSC, we have worked attentively to ensure that the building will meet our requirements. We developed our requirements based on our expertise in data center design and operations and in consultation with outside experts such as the Uptime Institute and the Lawrence Berkeley National Laboratory. We also used best practices and lessons learned from our recently completed Second Support Center (SSC) in North Carolina.
Congress has provided the General Services Administration (GSA) the authority to lease, purchase, or build facilities for most Federal agencies, including our agency. GSA has broad-ranging experience exercising this authority, and we support its efforts on our behalf. Guided by the functional requirements, GSA recently selected a site for the NSC. As required by law, we notified the Committees on Appropriations of the House of Representatives and the Senate that GSA had selected a site for our new data center and would proceed with its process to purchase it following a 10-day notification period. GSA will also manage the design and construction of the building at the selected site.
Currently, GSA estimates that the NSC project is about a year behind its original schedule. The estimated date for construction completion is September 2014 and for final commissioning of the building is now January 2015. Complete IT migration could take as long as 18 months after commissioning. GSA has expressed that there may be future opportunities to make up lost time without cutting corners. We support GSA’s efforts to identify and capitalize on those opportunities.
As responsible managers, we have taken assertive action to ensure the continuity of our operations through extensive risk mitigation and disaster planning. These improvements should help keep the NCC viable through the point of transition to the NSC. The SSC, which assumed production workloads in May 2009, already serves as a co-processing center for a significant portion of the NCC’s workloads. In the event of an NCC failure, we can currently recover all critical workloads at the SSC within four days. Next year, we anticipate being able to reduce that recovery time to one day.
While our risk mitigation and disaster planning activities provide a bridge between now and the NSC’s completion, by no means do they eliminate the dire need for a new data center. Despite all of our best efforts to preserve the NCC for as long as necessary, there is always the potential that a critical facility infrastructure system could suddenly fail. While the SSC could serve as our sole data center in an emergency, there would then be no backup for the SSC. This scenario would place the Nation in an extremely vulnerable position. Social Security is too important to the national economy and to the lives of a huge number of Americans to rely on a single data center.
Background on the NCC
We designed and built the NCC in the 1970s. Industry standards and best practices for data centers, along with technology, have changed radically since then. Modern data centers now have redundant electrical and cooling systems to provide continuous IT operations during essential preventative maintenance activities on facility systems or in the event one system fails or needs replacement. The lack of redundancy in the NCC’s cooling and electrical systems complicates our efforts to maintain and preserve the building.
Until recently, simply getting additional power to our IT equipment in the NCC was one of our biggest problems. Out of necessity, we mitigated this problem by adding more electrical risers or pathways to deliver power to the data center. This improvement came at significant expense and required three data center shutdowns between 2009 and 2010.
Each time we need to add IT equipment, we increase the likelihood of needing more cooling. The computer room air conditioning and overall heating, ventilating, and air-conditioning (HVAC) systems on the data center floor may prove insufficient to accommodate those cooling needs in the future. Moreover, the NCC’s HVAC system is already well beyond its expected life cycle and is impossible to replace while keeping the data center running.
A related NCC design problem is that employee office spaces in other areas of the building share the same power lines and HVAC system as the data center. This design problem means that a potentially isolated issue in an area outside the data center, such as a minor receptacle overload at someone’s workstation, could temporarily shut down some power to the data center and HVAC system.
However, the biggest power concern at the NCC is the building’s Uninterruptable Power System (UPS). The UPS is not an off-the-shelf product; it was designed specifically for the building. It is critical to maintain clean, uninterrupted power to the data center at all times. While we have extended our service contract with the UPS maintenance vendor over the years, the vendor recently advised us that it could not guarantee repairs in the near future. The necessary parts are simply no longer available. If the UPS failed, we would have to bypass the system and deliver unconditioned power to the data center equipment, which could quite potentially damage the equipment. Replacing the UPS would require significant downtime at the NCC.
We face even more fundamental problems at the NCC, such as tangled and overcrowded telecommunications and electrical cables underneath the data center floor.
Figure 1: Under-Floor Cabling Conditions in the NCC Data Center
Tangled cables can block the under-floor airflow that cools our servers, and we cannot work on the cables safely without shutting down the affected systems. Similarly, troubleshooting problems is difficult when we cannot isolate cable pairs easily to determine whether problems exist in the cables or in the IT equipment. There is also an elevated risk of data corruption, because electro-magnetic interference from the electrical wires that are located too close to the telecommunication wires can distort data transmission.
Another basic threat to the data center is the NCC’s pipes for both the water supply and fire suppression systems. The pipes are original to the building. Many of the pipes are clogged and corroded. Failure of the pipes could result in extensive water damage.
Figure 2: Clogged and Corroded Piping in the NCC
A recent incident illustrates how a problem as basic as failing pipes can affect the data center’s operations. Last year, our facilities staff noticed water on the floor of one of the large battery rooms in the NCC. They quickly traced the source to a leaking water pipe in the room. Any water in close proximity to high-voltage batteries presents a serious hazard to the building and its personnel.
In order to fix the leak, plumbers needed to expose the pipe and cut off the water supply. Unfortunately, without redundant systems, cutting off the water supply to the pipe also required cutting off the water supply to the large air handling equipment that is responsible for cooling our computing space. Since the air handling equipment had to be turned off, we had to actually shut down a portion of our national computing operations while making the repairs.
Thankfully, we did not experience serious service disruptions because we managed to complete the repair in the early morning hours of a weekend. Nonetheless, just to fix a seemingly simple leak ultimately required extensive planning, work staging, and a major IT shutdown. If the leak had been caused by a pipe that burst in the middle of a business day, we would have experienced major IT service disruptions.
As the NCC has aged, we have continuously upgraded and repaired facility infrastructure systems to the best of our ability. Similar to how we maintain our homes, incremental improvements are an industry best practice for maintaining facility systems beyond their life cycle. We must incrementally repair these infrastructure systems, where possible, because we cannot totally replace them in the existing NCC. To replace them, we would have to shut down the building completely for an extended period of time.
Figure 3: Sample of Major Agency Workloads Affected by an NCC Shutdown
Following consultation with Congress, we concluded that building a new, state-of-the-art data center was the best way forward. In support of our efforts, in 2009, Congress provided us the funds to construct and partially equip the NSC. We have provided GSA with our requirements for the NSC’s design and operations.
NSC Project Update
There are several important milestones along the way to full completion of the NSC. GSA, working closely with a team from our agency and outside experts, has completed three of those milestones within the past six months. We are encouraged by the recent project developments, and continue to work with GSA to complete this key investment in a timely manner.
In August 2010, GSA achieved its first important milestone with completion of the Program of Requirements (POR). The objective of the POR is to provide the necessary business requirements, including space, power, cooling, and design guidance for a Design/Build (D/B) contractor to successfully engineer, design, construct, and deliver the NSC. We worked closely together with GSA to develop the POR in collaboration with the GSA contractor, Jacobs. We ensured that the POR met our technical requirements for data center design and operations.
GSA completed a second important project milestone in January 2011 when it posted online the first of two Requests for Proposal (RFP) from D/B contractors. The first RFP requested that interested D/B contractors provide information concerning their prior experience on relevant projects, past performance on relevant projects, and project team qualifications and approach. The second RFP, which GSA plans to issue in April 2011, will request that qualified D/B contractors provide design developments, project management and delivery plans, oral presentations, project labor agreements, and pricing.
The third major project milestone completed within the past six months is site selection. GSA recently informed us that it selected a site for the NSC. We accept GSA’s site selection decision because the selected site meets our functional requirements. Those published requirements include that the site:
- Is contiguous and within 40 miles of our headquarters in Woodlawn, Maryland;
- Provides geometry and topography suitable for development;
- Has no known landfills or hazardous waste, soil, or water contamination, on or near the site, for which cleanup would significantly impact project cost or schedule;
- Has developable area that is not located within the 100- or 500-year flood plain or does not have other geological or environmental impairments;
- Has reasonable access to electrical power, water, telephone, satellite and fiber optics; and
- Shall not significantly affect the project schedule if assemblage of multiple sites is required.
We look forward to the timely completion of the remaining milestones on GSA’s revised project timeline. Following the issuance of the second RFP in April, GSA anticipates purchasing the selected site in June 2011. It also plans to award the D/B contract in January 2012 and finish constructing the NSC in September 2014. GSA estimates final building commissioning by January 2015, at which point it may take up to 18 months for us to migrate all of our IT from the NCC to the NSC.
Risk Mitigation and Disaster Preparedness Plans
Social Security has a history of providing excellent service to the public. Over our 75-year history, we have demonstrated that we meet challenges, including emergencies such as Hurricane Katrina when, within days of the storm, we established makeshift offices at evacuation centers and shelters along the Gulf Coast to issue payments to our beneficiaries.
Our engineers and technical staff maintaining the NCC are no exception to this proud tradition of service. They work every day to manage and maintain the building. They diligently continue to explore ideas and opportunities to overcome or work around the limitations of the NCC’s infrastructure systems.
In 2007, we commissioned an independent study to examine the condition of the NCC and identify options to accommodate and support our data processing operations into the future. The study found that we had managed and executed our NCC maintenance practices in an excellent manner and that we would need to continue these efforts to sustain the facility. The study recommended that we build or lease a new data center on an accelerated schedule.
To help us sustain the NCC until the new data center would be operational, the study identified specific areas of risk for us to address. Where economically and operationally feasible, we have implemented most of the study’s recommendations.
For example, we replaced deteriorating electrical feeder cables, which deliver power into the NCC from outside, and repaired or replaced roofs and lightning protection grids. As mentioned earlier, we increased electrical distribution capacity to the data center by adding electrical risers. We also procured available spare parts needed to maintain and repair our UPS and worked with our UPS maintenance vendor to extend our service contract. When the current UPS contract expires in 2012, we will negotiate a new agreement with our vendor through 2015; however, the vendor recently informed us that it would be able to provide only a “best effort” extension from 2015 to 2018. Accordingly, we have already begun discussing options for obtaining and installing UPS support to the data center in the event of a catastrophic failure. Those options may include extreme measures like machining spare parts not otherwise commercially available, bringing in a portable UPS to provide power, or replacing the entire UPS.
Of course, we know there are limitations to a comprehensive and aggressive program of preventative maintenance and disaster preparedness at the NCC. Many of the building’s primary infrastructure systems such as the HVAC and plumbing need replacement but are impractical to replace while the NCC is active.
Realizing that we will have to rely on the NCC for at least the next 5 years, we will do what we can to extend the life of the building. Therefore, we are working with GSA to complete a Building Engineering Report and a feasibility study to provide an updated assessment of the NCC facility systems and structure. Specifically, we will work with GSA in:
- Evaluating the current condition of all major systems within the NCC;
- Reevaluating all of the prior study recommendations for the NCC;
- Determining how to maintain the NCC as a viable and functional data center for critical IT equipment through 2020, including a project schedule and cost information on any recommendations; and
- Formulating methods to perform necessary renovations or repairs with little or no downtime at the NCC until we migrate the data center’s critical IT equipment to the NSC.
We remain hopeful that our on-going best efforts to extend the life of the NCC provide us a bridge between now and the NSC’s completion. Still, we know that the longer we must rely on the more than 30-year-old structure, the higher the risk is that a significant building failure may require us to move all of our IT operations to the SSC.
The SSC building opened two years ago and it has grown to become a fully functioning data center that shares a portion of our daily IT processing load with our aging NCC. The two facilities work in tandem. Both serve as primary computing locations for important IT functions and both have the reserve computing capacity to recover all the critical functions from the other site in the event of a disaster.
Before the SSC came online, our disaster recovery strategy relied on the use of a commercial data center. There were many weaknesses with the old strategy. It assumed we would always be able to occupy the commercial site even though we shared it with other organizations on a first-come, first- serve basis. The commercial site was not large enough to handle our operation at reasonable performance levels, and it lacked specific technology that we required. Even if everything went well at the commercial site, our recovery time was at least a week.
However, even one day of potential IT service outage would cause a major disruption to our customers and cost approximately $25 million in lost agency productivity. In addition, if we lost use of the NCC before completing the NSC, we would be forced to revert to an unsatisfactory reliance on a commercial facility as a backup.
Completion of the NSC will allow us to move all IT activities out of the existing NCC and will add significant stability to our IT enterprise since both the NSC and SSC would be modern, efficient, and well-designed facilities. We will still face IT disaster risks, but not as much from a major facility failure, which is much more likely to occur in an older building like the NCC.
The investment Congress has made in the construction of the NSC is a good decision for the country. The American people and the overall economy depend on the benefits we provide. Securing the continuity of our agency’s operations and the very fabric of this country’s safety net require that we replace the NCC as quickly as possible.
We continue our work to provide the necessary support and input to GSA so that it can effectively manage the NSC site procurement, design, and construction processes. In keeping with our stewardship responsibilities, we have taken assertive action in the area of risk mitigation and disaster preparedness. We will continue to pursue creative preservation of the NCC. We will also use the SSC’s capabilities to assist the NCC and take over in the event of a failure at the NCC. However, these are only temporary solutions. Any outcome other than the completion of the NSC will result in increasing risk of significant disruption in the delivery of Social Security services.