Red Storm upgrade lifts Sandia supercomputer to 2nd in world, but 1st in scalability, say researchers

A $15 million upgrade to Sandia鈥檚 Red Storm computer has increased its peak speed from 41.5 to 124.4 teraflops in a computing terrain in which a single teraflop was a big deal only 6 years ago.
The machine, built by Cray Inc., is now rated second fastest in the world, with a Linpack speed of 101.4 teraflops. The widely recognized Linpack test measures a supercomputer鈥檚 speed as applied to a computing problem.
鈥淲hile not number one in speed, in terms of scalability, Red Storm is the best in the world,鈥 says Bill Camp, director of Sandia鈥檚 Computation, Computers, Information, and Math center.
Scalability refers to a supercomputer鈥檚 computational efficiency as the number of processors on a job is increased. 鈥淵ou want to use more processors to get large jobs done more quickly,鈥 says Camp, 鈥渂ut if the computer doesn鈥檛 scale well you can lose much of that speedup.鈥 Red Storm loses little efficiency on large numbers of processors.
鈥淭he Cray XT3 supercomputers now dominating the highest end of computing worldwide is based upon Sandia鈥檚 Red Storm,鈥 says Camp, who together with Sandia colleague Jim Tomkins, led the design of the machine. 鈥淪cientists love it because they can do bigger science more quickly on it than any other computer in existence, except for molecular dynamics studies on BlueGene/L (Lawrence Livermore National Lab's supercomputer). Otherwise, it鈥檚 the best thing since night baseball.鈥
鈥淭he machine鈥檚 also a computational workhorse. It gets the job done,鈥 says Sandia researcher Steve Attaway, a winner of several national computing awards who runs large engineering simulations on the machine.
Red Storm was designed under the National Nuclear Security Administration鈥檚 Advanced Simulation & Computing program and is used for NNSA鈥檚 stockpile stewardship program, which helps ensure that the U.S. nuclear weapons stockpile is safe and reliable without the resumption of underground nuclear testing. This supercomputer also runs computer codes used for conducting materials science simulations critical to national security. Sandia is an NNSA laboratory.
The Red Storm design became the basis for the Cray XT3鈩 massively parallel processor (MPP) supercomputer that has been installed at a number of prestigious supercomputing centers around the world.
Purchasers of this design include Oak Ridge National Laboratory, will create an even bigger supercomputer than Red Storm based on the same design, as well as Lawrence Berkeley Labs, Pittsburgh Supercomputer Center (which the largest National Science Foundation site), the U.S. Army, the United Kingdom鈥檚 AWE Atomic Weapons Establishment program, the national computing centers in Finland, Switzerland and the U.K., and other U.S. and allied government sites.
Red Storm is Sandia鈥檚 largest high-performance computer, but is thrifty in its use of power. It uses 2.2 megawatts, roughly half of other supercomputers of its class. This means that comparatively less of Red Storm鈥檚 energy is converted to useless heat.
Red Storm also takes up a relatively small area 鈥 about 3,500 square feet.
Its Linpack test demonstrated high reliability, repeatedly running for nine hours on over 26,000 processor cores without a failure.
The machine took less than three years to create from concept to customer shipment. It was relatively inexpensive to develop and build 鈥 $77.5 million including engineering and design costs 鈥 and is used for large scientific and technical problems.
Sandia developed the architectural specifications of the machine and did much of the software development. 鈥淭he hardware at Cray was built to meet our specifications,鈥 says Sandia Senior Scientist Jim Tomkins.
The upgrade included the addition of a fifth row of cabinets and upgrading the entire system with dual-core AMD Opteron TM processors, resulting in a supercomputer with over 26,000 processor cores. Dual-core technology fits two processor cores on a single die; doubling processing capacity with minimal impact on power consumption and temperature levels.
Why is Red Storm so efficient? In part, says Sandia researcher Robert Ballance, because its operating system is based on minimalist software 鈥 termed a lightweight kernel 鈥 which carries just enough functionality to load the job, put it on the network, and stop it. Any other software is job-specific; thus, each computer node (at which two chips are located) in effect lugs no useless software on its back.
The original technology was pioneered by Sandia on its ASCI Red machine, built by Intel Corporation, the world鈥檚 first terascale supercomputer.
Source: Sandia National Laboratories