Computer scientists claim world data sorting record for second year
(麻豆淫院Org.com) -- Not content to rest on their laurels, a team of data center researchers from the Center for Networked Systems (CNS) at the University of California, San Diego recently broke two of their own world records. They also set world records in three other categories, including one for their TritonSort-MR system sorting a terabyte (one trillion bytes) of data in 106 seconds.
The competition that they entered, the Sort Benchmark, is the Formula One World Championship and Daytona 500 rolled into one for the world of large-scale data processing world. It attracts competitors from academic and industry labs all over the world, who vie to implement ever-faster data center designs.
鈥淭he competition provides excellent feedback on the team鈥檚 progress and gives them focus,鈥 said CNS assistant research scientist George Porter. 鈥淭he Sort Benchmark is like an annual reality check that gives us this objective standard by which we can validate how well we鈥檙e doing.鈥 In addition to Dr. Porter, the CNS team included Center Director Amin Vahdat, and Ph.D. students Alex Rasmussen and Michael Conley from the Computer Science and Engineering department of the UCSD Jacobs School of Engineering.
The TritonSort-MR compute cluster is housed in the UCSD division of the California Institute for Telecommunications and Information Technology (Calit2), a close partner of CNS on the La Jolla campus.
Since 1994 the competition has spurred creativity in the realm of data sorting speed, and the number of applications demanding fast data sorting has increased exponentially 鈥 making the need for innovation more pressing each year. Massive data centers support processes like searching for tagged pictures of friends on Facebook, checking an order history with Amazon, or typing a term into a search engine. As data centers become faster in retrieving records, the more data-sorting applications can practicably be developed.
Get free science updates with Science X Daily and Weekly Newsletters 鈥 to customize your preferences!
This expansion in the use and ubiquity of data centers has resulted in a concomitant explosion in capital expenditures for the enterprises that use them: data centers are expensive to equip, maintain, house, cool and power. Moreover, large-scale data processing tasks remain a significant bottleneck in the efficiency of data center activities. Rather than wait for hardware designers to come up with new equipment, data center architects are looking for better ways to use the equipment that currently exists on the market to achieve new goals in speed and efficiency.
In 2010 the CNS group won in the 鈥淚ndy鈥 category for the 鈥淕ray鈥 and 鈥淢inutesort鈥 categories, racing to sort 1 TB of data as quickly as possible, and as much data as possible in a single minute, respectively. The 鈥淚ndy鈥 category exists only for this competition, so designing a system to compete here is comparable to constructing a racing vehicle that can only be driven on a track. But building on their successful foray in 2010, the team decided to take their game to a new level by adjusting their system to compete in the 鈥淒aytona,鈥 or general purpose, category as well.聽
Rasmussen says the team had unfinished business from the previous competition. 鈥淲hen we set the record the first time, we had only just gotten TritonSort to go as fast as we thought it could go,鈥 noted the Ph.D. student. 鈥淏ut there were a lot of questions about the system鈥檚 performance that we just didn鈥檛 have answers to.鈥
鈥淭he key to the TritonSort-MR design is seeking an efficient use of resources, and to build balanced systems,鈥 added Porter. 鈥淲e made some improvements on the data structures and algorithms, basically to make it a lot more efficient in terms of sending records across the network.鈥
With the modifications, 鈥淒aytona鈥 was successful, and the modifications also allowed the team to upgrade the original specialized system built to compete in the 鈥淚ndy鈥 category.聽 Showing impressive improvements in performance, the team submitted for and won both categories in the 鈥淕ray鈥 and 鈥淢inutesort鈥 competitions.
Beyond the achievement of speed, TritonSort-MR also proved remarkable for its efficiency: while the second-place team used 3,500 nodes to achieve their result, the TritonSort-MR team used only 52. If implemented in a real-world data center, TritonSort-MR would therefore allow a company to sort data more quickly, while only making one-seventh of the investment in equipment, space, and energy costs for cooling and operation.
While winning in these four categories exceeded the team鈥檚 original goals from 2010, they found themselves intrigued by a new category on offer in 2011. The 鈥100 Terabyte Joulesort鈥 competition challenges teams to build systems that can sort the greatest number of data records, while consuming no more than one joule of energy. (By way of illustration, it takes roughly one million joules to watch TV for an hour.) The introduction of this new category reflects the recognition of an increasingly dire challenge facing industry in trying to solve data-intensive computing problems: energy usage. A primary reason why data centers are expensive to operate is the staggering scale of their energy consumption. Any design that increases energy efficiency would have a positive and much needed impact on both the environment and on a company鈥檚 bottom line.
Though intrigued by this new opportunity, the team was skeptical at first that they could compete in the Joulesort arena. 鈥淭ypically when you look at systems that set records like this, they鈥檙e all built out of these incredibly energy efficient pieces,鈥 said Alex Rasmussen. 鈥淏ut you鈥檇 never see this equipment deployed in an actual data center setting [because of its high cost].鈥
The TritonSort-MR team, on the other hand, was focused on making a system of direct applicability to enterprises with real-world needs and resources, rather than breaking a record for its own sake. This is reflected, said Rasmussen, in the type of equipment the TritonSort-MR team employs for its system. 鈥淭he stuff that we鈥檙e using is kind of commodity server stuff,鈥 he said. 鈥淲e鈥檝e got machines from HP that are a year and a half old,鈥 with multi-core processors and a Cisco Nexus 5596 switch. As an additional challenge to the efficiency of the design, the team elected not to customize their system for energy optimization. Despite placing these limitations upon themselves, the TritonSort-MR group won the Joulesort category handily 鈥 proving that the CNS solution was both fast and remarkably energy efficient.
Provided by University of California - San Diego