About PADS
The cost of computing and storage, although decreasing, has not declined at a rate equal to the enormous growth in data. Small research groups lack the resources not only to store this burgeoning data but also to analyze the data. Increasingly, groups also want to enable access by remote users. Thus, there is a need for high-performance "active data stores" that use innovative software to integrate reliable storage, high-speed I/O systems, substantial computing, and high-speed network connections.
The project team, comprising both computer scientists and disciplinary scientists, will use NSF funds to acquire and operate a substantial data storage (~500 terabytes) and analysis system (9 teraflop/s), the Petascale Active Data Store (PADS). The team will leverage existing projects and the University of Chicago (UChicago) commitment of technical support to achieve six aims:
- Operate PADS as a data storage and analysis facility for the UChicago community and external collaborators that can both (a) store hundreds of terabytes of data and (b) respond rapidly to requests to retrieve or compute on substantial subsets of that data (thus, an active data store).
- Use PADS capabilities to achieve significant scientific advances in 12 projects at UChicago, from the physical, biological, and social sciences.
- Use PADS infrastructure, data, and applications to support innovative computer science research in data analysis, management, and visualization, and in multicore programming-and to produce new data management, analysis, and visualization tools for the NSF community.
- Create a vibrant interdisciplinary data analysis community, through workshops involving discipline scientists, computer scientists, and statisticians from the UChicago and elsewhere.
- Create and deliver over high-speed networks substantial scientific data and services to national communities, in astrophysics, biology, economics, psychology, and other disciplines.
- Operate an outreach program that will make PADS a vital educational resource for students at minority serving institutions with which the UChicago team has longstanding ties.
PADS is a petabyte (10ˆ15-byte)-scale online storage server capable of sustained multi-gigabyte/s I/O performance, tightly integrated with a 9 teraflop/s computing resource and multi-gigabit/s local and wide area networks. Its hardware and associated software will enable the reliable storage of, access to, and analysis of massive datasets by both local users and the national scientific community.
The PADS design results from a study of the storage and analysis requirements of participating groups in astrophysics and astronomy, computer science, economics, evolutionary and organismal biology, geosciences, high-energy physics, linguistics, materials science, neuroscience, psychology, and sociology. For these groups, PADS represents a significant opportunity to look at their data in new ways, enabling new scientific insights. The infrastructure also will encourage new collaborations across disciplines. PADS is also a vehicle for computer science research into active data store systems, and will provide rich data on which to investigate new techniques. Results will be made available as open source software.
You must give proper acknowledgement for any work done on or using the PADS computational cluster, data storage or networking by including the following citation in all publications and presentations:
This work has been performed using the PADS resource (National Science Foundation grant OCI-0821678) at the Computation Institute a joint institute of Argonne National Laboratory and the University of Chicago.