Enabling active storage on parallel I/O software stacks

hgpu.org » Applications » Computer science » Enabling active storage on parallel I/O software stacks

Enabling active storage on parallel I/O software stacks

Seung Woo Son, Samuel Lang, Philip Carns, Robert Ross, Rajeev Thakur, Berkin Ozisikyilmaz, Prabhat Kumar, Wei-Keng Liao, Alok Choudhary

Mathematics and Computer Science Division, Argonne National Laboratory

IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010

DOI:10.1109/MSST.2010.5496981

@inproceedings{son2010enabling,

title={Enabling active storage on parallel I/O software stacks},

author={Son, S.W. and Lang, S. and Carns, P. and Ross, R. and Thakur, R. and Ozisikyilmaz, B. and Kumar, P. and Liao, W.K. and Choudhary, A.},

booktitle={Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on},

pages={1–12},

year={2010},

organization={IEEE}

}

Download (PDF)

View

Source

1811

views

As data sizes continue to increase, the concept of active storage is well fitted for many data analysis kernels. Nevertheless, while this concept has been investigated and deployed in a number of forms, enabling it from the parallel I/O software stack has been largely unexplored. In this paper, we propose and evaluate an active storage system that allows data analysis, mining, and statistical operations to be executed from within a parallel I/O interface. In our proposed scheme, common analysis kernels are embedded in parallel file systems. We expose the semantics of these kernels to parallel file systems through an enhanced runtime interface so that execution of embedded kernels is possible on the server. In order to allow complete server-side operations without file format or layout manipulation, our scheme adjusts the file I/O buffer to the computational unit boundary on the fly. Our scheme also uses server-side collective communication primitives for reduction and aggregation using interserver communication. We have implemented a prototype of our active storage system and demonstrate its benefits using four data analysis benchmarks. Our experimental results show that our proposed system improves the overall performance of all four benchmarks by 50.9% on average and that the compute-intensive portion of the k-means clustering kernel can be improved by 58.4% through GPU offloading when executed with a larger computational load. We also show that our scheme consistently outperforms the traditional storage model with a wide variety of input dataset sizes, number of nodes, and computational loads.

Tags: Benchmarking, Computer science, CUDA, GPU cluster, MPI, nVidia, Tesla C1060

August 6, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org