Large-scale Nanostructure Simulations from X-ray Scattering Data On Graphics Processor Clusters

hgpu.org » Programming » Algorithms » Large-scale Nanostructure Simulations from X-ray Scattering Data On Graphics Processor Clusters

Large-scale Nanostructure Simulations from X-ray Scattering Data On Graphics Processor Clusters

Abhinav Sarje, Jack Pien, Xiaoye S. Li, Elaine Chan, Slim Chourou, Alexander Hexemer, Arthur Scholz and Edward Kramer

Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720

Lawrence Berkeley National Laboratory, 2012

@article{sarje2012large,

title={Large-scale Nanostructure Simulations from X-ray Scattering Data On Graphics Processor Clusters},

author={Sarje, A.},

year={2012}

}

Download (PDF)

View

Source

2100

views

X-ray scattering is a valuable tool for measuring the structural properties of materials used in the design and fabrication of energy-relevant nanodevices (e.g., photovoltaic, energy storage, battery, fuel, and carbon capture and sequestration devices) that are key to the reduction of carbon emissions. Although today’s ultra-fast X-ray scattering detectors can provide tremendous information on the structural properties of materials, a primary challenge remains in the analyses of the resulting data. We are developing novel high-performance computing algorithms, codes, and software tools for the analyses of X-ray scattering data. In this paper we describe two such HPC algorithm advances. Firstly, we have implemented a exible and highly efficient Grazing Incidence Small Angle Scattering (GISAXS) simulation code based on the Distorted Wave Born Approximation (DWBA) theory with C++/CUDA/MPI on a cluster of GPUs. Our code can compute the scattered light intensity from any given sample in all directions of space; thus allowing full construction of the GISAXS pattern. Preliminary tests on a single GPU show speedups over 125x compared to the sequential code, and almost linear speedup when executing across a GPU cluster with 42 nodes, resulting in an additional 40x speedup compared to using one GPU node. Secondly, for the structural fitting problems in inverse modeling, we have implemented a Reverse Monte Carlo simulation algorithm with C++/CUDA using one GPU. Since there are large numbers of parameters for fitting in the in X-ray scattering simulation model, the earlier single CPU code required weeks of runtime. Deploying the AccelerEyes Jacket/Matlab wrapper to use GPU gave around 100x speedup over the pure CPU code. Our further C++/CUDA optimization delivered an additional 9x speedup.

Tags: Algorithms, CUDA, GPU cluster, Monte Carlo simulation, MPI, nVidia, Optimization, Physics, Tesla C2050, Tesla M2090

May 30, 2012 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org