Posts
Nov, 9
The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units
We present an algorithm named “Chamomile Scheme”. The scheme is fully optimized for calculating gravitational interactions on the latest programmable Graphics Processing Unit (GPU), NVIDIA GeForce8800GTX, which has (a) small but fast shared memories (16 K Bytes * 16) with no broadcasting mechanism and (b) floating point arithmetic hardware of 500 Gflop/s but only for […]
Nov, 9
Spherical harmonic transform with GPUs
We describe an algorithm for computing an inverse spherical harmonic transform suitable for graphic processing units (GPU). We use CUDA and base our implementation on a Fortran90 routine included in a publicly available parallel package, S2HAT. We focus our attention on the two major sequential steps involved in the transforms computation, retaining the efficient parallel […]
Nov, 9
Nodal Discontinuous Galerkin Methods on Graphics Processors
Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. Lately, another property of DG has been growing in importance: The majority of a DG operator is applied […]
Nov, 9
SIML: A Fast SIMD Algorithm for Calculating LINGO Chemical Similarities on GPUs and CPUs
LINGOs are a holographic measure of chemical similarity based on text comparison of SMILES strings. We present a new algorithm for calculating LINGO similarities amenable to parallelization on SIMD architectures (such as GPUs and vector units of modern CPUs). We show that it is nearly 3x as fast as existing algorithms on a CPU, and […]
Nov, 9
An exploration of CUDA and CBEA for a gravitational wave source-modelling application
In this paper, we accelerate a gravitational physics numerical modelling application using hardware accelerators — Cell processor and Tesla CUDA GPU. We describe these new technologies and our approach in detail, and then present our final performance results. We obtain well over an order-of-magnitude performance gain in our application by making use of these many-core […]
Nov, 9
Accelerating Scientific Computations with Mixed Precision Algorithms
On modern architectures, the performance of 32-bit operations is often atleast twice as fast as the performance of 64-bit operations. By using acombination of 32-bit and 64-bit floating point arithmetic, the performance ofmany dense and sparse linear algebra algorithms can be significantly enhancedwhile maintaining the 64-bit accuracy of the resulting solution. The approachpresented here can […]
Nov, 9
Teraflop per second gravitational lensing ray-shooting using graphics processing units
Gravitational lensing calculation using a direct inverse ray-shooting approach is a computationally expensive way to determine magnification maps, caustic patterns, and light-curves (e.g. as a function of source profile and size). However, as an easily parallelisable calculation, gravitational ray-shooting can be accelerated using programmable graphics processing units (GPUs). We present our implementation of inverse ray-shooting […]
Nov, 9
High Performance Direct Gravitational N-body Simulations on Graphics Processing Units: An implementation in CUDA (thesis)
At the end of 2006 NVIDIA introduced a new generation of graphical processing units (GPUs) (the so called G80 architecture). These GPUs are more powerful than any of the GPUs released before; they offer up to 350 billion floating-point operations per second (GFLOP/s) in certain situations. With the introduction of this hardware NVIDIA released a […]
Nov, 9
High Performance Direct Gravitational N-body Simulations on Graphics Processing Units
We present the results of gravitational direct $N$-body simulations using the commercial graphics processing units (GPU) NVIDIA Quadro FX1400 and GeForce 8800GTX, and compare the results with GRAPE-6Af special purpose hardware. The force evaluation of the $N$-body problem was implemented in Cg using the GPU directly to speed-up the calculations. The integration of the equations […]
Nov, 9
Running the NIM Next-Generation Weather Model on GPUs
We are using GPUs to run a new weather model being developed at NOAA’s Earth System Research Laboratory (ESRL). The parallelization approach is to run the entire model on the GPU and only rely on the CPU for model initialization, I/O, and inter-processor communications. We have written a compiler to convert Fortran into CUDA, and […]
Nov, 9
Nonlinear optimization with a massively parallel Evolution Strategy-Pattern Search algorithm on graphics hardware
This paper presents a massively parallel Evolution Strategy-Pattern Search Optimization (ES-PS) algorithm with graphics hardware acceleration on bound constrained nonlinear continuous optimization problems. The algorithm was specifically designed for a graphic processing unit (GPU) hardware platform featuring ‘Single Instruction Multiple Thread’ (SIMT). Evolution Strategy is a population-based evolutionary algorithm for solving complex optimization problems. GPU […]
Nov, 9
Deployment of parallel linear genetic programming using GPUs on PC and video game console platforms
We present a general method for deploying parallel linear genetic programming (LGP) to the PC and Xbox 360 video game console by using a publicly available common framework for the devices called XNA (for “XNA’s Not Acronymed”). By constructing the LGP within this framework, we effectively produce an LGP “game” for PC and XBox 360 […]