## Posts

Nov, 9

### An Exploration of OpenCL for a Numerical Relativity Application

Currently there is considerable interest in making use of many-core processor architectures, such as Nvidia and AMD graphics processing units (GPUs) for scientific computing. In this work we explore the use of the Open Computing Language (OpenCL) for a typical Numerical Relativity application: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver […]

Nov, 9

### Multi GPU Performance of Conjugate Gradient Algorithm with Staggered Fermions

We report results of the performance test of GPUs obtained using the conjugate gradient (CG) algorithm for staggered fermions on the MILC fine lattice ($28^3 times 96$). We use GPUs of nVIDIA GTX 295 model for the test. When we turn off the MPI communication and use only a single GPU, the performance is 35 […]

Nov, 9

### SU(2) Lattice QCD Simulations on Fermi GPUs

In this work we explore the performance of CUDA in lattice SU(2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analysis with multiple GPUs and […]

Nov, 9

### Implementation of the Neuberger-Dirac operator on GPUs

Recent developments have shown that a lot can be gained for QCD simulations from GPU hardware. This can be exploited especially in the case of Ginsparg-Wilson fermions when the com putational costs are particularly high. In this work, we use the Neuberger-Dirac operator as our realisation of Ginsparg-Wilson fermions, which greatly facilitate lattice investigations of […]

Nov, 9

### Staggered fermions simulations on GPUs

We present our implementation of the RHMC algorithm for staggered fermions on Graphics Processing Units using the NVIDIA CUDA programming language. While previous studies exclusively deal with the Dirac matrix inversion problem, our code performs the complete MD trajectory on the GPU. After pointing out the main bottlenecks and how to circumvent them, we discuss […]

Nov, 9

### Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA’s Compute Unified Device Architecture (CUDA). This library, interfaced to the […]

Nov, 9

### Fast Histograms using Adaptive CUDA Streams

Histograms are widely used in medical imaging, network intrusion detection, packet analysis and other stream-based high throughput applications. However, while porting such software stacks to the GPU, the computation of the histogram is a typical bottleneck primarily due to the large impact on kernel speed by atomic operations. In this work, we propose a stream-based […]

Nov, 9

### Rank k Cholesky Up/Down-dating on the GPU: gpucholmodV0.2

In this note we briefly describe our Cholesky modification algorithm for streaming multiprocessor architectures. Our implementation is available in C++ with Matlab binding, using CUDA to utilise the graphics processing unit (GPU). Limited speed ups are possible due to the bandwidth bound nature of the problem. Furthermore, a complex dependency pattern must be obeyed, requiring […]

Nov, 9

### Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU

Graphics processing units (GPUs) are gaining widespread use in computationalchemistry and other scientific simulation contexts because of their hugeperformance advantages relative to conventional CPUs. However, the reliabilityof GPUs in error-intolerant applications is largely unproven. In particular, alack of error checking and correcting (ECC) capability in the memory subsystemsof graphics cards has been cited as a […]

Nov, 9

### Numerical modeling of gravitational wave sources accelerated by OpenCL

In this work, we make use of the OpenCL framework to accelerate an EMRI modeling application using the hardware accelerators – Cell BE and Tesla CUDA GPU. We describe these compute technologies and our parallelization approach in detail, present our performance results, and then compare them with those from our previous implementations based on the […]

Nov, 9

### Parallel Graph Component Labelling with GPUs and CUDA

Graph component labelling, which is a subset of the general graph colouring problem, is a computationally expensive operation that is of importance in many applications and simulations. A number of data-parallel algorithmic variations to the component labelling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the […]

Nov, 9

### A molecular docking system using CUDA

A Molecular Docking System enables biologists to check whether two molecular models can be combined at a specific position and remain in their stable states by simulation. It can be used in developing new materials and designing new drugs. Since the docking simulation consists of several complicated computations at the level of atoms, it requires […]