high performance computing on graphics processing units: hgpu.org

Posts

Nov, 9

Staggered fermions simulations on GPUs

We present our implementation of the RHMC algorithm for staggered fermions on Graphics Processing Units using the NVIDIA CUDA programming language. While previous studies exclusively deal with the Dirac matrix inversion problem, our code performs the complete MD trajectory on the GPU. After pointing out the main bottlenecks and how to circumvent them, we discuss […]

CUDA

Nov, 9

Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA’s Compute Unified Device Architecture (CUDA). This library, interfaced to the […]

CUDA

Nov, 9

Fast Histograms using Adaptive CUDA Streams

Histograms are widely used in medical imaging, network intrusion detection, packet analysis and other stream-based high throughput applications. However, while porting such software stacks to the GPU, the computation of the histogram is a typical bottleneck primarily due to the large impact on kernel speed by atomic operations. In this work, we propose a stream-based […]

CUDA

Nov, 9

Rank k Cholesky Up/Down-dating on the GPU: gpucholmodV0.2

In this note we briefly describe our Cholesky modification algorithm for streaming multiprocessor architectures. Our implementation is available in C++ with Matlab binding, using CUDA to utilise the graphics processing unit (GPU). Limited speed ups are possible due to the bandwidth bound nature of the problem. Furthermore, a complex dependency pattern must be obeyed, requiring […]

CUDA

Nov, 9

Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU

Graphics processing units (GPUs) are gaining widespread use in computationalchemistry and other scientific simulation contexts because of their hugeperformance advantages relative to conventional CPUs. However, the reliabilityof GPUs in error-intolerant applications is largely unproven. In particular, alack of error checking and correcting (ECC) capability in the memory subsystemsof graphics cards has been cited as a […]

CUDA

Nov, 9

Numerical modeling of gravitational wave sources accelerated by OpenCL

In this work, we make use of the OpenCL framework to accelerate an EMRI modeling application using the hardware accelerators – Cell BE and Tesla CUDA GPU. We describe these compute technologies and our parallelization approach in detail, present our performance results, and then compare them with those from our previous implementations based on the […]

OpenCL

Nov, 9

Parallel Graph Component Labelling with GPUs and CUDA

Graph component labelling, which is a subset of the general graph colouring problem, is a computationally expensive operation that is of importance in many applications and simulations. A number of data-parallel algorithmic variations to the component labelling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the […]

CUDA

Nov, 9

A molecular docking system using CUDA

A Molecular Docking System enables biologists to check whether two molecular models can be combined at a specific position and remain in their stable states by simulation. It can be used in developing new materials and designing new drugs. Since the docking simulation consists of several complicated computations at the level of atoms, it requires […]

Nov, 9

A framework for simulating and estimating the state and functional topology of complex dynamic geometric networks

We present a framework for simulating signal propagation in geometric networks (i.e. networks that can be mapped to geometric graphs in some space) and for developing algorithms that estimate (i.e. map) the state and functional topology of complex dynamic geometric net- works. Within the framework we define the key features typically present in such networks […]

CUDA

Nov, 9

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with […]

CUDA

Nov, 9

Fast Calculation of the Lomb-Scargle Periodogram Using Graphics Processing Units

I introduce a new code for fast calculation of the Lomb-Scargle periodogram, that leverages the computing power of graphics processing units (GPUs). After establishing a background to the newly emergent field of GPU computing, I discuss the code design and narrate the key parts of the source. Benchmarking calculations indicate no significant differences in accuracy […]

CUDA

Nov, 9

The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units

We present an algorithm named “Chamomile Scheme”. The scheme is fully optimized for calculating gravitational interactions on the latest programmable Graphics Processing Unit (GPU), NVIDIA GeForce8800GTX, which has (a) small but fast shared memories (16 K Bytes * 16) with no broadcasting mechanism and (b) floating point arithmetic hardware of 500 Gflop/s but only for […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Staggered fermions simulations on GPUs

Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Fast Histograms using Adaptive CUDA Streams

Rank k Cholesky Up/Down-dating on the GPU: gpucholmodV0.2

Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU

Numerical modeling of gravitational wave sources accelerated by OpenCL

Parallel Graph Component Labelling with GPUs and CUDA

A molecular docking system using CUDA

A framework for simulating and estimating the state and functional topology of complex dynamic geometric networks

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

Fast Calculation of the Lomb-Scargle Periodogram Using Graphics Processing Units

The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)