high performance computing on graphics processing units: hgpu.org

Posts

Feb, 16

Using Graphics Processors to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation

This paper describes the use of graphics processors to accelerate the backpropagation method of forming images in Synthetic Aperture Sonar (SAS) systems. SAS systems coherently process multiple pulses to provide a higher level of detail in the resolved image than is otherwise possible with a single pulse. Several models are available to resolve an image […]

CUDA

Feb, 16

An experimental study on performance portability of OpenCL kernels

Accelerator processors allow energy-efficient computation at high performance, especially for computationintensive applications. There exists a plethora of different accelerator architectures, such as GPUs and the Cell Broadband Engine. Each accelerator has its own programming language, but the recently introduced OpenCL language unifies accelerator programming languages. Hereby, OpenCL achieves functional protability, allowing to reduce the development […]

OpenCL

Feb, 16

Multi-agent traffic simulation with CUDA

Today’s graphics processing units (GPU) have tremendous resources when it comes to raw computing power. The simulation of large groups of agents in transport simulation has a huge demand of computation time. Therefore it seems reasonable to try to harvest this computing power for traffic simulation. Unfortunately simulating a network of traffic is inherently connected […]

CUDA

Feb, 16

MuMax: a new high-performance micromagnetic simulation tool

We present MuMax, a general-purpose micromagnetic simulation tool running on Graphical Processing Units (GPUs). MuMax is designed for high performance computations and specifically targets large simulations. In that case speedups of over a factor 100x can easily be obtained compared to the CPU-based OOMMF program developed at NIST. MuMax aims to be general and broadly […]

CUDA

Feb, 15

Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid

We have previously suggested mixed precision iterative solvers specifically tailored to the iterative solution of sparse linear equation systems as they typically arise in the finite element discretization of partial differential equations. These schemes have been evaluated for a number of hardware platforms, in particular, single-precision GPUs as accelerators to the general purpose CPU. This […]

CUDA

Feb, 15

Accelerating Cosmological Data Analysis with Graphics Processors

In this paper we describe a successful effort to accelerate the two-point angular correlation function—a basic statistics tool used in the field of cosmology to characterize the distribution of the matter and energy in the Universe—by using an NVIDIA GPU-based system. We demonstrate the use of GPUs to accelerate the calculation of histograms of angular […]

CUDA

Feb, 15

Direct Self-Consistent Field Computations on GPU Clusters

We present an implementation of one of the direct self-consistent-field (DSCF) calculation techniques, the restricted Hartree-Fock method, on a high-performance computing cluster outfitted with graphics processing units (GPUs) and demonstrate its effectiveness and scalability up to 128 cluster nodes on molecules of as many as 1,732 atoms. We discuss the overall parallel application architecture that […]

CUDA

Feb, 15

Generation of Kernels for Calculating Electron Repulsion Integrals of High Angular Momentum Functions on GPUs – Preliminary Results

Evaluation of electron repulsion integrals (ERIs) takes considerable time in modern quantum chemistry applications and also presents a certain difficulty to be efficiently computed on GPUs. Here, we describe a novel methodology for generating high-arithmetic-density kernels for ERI evaluation of d and higher angular momentum functions, as well as highlight challenges associated with the efficient […]

CUDA

Feb, 15

Accelerating Quantum Chromodynamics Calculations with GPUs

We present a CUDA C implementation of the Conjugate Gradient (CG) and multi-mass CG solver from the MILC quantum chromodynamics package to speedup improved staggered quarks computations on NVIDIA GPUs. The implementation is built on the QUDA package from Boston University.

CUDA

Feb, 15

PRNG Random Numbers on GPU

Limited numerical precision of nVidia GeForce 8800 GTX and other GPUs requires careful implementation of PRNGs. The Park-Miller PRNG is programmed using G80’s native Value4f floating point in RapidMind C++. Speed up is more than 40. Code is available via ftp ftp://cs.ucl.ac.uk/genetic/gp-code/random-numbers/gpu park-miller.tar.gz

CUDA

Feb, 15

A Fast High Quality Pseudo Random Number Generator for Graphics Processing Units

Limited numerical precision of nVidia GeForce 8800 GTX and other GPUs requires careful implementation of PRNGs. The Park-Miller PRNG is programmed using G80’s native Value4f floating point in RapidMind C++. Speed up is more than 40. Code is available via ftp cs.ucl.ac.uk genetic/gp-code/random-numbers/gpu_park-miller.tar.gz.

CUDA

Feb, 15

A CUDA SIMT Interpreter for Genetic Programming

A Single Instruction Multiple Thread CUDA interpreter provides SIMD like parallel evaluation of the whole GP population of 1/4 million RPN expressions on graphics cards and nVidia Tesla T10P. Using sub-machine code GP a sustain peak performance of 212 billion GP operations per second (3300 speed up) and an average of 4.5 peta GP ops […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Using Graphics Processors to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation

An experimental study on performance portability of OpenCL kernels

Multi-agent traffic simulation with CUDA

MuMax: a new high-performance micromagnetic simulation tool

Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid

Accelerating Cosmological Data Analysis with Graphics Processors

Direct Self-Consistent Field Computations on GPU Clusters

Generation of Kernels for Calculating Electron Repulsion Integrals of High Angular Momentum Functions on GPUs – Preliminary Results

Accelerating Quantum Chromodynamics Calculations with GPUs

PRNG Random Numbers on GPU

A Fast High Quality Pseudo Random Number Generator for Graphics Processing Units

A CUDA SIMT Interpreter for Genetic Programming

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)