high performance computing on graphics processing units: hgpu.org

Posts

Jun, 13

Experiences with High-Level Programming Directives for Porting Applications to GPUs

HPC systems now exploit GPUs within their compute nodes to accelerate program performance. As a result, high-end application development has become extremely complex at the node level. In addition to restructuring the node code to exploit the cores and specialized devices, the programmer may need to choose a programming model such as OpenMP or CPU […]

CUDA

•

OpenCL

Jun, 13

A comparison of CPU and GPU performance for Fourier pseudospectral simulations of the Navier-Stokes, Cubic Nonlinear Schrodinger and Sine Gordon Equations

We report results comparing the performance of pseudospectral methods on a single CPU and a single GPU. Our CPU implementations use FFTW and we compare serial and OpenMP implementations. Our implementations for Nvidia GPUs use CuFFT and we compare the performance of PGI FORTRAN CUDA, Nvidia CUDA and PGI OpenACC compilers for similar algorithms.

CUDA

Jun, 13

Fluid Dynamics Simulations on Multi-GPU Systems

The thesis describes the original design, implementation and testing of the multi-GPU version of two fluid flow simulation models, focusing on the cellular automaton MAGFLOW lava flow simulator and the GPU-SPH model for Navier-Stokes. In both cases, a spatial subdivision of the domain is performed, with a minimal overlap to ensure the correct evaluation of […]

CUDA

Jun, 11

Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation

Language models play an important role in large vocabulary speech recognition and statistical machine translation systems. The dominant approach since several decades are back-off language models. Some years ago, there was a clear tendency to build huge language models trained on hundreds of billions of words. Lately, this tendency has changed and recent works concentrate […]

CUDA

Jun, 11

GPUSync: Architecture-Aware Management of GPUs for Predictable Multi-GPU Real-Time Systems

The integration of graphics processing units (GPUs) into real-time systems has recently become an active area of research. However, prior research on this topic has failed to produce real-time GPU allocation methods that fully exploit the available parallelism in GPU-enabled systems. In this paper, a GPU management framework called GPUSync is described that was designed […]

CUDA

Jun, 11

Range query processing in a multi-GPU environment

Similarity search has been widely studied in the last years, as it can be applied to several fields such as searching by content in multimedia objects, text retrieval or computational biology. These applications usually work on very large databases that are often indexed off-line to enable the acceleration of online searches. However, to maintain an […]

CUDA

Jun, 11

CUDAICA: GPU optimization of Infomax-ICA EEG analysis

In recent years Independent Component Analysis (ICA) has become a standard to identify relevant dimensions of the data in neuroscience. ICA is a very reliable method to analyze data but it is, computationally, very costly. The use of ICA for on-line analysis of the data, used in brain computing interfaces, results almost completely prohibitive. We […]

CUDA

Jun, 11

Solving the Ghost-Gluon System of Yang-Mills Theory on GPUs

We solve the ghost-gluon system of Yang-Mills theory using Graphics Processing Units (GPUs). Working in Landau gauge, we use the Dyson-Schwinger formalism for the mathematical description as this approach is well-suited to directly benefit from the computing power of the GPUs. With the help of a Chebyshev expansion for the dressing functions and a subsequent […]

CUDA

Jun, 10

Using the GPGPU for Scaling Up Mining Software Repositories

The Mining Software Repositories (MSR) field integrates and analyzes data stored in repositories such as source control and bug repositories to support practitioners. Given the abundance of repository data, scaling up MSR analyses has become a major challenge. Recently, researchers have experimented with conventional techniques like a super-computer or cloud computing, but these are either […]

CUDA

Jun, 10

Point to point processing of digital images using parallel computing

This paper presents an approach the point to point processing of digital images using parallel computing, particularly for grayscale, brightening, darkening, thresholding and contrast change. The point to point technique applies a transformation to each pixel on image concurrently rather than sequentially. This approach used CUDA as parallel programming tool on a GPU in order […]

CUDA

Jun, 10

CUDA Kernel Design for GPU-Based Beam Dynamics Simulations

Efficient implementation of general purpose particle tracking on GPUs can result in significant performance benefits to large scale particle tracking and tracking-based accelerator optimization simulations. We present our work on accelerating Argonne National Lab’s accelerator simulation code ELEGANT [1, 2] using CUDA-enabled GPUs [3]. In particular, we provide an overview of beamline elements ported to […]

CUDA

Jun, 10

S-buffer: Sparsity-aware Multi-fragment Rendering

This work introduces S-buffer, an efficient and memory-friendly gpu-accelerated A-buffer architecture for multi-fragment rendering. Memory is organized into variable contiguous regions for each pixel, thus avoiding limitations set in linked-lists and fixed-array techniques. S-buffer exploits fragment distribution for precise allocation of the needed storage and pixel sparsity (empty pixel ratio) for computing the memory offsets […]

OpenCL

•

OpenGL

high performance computing on graphics processing units: hgpu.org

Posts

Experiences with High-Level Programming Directives for Porting Applications to GPUs

A comparison of CPU and GPU performance for Fourier pseudospectral simulations of the Navier-Stokes, Cubic Nonlinear Schrodinger and Sine Gordon Equations

Fluid Dynamics Simulations on Multi-GPU Systems

Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation

GPUSync: Architecture-Aware Management of GPUs for Predictable Multi-GPU Real-Time Systems

Range query processing in a multi-GPU environment

CUDAICA: GPU optimization of Infomax-ICA EEG analysis

Solving the Ghost-Gluon System of Yang-Mills Theory on GPUs

Using the GPGPU for Scaling Up Mining Software Repositories

Point to point processing of digital images using parallel computing

CUDA Kernel Design for GPU-Based Beam Dynamics Simulations

S-buffer: Sparsity-aware Multi-fragment Rendering

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)