high performance computing on graphics processing units: hgpu.org

Posts

Sep, 14

Accelerating Sparse Matrix Kernels on Graphics Processing Units

After microprocessor clock speeds have levelled off, general purpose computing on Graphics Pro- cessing Units (GPGPUs) has shown some promise for the future High Performance Computing (HPC). This can be largely attributed to the performance per unit cost, performance per unit watt and CUDA, a programming model for GPU. For instance, for under $400, one […]

CUDA

Sep, 13

Data Sorting Using Graphics Processing Units

Graphics processing units (GPUs) have been increasingly used for general-purpose computation in recent years. The GPU accelerated applications are found in both scientific and commercial domains. Sorting is considered as one of the very important operations in many applications, so its efficient implementation is essential for the overall application performance. This paper represents an effort […]

CUDA

Sep, 13

GPU Fluid Simulation using Smoothed Particle Hydrodynamics

In this paper we present an overview of our implementation of a fluid simulation technique called "Smoothed Particle Hydrodynamics". Our implementation uses a hybrid CPU+GPU hash based data structure to provide quick lookups of particle nearest neighbors and improve memory access patterns.In our discussion we begin with a brief overview of the Navier Stokes equations […]

OpenCL

•

OpenGL

Sep, 13

Exploring Heterogeneous Scheduling using the Task-Centric Programming Model

Computer architecture technology is moving towards more heterogeneous solutions, which will contain a number of processing units with different capabilities that may increase the performance of the system as a whole. However, with increased performance comes increased complexity; complexity that is now barely handled in homogeneous multiprocessing systems. The present study tries to solve a […]

CUDA

Sep, 13

Multi-GPU implementation of the NICAM atmospheric model

Climate simulation models are used for a variety of scientific problems and accuracy of the climate prognoses is mostly limited by the resolution of the models. Finer resolution results in more accurate prognoses but, at the same time, significantly increases computational complexity. This explains the increasing interest to the High Performance Computing (HPC), and GPU […]

CUDA

Sep, 13

Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU

The use of GPUs has been very beneficial in accelerating dense linear algebra computational kernels (DLA). Many high performance numerical libraries like CUBLAS, MAGMA, and CULA provide BLAS and LAPACK implementations on GPUs as well as hybrid computations involving both, CPUs and GPUs. GPUs usually score better performance than CPUs for compute-bound operations, especially those […]

CUDA

Sep, 12

Hotspot Analysis Based Partial CUDA Acceleration of HMMER 3.0 on GPGPUs

With the introduction of many-core GPUs, there is widespread interest in using GPUs to accelerate non-graphics applications such as bioinformatics, energy, finance and several research areas. Even though the GPUs provide highly parallel processing capability, the communication interface between CPU and GPU could be a performance bottleneck due to heavy data transfer. If data transfer […]

CUDA

Sep, 12

Fast Determination of the Number of Endmembers for Real-Time Hyperspectral Unmixing on GPUs

Spectral unmixing is a very important task for remotely sensed hyperspectral data exploitation. It amounts at identifying a set of spectrally pure components (called endmembers) and their associated per-pixel coverage fractions (called abundances). A challenging problem in spectral unmixing is how to determine the number of endmembers in a given scene. Several automatic techniques exist […]

CUDA

Sep, 12

Mapping a Dataflow Programming Model onto Heterogeneous Architectures

This thesis describes and evaluates how extending Intel’s Concurrent Collections (CnC) programming model can address the problem of hybrid programming with high performance and low energy consumption, while retaining the ease of use of data-flow programming. The CnC model is a declarative, dynamic light-weight task based parallel programming model and is implicitly deterministic by enforcing […]

CUDA

Sep, 12

Medusa: Simplified Graph Processing on GPUs

Graphs are the de facto data structures for many applications, and efficient graph processing is a must for the application performance. Recently, the graphics processing unit (GPU) has been adopted to accelerate various graph processing algorithms such as BFS and shortest path. However, it is difficult to write correct and efficient GPU programs and even […]

CUDA

Sep, 12

Dynamical heterogeneities as fingerprints of a backbone structure in Potts models

We investigate slow non-equilibrium dynamical processes in two-dimensional q-state Potts model with both ferromagnetic and $pm J$ couplings. Dynamical properties are characterized by means of the mean-flipping time distribution. This quantity is known for clearly unveiling dynamical heterogeneities. Using a two-times protocol we characterize the different time scales observed and relate them to growth processes […]

CUDA

Sep, 11

A First Step Towards GPU-assisted Query Optimization

Modern graphics cards bundle high-bandwidth memory with a massively parallel processor, making them an interesting platform for running data-intensive operations. Consequently, several authors have discussed accelerating database operators using graphics cards, often demonstrating promising speed-ups. However, due to limitations stemming from limited device memory and expensive data transfer, GPUaccelerated databases remain a niche technology. We […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Accelerating Sparse Matrix Kernels on Graphics Processing Units

Data Sorting Using Graphics Processing Units

GPU Fluid Simulation using Smoothed Particle Hydrodynamics

Exploring Heterogeneous Scheduling using the Task-Centric Programming Model

Multi-GPU implementation of the NICAM atmospheric model

Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU

Hotspot Analysis Based Partial CUDA Acceleration of HMMER 3.0 on GPGPUs

Fast Determination of the Number of Endmembers for Real-Time Hyperspectral Unmixing on GPUs

Mapping a Dataflow Programming Model onto Heterogeneous Architectures

Medusa: Simplified Graph Processing on GPUs

Dynamical heterogeneities as fingerprints of a backbone structure in Potts models

A First Step Towards GPU-assisted Query Optimization

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)