high performance computing on graphics processing units: hgpu.org

Posts

Jun, 7

Scientific Computing on Hybrid Architectures

Modern computer architectures, with multicore CPUs and GPUs or other accelerators, make stronger demands than ever on writers of scientific code. Normally, the most efficient program has to be written – using a substantial effort – by expert programmers for a certain application on a particular computer. This thesis deals with several algorithmic and technical […]

CUDA

Jun, 7

CUDA Based Performance Evaluation of the Computational Efficiency of the DCT Image Compression Technique on Both the CPU and GPU

Recent advances in computing such as the massively parallel GPUs (Graphical Processing Units),coupled with the need to store and deliver large quantities of digital data especially images, has brought a number of challenges for Computer Scientists, the research community and other stakeholders. These challenges, such as prohibitively large costs to manipulate the digital data amongst […]

CUDA

Jun, 6

Parallel Implementation of Finite Element Codes using CUDA

The purpose of this work is to study the performance of parallel computation of Finite Element Method using the NVIDIA’s CUDA. The numerical experiments are performed only on the stiffness matrix using the conjugate gradient method. In addition, the generalized minimal residual method is considered to solve the Stokes problem using both PETSc and CUDA. […]

CUDA

Jun, 6

Development of an explicit pressure-based unstructured solver for three-dimensional incompressible flows with graphics hardware acceleration

In this research, a numerical algorithm was developed to solve the incompressible Navier-Stokes equations using explicit time stepping. The goal of this research was to develop an unsteady SIMPLER based algorithm with lower computational overhead. The new explicit algorithm uses a four stage Runge-Kutta scheme to update the velocities and eliminates the need for the […]

CUDA

Jun, 6

Parallelization & checkpointing of GPU applications through program transformation

GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general-purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, […]

OpenCL

Jun, 6

A MapReduce Framework for Heterogeneous Computing Architectures

Nowadays, an increasing number of computational systems are equipped with heterogeneous compute resources, i.e., following different architecture. This applies to the level of a single chip, a single node and even supercomputers and large-scale clusters. With its impressive price-to-performance ratio as well as power efficiency compared to traditional multicore processors, graphics processing units (GPUs) has […]

CUDA

•

OpenCL

Jun, 6

A comprehensive study of Dynamic Memory Management in OpenCL kernels

Traditional (sequential) applications use malloc for a variety of dynamic data structures, like linked lists or trees. GPGPU is gaining attention and popularity because its massively-parallel architecture allows for great speed improvement for programs that can be parallelised and implemented for a platform like OpenCL. Programmers who try to port their existing sequential or even […]

OpenCL

Jun, 6

A Reliable Throughput Gain on GPUs

Graphic Processing Units (GPUs) are widely employed in many applications in which high computing capabilities are required and parallelism can be fruitfully exploited. A higher amount of parallel threads bring to the GPU a higher throughput, but may also increase the code neutron-induced error rate. The GPUs sensitivity depends not only on the code throughput, […]

CUDA

Jun, 6

Automating elimination of idle functions by run-time reconfiguration

A design approach is proposed to automatically identify and exploit run-time reconfiguration opportunities while optimising resource utilisation. We introduce Reconfiguration Data Flow Graph, a hierarchical graph structure enabling reconfigurable designs to be synthesised in three steps: function analysis, configuration organisation, and run-time solution generation. Three applications, based on barrier option pricing, particle filter, and reverse […]

CUDA

Jun, 6

Implicit Skinning: Real-Time Skin Deformation with Contact Modeling

Geometric skinning techniques, such as smooth blending or dualquaternions, are very popular in the industry for their high performances, but fail to mimic realistic deformations. Other methods make use of physical simulation or control volume to better capture the skin behavior, yet they cannot deliver real-time feedback. In this paper, we present the first purely […]

CUDA

Jun, 6

ElastiFace: Matching and Blending Textured Faces

In this paper we present ELASTIFACE, a simple and versatile method for establishing correspondence between textured face models, either for the construction of a blend-shape facial rig or for the exploration of new characters by morphing between a set of input models. While there exists a wide variety of approaches for inter-surface mapping and mesh […]

OpenCL

Jun, 6

Accelerating Fast Fourier Transform for Wideband Channelization

Wideband channelization is a compute-intensive task with performance requirements that are arguably greater than what current multi-core CPUs can provide. To date, researchers have used dedicated hardware such as field programmable gate arrays (FPGAs) to address the performancecritical aspects of the channelizer. In this work, we assess the viability of the graphics processing unit (GPU) […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Scientific Computing on Hybrid Architectures

CUDA Based Performance Evaluation of the Computational Efficiency of the DCT Image Compression Technique on Both the CPU and GPU

Parallel Implementation of Finite Element Codes using CUDA

Development of an explicit pressure-based unstructured solver for three-dimensional incompressible flows with graphics hardware acceleration

Parallelization & checkpointing of GPU applications through program transformation

A MapReduce Framework for Heterogeneous Computing Architectures

A comprehensive study of Dynamic Memory Management in OpenCL kernels

A Reliable Throughput Gain on GPUs

Automating elimination of idle functions by run-time reconfiguration

Implicit Skinning: Real-Time Skin Deformation with Contact Modeling

ElastiFace: Matching and Blending Textured Faces

Accelerating Fast Fourier Transform for Wideband Channelization

Recent source codes

Luthier: Bridging Auto-Tuning and Vendor Libraries for Efficient Deep Learning Inference

Fused Kernel Library (FKL)

GPUHammer: Rowhammer Attacks on GPU Memories are Practical

Block: Balance Loader of LLM Serving with Context, Knowledge and Predictive Scheduling

SIGMo: Scalable Isomorphism Graph Matching on GPUs

DGEMM without FP64 Arithmetic - using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme

GEAK-agent: LLM-based AI agent, which can write correct and efficient GPU kernels automatically

OpenDwarfs 2025: re-engineered version of the OpenDwarfs benchmark suite, for compatibility with modern platforms

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Most viewed papers (last 30 days)