high performance computing on graphics processing units: hgpu.org

Posts

Mar, 27

Practical and Theoretical Aspects of a Parallel Twig Join Algorithm for XML Processing using a GPGPU

With an increasing amount of data and demand for fast query processing, the efficiency of database operations continues to be a challenging task. A common approach is to leverage parallel hardware platforms. With the introduction of general-purpose GPU (Graphics Processing Unit) computing, massively parallel hardware has become available within commodity hardware. XML is based on […]

CUDA

Mar, 27

Accelerating Constraint Automata Composition with GPGPU Parallelization

One of the principle challenges of Constraint Automata composition is the rapid growth of the state space and the diffficulty inherent in processing very large state spaces both in terms of space as well as computation time. We show that the method outlined here goes some way in tackling both these issues by making it […]

CUDA

Mar, 27

Dynamic Translation of Runtime Environments for Heterogeneous Computing

The current trend towards heterogeneous architectures requires a global rethinking of software and hardware design. The focus is centered around new parallel programming models, design space exploration and run-time resource management techniques to exploit the features of many-core processor architectures. Graphics Processing Units (GPU) have become the platform of choice in this area for accelerating […]

CUDA

•

OpenCL

Mar, 27

Adaptive Row-grouped CSR Format for Storing of Sparse Matrices on GPU

We present new adaptive format for storing sparse matrices on GPU. We compare it with several other formats including CUSPARSE which is today probably the best choice for processing of sparse matrices on GPU in CUDA. Contrary to CUSPARSE which works with common CSR format, our new format requires conversion. However, multiplication of sparse-matrix and […]

CUDA

Mar, 26

OpenMPC: Extended OpenMP for Efficient Programming and Tuning on GPUs

General-Purpose Graphics Processing Units (GPGPUs) provide inexpensive, high performance platforms for compute-intensive applications. However, their programming complexity poses a significant challenge to developers. Even though the CUDA (Compute Unified Device Architecture) programming model offers better abstraction, developing efficient GPGPU code is still complex and error-prone. This paper proposes a directive-based, high-level programming model, called OpenMPC, […]

CUDA

Mar, 26

Massively Parallel Localization of Pulsed Signal Transitions Using a GPU

Computer clock speeds which had been increasing tremendously over years is now slowing down and has reached its limit of saturation. In order to overcome this saturation of the clock speed, aggressively pursuing optimizations techniques are being developed to get more work done in each clock cycle in favor of parallel computing and concurrent programming. […]

CUDA

Mar, 26

A Parallel Access Method for Spatial Data Using GPU

Spatial access methods (SAMs) are used for information retrieval in large spatial databases. Many of the SAMs use sequential tree structures to search the result set of the spatial data which are contained in the given query region. In order to improve performance for the SAM, this paper proposes a parallel method using GPU. Since […]

CUDA

Mar, 26

GPUstore: Harnessing GPU Computing for Storage Systems in the OS Kernel

Many storage systems include computationally expensive components. Examples include encryption for confidentiality, checksums for integrity, and error correcting codes for reliability. As storage systems become larger, faster, and serve more clients, the demands placed on their computational components increase and they can become performance bottlenecks. Many of these computational tasks are inherently parallel: they can […]

CUDA

Mar, 26

Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting

Modern system-on-chips are evolving towards complex and heterogeneous platforms with general purpose processors coupled with massively parallel manycore accelerator fabrics (e.g. embedded GPUs). Platform developers are looking for efficient full-system simulators capable of simulating complex applications, middleware and operating systems on these heterogeneous targets. Unfortunately current virtual platforms are not able to tackle the complexity […]

CUDA

Mar, 23

Volume-preserving FFD for programmable graphics hardware

Free-Form Deformation (FFD) is a well established technique for deforming arbitrary object shapes in space. Although more recent deformation techniques have been introduced, among them skeleton-based deformation and cage-based deformation, the simple and versatile nature of FFD is a strong advantage, and justifies its presence in nowadays leading commercial geometric modeling and animation software systems. […]

CUDA

•

OpenGL

Mar, 23

Accelerating large-scale simulations of cortical neuronal network development

Cultured dissociated cortical cells grown into networks on multi-electrode arrays are used to investigate neuronal network development, activity, plasticity, response to stimuli, the effects of pharmacological agents, etc. We made a computational model of such a neuronal network and studied the interplay of individual neuron activity, cell culture development, and network behavior. For small networks […]

CUDA

Mar, 23

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Practical and Theoretical Aspects of a Parallel Twig Join Algorithm for XML Processing using a GPGPU

Accelerating Constraint Automata Composition with GPGPU Parallelization

Dynamic Translation of Runtime Environments for Heterogeneous Computing

Adaptive Row-grouped CSR Format for Storing of Sparse Matrices on GPU

OpenMPC: Extended OpenMP for Efficient Programming and Tuning on GPUs

Massively Parallel Localization of Pulsed Signal Transitions Using a GPU

A Parallel Access Method for Spatial Data Using GPU

GPUstore: Harnessing GPU Computing for Storage Systems in the OS Kernel

Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting

Volume-preserving FFD for programmable graphics hardware

Accelerating large-scale simulations of cortical neuronal network development

CUDA implementation of Wagener’s 2D convex hull PRAM algorithm

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)