high performance computing on graphics processing units: hgpu.org

Posts

Mar, 28

Auto-tuning a High-Level Language Targeted to GPU Codes

Determining the best set of optimizations to apply to a kernel to be executed on the graphics processing unit (GPU) is a challenging problem. There are large sets of possible optimization configurations that can be applied, and many applications have multiple kernels. Each kernel may require a specific configuration to achieve the best performance, and […]

CUDA

•

OpenCL

Mar, 28

Accelerating the FDTD Method Using SSE and Graphics Processing Units

The Finite-Difference Time-Domain (FDTD) method is a computational technique for modelling the behaviour of electromagnetic waves in three-dimensional space. When executed to solve real-world problems the FDTD method is characterised by long execution times involving a large amount of data organised into matrices. The FDTD method exhibits ample data parallelism, and parallel computing techniques are […]

CUDA

Mar, 28

Systematic construction, verification and implementation methodology for LDPC codes

In this article, a novel and systematic Low-density parity-check (LDPC) code construction, verification and implementation methodology is proposed. The methodology is composed by the simulated annealing based LDPC code constructor, the GPU based high-speed code selector, the ant colony optimization based pipeline scheduler and the FPGA-based hardware implementer. Compared to the traditional ways, this methodology […]

CUDA

Mar, 28

Fast, parallel and secure cryptography algorithm using Lorenz’s attractor

A novel cryptography method based on the Lorenz’s attractor chaotic system is presented. The proposed algorithm is secure and fast, making it practical for general use. We introduce the chaotic operation mode, which provides an interaction among the password, message and a chaotic system. It ensures that the algorithm yields a secure codification, even if […]

CUDA

Mar, 28

Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems

We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and multi-GPU systems to support dense matrix computations efficiently. The main idea is that we treat a heterogeneous system as a distributed-memory machine, and use a heterogeneous multi-level block cyclic distribution method to allocate data to the host and […]

CUDA

Mar, 27

Improving Performance of OpenCL on CPUs

Data-parallel languages like OpenCL and CUDA are an important means to exploit the computational power of today’s computing devices. In this paper, we deal with two aspects of implementing such languages on CPUs: First, we present a static analysis and an accompanying optimization to exclude code regions from control-flow to data-flow conversion, which is the […]

OpenCL

Mar, 27

Practical and Theoretical Aspects of a Parallel Twig Join Algorithm for XML Processing using a GPGPU

With an increasing amount of data and demand for fast query processing, the efficiency of database operations continues to be a challenging task. A common approach is to leverage parallel hardware platforms. With the introduction of general-purpose GPU (Graphics Processing Unit) computing, massively parallel hardware has become available within commodity hardware. XML is based on […]

CUDA

Mar, 27

Accelerating Constraint Automata Composition with GPGPU Parallelization

One of the principle challenges of Constraint Automata composition is the rapid growth of the state space and the diffficulty inherent in processing very large state spaces both in terms of space as well as computation time. We show that the method outlined here goes some way in tackling both these issues by making it […]

CUDA

Mar, 27

Dynamic Translation of Runtime Environments for Heterogeneous Computing

The current trend towards heterogeneous architectures requires a global rethinking of software and hardware design. The focus is centered around new parallel programming models, design space exploration and run-time resource management techniques to exploit the features of many-core processor architectures. Graphics Processing Units (GPU) have become the platform of choice in this area for accelerating […]

CUDA

•

OpenCL

Mar, 27

Adaptive Row-grouped CSR Format for Storing of Sparse Matrices on GPU

We present new adaptive format for storing sparse matrices on GPU. We compare it with several other formats including CUSPARSE which is today probably the best choice for processing of sparse matrices on GPU in CUDA. Contrary to CUSPARSE which works with common CSR format, our new format requires conversion. However, multiplication of sparse-matrix and […]

CUDA

Mar, 26

OpenMPC: Extended OpenMP for Efficient Programming and Tuning on GPUs

General-Purpose Graphics Processing Units (GPGPUs) provide inexpensive, high performance platforms for compute-intensive applications. However, their programming complexity poses a significant challenge to developers. Even though the CUDA (Compute Unified Device Architecture) programming model offers better abstraction, developing efficient GPGPU code is still complex and error-prone. This paper proposes a directive-based, high-level programming model, called OpenMPC, […]

CUDA

Mar, 26

Massively Parallel Localization of Pulsed Signal Transitions Using a GPU

Computer clock speeds which had been increasing tremendously over years is now slowing down and has reached its limit of saturation. In order to overcome this saturation of the clock speed, aggressively pursuing optimizations techniques are being developed to get more work done in each clock cycle in favor of parallel computing and concurrent programming. […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Auto-tuning a High-Level Language Targeted to GPU Codes

Accelerating the FDTD Method Using SSE and Graphics Processing Units

Systematic construction, verification and implementation methodology for LDPC codes

Fast, parallel and secure cryptography algorithm using Lorenz’s attractor

Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems

Improving Performance of OpenCL on CPUs

Practical and Theoretical Aspects of a Parallel Twig Join Algorithm for XML Processing using a GPGPU

Accelerating Constraint Automata Composition with GPGPU Parallelization

Dynamic Translation of Runtime Environments for Heterogeneous Computing

Adaptive Row-grouped CSR Format for Storing of Sparse Matrices on GPU

OpenMPC: Extended OpenMP for Efficient Programming and Tuning on GPUs

Massively Parallel Localization of Pulsed Signal Transitions Using a GPU

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)