high performance computing on graphics processing units: hgpu.org

Posts

Sep, 27

Increasing the performance of AllToAll variant of self-organizing migration algorithm using CUDA

Modern graphics processing units offer general purpose parallel computing capabilities. Thus they have become a relatively low cost alternative for applications requiring extensive parallel computations. Evolutionary algorithms are especially well suited for parallel SIMD architecture. This paper deals with the modification of AllToAll variation of self-organizing migration algorithm, which has high computational demand for one […]

CUDA

Sep, 27

Deterministic Parallelism

A program is deterministic if it always produces the same output for a given input. Although sequential programs are often deterministic by default, parallel programs are more susceptible to behaving nondeterministically because instructions from different threads can be interleaved unpredictably. Non-determinism complicates the task of developing and maintaining software because it makes reasoning about program […]

CUDA

Sep, 27

GPU-based tuning of quantum-inspired genetic algorithm for a combinatorial optimization problem

This paper concerns efficient parameters tuning (meta-optimization) of a state-of-the-art metaheuristic, Quantum-Inspired Genetic Algorithm (QIGA), in a GPU-based massively parallel computing environment (NVidia CUDA technology). A novel approach to parallel implementation of the algorithm has been presented. In a block of threads, each thread transforms a separate quantum individual or different quantum gene; In each […]

CUDA

Sep, 27

Lattice QCD based on OpenCL

We present an OpenCL-based Lattice QCD application using a heatbath algorithm for the pure gauge case and Wilson fermions in the twisted mass formulation. The implementation is platform independent and can be used on AMD or NVIDIA GPUs, as well as on classical CPUs. On the AMD Radeon HD 5870 our double precision dslash implementation […]

OpenCL

Sep, 27

GPU Acceleration of Image Convolution using Spatially-varying Kernel

Image subtraction in astronomy is a tool for transient object discovery such as asteroids, extra-solar planets and supernovae. To match point spread functions (PSFs) between images of the same field taken at different times a convolution technique is used. Particularly suitable for large-scale images is a computationally intensive spatially-varying kernel. The underlying algorithm is inherently […]

CUDA

Sep, 26

Improved Row-Grouped CSR Format for Storing of Sparse Matrices on GPU

We present new format for storing sparse matrices on GPU. We compare it with several other formats including CUSPARSE which is today probably the best choice for processing of sparse matrices on GPU in CUDA. Contrary to CUSPARSE which works with common CSR format, our new format requires conversion. However, multiplication of sparse-matrix and vector […]

CUDA

Sep, 26

GPU Shape Grammars

GPU Shape Grammars provide a solution for interactive procedural generation, tuning and visualization of massive environment elements for both video games and production rendering. Our technique generates detailed models without explicit geometry storage. To this end we reformulate the grammar expansion for generation of detailed models at the tesselation control and geometry shader stages. Using […]

Sep, 26

Enabling Development of OpenCL Applications on FPGA platforms

FPGAs can potentially deliver tremendous acceleration in high-performance server and embedded computing applications. Whether used to augment a processor or as a stand-alone device, these reconfigurable architectures are being deployed in a large number of implementations owing to the massive amounts of parallelism offered. At the same time, a significant challenge encountered in their wide-spread […]

OpenCL

Sep, 26

A Parallel Auxiliary Grid AMG Method for GPU

In this paper, we develop a new parallel auxiliary grid algebraic multigrid (AMG) method to leverage the power of graphic processing units (GPUs). In the construction of the hierarchical coarse grid, we use a simple and fixed coarsening procedure based on a region quadtree generated from an auxiliary grid. This allows us to explicitly control […]

CUDA

Sep, 26

Accelerating Iterative SpMV for Discrete Logarithm Problem using GPUs

In the cryptanalytic context, computing discrete logarithms in large cyclic groups using index-calculus-based methods, such as the number field sieve or the function field sieve, requires solving large sparse systems of linear equations modulo the group order. Most of the fast algorithms used to solve such systems — e.g., the conjugate gradient or the Lanczos […]

CUDA

Sep, 25

GPF: a framework for general packet classification on GPU co-processors

This thesis explores the design and experimental implementation of GPF, a novel protocol-independent, multi-match packet classification framework. This framework is targeted and optimised for flexible, efficient execution on NVIDIA GPU platforms through the CUDA API, but should not be difficult to port to other platforms, such as OpenCL, in the future. GPF was conceived and […]

CUDA

Sep, 25

Implementation and Analysis of AES Encryption on GPU

GPU is continuing its trend of vastly outperforming CPU while becoming more general purpose. In order to improve the efficiency of AES algorithm, this paper proposed a CUDA implementation of Electronic Codebook (ECB) mode encoding process and Cipher Feedback (CBC) mode decoding process on GPU. In our implementation, the frequently accessed T-boxes were allocated on […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Increasing the performance of AllToAll variant of self-organizing migration algorithm using CUDA

Deterministic Parallelism

GPU-based tuning of quantum-inspired genetic algorithm for a combinatorial optimization problem

Lattice QCD based on OpenCL

GPU Acceleration of Image Convolution using Spatially-varying Kernel

Improved Row-Grouped CSR Format for Storing of Sparse Matrices on GPU

GPU Shape Grammars

Enabling Development of OpenCL Applications on FPGA platforms

A Parallel Auxiliary Grid AMG Method for GPU

Accelerating Iterative SpMV for Discrete Logarithm Problem using GPUs

GPF: a framework for general packet classification on GPU co-processors

Implementation and Analysis of AES Encryption on GPU

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)