high performance computing on graphics processing units: hgpu.org

Posts

Jan, 10

Parallelizing Kernel Polynomial Method Applying Graphics Processing Units

The Kernel Polynomial Method (KPM) is one of the fast diagonalization methods used for simulations of quantum systems in research fields of condensed matter physics and chemistry. The algorithm has a difficulty to be parallelized on a cluster computer or a supercomputer due to the fine-grain recursive calculations. This paper proposes an implementation of the […]

CUDA

Jan, 10

Acceleration of AES encryption on CUDA GPU

GPU exhibits the capability for applications with a high level of parallelism despite its low cost. The support of integer and logical instructions by the latest generation of GPUs enables us to implement cipher algorithms more easily. However, decisions such as parallel processing granularity and memory allocation impose a heavy burden on programmers. Therefore, this […]

CUDA

Jan, 9

Providing Source Code Level Portability Between CPU and GPU with MapCG

Graphics processing units (GPU) have taken an important role in the general purpose computing market in recent years. At present, the common approach to programming GPU units is to write GPU specific code with low level GPU APIs such as CUDA. Although this approach can achieve good performance, it creates serious portability issues as programmers […]

CUDA

•

OpenCL

Jan, 9

Gauge Fixing in Lattice QCD on GPUs

Quantum Chromodynamics (QCD) [1, 2] is the theory of the strong interaction which is responsible for the hadron spectrum and therefore for all matter in our everyday life. QCD, being a quantum field theory and part of the standard model of elementary particles, describes the interactions between color-charged quarks and gluons. Hadrons, e.g., protons, neutrons […]

CUDA

Jan, 9

A new parallel tool for classification of remotely sensed imagery

In this paper, we describe a new tool for classification of remotely sensed images. Our processing chain is based on three main parts: (1) pre-processing, performed using morphological profiles which model both the spatial (high resolution) and the spectral (color) information available from the scenes; (2) classification, which can be performed in unsupervised fashion using […]

CUDA

Jan, 9

Top-k Queries Processing With Uncertain Data on Graphics Processing Units

Considering the complex uncertain database, top-k query processing in uncertain databases is semantically and computationally different from classical top-k processing. Score is not the only factor we should concern. The interplay between score and membership uncertainty makes computation complex. Powerful computing capability of Graphic Processing Unit(GPU) is needed in the processing of this kind of […]

CUDA

Jan, 9

Designing Numerical Solvers for Next Generation High Performance Computing

High Performance Computing (HPC) is moving towards massive scales of parallelism. The changes in hardware towards large scale on chip parallelism requires the re-writing of existing solvers for various Computational Fluid Dynamics (CFD) problems. The aim of the project is to write and optimise novel solvers for various common CFD numerical problems that can take […]

CUDA

Jan, 9

LU Factorization for Accelerator-based Systems

Multicore architectures enhanced with multiple GPUs are likely to become mainstream High Performance Computing (HPC) platforms in a near future. In this paper, we present the design and implementation of an LU factorization using tile algorithm that can fully exploit the potential of such platforms in spite of their complexity. We use a methodology derived […]

CUDA

Jan, 9

Neural Network Simulation: The recognition application

This paper presents the GPU mapping of the recognition algorithm of a Convolution Neural Network (CNN). This work is based on a C-implementation of the application. The mapping to GPU was performed through different approaches which are explained in detail. The improvements achieved by each approach are presented as well as the overall speed up […]

CUDA

Jan, 9

Spatial Sorting Algorithms for Parallel Computing in Networks

Many basic techniques in computer science have been founded on the assumption that physical computing resources are scarce but orderly, and that the cost of effective direct communication between physically distant parts of a computer system is affordable. In large scale cluster computing installations, fine-grained parallel computing hardware, or wireless mesh networks, these familiar assumptions […]

Jan, 9

High Performance and Scalable GPU Graph Traversal

Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to […]

CUDA

Jan, 9

Fast GPU-based Locality Sensitive Hashing for K-Nearest Neighbor Computation

We present an efficient GPU-based parallel LSH algorithm to perform approximate k-nearest neighbor computation in high-dimensional spaces. We use the Bi-level LSH algorithm, which can compute k-nearest neighbors with higher accuracy and is amenable to parallelization. During the first level, we use the parallel RP-tree algorithm to partition datasets into several groups so that items […]

high performance computing on graphics processing units: hgpu.org

Posts

Parallelizing Kernel Polynomial Method Applying Graphics Processing Units

Acceleration of AES encryption on CUDA GPU

Providing Source Code Level Portability Between CPU and GPU with MapCG

Gauge Fixing in Lattice QCD on GPUs

A new parallel tool for classification of remotely sensed imagery

Top-k Queries Processing With Uncertain Data on Graphics Processing Units

Designing Numerical Solvers for Next Generation High Performance Computing

LU Factorization for Accelerator-based Systems

Neural Network Simulation: The recognition application

Spatial Sorting Algorithms for Parallel Computing in Networks

High Performance and Scalable GPU Graph Traversal

Fast GPU-based Locality Sensitive Hashing for K-Nearest Neighbor Computation

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)