high performance computing on graphics processing units: hgpu.org

Posts

May, 1

Parallel For Loops on Heterogeneous Resources

In recent years, Graphics Processing Units (GPUs) have piqued the interest of researchers in scientific computing. Their immense floating point throughput and massive parallelism make them ideal for not just graphical applications, but many general algorithms as well. Load balancing applications and taking advantage of all computational resources in a machine is a difficult challenge, […]

OpenCL

May, 1

GPU-based Steady-State Solution of the Chemical Master Equation

The Chemical Master Equation (CME) is a stochastic and discrete-state continuous-time model for macromolecular reaction networks inside the cell. Under this theoretical framework, the solution of a sparse linear system provides the steady-state probability landscape over the molecular microstates. The CME framework can in fact reveal important insights into basic principles on how biological networks […]

CUDA

May, 1

ACL2 Meets the GPU: Formalizing a CUDA-based Parallelizable All-Pairs Shortest Path Algorithm in ACL2

As Graphics Processing Units (GPUs) have gained in capability and GPU development environments have matured, developers are increasingly turning to the GPU to off-load the main host CPU of numerically-intensive, parallelizable computations. Modern GPUs feature hundreds of cores, and offer programming niceties such as double-precision floating point, and even limited recursion. This shift from CPU […]

CUDA

May, 1

Graphics Programming on the Web WebCL Course Notes

This document introduces WebCL [1], a new standard under development by the Khronos Group, for highperformance computing in web browsers. Since WebCL wraps OpenCL, the course starts by reviewing important OpenCL [2] concepts. Next, we detail how to program with WebCL in the browser and on devices such as GPUs. Finally, we discuss WebCL – […]

OpenCL

Apr, 30

The 2013 International Conference on Network Computing and Information Security and the 2013 International Conference on Multimedia and Signal Processing, NCIS’13- CMSP’13

The 2013 International Conference on Network Computing and Information Security (NCIS’13) and the 2013 International Conference on Multimedia and Signal Processing (CMSP’13) will be jointly held at Guiyang, China in September 20-22, 2013. NCIS’13- CMSP’13 aims to provide a high-level international forum for scientists and researchers to present the state of the art of Network […]

Apr, 30

Automatic Compilation for Heterogeneous Architectures with Single Assignment C

In recent years, we have witnessed an increasing heterogeneity of computing resources. A typical laptop today combines at least one multicore processor with one general purpose graphics processing unit (GPGPU), while supercomputer nodes typically have several of each. Exploiting all these available computing resources effectively is very important, but also still very challenging. In this […]

CUDA

Apr, 30

Improving Numerical Accuracy for Non-Negative Matrix Multiplication on GPUs using Recursive Algorithms

Scientific computing is only bound by the limits of Moore’s Law and the scalability of high performance mathematical library implementations. Most mathematical libraries however tend to focus only on general inputs, limiting their potential performance and scalability by not tailoring their implementation to specific inputs, such as non-negative inputs. By removing this limitation it is […]

CUDA

Apr, 30

MPI Derived Datatypes Processing on Noncontiguous GPU-resident Data

Driven by the goals of efficient and generic communication of noncontiguous data layouts in GPU memory, for which solutions do not currently exist, we present a parallel, noncontiguous data-processing methodology through the MPI datatypes specification. Our processing algorithm utilizes a kernel on the GPU to pack arbitrary noncontiguous GPU data by enriching the datatypes encoding […]

CUDA

Apr, 30

High Performance Data Leak Detection

We describe a novel deep packet inspection technique that provides precise quantitative measures for detecting data exfiltration. We point out the fundamental differences between our data leak detection and the conventional intrusion detection systems (IDS). The key to our solution is a powerful sampling algorithm and a sophisticated local alignment algorithm. Our sampling method has […]

CUDA

Apr, 30

Mr. Scan: Extreme Scale Density-Based Clustering using a Tree-Based Network of GPGPU Nodes

Density-based clustering algorithms are a widely-used class of data mining techniques that can find irregularly shaped clusters and cluster data without prior knowledge of the number of clusters it contains. DBSCAN is the most well-known density-based clustering algorithm. We introduce our version of DBSCAN, called Mr. Scan, which uses a hybrid parallel implementation that combines […]

CUDA

Apr, 29

RealTime GPU-Based Motion Planning for Task Executions

We present a realtime GPU-based motion planning algorithm for robot task executions. Many task execution strategies break down a high-level task planning problem into multiple low-level motion planning problems, and it is essential to solve those problems at interactive rates. In order to achieve high performance for the planning, our method exploits a high number […]

CUDA

Apr, 29

Multigrid Optimization Methods for High Performance Computing

The aim of this work was the investigation of implementability and efficiency of an algorithm for solving optimal control problems on a new hardware architecture. For an academic test problem the collective smoothing multigrid method (CSMG) was realized on a commodity graphics card (GPU) and the performance in term of elapsed time compared to those […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Parallel For Loops on Heterogeneous Resources

GPU-based Steady-State Solution of the Chemical Master Equation

ACL2 Meets the GPU: Formalizing a CUDA-based Parallelizable All-Pairs Shortest Path Algorithm in ACL2

Graphics Programming on the Web WebCL Course Notes

The 2013 International Conference on Network Computing and Information Security and the 2013 International Conference on Multimedia and Signal Processing, NCIS’13- CMSP’13

Automatic Compilation for Heterogeneous Architectures with Single Assignment C

Improving Numerical Accuracy for Non-Negative Matrix Multiplication on GPUs using Recursive Algorithms

MPI Derived Datatypes Processing on Noncontiguous GPU-resident Data

High Performance Data Leak Detection

Mr. Scan: Extreme Scale Density-Based Clustering using a Tree-Based Network of GPGPU Nodes

RealTime GPU-Based Motion Planning for Task Executions

Multigrid Optimization Methods for High Performance Computing

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)