high performance computing on graphics processing units: hgpu.org

Posts

Nov, 13

Manycore processing of repeated range queries over massive moving objects observations

The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised systems to feature low latency and high scalability. In this paper we focus on a specific data-intensive problem, concerning the repeated processing […]

CUDA

Nov, 12

Brute force de-shredding algorithm using the GPU

The graphics processing unit (GPU) has seen significant increase in performance over the past few years. Hence the interest in using GPUs for more general purposes has increased. The higher number of cores on a GPU allows it to outperform central processing units (CPUs). However, since in certain aspects instructions executed on the GPU must […]

OpenCL

Nov, 12

Locality-Aware Mapping of Nested Parallel Patterns on GPUs

Recent work has explored using higher level languages to improve programmer productivity on GPUs. These languages often utilize high level computation patterns (e.g., Map and Reduce) that encode parallel semantics to enable automatic compilation to GPU kernels. However, the problem of efficiently mapping patterns to GPU hardware becomes significantly more difficult when the patterns are […]

CUDA

Nov, 12

Accelerated Runtime Verification of LTL Specifications with Counting Semantics

Runtime verification is an effective automated method for specification-based offline testing and analysis as well as online monitoring of complex systems. The specification language is often a variant of regular expressions or a popular temporal logic, such as LTL. This paper presents a novel and efficient parallel algorithm for verifying a more expressive version of […]

CUDA

Nov, 12

Grace: a Cross-platform Micromagnetic Simulator On Graphics Processing Units

A micromagnetic simulator running on graphics processing unit (GPU) is presented. It achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude for large input problems. Different from GPU implementations of other research groups, this simulator is developed with C++ Accelerated Massive Parallelism (C++ AMP) and […]

Nov, 12

Code Optimization on Kepler GPUs and Xeon Phi

Kepler GTX Titan Black and Kepler Tesla K40 are still the best GPUs for high performance computing, although Maxwell GPUs such as GTX 980 are available in the market. Hence, we measure the performance of our lattice QCD codes using the Kepler GPUs. We also upgrade our code to use the latest CPS (Columbia Physics […]

CUDA

Nov, 9

An Execution Model for OpenCL 2.0

A popular approach to programming manycore GPUs is the Single Instruction Multiple Thread (SIMT) abstraction. SIMT has the benefit of presenting a "single thread" view, alleviating the complexity of explicitly vectorizing the source code. However, due to the SIMD nature of the underlying hardware it is often difficult to fully hide all aspects from the […]

OpenCL

Nov, 9

Real-time 3D Reconstruction for FPGAs: A Case Study for Evaluating the Performance, Area, and Programmability Trade-offs of the Altera OpenCL SDK

Embedding real-time 3D reconstruction of a scene from a low-cost depth sensor can improve the development of technologies in the domains of augmented reality, mobile robotics, and more. However, current implementations require a computer with a powerful GPU, which limits its prospective applications with low-power requirements. To implement low-power 3D reconstruction we embedded two prominent […]

OpenCL

Nov, 9

Relax-Miracle: GPU Parallelization of Semi-Analytic Fourier-Domain solvers for Earthquake Modeling

Effective utilization of GPU processing capacity for scientific workloads is often limited by memory throughput and PCIe communication transfer times. This is particularly true for semi-analytic Fourier-domain computations in earthquake modeling (Relax) where operations on large-scale 3D data structures can require moving large volumes of data from storage to the compute in predictable but orthogonal […]

CUDA

Nov, 9

Parallel FIM Approach on GPU using OpenCL

In this paper, we describe GPU-Eclat algorithm, a GPU (General Purpose Graphics Processing Unit) enhanced implementation of Frequent Item set Mining (FIM). The frequent itemsets are extracted from a transactional database as it is a essential assignment in data mining field because of its broad applications in mining association rules, time series, correlations etc. The […]

OpenCL

Nov, 9

Dogwild! – Distributed Hogwild for CPU & GPU

Deep learning has enjoyed tremendous success in recent years. Unfortunately, training large models can be very time consuming, even on GPU hardware. We describe a set of extensions to the state of the art Caffe library [3], allowing training on multiple threads and GPUs, and across multiple machines. Our focus is on architecture, implementing asynchronous […]

CUDA

Nov, 5

Graphics Processing Unit-Based Computer-Aided Design Algorithms for Electronic Design Automation

This dissertation presents research focusing on reshaping the design paradigm of electronic design automation (EDA) applications to embrace the computational throughput of a massively parallel computing architecture. The EDA industry has gone through major evolution in algorithm designs over the past several decades, delivering improved and more sophisticated design tools. Today, these tools provide a […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Manycore processing of repeated range queries over massive moving objects observations

Brute force de-shredding algorithm using the GPU

Locality-Aware Mapping of Nested Parallel Patterns on GPUs

Accelerated Runtime Verification of LTL Specifications with Counting Semantics

Grace: a Cross-platform Micromagnetic Simulator On Graphics Processing Units

Code Optimization on Kepler GPUs and Xeon Phi

An Execution Model for OpenCL 2.0

Real-time 3D Reconstruction for FPGAs: A Case Study for Evaluating the Performance, Area, and Programmability Trade-offs of the Altera OpenCL SDK

Relax-Miracle: GPU Parallelization of Semi-Analytic Fourier-Domain solvers for Earthquake Modeling

Parallel FIM Approach on GPU using OpenCL

Dogwild! – Distributed Hogwild for CPU & GPU

Graphics Processing Unit-Based Computer-Aided Design Algorithms for Electronic Design Automation

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)