high performance computing on graphics processing units: hgpu.org

Posts

Jun, 4

Plasma Visualization in Parallel using Particle Systems on Graphical Processing Units

Visualising and simulating charged plasma systems present additional challenges to conventional particle methods. Plasmas exhibit multi scale phenomena that often prevent the use of standard localisation approximations. Plasmas as particle systems that emit light are important in many interesting components of games, computer animated movies such as weapons fire, explosions, astronomical effects. They also have […]

CUDA

Jun, 4

Using sparse optical flow for multiple Kinect applications

The use of Multiple Microsoft Kinects has become prominent in the last two years and enjoyed widespread acceptance. While several work has been published to mitigate quality degradations in the precomputed depth image, this work focuses on employing an optical flow suitable for dot patterns as employed in the Kinect to retrieve subtle scene data […]

CUDA

Jun, 4

Data-driven versus Topology-driven Irregular Computations on GPUs

Irregular algorithms are algorithms with complex main data structures such as directed and undirected graphs, trees, etc. A useful abstraction for many irregular algorithms is its operator formulation in which the algorithm is viewed as the iterated application of an operator to certain nodes, called active nodes, in the graph. Each operator application, called an […]

CUDA

Jun, 4

GPU Acceleration of a Genetic Algorithm for the Synthesis of FSM-based Bimodal Predictors

This paper presents a fast GPU implementation of a genetic algorithm for synthesizing bimodal predictor FSMs of a given size. Bimodal predictors, i.e., predictors that make binary yes/no predictions, are ubiquitous in microprocessors. Many of these predictors are based on finite-state machines (FSMs). However, there are countless possible FSMs and even heuristic searches for finding […]

CUDA

Jun, 4

CUDA Leaks: Information Leakage in GPU Architectures

Graphics Processing Units (GPUs) are deployed on most present server, desktop, and even mobile platforms. Nowadays, a growing number of applications leverage the high parallelism offered by this architecture to speed-up general purpose computation. This phenomenon is called GPGPU computing (General Purpose GPU computing). The aim of this work is to discover and highlight security […]

CUDA

Jun, 2

GPU-accelerated generation of correctly-rounded elementary functions

The IEEE 754-2008 standard recommends the correct rounding of elementary functions. This requires to solve the Table Maker’s Dilemma which implies a huge amount of CPU computation time. We consider in this paper accelerating such computations, namely Lefevre algorithm, on Graphics Processing Units (GPU) which are massively parallel architectures with a partial SIMD execution (Single […]

CUDA

Jun, 2

GPUburn: A System to Test and Mitigate GPU Hardware Failures

Due to many factors such as, high transistor density, high frequency, and low voltage, today’s processors are more than ever subject to hardware failures. These errors have various impacts depending on the location of the error and the type of processor. Because of the hierarchical structure of the compute units and work scheduling, the hardware […]

CUDA

•

OpenCL

Jun, 2

Novel Multi-Layer Network Decomposition Boosting Acceleration of Multi-core Algorithms

Complex networks are a technique for the modeling and analysis of large data sets in many scientific and engineering disciplines. Due to their excessive size conventional algorithms and single core processors struggle with the efficient processing of such networks. Employing multi-core graphic processing units (GPUs) could provide sufficient processing power for the analysis of such […]

CUDA

Jun, 2

On GPU Fourier Transformations

The Fourier Transform is one of the most in uential mathematical equations of our time. The Discrete Fourier Transform (DFT) (which is equal to the Fourier Transform for signals with equally spaced samples) has been improved by a more efficient algorithm called the Fast Fourier Transform contributed by Cooley-Tukey[8] and Gentlemen-Sande[11]. Improvements since then have […]

CUDA

Jun, 2

Task scheduling in hybrid CPU-GPU systems

The distribution of workload among available computational units is an essential problem for every parallel system. It has been attended thoroughly from many perspectives, such as thread scheduling in operating systems, task scheduling in frameworks for parallel computations, or constrained scheduling in real-time systems. However, each system has unique properties and requirements, thus we cannot […]

OpenCL

May, 31

Composition and Reuse with Compiled Domain-Specific Languages

Programmers who need high performance currently rely on low-level, architecture-specific programming models (e.g. OpenMP for CMPs, CUDA for GPUs, MPI for clusters). Performance optimization with these frameworks usually requires expertise in the specific programming model and a deep understanding of the target architecture. Domain-specific languages (DSLs) are a promising alternative, allowing compilers to map problem-specific […]

CUDA

May, 31

Geometric Optimisation using Karva for Graphical Processing Units

Population-based evolutionary algorithms continue to play an important role in artifically intelligent systems, but can not always easily use parallel computation. We have combined a geometric (any-space) particle swarm optimisation algorithm with use of Ferreira’s Karva language of gene expression programming to produce a hybrid that can accelerate the genetic operators and which can rapidly […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Plasma Visualization in Parallel using Particle Systems on Graphical Processing Units

Using sparse optical flow for multiple Kinect applications

Data-driven versus Topology-driven Irregular Computations on GPUs

GPU Acceleration of a Genetic Algorithm for the Synthesis of FSM-based Bimodal Predictors

CUDA Leaks: Information Leakage in GPU Architectures

GPU-accelerated generation of correctly-rounded elementary functions

GPUburn: A System to Test and Mitigate GPU Hardware Failures

Novel Multi-Layer Network Decomposition Boosting Acceleration of Multi-core Algorithms

On GPU Fourier Transformations

Task scheduling in hybrid CPU-GPU systems

Composition and Reuse with Compiled Domain-Specific Languages

Geometric Optimisation using Karva for Graphical Processing Units

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)