high performance computing on graphics processing units: hgpu.org

Posts

Aug, 28

GPUVerify: A Verifier for GPU Kernels

We present a technique for verifying race- and divergencefreedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming termed synchronous, delayed visibility (SDV) semantics. The SDV semantics provides a precise definition of barrier divergence in […]

CUDA

•

OpenCL

Aug, 28

Intelligent Edge Detection using a CUDA Simulator of Multilayer Neural Network Based on Multi-Valued Neurons

In this paper, we consider the edge detection problem using an intelligent approach. We use a multilayer neural network based on multi-valued neurons (MLMVN) as an intelligent edge enhancer. MLMVN is a complex-valued neural network and it has many advantages over classical neural networks. It significantly outperforms a classical multilayer feedforward neural network in terms […]

CUDA

Aug, 28

Performance Comparison Between Cg-based and CUDA-based Matrix Multiplications

In this paper, we compare the performances of Cg-based and CUDA-based GPU programming APIs. In particular, their performances on squared matrix multiplications are considered. We also discuss other aspects of these widely-used GPU programming APIs. This work can help gain insight on various applications that involve matrix multiplication that are better suited for a specific […]

CUDA

Aug, 28

Optimization Techniques for CUDA Application

In this paper, we summarize our experiment results of applying various optimization techniques for CUDA application running on NVIDIA Fermi GPUs. Our experiments on matrix multiplication and breadth first search algorithms show that optimization techniques such as coalesced global memory access, conflict-free shared memory access and data pre-fetching improve the performance of applications running on […]

CUDA

Aug, 28

A Research of MapReduce with GPU Acceleration

MapReduce is an efficient distributed computing model on large data sets. The data processing is fully distributed on huge amount of nodes, and a MapReduce cluster is of highly scalable. However, single-node performance is gradually to be a bottleneck in computeintensive jobs, which makes it difficult to extend the MapReduce model to wider application fields […]

OpenCL

Aug, 27

A Unified Optimizing Compiler Framework for Different GPGPU Architectures

This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and judicious management of parallelism. The input to our compiler is a naive GPU kernel function, which is functionally correct but without […]

CUDA

•

OpenCL

Aug, 27

Low-Latency Elliptic Curve Scalar Multiplication

This paper presents a low-latency algorithm designed for parallel computer architectures to compute the scalar multiplication of elliptic curve points based on approaches from cryptographic side-channel analysis. A graphics processing unit implementation using a standardized elliptic curve over a 224-bit prime field, complying with the new 112-bit security level, computes the scalar multiplication in 1.9 […]

CUDA

Aug, 27

An Implementation of Coincidence Algorithm on Graphic Processing Units

Genetic Algorithms (GAs) are powerful search techniques. However when they are applied to complex problems, they consume large computation power. One of the choices to make them faster is to use a parallel implementation. This paper presents a parallel implementation of Combinatorial Optimisation with Coincidence Algorithm (COIN) on Graphic Processing Units. COIN is a modern […]

CUDA

Aug, 27

Perceptually Optimized Real-Time Computer Graphics

Perceptual optimization, the application of human visual perception models to remove imperceptible components in a graphics system, has been proven effective in achieving significant computational speedup. Previous implementations of this technique have focused on spatial level of detail reduction, which typically results in noticeable degradation of image quality. This thesis introduces refresh rate modulation (RRM), […]

OpenCL

Aug, 27

A Novel Approach to Visualizing Dark Matter Simulations

In the last decades cosmological N-body dark matter simulations have enabled ab initio studies of the formation of structure in the Universe. Gravity amplified small density fluctuations generated shortly after the Big Bang, leading to the formation of galaxies in the cosmic web. These calculations have led to a growing demand for methods to analyze […]

OpenGL

Aug, 26

GPU Accelerated Nonlinear Optimization in Radio Interferometric Calibration

We present the GPU based acceleration of two well known nonlinear optimization routines: Levenberg-Marquardt (LM) and Limited Memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) in radio interferometric calibration. Radio interferometric calibration is a heavily compute intensive operation where the same nonlinear optimization problem has to be solved over many time intervals, with different data. We achieve a speedup of […]

CUDA

Aug, 26

Efficient Dynamic Program Monitoring on Multi-Core Platforms

Software security and reliability have become increasingly important in the modern world. An effective approach to enforcing software security and reliability is to monitor a program’s execution at run time. However, instrumentation-based implementation of a dynamic program monitor on single-core systems suffers significant performance overhead. As multi-core architecture becomes more mainstream, implementing efficient dynamic program […]

CUDA

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

GPUVerify: A Verifier for GPU Kernels

Intelligent Edge Detection using a CUDA Simulator of Multilayer Neural Network Based on Multi-Valued Neurons

Performance Comparison Between Cg-based and CUDA-based Matrix Multiplications

Optimization Techniques for CUDA Application

A Research of MapReduce with GPU Acceleration

A Unified Optimizing Compiler Framework for Different GPGPU Architectures

Low-Latency Elliptic Curve Scalar Multiplication

An Implementation of Coincidence Algorithm on Graphic Processing Units

Perceptually Optimized Real-Time Computer Graphics

A Novel Approach to Visualizing Dark Matter Simulations

GPU Accelerated Nonlinear Optimization in Radio Interferometric Calibration

Efficient Dynamic Program Monitoring on Multi-Core Platforms

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)