high performance computing on graphics processing units: hgpu.org

Posts

Feb, 11

Confidentiality Issues on a GPU in a Virtualized Environment

General-Purpose computing on Graphics Processing Units (GPGPU) combined to cloud computing is already a commercial success. However, there is little literature that investigates its security implications. Our objective is to highlight possible information leakage due to GPUs in virtualized and cloud computing environments. We provide insight into the different GPU virtualization techniques, along with their […]

CUDA

Feb, 11

Exploiting GPU Parallelism to Optimize Real-World Problems

Construction of optimal schedule for airline crew-scheduling requires high computation time. The main objective to create this optimal schedule is to assign all the crews to available flights in a minimum amount of time. This is a highly constrained optimization problem. In this paper, we implement co-evolutionary genetic algorithm in order to solve this problem. […]

CUDA

Feb, 11

Exploring Multiple Levels of Performance Modeling for Heterogeneous Systems

One of the major challenges faced by the HPC community today is user-friendly and accurate heterogeneous performance modeling. Although performance prediction models exist to fine-tune applications, they are seldom easy-to-use and do not address multiple levels of design space abstraction. Our research aims to bridge the gap between reliable performance model selection and user-friendly analysis. […]

CUDA

Feb, 11

Benchmarking the Intel Xeon Phi Coprocessor

This document summarizes our first experience with the Intel Xeon Phi. This is a coprocessor that uses Intel’s Many Integrated Core (MIC) architecture to speed up highly parallel processes involving intensive numerical computations. The MIC coprocessor communicates with a regular Intel Xeon ("host") processor through its operating system. The Xeon Phi coprocessor is sometimes referred […]

Feb, 11

Point Rendering in CUDA Path Tracer

A novel technique for point rendering in a CUDA path tracer is introduced in this proposal. The approach makes it possible to render point represented geometries with global illumination effects. Octree data structure is combined in order for more efficient intersection determination. Furthermore, Octree enables the users/artists to choose the level of details of the […]

CUDA

Feb, 11

Accurate Cross-Architecture Performance Modeling for Sparse Matrix-Vector Multiplication (SpMV) on GPUs

This paper presents an integrated analytical and profile-based cross-architecture performance modeling tool to specifically provide inter-architecture performance prediction for Sparse Matrix-Vector Multiplication (SpMV) on NVIDIA GPU architectures. To design and construct the tool, we investigate the inter-architecture relative performance for multiple SpMV kernels. For a sparse matrix, based on its SpMV kernel performance measured on […]

CUDA

Feb, 11

Implementing the Projected Spatial Rich Features on a GPU

The Projected Spatial Rich Model (PSRM) generates powerful steganalysis features, but requires the calculation of tens of thousands of convolutions with image noise residuals. This makes it very slow: the reference implementation takes an impractical 20{30 minutes per 1 megapixel (Mpix) image. We present a case study which first tweaks the definition of the PSRM […]

CUDA

Feb, 11

Genetically Improved CUDA C++ Software

Genetic Programming (GP) may dramatically increase the performance of software written by domain experts. GP and autotuning are used to optimise and refactor legacy GPGPU C code for modern parallel graphics hardware and software. Speed ups of more than six times on recent nVidia GPU cards are reported compared to the original kernel on the […]

CUDA

Feb, 9

GPGPU-Assisted Subpixel Tracking Method for Fiducial Markers

With an aim to realizing highly accurate position estimation, we propose in this paper a method for efficiently and accurately detecting the 3D positions and poses of traditional fiducial markers with black frames in high-resolution images taken by ordinary web cameras. Our tracking method can be efficiently executed utilizing GPGPU computation, and in order to […]

OpenCL

Feb, 9

Benchmarks for Intel MIC Architecture

Intel Many Integrated Core (MIC) Architecture combines about 60 cores onto a single chips. Intel MIC brand named Xeon Phi offers a theoretical maximum of more than 3 double precision GFLOPs than Intel Xeon E5 core. We carry out benchmarks for Intel MIC with a Monte Carlo simulation of LIBOR Market Model. The results show […]

Feb, 9

Extending the SkelCL Skeleton Library for Stencil Computations on Multi-GPU Systems

The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA, which makes it a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high level of programming abstraction […]

OpenCL

Feb, 9

A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow

Graphics processing units (GPUs) are increasingly used for non-graphics computing. However, applications with divergent control flow incur performance degradation on current GPUs. These GPUs implement the SIMT execution model by serializing the execution of different control flow paths encountered by a warp. This serialization can mask thread level parallelism among the scalar threads comprising a […]

high performance computing on graphics processing units: hgpu.org

Posts

Confidentiality Issues on a GPU in a Virtualized Environment

Exploiting GPU Parallelism to Optimize Real-World Problems

Exploring Multiple Levels of Performance Modeling for Heterogeneous Systems

Benchmarking the Intel Xeon Phi Coprocessor

Point Rendering in CUDA Path Tracer

Accurate Cross-Architecture Performance Modeling for Sparse Matrix-Vector Multiplication (SpMV) on GPUs

Implementing the Projected Spatial Rich Features on a GPU

Genetically Improved CUDA C++ Software

GPGPU-Assisted Subpixel Tracking Method for Fiducial Markers

Benchmarks for Intel MIC Architecture

Extending the SkelCL Skeleton Library for Stencil Computations on Multi-GPU Systems

A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)