high performance computing on graphics processing units: hgpu.org

Posts

Mar, 24

How to Benefit from AMD, Intel and Nvidia Accelerator Technologies in Scilab

This paper presents how to use, in Scilab, accelerator technologies from AMD, Intel and Nvidia in a exible and portable manner. The proposed approach aims at simplifying speeding up Scilab programs in an incremental process thanks to directives based parallel programming provided by CAPS OpenHMPP technology.

CUDA

•

OpenCL

Mar, 23

A Novel Data Structure for Particle System Simulation based on GPU with the Use of Neighborhood Grids

Simulation and visualization of particles in real-time can be a computationally intensive task. This intensity comes from diverse factories, being one of them is the O(n^2) complexity of the traversal algorithm, necessary for the proximity queries of all pair of particles that decide the need to compute collisions. Previous works reduced this complexity by considerably […]

CUDA

Mar, 23

Depth-First Search versus Jurema Search on GPU Branch-and-Bound Algorithms: a case study

Branch-and-Bound (B&B) is a general problem solving paradigm and it has been successfully used to prove the optimality of combinatorial optimization problems. The development of GPU-based parallel Branch-and-Bound algorithm is a brandnew and challenging topic on high performance computing and combinatorial optimization, motivated by GPU’s high performance and low cost. This work presents a strategy […]

CUDA

Mar, 23

Real-Time Implementation of the Pixel Purity Index Algorithm for Endmember Identification on GPUs

Spectral unmixing amounts at automatically finding the signatures of pure spectral components (called endmembersin the hyperspectral imaging literature) and their associated abundance fractions in each pixel of the hyperspectral image. Many algorithms have been proposed to automatically find spectral endmembers in hyperspectral data sets. Perhaps one of the most popular ones is the pixel purity […]

Mar, 23

GRay: a Massively Parallel GPU-Based Code for Ray Tracing in Relativistic Spacetimes

We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This GPU-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single precision floating-point arithmetic on a single GPU exceeds 300 […]

CUDA

Mar, 23

Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

Graphics processors, or GPUs, have recently been widely used as accelerators in the shared environments such as clusters and clouds. In such shared environments, many kernels are submitted to GPUs from different users, and throughput is an important metric for performance and total ownership cost. Despite the recently improved runtime support for concurrent GPU kernel […]

CUDA

Mar, 21

Efficient GPU implementation of the integral histogram

The integral histogram for images is an efficient preprocessing method for speeding up diverse computer vision algorithms including object detection, appearance-based tracking, recognition and segmentation. Our proposed Graphics Processing Unit (GPU) implementation uses parallel prefix sums on row and column histograms in a cross-weave scan with high GPU utilization and communication-aware data transfer between CPU […]

CUDA

Mar, 21

Performance study of filtered back-projection algorithms implemented on GPUs

In recent years the use of graphical processing units (GPUs) in the diverse fields of science has increase dramatically. This increase is not only due to the GPU tremendous computational power, but also because they are relatively cheap when compared to clusters. In this work we explore the use of the GPU to reduce the […]

CUDA

Mar, 21

GPGPU Test Suite Minimisation: Search Based Software Engineering Performance Improvement Using Graphics Cards

It has often been claimed that SBSE uses so-called "embarrassingly parallel" algorithms that will imbue SBSE applications with easy routes to dramatic performance improvements. However, despite recent advances in multicore computation, this claim remains largely theoretical; there are few reports of performance improvements using multicore SBSE. This paper shows how inexpensive General Purpose computing on […]

OpenCL

Mar, 21

Duplicate Detection on GPUs

With the ever increasing volume of data and the ability to integrate different data sources, data quality problems abound. Duplicate detection, as an integral part of data cleansing, is essential in modern information systems. We present a complete duplicate detection workflow that utilizes the capabilities of modern graphics processing units (GPUs) to increase the efficiency […]

OpenCL

Mar, 21

Stream Join Processing on Heterogeneous Processors

The window-based stream join is an important operator in all data streaming systems. It has often high resource requirements so that many efficient sequential as well as parallel versions of it were proposed in the literature. The parallel stream join operators recently gain increasing interest because hardware is getting more and more parallel. Most of […]

OpenCL

Mar, 20

Symbolic Crosschecking of Data-Parallel Floating Point Code

In this thesis we present a symbolic execution-based technique for cross-checking programs accelerated using SIMD or OpenCL against an unaccelerated version, as well as a technique for detecting data races in OpenCL programs. Our techniques are implemented in KLEE-CL, a symbolic execution engine based on KLEE that supports symbolic reasoning on the equivalence between expressions […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

How to Benefit from AMD, Intel and Nvidia Accelerator Technologies in Scilab

A Novel Data Structure for Particle System Simulation based on GPU with the Use of Neighborhood Grids

Depth-First Search versus Jurema Search on GPU Branch-and-Bound Algorithms: a case study

Real-Time Implementation of the Pixel Purity Index Algorithm for Endmember Identification on GPUs

GRay: a Massively Parallel GPU-Based Code for Ray Tracing in Relativistic Spacetimes

Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

Efficient GPU implementation of the integral histogram

Performance study of filtered back-projection algorithms implemented on GPUs

GPGPU Test Suite Minimisation: Search Based Software Engineering Performance Improvement Using Graphics Cards

Duplicate Detection on GPUs

Stream Join Processing on Heterogeneous Processors

Symbolic Crosschecking of Data-Parallel Floating Point Code

Recent source codes

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)