high performance computing on graphics processing units: hgpu.org

Posts

Jul, 6

Two-way partitioning of a recursive Gaussian filter in CUDA

Recursive Gaussian filters are more efficient than basic Gaussian filters when its filter window size is large. Since the computation of a point should start after the computation of its neighborhood points, recursive Gaussian filters are line oriented. Thus, the degree of parallelism is restricted by the length of the data image. In order to […]

CUDA

Jul, 6

SimCommSys: taking the errors out of error-correcting code simulations

In this study, we present SimCommSys, a simulator of communication systems that we are releasing under an open source license. The core of the project is a set of C + + libraries defining communication system components and a distributed Monte Carlo simulator. Of principal interest is the error-control coding component, where various kinds of […]

CUDA

Jul, 6

A Parallelized Implementation for H. 264 Real-time Encoding Scheme

In this paper, a high-speed video stream encoder for the H.264 digital video codec standard specification is accelerated with nowadays parallel processing architectures. Based on the parallel processing techniques with GPU’s, we used an OpenCL-based GPU kernel programs, and finally achieved a high-level CPU-GPU interoperability. In its design, our system makes the CPU perform all […]

OpenCL

Jul, 6

High-level Parallel Programming Support for Heterogeneous Systems

This master thesis focuses on several high-level parallel programming models for heterogeneous systems that have been becoming increasingly popular in the field of high-performance computing. Heterogeneous systems are an inexpensive and effective way for further performance improvements. A powerful combination of graphics processing units (GPUs) and central processing units (CPUs) is one of the most […]

CUDA

•

OpenCL

Jul, 4

Writing self-adaptive codes for heterogeneous systems

Heterogeneous systems are becoming increasingly common. Relatedly, the popularity of OpenCL is growing, as it provides a unified mean to program a wide variety of devices including GPUs or multicore CPUs. More recently, the Heterogeneous Programming Library (HPL) targets the same variety of systems as OpenCL, intending to improve their programmability. The main drawback of […]

OpenCL

Jul, 4

Molecular dynamics simulations through GPU video games technologies

Bioinformatics is the scientific field that focuses on the application of computer technology to the management of biological information. Over the years, bioinformatics applications have been used to store, process and integrate biological and genetic information, using a wide range of methodologies. One of the most de novo techniques used to understand the physical movements […]

CUDA

•

OpenCL

Jul, 4

High-Level Energy Model of Embedded GPU for Real-Time Graphic Rendering

Embedded graphic processing unit (GPU) accelerates a real-time rendering process of a graphics application on mobile devices, however, at the cost of consuming a considerable portion of the system energy [1] which is one of the most critical design issues for battery-operated devices. To estimate the power consumption of a graphics application, conventional approaches collect […]

OpenGL

Jul, 4

A second generation of DEFG: Declarative Framework for GPUs

DEFG is our declarative language and framework for the efficient generation of OpenCL GPU applications. Using our new DEFG implementation, run-time and lines-of-code comparisons are provided for three well-known algorithms: Sobel image filtering, breadth-first search and all-pairs shortest path. The DEFG declarative language and corresponding OpenCL kernels provide complete OpenCL applications. The lines-of-code comparison demonstrates […]

OpenCL

Jul, 4

On Static Timing Analysis of GPU Kernels

We study static timing analysis of programs running on GPU accelerators. Such programs follow a data parallel programming model that allows massive parallelism on manycore processors. Data parallel programming and GPUs as accelerators have received wide use during the recent years. The timing analysis of programs running on single core machines is well known and […]

OpenCL

Jul, 4

The Design and Implementation of a GPU-enabled Multi-objective Tabu-search Intended for Real World and High-dimensional Applications

Metaheuristics is a class of approximate methods based on heuristics that can effectively handle real world (usually NP-hard) problems of high-dimensionality with multiple objectives. An existing multi-objective Tabu-Search (MOTS2) has been re-designed by and ported onto Compute Unified Device Architecture (CUDA) so as to effectively deal with a scalable multi-objective problem with a range of […]

CUDA

Jul, 4

Parallel Implementation of Travelling Salesman Problem using Ant Colony Optimization

In this paper we have proposed parallel implementation of Ant colony optimization Ant System algorithm on GPU using OpenCL. We have done comparison on different parameters of the ACO which directly or indirectly affect the result. Parallel comparison of speedup between CPU and GPU implementation is done with a speed up of 3.11x in CPU […]

OpenCL

Jul, 4

SIMD Implementation of a Multiplicative Schwarz Smoother for a Multigrid Poisson Solver on an Intel Xeon Phi Coprocessor

In this paper, we discuss an efficient implementation of the three-dimensional multigrid Poisson solver on a many-core coprocessor, Intel Xeon Phi. We have used the modified block red-black (mBRB) Gauss-Seidel (GS) smoother to achieve sufficient degree of parallelism and high cache hit ratio. We have vectorized (SIMDized) the GS steps in the smoother by introducing […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Two-way partitioning of a recursive Gaussian filter in CUDA

SimCommSys: taking the errors out of error-correcting code simulations

A Parallelized Implementation for H. 264 Real-time Encoding Scheme

High-level Parallel Programming Support for Heterogeneous Systems

Writing self-adaptive codes for heterogeneous systems

Molecular dynamics simulations through GPU video games technologies

High-Level Energy Model of Embedded GPU for Real-Time Graphic Rendering

A second generation of DEFG: Declarative Framework for GPUs

On Static Timing Analysis of GPU Kernels

The Design and Implementation of a GPU-enabled Multi-objective Tabu-search Intended for Real World and High-dimensional Applications

Parallel Implementation of Travelling Salesman Problem using Ant Colony Optimization

SIMD Implementation of a Multiplicative Schwarz Smoother for a Multigrid Poisson Solver on an Intel Xeon Phi Coprocessor

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)