high performance computing on graphics processing units: hgpu.org

Posts

Nov, 18

Processing Hard Sphere Collisions on a GPU Using OpenCL

Physically accurate hard sphere collisions are inherently sequential as the order in which collisions occur can have a significant impact on the resulting system. This makes processing hard sphere collisions on parallel hardware challenging. We present an approach to solving this problem that can be implemented using OpenCL that runs on current hardware. This approach […]

OpenCL

Nov, 9

An Execution Model for OpenCL 2.0

A popular approach to programming manycore GPUs is the Single Instruction Multiple Thread (SIMT) abstraction. SIMT has the benefit of presenting a "single thread" view, alleviating the complexity of explicitly vectorizing the source code. However, due to the SIMD nature of the underlying hardware it is often difficult to fully hide all aspects from the […]

OpenCL

Nov, 9

Real-time 3D Reconstruction for FPGAs: A Case Study for Evaluating the Performance, Area, and Programmability Trade-offs of the Altera OpenCL SDK

Embedding real-time 3D reconstruction of a scene from a low-cost depth sensor can improve the development of technologies in the domains of augmented reality, mobile robotics, and more. However, current implementations require a computer with a powerful GPU, which limits its prospective applications with low-power requirements. To implement low-power 3D reconstruction we embedded two prominent […]

OpenCL

Nov, 9

Parallel FIM Approach on GPU using OpenCL

In this paper, we describe GPU-Eclat algorithm, a GPU (General Purpose Graphics Processing Unit) enhanced implementation of Frequent Item set Mining (FIM). The frequent itemsets are extracted from a transactional database as it is a essential assignment in data mining field because of its broad applications in mining association rules, time series, correlations etc. The […]

OpenCL

Oct, 31

Using an OpenCL Framework to Evaluate Interconnect Implementations on FPGAs

Field Programmable Gate Arrays (FPGAs) are an ideal platform for building systems with custom hardware accelerators, however managing these systems is still a major challenge. The OpenCL standard has become accepted as a good programming model for managing heterogeneous platforms due to its rich constructs. Although commercial OpenCL frameworks are now emerging, there is a […]

OpenCL

Oct, 29

Implementing Level-3 BLAS Routines in OpenCL on Different Processing Units

This paper presents an implementation of different matrix-matrix multiplication routines in OpenCL. We utilize the high-performance GEMM (GEneral Matrix-Matrix Multiply) implementation from our previous work for the present implementation of other matrix-matrix multiply routines in Level-3 BLAS (Basic Linear Algebra Subprograms). The other routines include SYMM (Symmetric Matrix-Matrix Multiply), SYRK (Symmetric Rank-K Update), SYR2K (Symmetric […]

OpenCL

Oct, 25

GPGPU Acceleration for Skeletal Animation-comparing OpenCL with CUDA and GLSL

The existing matrix palette algorithms for skeletal animation are accelerated by the technique GPGPU based on GLSL or CUDA. Because GLSL is extended from graphics library OpenGL, it couples the rendering and calculations together closely and forces itself not convenient to reuse, meanwhile CUDA is designed only for NVIDIA GPUs. In this paper GPGPU based […]

CUDA

•

OpenCL

•

OpenGL

Oct, 16

The Distribution of OpenCL Kernel Execution Across Multiple Devices

Many computer systems now include both CPUs and programmable GPUs. OpenCL, a new programming framework, can program individual CPUs or GPUs; however, distributing a problem across multiple devices is more difficult. This thesis contributes three OpenCL runtimes that automatically distribute a problem across multiple devices: DualCL and m2sOpenCL, which distribute tasks across a single system’s […]

OpenCL

Oct, 16

OpenCL Implementation of Montgomery Multiplication on FPGA

Galois Field arithmetic has been used very frequently in popular security and error-correction applications. Montgomery multiplication is among the suitable methods used for accelerating modular multiplication, which is the most time consuming basic arithmetic operation. Montgomery multiplication is also suitable to be implemented in parallel. OpenCL, which is a portable, heterogeneous and parallel programming framework, […]

OpenCL

Oct, 14

A Case Study of OpenCL on an Android Mobile GPU

An observation in supercomputing in the past decade illustrates the transition of pervasive commodity products being integrated with the world’s fastest system. Given today’s exploding popularity of mobile devices, we investigate the possibilities for high performance mobile computing. Because parallel processing on mobile devices will be the key element in developing a mobile and computationally […]

OpenCL

Oct, 11

Monte Carlo Path Tracing with OpenCL

We introduce an interactive Monte Carlo path tracer that uses the OpenCL framework. A path tracer draws a photo-realistic image of a 3D scene by simulating physical effects of light. Interactivity enables the user to move around the scene in real time, while OpenCL makes it possible to run the same code on either CPU […]

OpenCL

Sep, 25

Performance Evaluation of Edge Detection Techniques on GPU Using OpenCL

GPU (Graphic processing system) enhance the performance of the performance of the computing field due to its hundreds of cores in parallel. CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) programming models are included in GPU. The advantage of these two programming models in GPU is that developers don’t have to understand any […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Processing Hard Sphere Collisions on a GPU Using OpenCL

An Execution Model for OpenCL 2.0

Real-time 3D Reconstruction for FPGAs: A Case Study for Evaluating the Performance, Area, and Programmability Trade-offs of the Altera OpenCL SDK

Parallel FIM Approach on GPU using OpenCL

Using an OpenCL Framework to Evaluate Interconnect Implementations on FPGAs

Implementing Level-3 BLAS Routines in OpenCL on Different Processing Units

GPGPU Acceleration for Skeletal Animation-comparing OpenCL with CUDA and GLSL

The Distribution of OpenCL Kernel Execution Across Multiple Devices

OpenCL Implementation of Montgomery Multiplication on FPGA

A Case Study of OpenCL on an Android Mobile GPU

Monte Carlo Path Tracing with OpenCL

Performance Evaluation of Edge Detection Techniques on GPU Using OpenCL

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)