high performance computing on graphics processing units: hgpu.org

Posts

Aug, 26

Challenging cloning related problems with GPU-based algorithms

Graphics Processing Unit (GPU) have been around for a while. Although they are primarily used for high-end 3D graphics processing, their use is now acknowledged for general massive parallel computing. This paper presents an original technique based on [10] to compute many instances of the longest common subsequence problem on a generic GPU architecture using […]

Aug, 26

Considerations when evaluating microprocessor platforms

Motivated by recent papers comparing CPU and GPU performance, this paper explores the questions: Why do we compare microprocessors and by what means should we compare them? We distinguish two distinct perspectives from which to make comparisons: application developers and computer architecture researchers. We survey the distinct concerns of these groups, identifying essential information each […]

Aug, 26

Exploring graphics processing units as parallel coprocessors for online aggregation

Multidimensional aggregation is one of the most important computational building blocks and hence also a potential performance bottleneck in Online Analytic Processing (OLAP). In order to deliver fast query responses for interactive operations such as slicing, dicing, roll-up and drill-down, it is essential that aggregates along the relevant dimensions of a data cube can be […]

CUDA

Aug, 26

Parallel Viewshed Analysis on GPU Using CUDA

Viewshed analysis is a long established function of many geographical information systems to determine the visible cells of an input raster from one or more observers. It can be extended into large scale or higher resolution which requires the parallel implementation for time-tolerance. In this paper, we describe a GPU parallelization of viewshed analysis using […]

CUDA

Aug, 26

GPU Based Real-time Correction for Optical Distortions in Head-Mounted Displays

This paper presents a GPU-based real-time method to correct optical distortions in head-mounted displays (HMDs). The HMD to be corrected is a lightweight and wide field-of-view HMD system with free-form-surface (FFS) prism, in which the image distortion is not rectilinear and centrosymmetric. A special predistortion model is constructed to correct the distortion of the HMD. […]

Aug, 26

Acceleration of an improved Retinex algorithm

Retinex is an image restoration method and the center/surround Retinex is appropriate for parallelization because it utilizes a convolution operation with large kernel size to achieve dynamic range compression and color/lightness rendition. However, its great capability for image enhancement comes with intensive computation. This paper presents a GPURetinex, which is a data parallel algorithm based […]

CUDA

Aug, 26

Accelerating tetrahedral interpolation with data-level and Thread-Level Parallel optimization

The tetrahedral interpolation method for color space conversion consumes the longest time in the entire color management process. This makes it difficult to implement a purely software-based high-end image processing system. In this study, SIMD (Single Instruction Multiple Data) and GPGPU (General Purpose Graphics Processing Unit) based optimizations for tetrahedral interpolation are implemented. To exploit […]

Aug, 26

Multi-level parallelism, global arrays, GPGPU Programming: Unify programming paradigms on Grid computing with efficiency

As technology advances, computing resources also gain benefits in many aspects: larger capacity, increased capability as well as rapidity. However, with heterogeneously distributed resources in Grid computing environment, the development an application to fully utilize the resources is a challenge. Especially, the computing resources themselves regularly upgrade their computing power for example by recruiting General […]

Aug, 25

TH-1: China’s first petaflop supercomputer

In recent years, heterogeneous systems and cooperative computing have become popular research directions in the field of high performance computing. With fast scaling of the size of high performance computer systems, problems such as power consumption and reliability come to the forefront. The aim of high performance computing has thus shifted from merely seeking peak […]

Aug, 25

Hera-JVM: a runtime system for heterogeneous multi-core architectures

Heterogeneous multi-core processors, such as the IBM Cell processor, can deliver high performance. However, these processors are notoriously difficult to program: different cores support different instruction set architectures, and the processor as a whole does not provide coherence between the different cores’ local memories. We present Hera-JVM, an implementation of the Java Virtual Machine which […]

Aug, 25

Parallelizing compiler framework and API for power reduction and software productivity of real-time heterogeneous multicores

Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between […]

Aug, 25

MobiRT: an implementation of OpenGL ES-based CPU-GPU hybrid ray tracer for mobile devices

Three-dimensional user interfaces on mobile devices are increasingly important. For more realistic three-dimensional visualization on mobile devices, we present the implementation of an OpenGL ES-based CPU-GPU hybrid ray tracer. This ray tracer exploits the availability of CPU and GPU architectures to fully support reflection, refraction, hard shadows, and dynamic scenes. To the best of our […]

OpenGL

high performance computing on graphics processing units: hgpu.org

Posts

Challenging cloning related problems with GPU-based algorithms

Considerations when evaluating microprocessor platforms

Exploring graphics processing units as parallel coprocessors for online aggregation

Parallel Viewshed Analysis on GPU Using CUDA

GPU Based Real-time Correction for Optical Distortions in Head-Mounted Displays

Acceleration of an improved Retinex algorithm

Accelerating tetrahedral interpolation with data-level and Thread-Level Parallel optimization

Multi-level parallelism, global arrays, GPGPU Programming: Unify programming paradigms on Grid computing with efficiency

TH-1: China’s first petaflop supercomputer

Hera-JVM: a runtime system for heterogeneous multi-core architectures

Parallelizing compiler framework and API for power reduction and software productivity of real-time heterogeneous multicores

MobiRT: an implementation of OpenGL ES-based CPU-GPU hybrid ray tracer for mobile devices

Recent source codes

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

OpScanner

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Most viewed papers (last 30 days)