high performance computing on graphics processing units: hgpu.org

Posts

Nov, 16

A Methodology for Translating C-Programs to OpenCL

Graphics Processing Units (GPUs) is currently a common feature of high performance computing. Languages such as CUDA and Open Computing Language (OpenCL) are such programming models; provide a standard interface for achieving high performance across these GPU devices. However, because of the wide variety of architectural complexities of these GPU devices; often makes difficult to […]

OpenCL

Nov, 16

Ray-Tracing Based Interactive Camera Simulation

Camera simulation aims to enhance realistic rendering, lens design and augmented reality by accurately simulating geometrical optics. Here our work focuses on optical phenomena, such as depth of field, monochromatic aberration, distortion, and aperture exposure, that are based on real camera lens architecture. Our approach is modeling pixel equation using ray tracing algorithm to render […]

OpenGL

Nov, 16

Graphics Processing Unit-Accelerated Quantitative Trait Loci Detection

Mapping quantitative trait loci (QTL) using genetic marker information is a time-consuming analysis that has interested the mapping community in recent decades. The increasing amount of genetic marker data allows one to consider ever more precise QTL analyses while increasing the demand for computation. Part of the difficulty of detecting QTLs resides in finding appropriate […]

CUDA

Nov, 16

A Fast GVF Snake Algorithm on the GPU

GVF Snake is one of the most widely-used edge detection algorithms, nevertheless subject to its slow computation. This study reveals the bottleneck and transfers the time-consuming part of this algorithm to the GPU for better performance. In detail, this algorithm is decomposed into three parts, (1) GVF Computation, (2) inversing a circulant matrix and (3) […]

OpenGL

Nov, 16

Using GPUs to Crack Android Pattern-based Passwords

We investigate the strength of patterns as secret signatures in Android’s pattern based authentication mechanism. Parallelism of GPU is exploited to exhaustively search for the secret pattern. Typically, searching for a pattern, composed of a number of nodes and edges, requires an exhaustive search for the pattern. In this work, we show that the use […]

CUDA

Nov, 14

SAGE: Self-Tuning Approximation for Graphics Engines

Approximate computing, where computation accuracy is traded off for better performance or higher data throughput, is one solution that can help data processing keep pace with the current and growing overabundance of information. For particular domains such as multimedia and learning algorithms, approximation is commonly used today. We consider automation to be essential to provide […]

CUDA

Nov, 14

Real-Time Screen Space Rendering of Cartoon Water

A non-photorealistic rendering style is constantly chosen in games to emphasize the fantasy of the story. In this scenario, the presence of natural elements such as water is common. Simulation and rendering of water in 3D worlds still presents some technical challenges though. This paper describes an approach to render cartoon style water in real […]

CUDA

•

OpenGL

Nov, 14

Optical Flow via Locally Adaptive Fusion of Complementary Data Costs

Many state-of-the-art optical flow estimation algorithms optimize the data and regularization terms to solve ill-posed problems. In this paper, in contrast to the conventional optical flow framework that uses a single or fixed data model, we study a novel framework that employs locally varying data term that adaptively combines different multiple types of data models. […]

CUDA

Nov, 14

On the origin of yet another channel

Cryptanalysis of a cryptographic function like stream, block or hash function usually requires human cryptanalytical skills and labour. However, some automation is possible – e.g., by randomness testing suites like NIST/Diehard that can be applied to test statistical properties of cryptographic function outputs. Yet such testing suites are limited only to predefined statistical functions.We propose […]

CUDA

Nov, 14

A finite volume approach for the simulation of nonlinear dissipative acoustic wave propagation

A form of the conservation equations for fluid dynamics is presented, deduced using slightly less restrictive hypothesis than those necessary to obtain the well known Westervelt equation. This formulation accounts for full wave diffraction, nonlinearity, and thermoviscous dissipative effects. A CLAWPACK based, 2D finite volume method using the Roe linearization was implemented to obtain numerically […]

CUDA

Nov, 13

Designing Scientific Applications on GPUs

Many of today’s complex scientific applications now require a vast amount of computational power. General purpose graphics processing units (GPGPUs) enable researchers in a variety of fields to benefit from the computational power of all the cores available inside graphics cards. Understand the Benefits of Using GPUs for Many Scientific Applications: Designing Scientific Applications on […]

CUDA

Nov, 13

Anatomy of High-Performance Many-Threaded Matrix Multiplication

BLIS is a new framework for rapid instantiation of the BLAS. We describe how BLIS extends the "GotoBLAS approach" to implementing matrix multiplication (GEMM). While GEMM was previously implemented as three loops around an inner kernel, BLIS exposes two additional loops within that inner kernel, casting the computation in terms of the BLIS microkernel so […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Methodology for Translating C-Programs to OpenCL

Ray-Tracing Based Interactive Camera Simulation

Graphics Processing Unit-Accelerated Quantitative Trait Loci Detection

A Fast GVF Snake Algorithm on the GPU

Using GPUs to Crack Android Pattern-based Passwords

SAGE: Self-Tuning Approximation for Graphics Engines

Real-Time Screen Space Rendering of Cartoon Water

Optical Flow via Locally Adaptive Fusion of Complementary Data Costs

On the origin of yet another channel

A finite volume approach for the simulation of nonlinear dissipative acoustic wave propagation

Designing Scientific Applications on GPUs

Anatomy of High-Performance Many-Threaded Matrix Multiplication

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)