high performance computing on graphics processing units: hgpu.org

Posts

Mar, 20

Joint Forces: From Multithreaded Programming to GPU Computing

Desktop software developers interest in graphics hardware is increasing as a result of modern graphics cards’ capabilities to act as compute devices that augment the main processor. This capability means parallel computing is no longer a dedicated task for the CPU. A trend toward heterogeneous computing combines the main processor and graphics processing unit (GPU). […]

Mar, 19

Mean Shift Parallel Tracking on GPU

We propose a parallel Mean Shift (MS) tracking algorithm on Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). Traditional MS algorithm uses a large number of color histogram, say typically 16x16x16, which makes parallel implementation infeasible. We thus employ K-Means clustering to partition the object color space that enables us to represent color […]

CUDA

Mar, 19

Method for simulation of coastal terrain on GPU

The shader in the GPU are widely used to model coastal terrain, but the created terrain are of great similarity and unable to embody the differences of coastal features. To overcome the above disadvantage, we present a new modeling method for created terrain based on sketch map. Through specifying the coastal features type, the proposed […]

Mar, 19

Exploring utilisation of GPU for database applications

This study is devoted to exploring possible applications of GPU technology for acceleration of the database access. We use the n-gram based approximate text search engine as a test bed for GPU based acceleration algorithms. Two solutions – hybrid CPU/GPU and pure GPU algorithms for query processing are studied and compared with the baseline CPU […]

CUDA

Mar, 19

Efficient and Quality Contouring Algorithms on the GPU

Interactive isosurface extraction has recently become possible through successful efforts to map algorithms such as Marching Cubes (MC) and Marching Tetrahedra (MT) to modern Graphics Processing Unit (GPU) architectures. Other isosurfacing algorithms, however, are not so easily portable to GPUs, either because they involve more complex operations or because they are not based on discrete […]

Mar, 19

Implementing a GPU Programming Model on a non-GPU Accelerator Architecture

Parallel codes are written primarily for the purpose of performance. It is highly desirable that parallel codes be portable between parallel architectures without significant performance degradation or code rewrites. While performance portability and its limits have been studied thoroughly on single processor systems, this goal has been less extensively studied and is more difficult to […]

CUDA

Mar, 19

Scalable Multi Agent Simulation on the GPU

We present a unique and elegant graphics hardware realization of multi agent simulation. Specifically, we adapted Velocity Obstacles that suits well parallel computation on single instruction, multiple thread, SIMT, type architecture. We explore hash based nearest neighbors search to considerably optimize the algorithm when mapped on to the GPU. Moreover, to alleviate inefficiencies of agent […]

CUDA

Mar, 19

Password Recovery for RAR Files Using CUDA

Driven by the insatiable demand of real-time graphics, especially from the market of computer games, Graphics Processing Unit (GPU) is becoming a major computing horsepower during recent years since the performance of GPU is surpassing that of the contemporary CPU. This paper presents our study on how to efficiently recover the passwords for encrypted RAR […]

CUDA

Mar, 19

Password recovery for encrypted ZIP archives using GPUs

Protecting data by passwords in documents such as DOC, PDF or RAR, ZIP archives has been demonstrated to be weak under dictionary attacks. Time for recovering the passwords of such documents mainly depends on two factors: the size of the password search space and the computing power of the underline system. In this paper, we […]

CUDA

Mar, 19

RAR password decryption by utilizing GPU

Graphics processing unit GPU supports data parallel computation through single instruction multi-data, and provides powerful logic computation ability. We have testified that RAR password decryption rate is greatly improved utilizing parallel computation ability of GPU.

Mar, 19

Clipmapping on the GPU

Dealing with high-resolution imagery with billions or trillions of samples is an enormous challenge that oftenoverwhelms the graphics subsystem of any computer. Silicon Graphics, Inc. addressed this issue by providing explicit hardwaresupport for offset registers and texture sub-loads in their InfiniteReality machine. The clipmap algorithm uses sub-textures andincremental updates based on a toroidal mapping to […]

OpenGL

Mar, 18

A Fast GEMM Implementation On a Cypress GPU

We present benchmark results of optimized dense matrix multiplication kernels for Cypress GPU. We write general matrix multiply (GEMM) kernels for single (SP), double (DP) and double-double (DDP) precision. Our SGEMM and DGEMM kernels show ~2 Tflop/s and ~470 Gflop/s, respectively. These results for SP and DP correspond to 73% and 87% of the theoretical […]

high performance computing on graphics processing units: hgpu.org

Posts

Joint Forces: From Multithreaded Programming to GPU Computing

Mean Shift Parallel Tracking on GPU

Method for simulation of coastal terrain on GPU

Exploring utilisation of GPU for database applications

Efficient and Quality Contouring Algorithms on the GPU

Implementing a GPU Programming Model on a non-GPU Accelerator Architecture

Scalable Multi Agent Simulation on the GPU

Password Recovery for RAR Files Using CUDA

Password recovery for encrypted ZIP archives using GPUs

RAR password decryption by utilizing GPU

Clipmapping on the GPU

A Fast GEMM Implementation On a Cypress GPU

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)