high performance computing on graphics processing units: hgpu.org

Posts

Aug, 5

The Reduction Problem in CUDA and Its Simulation with P Systems

We introduce P systems with dynamic communication graphs which simulate the functioning of the CUDA architecture when solving the parallel reduction problem.

CUDA

Aug, 5

Image Encryption Using Parallel RSA Algorithm on CUDA

In this paper we discuss Image Encryption and Decryption using RSA Algorithm which was earlier used for text encryption. In today’s era it is a crucial concern that proper encryption decryption should be applied so that unauthorized access can be prevented. We intend to build a general RSA algorithm which can be combined with other […]

CUDA

Aug, 5

Roberts edge detection algorithm based on GPU

With the development of the semiconductor technology, the GPU’s floating point computing capacity improves rapidly. How to apply the GPU technology to the non-graphic computing field becomes a highlight in the research of high performance computing. The Roberts edge detection algorithm is a typical image processing algorithms. A fast Roberts edge detection algorithm is presented […]

CUDA

Aug, 5

GIS Polygon Overlay Processing: New Parallel Algorithm and System Prototype

Polygon overlay is one of the complex operations in computational geometry. It is applied in many fields such as Geographic Information Systems (GIS), computer graphics, VLSI CAD, etc. We have two significant results to report. Our first result is the first output-sensitive CREW PRAM algorithm for simple polygons, which can perform typical set operations including […]

CUDA

Aug, 5

A Moving Least Squares Based Approach for Contour Visualization of Multi-Dimensional Data

Analysis of high dimensional data is a common task. Often, small multiples are used to visualize 1 or 2 dimensions at a time, such as in a scatterplot matrix. Associating data points between different views can be difficult though, as the points are not fixed. Other times, dimensional reduction techniques are employed to summarize the […]

CUDA

Aug, 3

Integrating Profiling into MDE Compilers

Scientific computation requires more and more performance in its algorithms. New massively parallel architectures suit well to these algorithms. They are known for offering high performance and power efficiency. Unfortunately, as parallel programming for these architectures requires a complex distribution of tasks and data, developers find difficult to implement their applications effectively. Although approaches based […]

OpenCL

Aug, 3

Multithreading for Visual Effects

Tackle the Challenges of Parallel Programming in the Visual Effects Industry: In Multithreading for Visual Effects, developers from DreamWorks Animation, Pixar, Side Effects, Intel, and AMD share their successes and failures in the messy real-world application area of production software. They provide practical advice on multithreading techniques and visual effects used in popular visual effects […]

OpenCL

Aug, 3

Accelerating Krylov Subspace Solvers on Graphics Processing Units

Krylov subspace solvers are often the method of choice when solving sparse linear systems iteratively. At the same time, hardware accelerators such as graphics processing units (GPUs) continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a […]

CUDA

Aug, 3

Extending Lyapack for the Solution of Band Lyapunov Equations on Hybrid CPU-GPU Platforms

The solution of large-scale Lyapunov equations is an important tool for the solution of several engineering problems arising in optimal control and model order reduction. In this work we investigate the case when the coefficient matrix of the equations presents a band structure. Exploiting the structure of this matrix we can achive relevant reductions in […]

CUDA

Aug, 3

Multi-Threaded Automatic Integration Using OpenMP and CUDA

Problems in many areas give rise to computationally expensive integrals that beg the need of efficient techniques to solve them, e.g., in computational finance for the modeling of cash flows; for the computation of Feynman loop integrals in high energy physics; and in stochastic geometry with applications to computer graphics. We demonstrate feasible numerical approaches […]

CUDA

Aug, 2

Design of an FPGA-Based FDTD Accelerator Using OpenCL

High-performance computing systems with dedicated hardware on FPGAs can achieve power efficient computations compared with CPUs and GPUs. However, the hardware design on FPGAs needs more time than the software design on CPUs and GPUs. We designed an FDTD hardware accelerator using the OpenCL compiler for FPGAs in this paper. Since it is possible to […]

OpenCL

Aug, 2

An Analysis of OpenACC Programming Model: Image Processing Algorithms as a Case Study

Graphics processing units and similar accelerators have been intensively used in general purpose computations for several years. In the last decade, GPU architecture and organization changed dramatically to support an ever-increasing demand for computing power. Along with changes in hardware, novel programming models have been proposed, such as NVIDIA’s Compute Unified Device Architecture (CUDA) and […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

The Reduction Problem in CUDA and Its Simulation with P Systems

Image Encryption Using Parallel RSA Algorithm on CUDA

Roberts edge detection algorithm based on GPU

GIS Polygon Overlay Processing: New Parallel Algorithm and System Prototype

A Moving Least Squares Based Approach for Contour Visualization of Multi-Dimensional Data

Integrating Profiling into MDE Compilers

Multithreading for Visual Effects

Accelerating Krylov Subspace Solvers on Graphics Processing Units

Extending Lyapack for the Solution of Band Lyapunov Equations on Hybrid CPU-GPU Platforms

Multi-Threaded Automatic Integration Using OpenMP and CUDA

Design of an FPGA-Based FDTD Accelerator Using OpenCL

An Analysis of OpenACC Programming Model: Image Processing Algorithms as a Case Study

Recent source codes

Luthier: Bridging Auto-Tuning and Vendor Libraries for Efficient Deep Learning Inference

Fused Kernel Library (FKL)

GPUHammer: Rowhammer Attacks on GPU Memories are Practical

Block: Balance Loader of LLM Serving with Context, Knowledge and Predictive Scheduling

SIGMo: Scalable Isomorphism Graph Matching on GPUs

DGEMM without FP64 Arithmetic - using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme

GEAK-agent: LLM-based AI agent, which can write correct and efficient GPU kernels automatically

OpenDwarfs 2025: re-engineered version of the OpenDwarfs benchmark suite, for compatibility with modern platforms

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Most viewed papers (last 30 days)