high performance computing on graphics processing units: hgpu.org

Posts

Aug, 9

Parallel Distributed Breadth First Search on the Kepler Architecture

We present the results obtained by using an evolution of our CUDA-based solution for the exploration, via a Breadth First Search, of large graphs. This latest version exploits at its best the features of the Kepler architecture and relies on a 2D decomposition of the adjacency matrix to reduce the number of communications among the […]

CUDA

Aug, 9

GPU Parallel Implementation of the Approximate K-SVD Algorithm Using OpenCL

Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. We investigate a parallel version of the approximate K-SVD algorithm, where multiple atoms are updated simultaneously, and implement it using OpenCL, for execution on graphics processing units (GPU). […]

OpenCL

Aug, 7

Optimizing memory management on heterogeneous systems using polyhedral, compile-time techniques

The target of this thesis is to optimize memory management on heterogeneous systems. Our approach involves performing memory access pattern analysis on kernels in order to produce an accurate estimation of the memory usage. This information is produced in the form of array ranges describing which elements are accessed as well as whether they are […]

OpenCL

Aug, 7

On the Fly Porn Video Blocking Using Distributed Multi-GPU and Data Mining Approach

Preventing users from accessing adult videos and at the same time allowing them to access good educational videos and other materials through campus wide network is a big challenge for organizations. Major existing web filtering systems are textual content or link analysis based. As a result, potential users cannot access qualitative and informative video content […]

CUDA

Aug, 7

Dense Arithmetic over Finite Fields with the CUMODP Library

CUMODP is a CUDA library for exact computations with dense polynomials over finite fields. A variety of operations like multiplication, division, computation of subresultants, multi-point evaluation, interpolation and many others are provided. These routines are primarily designed to offer GPU support to polynomial system solvers and a bivariate system solver is part of the library. […]

CUDA

Aug, 7

Multi-Agent Systems and General-Purpose Computing on Graphics Processing Units: A Survey

In some application domains, using a Multi-Agent Systems (MAS) modeling approach may require to handle a large number of agents (crowds, traffic, animal societies, ecosystems, etc.). Today, as this number is constantly growing, the computational resources which are needed cannot be fulfilled by the CPU of single Personal Computers (PC) any more. Considering this issue, […]

CUDA

Aug, 7

Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabled GPU

Methods for Molecular Dynamics(MD) simulations are investigated. MD simulation is the widely used computer simulation approach to study the properties of molecular system. Force calculation in MD is computationally intensive. Parallel programming techniques can be applied to improve those calculations. The major aim of this paper is to speed up the MD simulation calculations by/using […]

CUDA

Aug, 5

FPGA Acceleration of Multifunction Printer Image Processing using OpenCL

OpenCL adoption in the High Performance Computing, entertainment and scientific computing markets continues to grow. The flexibility and portability of OpenCL make it an excellent platform upon which to develop image processing applications. However, OpenCL has not yet been applied to the hardcopy printer and Multi-Function Printer, MFP, markets. The printer/MFP markets traditionally use full […]

OpenCL

Aug, 5

The Reduction Problem in CUDA and Its Simulation with P Systems

We introduce P systems with dynamic communication graphs which simulate the functioning of the CUDA architecture when solving the parallel reduction problem.

CUDA

Aug, 5

Image Encryption Using Parallel RSA Algorithm on CUDA

In this paper we discuss Image Encryption and Decryption using RSA Algorithm which was earlier used for text encryption. In today’s era it is a crucial concern that proper encryption decryption should be applied so that unauthorized access can be prevented. We intend to build a general RSA algorithm which can be combined with other […]

CUDA

Aug, 5

Roberts edge detection algorithm based on GPU

With the development of the semiconductor technology, the GPU’s floating point computing capacity improves rapidly. How to apply the GPU technology to the non-graphic computing field becomes a highlight in the research of high performance computing. The Roberts edge detection algorithm is a typical image processing algorithms. A fast Roberts edge detection algorithm is presented […]

CUDA

Aug, 5

GIS Polygon Overlay Processing: New Parallel Algorithm and System Prototype

Polygon overlay is one of the complex operations in computational geometry. It is applied in many fields such as Geographic Information Systems (GIS), computer graphics, VLSI CAD, etc. We have two significant results to report. Our first result is the first output-sensitive CREW PRAM algorithm for simple polygons, which can perform typical set operations including […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Parallel Distributed Breadth First Search on the Kepler Architecture

GPU Parallel Implementation of the Approximate K-SVD Algorithm Using OpenCL

Optimizing memory management on heterogeneous systems using polyhedral, compile-time techniques

On the Fly Porn Video Blocking Using Distributed Multi-GPU and Data Mining Approach

Dense Arithmetic over Finite Fields with the CUMODP Library

Multi-Agent Systems and General-Purpose Computing on Graphics Processing Units: A Survey

Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabled GPU

FPGA Acceleration of Multifunction Printer Image Processing using OpenCL

The Reduction Problem in CUDA and Its Simulation with P Systems

Image Encryption Using Parallel RSA Algorithm on CUDA

Roberts edge detection algorithm based on GPU

GIS Polygon Overlay Processing: New Parallel Algorithm and System Prototype

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)