high performance computing on graphics processing units: hgpu.org

Posts

Aug, 9

Real-Time Automatic Object Classification and Tracking using Genetic Programming and NVIDIA CUDA

Genetic Programming (GP) is a widely used methodology for solving various computational problems. GP’s problem solving ability is usually hindered by its long execution times. In this thesis, GP is applied toward real-time computer vision. In particular, object classification and tracking using a parallel GP system is discussed. First, a study of suitable GP languages […]

CUDA

Aug, 9

Vivaldi: A Domain-Specific Language for Volume Processing and Visualization on Distributed Heterogeneous Systems

As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and […]

CUDA

Aug, 9

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

In semantic scene segmentation, every pixel of an image is assigned a category label. This task can be made easier by incorporating depth information, which structured light sensors provide. Depth, however, has very different properties from RGB image channels. In this paper, we present a novel method to provide depth information to convolutional neural networks. […]

CUDA

Aug, 9

Parallel Distributed Breadth First Search on the Kepler Architecture

We present the results obtained by using an evolution of our CUDA-based solution for the exploration, via a Breadth First Search, of large graphs. This latest version exploits at its best the features of the Kepler architecture and relies on a 2D decomposition of the adjacency matrix to reduce the number of communications among the […]

CUDA

Aug, 9

GPU Parallel Implementation of the Approximate K-SVD Algorithm Using OpenCL

Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. We investigate a parallel version of the approximate K-SVD algorithm, where multiple atoms are updated simultaneously, and implement it using OpenCL, for execution on graphics processing units (GPU). […]

OpenCL

Aug, 7

Optimizing memory management on heterogeneous systems using polyhedral, compile-time techniques

The target of this thesis is to optimize memory management on heterogeneous systems. Our approach involves performing memory access pattern analysis on kernels in order to produce an accurate estimation of the memory usage. This information is produced in the form of array ranges describing which elements are accessed as well as whether they are […]

OpenCL

Aug, 7

On the Fly Porn Video Blocking Using Distributed Multi-GPU and Data Mining Approach

Preventing users from accessing adult videos and at the same time allowing them to access good educational videos and other materials through campus wide network is a big challenge for organizations. Major existing web filtering systems are textual content or link analysis based. As a result, potential users cannot access qualitative and informative video content […]

CUDA

Aug, 7

Dense Arithmetic over Finite Fields with the CUMODP Library

CUMODP is a CUDA library for exact computations with dense polynomials over finite fields. A variety of operations like multiplication, division, computation of subresultants, multi-point evaluation, interpolation and many others are provided. These routines are primarily designed to offer GPU support to polynomial system solvers and a bivariate system solver is part of the library. […]

CUDA

Aug, 7

Multi-Agent Systems and General-Purpose Computing on Graphics Processing Units: A Survey

In some application domains, using a Multi-Agent Systems (MAS) modeling approach may require to handle a large number of agents (crowds, traffic, animal societies, ecosystems, etc.). Today, as this number is constantly growing, the computational resources which are needed cannot be fulfilled by the CPU of single Personal Computers (PC) any more. Considering this issue, […]

CUDA

Aug, 7

Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabled GPU

Methods for Molecular Dynamics(MD) simulations are investigated. MD simulation is the widely used computer simulation approach to study the properties of molecular system. Force calculation in MD is computationally intensive. Parallel programming techniques can be applied to improve those calculations. The major aim of this paper is to speed up the MD simulation calculations by/using […]

CUDA

Aug, 5

FPGA Acceleration of Multifunction Printer Image Processing using OpenCL

OpenCL adoption in the High Performance Computing, entertainment and scientific computing markets continues to grow. The flexibility and portability of OpenCL make it an excellent platform upon which to develop image processing applications. However, OpenCL has not yet been applied to the hardcopy printer and Multi-Function Printer, MFP, markets. The printer/MFP markets traditionally use full […]

OpenCL

Aug, 5

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Real-Time Automatic Object Classification and Tracking using Genetic Programming and NVIDIA CUDA

Vivaldi: A Domain-Specific Language for Volume Processing and Visualization on Distributed Heterogeneous Systems

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

Parallel Distributed Breadth First Search on the Kepler Architecture

GPU Parallel Implementation of the Approximate K-SVD Algorithm Using OpenCL

Optimizing memory management on heterogeneous systems using polyhedral, compile-time techniques

On the Fly Porn Video Blocking Using Distributed Multi-GPU and Data Mining Approach

Dense Arithmetic over Finite Fields with the CUMODP Library

Multi-Agent Systems and General-Purpose Computing on Graphics Processing Units: A Survey

Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabled GPU

FPGA Acceleration of Multifunction Printer Image Processing using OpenCL

The Reduction Problem in CUDA and Its Simulation with P Systems

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)