high performance computing on graphics processing units: hgpu.org

Posts

Dec, 9

A Video Deblurring Optimization Algorithm Based on Motion Detection

Although the performance of image acquisition devices has been improved dramatically in recent years, especially in the resolution and clarity, defocusing and motion blur are still big problems. Upgrading the devices with the better hardware is one way to solve the problem, but the costs will usually increase disproportionately comparing with what we get. The […]

CUDA

Dec, 9

Locality Analysis for Characterizing Applications Based on Sparse Matrices

We propose an adaptability judging method applied to sparse matrices and the target cache memory using two metrics based on spatial locality and temporal locality. For indirect access sequences of sparse matrix-vector multiplications, one metric is the number of valid data within a cache line, and another metric is average reference interval. We also develop […]

CUDA

Dec, 9

Dense Real-Time Mapping of Object-Class Semantics from RGB-D Video

We propose a real-time approach to learning semantic maps from moving RGB-D cameras. Our method models geometry, appearance, and semantic labeling of surfaces. We recover camera pose using simultaneous localization and mapping while concurrently recognizing and segmenting object classes in the images. Our object-class segmentation approach is based on random decision forests and yields a […]

CUDA

Dec, 9

Advanced ultrasound beam forming using GPGPU technology

Ultrasound scanners are often used in medical diagnostics for visualising body parts without entering the body. An image is created by visualising reflections from an ultrasound pulse, transmitted into the body. Current scanners use a scanning which creates an image line by line, using focused pulses on each line separately. This method results in high […]

CUDA

Dec, 8

A Highly Extensible Framework for Molecule Dynamic Simulation on GPUs

Molecular dynamics (MD) was widely used in chemistry and bio molecules. Numerous attempts have been made to accelerate MD simulations. CUDA enabled NVIDIA Graphics processing units (GPUs) use as a general purpose parallel computer chips as CPU. But it is not easy to port a program to GPU. We present a highly extensible framework for […]

CUDA

Dec, 8

Partial Parallelization of the Successive Projections Algorithm using Compute Unified Device Architecture

This paper proposes a partial parallelization for the Successive Projections Algorithm (SPA), which is a variable selection technique designed for use with Multiple Linear Regression. This implementation is aimed at improving the computational efficiency of SPA, without changing the outcome of the algorithm. For this purpose, a new strategy of inverse matrix calculation is employed. […]

CUDA

Dec, 8

A GPU-based Multiresolution Pipeline for Compressed Volume Rendering

The recent improvements in data-acquisition methods have resulted in the emergence of increasingly larger volumetric datasets. The design of GPU volume rendering solutions must have into account this trend while dealing with the limited available memory in a graphics card. In this work, we present a pipeline for volume rendering that stores a compressed version […]

CUDA

Dec, 8

Waste Not… Efficient Co-Processing of Relational Data

The variety of memory devices in modern computer systems holds opportunities as well as challenges for data management systems. In particular, the exploitation of Graphics Processing Units (GPUs) and their fast memory has been studied quite intensively. However, current approaches treat GPUs as systems in their own right and fail to provide a generic strategy […]

OpenCL

Dec, 8

GPU-Accelerated Crack Path Computation Based on a Phase Field Approach for Brittle Fracture

In recent years, a new approach to analyze fracturing has been developed. The so-called phase field models approximate cracks by a scalar, macroscopic field variable that distinguishes between broken and undamaged material. The phase field approach to fracture has significant advantages over more established methods. However it is necessary to solve a coupled set of […]

CUDA

Dec, 6

A Distributed Data Mining Framework Accelerated with Graphics Processing Units

In the context of processing high volumes of data, the recent developments have led to numerous models and frameworks of distributed processing running on clusters of commodity hardware. On the other side, the Graphics Processing Unit (GPU) has seen much enthusiastic development as a device for general-purpose intensive parallel computation. In this paper we propose […]

Dec, 6

A Quantitative Comparison of Emulated Shared Memory Architectures to Current Multicore CPUs and GPUs

The performance of current multicore CPUs and GPUs is limited in computations making frequent use of communication/synchronization between the subtasks executed in parallel. This is because the directory-based cache systems scale weakly and/or the cost of synchronization is high. The Emulated Shared Memory (ESM) architectures relying on multithreading and efficient synchronization mechanism have been developed […]

CUDA

Dec, 6

Parallel tree-ensemble algorithms for GPUs using CUDA

We present two new parallel implementations of the tree-ensemble algorithms Random Forest (RF) and Extremely randomized trees (ERT) for emerging many-core platforms, e.g., contemporary graphics cards suitable for general-purpose computing (GPGPU). Random Forest and Extremely randomized trees are ensemble learners for classification and regression. They operate by constructing a multitude of decision trees at training […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Video Deblurring Optimization Algorithm Based on Motion Detection

Locality Analysis for Characterizing Applications Based on Sparse Matrices

Dense Real-Time Mapping of Object-Class Semantics from RGB-D Video

Advanced ultrasound beam forming using GPGPU technology

A Highly Extensible Framework for Molecule Dynamic Simulation on GPUs

Partial Parallelization of the Successive Projections Algorithm using Compute Unified Device Architecture

A GPU-based Multiresolution Pipeline for Compressed Volume Rendering

Waste Not… Efficient Co-Processing of Relational Data

GPU-Accelerated Crack Path Computation Based on a Phase Field Approach for Brittle Fracture

A Distributed Data Mining Framework Accelerated with Graphics Processing Units

A Quantitative Comparison of Emulated Shared Memory Architectures to Current Multicore CPUs and GPUs

Parallel tree-ensemble algorithms for GPUs using CUDA

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)