high performance computing on graphics processing units: hgpu.org

Posts

Jun, 27

Performance Evaluation of Parallel AES Implementations over CUDA GPU Framework

With a high computational complexity of encryption algorithm, AES, especially for huge real-time data, GPU has recently offered an alternate computational system instead of a traditional CPU (thread), incurring a significant improvement in speeding up the computational intensive parallel data encryption in various aspects – tremendous number of processing cores and non-generic computational processing architecture […]

CUDA

Jun, 27

Industrial Robot Collision Handling in Harsh Environments

The focus in this thesis is on robot collision handling systems, mainly collision detection and collision avoidance for industrial robots operating in harsh environments (e.g. potentially explosive atmospheres found in the oil and gas sector). Collision detection should prevent the robot from colliding and therefore avoid a potential accident. Collision avoidance builds on the concept […]

CUDA

Jun, 27

Computation on GPU of Eigenvalues and Eigenvectors of a Large Number of Small Hermitian Matrices

This paper presents an implementation on Graphics Processing Units of QR-Householder algorithm used to find all the eigenvalues and eigenvectors of many small hermitian matrices (double precision) in a very short time to address time constraints for Radar issues.

CUDA

Jun, 27

Image Noise Removal on Heterogeneous CPU-GPU Configurations

A parallel algorithm to remove impulsive noise in digital images using heterogeneous CPU/GPU computing is proposed. The parallel denoising algorithm is based on the peer group concept and uses an Euclidean metric. In order to identify the amount of pixels to be allocated in multi-core and GPUs, a performance analysis using large images is presented. […]

CUDA

Jun, 27

Speeding up a Video Summarization Approach Using GPUs and Multicore CPUs

The recent progress of digital media has stimulated the creation, storage and distribution of data, such as digital videos, generating a large volume of data and requiring efficient technologies to increase the usability of these data. Video summarization methods generate concise summaries of video contents and enable faster browsing, indexing and accessing of large video […]

CUDA

Jun, 26

On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms

The proliferation of heterogeneous computing platforms presents the parallel computing community with new challenges. One such challenge entails evaluating the efficacy of such parallel architectures and identifying the architectural innovations that ultimately benefit applications. To address this challenge, we need benchmarks that capture the execution patterns (i.e., dwarfs or motifs) of applications, both present and […]

OpenCL

Jun, 26

Effect And Analysis of Elastic Fidelity Computing On GPUs

The graphics processing unit (GPU) has become an integral part and plays a very vital role in high-end computing. Though GPU can easily reduce the execution time, it comes at the expense of power and energy consumption. There are various ways and approaches to reduce the power and energy consumption, Elastic Fidelity Computing (EFC) in […]

CUDA

•

OpenCL

Jun, 26

Fast LBP Face Detection on low-power SIMD architectures

This paper presents an embedded implementation of a face detection method based on boosted LBP features for Single Instruction Multiple Data (SIMD) architectures. The implementation exploits parallelism and data reuse in the detection algorithm and is integrated into CogniVue’s Gen-1 APEX platform, which uses a SIMD design and is extremely energy efficient. The proposed embedded […]

Jun, 26

Multi-GPU Implementation of a Hybrid Thermal Lattice Boltzmann Solver using the TheLMA Framework

In this contribution, a single-node multi-GPU thermal lattice Boltzmann solver is presented. We implement a simplified version of the hybrid model developed by Lallemand and Luo in 2003, which combines multiple-relaxation-time lattice Boltzmann for the fluid flow with a finite-difference method for temperature. The program is based on the TheLMA framework which was developed for […]

CUDA

Jun, 26

C and CUDA Implementation for SIRT and SART Reconstruction Algorithms

Tomographic reconstruction techniques deserve studying because of its plenty of application in interdisciplinary fields. With outstanding features of no need to set of uniformly distributed projections for precise reconstruction, easy provide a priori knowledge about the reconstructed object, good image quality, we chose Simultaneous Iterative Reconstruction Technique (SIRT) and Simultaneous Algebraic Reconstruction Technique (SART) as […]

CUDA

Jun, 25

Customizing Driving Directions with GPUs

Computing driving directions interactively on continental road networks requires preprocessing. This step can be costly, limiting our ability to incorporate new optimization functions, including traffic information or personal preferences. We show how the performance of the state-of-the-art customizable route planning (CRP) framework is boosted by GPUs, even though it has highly irregular structure. Our experimental […]

CUDA

Jun, 25

Concurrent Analytical Query Processing with GPUs

In current databases, GPUs are used as dedicated accelerators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the profiling of an open-source GPU query engine running commonly used single-query data warehousing workloads, we observe that the utilization of main GPU resources is only up […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Performance Evaluation of Parallel AES Implementations over CUDA GPU Framework

Industrial Robot Collision Handling in Harsh Environments

Computation on GPU of Eigenvalues and Eigenvectors of a Large Number of Small Hermitian Matrices

Image Noise Removal on Heterogeneous CPU-GPU Configurations

Speeding up a Video Summarization Approach Using GPUs and Multicore CPUs

On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms

Effect And Analysis of Elastic Fidelity Computing On GPUs

Fast LBP Face Detection on low-power SIMD architectures

Multi-GPU Implementation of a Hybrid Thermal Lattice Boltzmann Solver using the TheLMA Framework

C and CUDA Implementation for SIRT and SART Reconstruction Algorithms

Customizing Driving Directions with GPUs

Concurrent Analytical Query Processing with GPUs

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)