high performance computing on graphics processing units: hgpu.org

Posts

Apr, 21

Fast Universal Background Model (UBM) Training on GPUs using Compute Unified Device Architecture (CUDA)

Universal Background Modeling (UBM) is an alternative hypothesized modeling that is used extensively in Speaker Verification (SV) systems. Training the background models from large speech data requires a significant amount of memory and computational load. In this paper a parallel implementation of speaker verification system based on Gaussian Mixture Modeling – Universal Background Modeling (GMM […]

CUDA

Apr, 21

An On-Demand Fast Parallel Pseudo Random Number Generator with Applications

The use of manycore architectures and accelerators, such as GPUs, with good programmability has allowed them to be deployed for vital computational work. The ability to use randomness in computation is known to help in several situations. For such computations to be made possible on a general purpose computer, a source of randomness, or in […]

CUDA

Apr, 21

Image Convolution Processing: a GPU versus FPGA Comparison

Convolution is one of the most important operators used in image processing. With the constant need to increase the performance in high-end applications and the rise and popularity of parallel architectures, such as GPUs and the ones implemented in FPGAs, comes the necessity to compare these architectures in order to determine which of them performs […]

CUDA

Apr, 21

Fast Collision Culling in Large-Scale Environments Using GPU Mapping Function

This paper presents a novel and efficient GPU-based parallel algorithm to cull non-colliding object pairs in very large-scale dynamic simulations. It allows to cull objects in less than 25ms with more than 100K objects. It is designed for many-core GPU and fully exploits multi-threaded capabilities and data-parallelism. In order to take advantage of the high […]

CUDA

Apr, 21

An Automatic OpenCL Compute Kernel Generator for Basic Linear Algebra Operations

An automatic OpenCL compute kernel generator framework for linear algebra operations is presented. It allows for specifying matrix and vector operations in high-level C++ code, while the low-level details of OpenCL compute kernel generation and handling are dealt with in the background. Our approach releases users from considerable additional effort required for learning the details […]

OpenCL

Apr, 21

Priority-Based Task Management in a GPGPU Megakernel

In this paper, we examine the challenges of implementing priority-based task management with respect to userdefined preferential attributes on a graphics processing unit (GPU). Previous approaches usually rely on constant synchronization with the CPU to determine the proper chronological sequence for execution. We transfer the responsibility for evaluating and arranging planned tasks to the GPU […]

CUDA

Apr, 21

Multicore Processing for Classification and Clustering Algorithms

Data Mining algorithms such as classification and clustering are the future of computation, though multidimensional data-processing is required. People are using multicore processors with GPU’s. Most of the programming languages doesn’t provide multiprocessing facilities and hence wastage of processing resources. Clustering and classification algorithms are more resource consuming. In this paper we have shown strategies […]

CUDA

•

OpenCL

Apr, 21

Genetic Algorithm Modeling with GPU Parallel Computing Technology

We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from a multi-core CPU serial implementation, named GAME, already scientifically successfully tested and validated on astrophysical massive data classification problems, through a web application resource (DAMEWARE), specialized in data mining based on Machine Learning paradigms. […]

CUDA

Apr, 21

Graphical Processing Units (GPU)-based modeling for Acoustic and Ultrasonic NDE

With rapidly improving computational power numerical models are being developed for ever more complex problems that cannot be solved analytically, making them more and more computationally intensive. Parallel computing has emerged as an important paradigm to speed up the processing of such models. In recent years graphics processing units (GPU) are among the massively parallel […]

CUDA

Apr, 19

24th International Conference on Parallel Computational Fluid Dynamics, ParCFD2012

Parallel Computational Fluid Dynamics (ParCFD) Conference 2012 is the 24th of series of annual international meetings since 1989 dedicated to the discussion of most recent developments and applications of parallel computing in the field of CFD and related disciplines. ParCFD conferences are truly multi-cultural and international attracting many researchers across the globe with diverse technical […]

Apr, 19

Performance Exploration of Selected Manually and Automatically Parallelized Codes on GPUs

General-Purpose computing on GPUs (GPGPU) provides the opportunity to utilize the tremendous computational power of graphics accelerators for a wider set of problems. These devices leverage massive parallelism to achieve high performance, however, creating highly parallelized code which is optimized for the characteristics of GPUs is no simple task. The polyhedron model is used successfully […]

CUDA

Apr, 19

GPU-Accelerated Numerical Simulations of the Knudsen Gas on Time-Dependent Domains

We consider the long-time behaviour of a free-molecular gas in a time-dependent vessel with absorbing boundary, in any space dimension. We first show, at the theoretical level, that the convergence towards equilibrium heavily depends on the initial data and on the time evolution law of the vessel. Subsequently, we describe a numerical strategy to simulate […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Fast Universal Background Model (UBM) Training on GPUs using Compute Unified Device Architecture (CUDA)

An On-Demand Fast Parallel Pseudo Random Number Generator with Applications

Image Convolution Processing: a GPU versus FPGA Comparison

Fast Collision Culling in Large-Scale Environments Using GPU Mapping Function

An Automatic OpenCL Compute Kernel Generator for Basic Linear Algebra Operations

Priority-Based Task Management in a GPGPU Megakernel

Multicore Processing for Classification and Clustering Algorithms

Genetic Algorithm Modeling with GPU Parallel Computing Technology

Graphical Processing Units (GPU)-based modeling for Acoustic and Ultrasonic NDE

24th International Conference on Parallel Computational Fluid Dynamics, ParCFD2012

Performance Exploration of Selected Manually and Automatically Parallelized Codes on GPUs

GPU-Accelerated Numerical Simulations of the Knudsen Gas on Time-Dependent Domains

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)