high performance computing on graphics processing units: hgpu.org

Posts

Oct, 26

Quasars spectra classification with the help of GPU computing

Finding interesting celestial objects among tens of thousands or even millions of recorded raw data is not an easy task to implement. In this paper we speed up this process with high level nvidia cuda C++ template library called Thrust, which makes our database with R interface much more evaluatedcient.

CUDA

Oct, 26

Efficient Probabilistic Latent Semantic Indexing using Graphics Processing Unit

In this paper, we propose a scheme to accelerate the Probabilistic Latent Semantic Indexing (PLSI), which is an automated document indexing method based on a statistical latent semantic model, exploiting the high parallelism of Graphics Processing Unit (GPU). Our proposal is composed of three techniques: the first one is to accelerate the Expectation-Maximization (EM) computation […]

CUDA

Oct, 26

A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU

This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative […]

CUDA

Oct, 26

Accelerated MD Program Using CUDA Technology

Molecular dynamic (MD) simulation is proven to be an important tool to study the structure as well as the physical properties at atomic level in materials science. However, it requires a huge computing time and hence limits the ability to treat a large scale simulation. In this paper we present a solution to speed up […]

CUDA

Oct, 26

Evaluation of Speedup of Monte Carlo Calculations of Two Simple Reactor Physics Problems Coded for the GPU/CUDA Environment

Monte Carlo simulation is ideally suited for solving Boltzmann neutron transport equation in inhomogeneous media. However, routine applications require the computation time to be reduced to hours and even minutes in a desktop system. The interest in adopting GPUs for Monte Carlo acceleration is rapidly mounting, fueled partially by the parallelism afforded by the latest […]

CUDA

Oct, 26

Flexible neuronal network simulation framework using code generation for NVidia CUDA

Simulating large scale computer models of brain structures with spiking neuronal networks has become increasingly popular and feasible with the advent of general purpose computing on graphical processing units (GPGPU). Modern graphics cards, such as the NVidia range supporting the common unified device architecture (CUDA) provide massively parallel computing architectures for this purpose. Earlier GPU […]

CUDA

Oct, 26

Hand Tracking based on Hierarchical Clustering of Range Data

Fast and robust hand segmentation and tracking is an essential basis for gesture recognition and thus an important component for contact-less human-computer interaction (HCI). Hand gesture recognition based on 2D video data has been intensively investigated. However, in practical scenarios purely intensity based approaches suffer from uncontrollable environmental conditions like cluttered background colors. In this […]

Oct, 25

Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

High performance computing using graphics processing units (GPUs) is gaining popularity in the scientific computing field, with many large compute clusters being augmented with multiple GPUs in each node. We investigate hybrid tri-level (MPI-OpenMP-CUDA) parallel implementations to explore the efficiency and scalability of incompressible flow computations on GPU clusters up to 128 GPUS. This work […]

CUDA

Oct, 25

Current and Nascent SETI Instruments in the Radio and Optical

Here we describe our ongoing efforts to develop high-performance and sensitive instrumentation for use in the search for extra-terrestrial intelligence (SETI). These efforts include our recently deployed Search for Extraterrestrial Emissions from Nearby Developed Intelligent Populations Spectrometer (SERENDIP V.v) and two instruments currently under development; the Heterogeneous Radio SETI Spectrometer (HRSS) for SETI observations in […]

Oct, 25

Design and Implementation of GPU-Based Prim’s Algorithm

Minimum spanning tree is a classical problem in graph theory that plays a key role in a broad domain of applications. This paper proposes a minimum spanning tree algorithm using Prim’s approach on Nvidia GPU under CUDA architecture. By using new developed GPU-based Min-Reduction data parallel primitive in the key step of the algorithm, higher […]

CUDA

Oct, 25

Parallel Execution of AES-CTR Algorithm Using Extended Block Size

Data encryption and decryption are common operations in a network based application programs with security. In order to keep pace with the input data rate in such applications, real-time processing of data encryption/decryption is essential. For example, in an environment where a multimedia data is streamed, high speed data encryption/decryption is crucial. In this paper, […]

CUDA

Oct, 25

The Model of Computation of CUDA and its Formal Semantics

We formalize the model of computation of modern graphics cards based on the specification of Nvidia’s Compute Unified Device Architecture (CUDA). CUDA programs are executed by thousands of threads concurrently and have access to several different types of memory with unique access patterns and latencies. The underlying hardware uses a single instruction, multiple threads execution […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Quasars spectra classification with the help of GPU computing

Efficient Probabilistic Latent Semantic Indexing using Graphics Processing Unit

A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU

Accelerated MD Program Using CUDA Technology

Evaluation of Speedup of Monte Carlo Calculations of Two Simple Reactor Physics Problems Coded for the GPU/CUDA Environment

Flexible neuronal network simulation framework using code generation for NVidia CUDA

Hand Tracking based on Hierarchical Clustering of Range Data

Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

Current and Nascent SETI Instruments in the Radio and Optical

Design and Implementation of GPU-Based Prim’s Algorithm

Parallel Execution of AES-CTR Algorithm Using Extended Block Size

The Model of Computation of CUDA and its Formal Semantics

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)