high performance computing on graphics processing units: hgpu.org

Posts

Sep, 16

Parallel Implementation of Moving Averages and Stock Market Prediction

In recent years, graphics processing units have made parallel processing affordable with the price of personal desktop computers. This report investigates the computational aspects of calculating simple moving average and exponential moving average operations, two of the most popular financial indicators. In this report, we also investigate the usage of GPU to run artificial neural […]

CUDA

Sep, 16

Accelerating the Smith-Waterman Algorithm for Bio-sequence Matching on GPU

Nowadays, GPU has emerged as one promising computing platform to accelerate bio-sequence analysis applications by exploiting all kinds of parallel optimization strategies. In this paper, we take a well-known algorithm in the field of pair-wise sequence alignment and database searching, the Smith-Waterman (S-W) algorithm as an example, and demonstrate approaches that fully exploit its performance […]

CUDA

Sep, 16

High Performance Computing on Astrophysics with Artificial Intelligence Algorithms

This paper presents the applications that have been developed in astrophysics by using Artificial Intelligence (AI) algorithms and high performance computing and the ongoing research with grid computing. In astrophysics, we deal with the time delay problem. Nowadays, the time delay is estimated from observed data gathered from radio or optical telescopes around the world. […]

CUDA

Sep, 15

NT-SIM: A Co-Simulator for Networked Signal Processing Applications

In networked signal processing systems, network nodes that perform embedded processing on sensory inputs and other data interact across wired or wireless communication networks. In such applications, the processing on individual network nodes can be described in terms of dataflow graphs. However, to analyze the correctness and performance of these applications, designers must understand the […]

CUDA

Sep, 15

Real-time Kd-tree Based Importance Sampling of Environment Maps

We present a new real-time importance sampling algorithm for environment maps. Our method is based on representing environment maps using kd-tree structures, and generating samples with a single data lookup. An efficient algorithm has been developed for realtime image-based lighting applications. In this paper, we compared our algorithm with Inversion method [Fishman 1996]. We show […]

CUDA

Sep, 15

Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm

In this paper, we address the design and implementation of GPU-accelerated Branch-and-Bound algorithms (B&B) for solving Flow-shop scheduling optimization problems (FSP). Such applications are CPU-time consuming and highly irregular. On the other hand, GPUs are massively multi-threaded accelerators using the SIMD model at execution. A major issue which arises when executing on GPU a B&B […]

Sep, 15

Efficient computation of condition estimates for linear least squares problems

Linear least squares (LLS) is a classical linear algebra problem in scientific computing, arising for instance in many parameter estimation problems. In addition to computing efficiently LLS solutions, an important issue is to assess the numerical quality of the computed solution. The notion of conditioning provides a theoretical framework that can be used to measure […]

CUDA

Sep, 15

High-Throughput parallel blind Virtual Screening using BINDSURF

BACKGROUND: Virtual Screening (VS) methods can considerably aid clinical research, predicting how ligands interact with drug targets. Most VS methods suppose a unique binding site for the target, usually derived from the interpretation of the protein crystal structure. However, it has been demonstrated that in many cases, diverse ligands interact with unrelated parts of the […]

CUDA

Sep, 14

Parallelize L-BFGS-B on the GPU

Nonlinear optimization is at the heart of many algorithms in engineering. Recently, due to the rise of general purpose graphics processing unit (GPGPU), it is promising to investigate the performance improvement of optimization methods after parallelized. While much has been done for simple optimization methods such as conjugate gradient, due to the strong dependencies contained, […]

CUDA

Sep, 14

An Optimized Parallel IDCT on Graphics Processing Units

In this paper we present an implementation of the H.264/AVC Inverse Discrete Cosine Transform (IDCT) optimized for Graphics Processing Units (GPUs) using OpenCL. By exploiting that most of the input data of the IDCT for real videos are zero valued coefficients a new compacted data representation is created that allows for several optimizations. Experimental evaluations […]

OpenCL

Sep, 14

Parallel Ray Tracing Simulations with MATLAB for Dynamic Lens Systems

Ray tracing simulations are required for investigating the dynamical behavior of optical systems. By means of image simulations, an exposed image can be generated. However, this requires a high number of rays which have to be traced through an optical system. Since all rays are independent of each other, they can be traced individually using […]

CUDA

Sep, 14

On the Validation and Applications of a Parallel Flexible Multi-Body Dynamics Implementation

This work discusses how a flexible body formalism, specifically, the Absolute Nodal Coordinate Formulation (ANCF), is combined with the Discrete Element Method (DEM) and the Newmark implicit integration method to address many-body dynamics problems; i.e., problems with hundreds of thousands of rigid and deformable bodies. DEM is used to model friction and contact between elements, […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Parallel Implementation of Moving Averages and Stock Market Prediction

Accelerating the Smith-Waterman Algorithm for Bio-sequence Matching on GPU

High Performance Computing on Astrophysics with Artificial Intelligence Algorithms

NT-SIM: A Co-Simulator for Networked Signal Processing Applications

Real-time Kd-tree Based Importance Sampling of Environment Maps

Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm

Efficient computation of condition estimates for linear least squares problems

High-Throughput parallel blind Virtual Screening using BINDSURF

Parallelize L-BFGS-B on the GPU

An Optimized Parallel IDCT on Graphics Processing Units

Parallel Ray Tracing Simulations with MATLAB for Dynamic Lens Systems

On the Validation and Applications of a Parallel Flexible Multi-Body Dynamics Implementation

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)