high performance computing on graphics processing units: hgpu.org

Posts

Sep, 16

Parallel Benefit on Different Programming Paradigms

Multi-core platforms become ubiquitous nowadays. Even laptops contain multi-core processors now. There are multiple cores in a chip or socket or die. A computing node contains multiple chips. Multi-core platforms are rapidly increasing and the number of cores on these platforms is increasing rapidly too. How to enjoy the benefits of parallel computing on the […]

CUDA

Sep, 16

Parallel Implementation of Moving Averages and Stock Market Prediction

In recent years, graphics processing units have made parallel processing affordable with the price of personal desktop computers. This report investigates the computational aspects of calculating simple moving average and exponential moving average operations, two of the most popular financial indicators. In this report, we also investigate the usage of GPU to run artificial neural […]

CUDA

Sep, 16

Accelerating the Smith-Waterman Algorithm for Bio-sequence Matching on GPU

Nowadays, GPU has emerged as one promising computing platform to accelerate bio-sequence analysis applications by exploiting all kinds of parallel optimization strategies. In this paper, we take a well-known algorithm in the field of pair-wise sequence alignment and database searching, the Smith-Waterman (S-W) algorithm as an example, and demonstrate approaches that fully exploit its performance […]

CUDA

Sep, 16

High Performance Computing on Astrophysics with Artificial Intelligence Algorithms

This paper presents the applications that have been developed in astrophysics by using Artificial Intelligence (AI) algorithms and high performance computing and the ongoing research with grid computing. In astrophysics, we deal with the time delay problem. Nowadays, the time delay is estimated from observed data gathered from radio or optical telescopes around the world. […]

CUDA

Sep, 15

NT-SIM: A Co-Simulator for Networked Signal Processing Applications

In networked signal processing systems, network nodes that perform embedded processing on sensory inputs and other data interact across wired or wireless communication networks. In such applications, the processing on individual network nodes can be described in terms of dataflow graphs. However, to analyze the correctness and performance of these applications, designers must understand the […]

CUDA

Sep, 15

Real-time Kd-tree Based Importance Sampling of Environment Maps

We present a new real-time importance sampling algorithm for environment maps. Our method is based on representing environment maps using kd-tree structures, and generating samples with a single data lookup. An efficient algorithm has been developed for realtime image-based lighting applications. In this paper, we compared our algorithm with Inversion method [Fishman 1996]. We show […]

CUDA

Sep, 15

Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm

In this paper, we address the design and implementation of GPU-accelerated Branch-and-Bound algorithms (B&B) for solving Flow-shop scheduling optimization problems (FSP). Such applications are CPU-time consuming and highly irregular. On the other hand, GPUs are massively multi-threaded accelerators using the SIMD model at execution. A major issue which arises when executing on GPU a B&B […]

Sep, 15

Efficient computation of condition estimates for linear least squares problems

Linear least squares (LLS) is a classical linear algebra problem in scientific computing, arising for instance in many parameter estimation problems. In addition to computing efficiently LLS solutions, an important issue is to assess the numerical quality of the computed solution. The notion of conditioning provides a theoretical framework that can be used to measure […]

CUDA

Sep, 15

High-Throughput parallel blind Virtual Screening using BINDSURF

BACKGROUND: Virtual Screening (VS) methods can considerably aid clinical research, predicting how ligands interact with drug targets. Most VS methods suppose a unique binding site for the target, usually derived from the interpretation of the protein crystal structure. However, it has been demonstrated that in many cases, diverse ligands interact with unrelated parts of the […]

CUDA

Sep, 14

Parallelize L-BFGS-B on the GPU

Nonlinear optimization is at the heart of many algorithms in engineering. Recently, due to the rise of general purpose graphics processing unit (GPGPU), it is promising to investigate the performance improvement of optimization methods after parallelized. While much has been done for simple optimization methods such as conjugate gradient, due to the strong dependencies contained, […]

CUDA

Sep, 14

An Optimized Parallel IDCT on Graphics Processing Units

In this paper we present an implementation of the H.264/AVC Inverse Discrete Cosine Transform (IDCT) optimized for Graphics Processing Units (GPUs) using OpenCL. By exploiting that most of the input data of the IDCT for real videos are zero valued coefficients a new compacted data representation is created that allows for several optimizations. Experimental evaluations […]

OpenCL

Sep, 14

Parallel Ray Tracing Simulations with MATLAB for Dynamic Lens Systems

Ray tracing simulations are required for investigating the dynamical behavior of optical systems. By means of image simulations, an exposed image can be generated. However, this requires a high number of rays which have to be traced through an optical system. Since all rays are independent of each other, they can be traced individually using […]

CUDA

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Parallel Benefit on Different Programming Paradigms

Parallel Implementation of Moving Averages and Stock Market Prediction

Accelerating the Smith-Waterman Algorithm for Bio-sequence Matching on GPU

High Performance Computing on Astrophysics with Artificial Intelligence Algorithms

NT-SIM: A Co-Simulator for Networked Signal Processing Applications

Real-time Kd-tree Based Importance Sampling of Environment Maps

Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm

Efficient computation of condition estimates for linear least squares problems

High-Throughput parallel blind Virtual Screening using BINDSURF

Parallelize L-BFGS-B on the GPU

An Optimized Parallel IDCT on Graphics Processing Units

Parallel Ray Tracing Simulations with MATLAB for Dynamic Lens Systems

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)