high performance computing on graphics processing units: hgpu.org

Posts

Dec, 31

Accelerator weather forecasting

Advection is the transport of a quantity due to fluid flow, and is an important, computationally intensive part of any fluid simulation. OpenACC GPU acceleration of the advection components of MONC, an atmospheric LES, was pursued. Although this yielded no speedup, the reasons for this are examined, and the conditions under which it may become […]

Dec, 31

Study of basic vector operations on Intel Xeon Phi and NVIDIA Tesla using OpenCL

The present work is an analysis of the performance of the basic vector operations AXPY, DOT and SpMV using OpenCL. The code was tested on the NVIDIA Tesla S2050 GPU and Intel Xeon Phi 3120A coprocessor. Due to the nature of the AXPY function, only two versions were implemented, the routine to be executed by […]

OpenCL

Dec, 31

A Deep Generative Deconvolutional Image Model

A deep generative model is developed for representation and analysis of images, based on a hierarchical convolutional dictionary-learning framework. Stochastic unpooling is employed to link consecutive layers in the model, yielding top-down image generation. A Bayesian support vector machine is linked to the top-layer features, yielding max-margin discrimination. Deep deconvolutional inference is employed when testing, […]

CUDA

Dec, 31

Parallel 3D Fast Wavelet Transform comparison on CPUs and GPUs

We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multicore CPUs and manycore GPUs. On the GPU side, we focus on CUDA and OpenCL programming to develop methods for an efficient mapping on manycores. On multicore CPUs, OpenMP and Pthreads are used as counterparts to maximize parallelism, and renowned […]

CUDA

•

OpenCL

Dec, 31

Accelerating Fluids Simulation Using SPH and Implementation on GPU

Fluids simulation is usually done with CFD methods which offers high precision but needs days/weeks/months to compute on desktop CPUs which limits the practical use in industrial control systems. In order to reduce the computation time Smoothed Particle Hydrodynamics (SPH) method is used. SPH is commonly used to simulate fluids in computer graphics field, especially […]

CUDA

Dec, 23

Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines

Deep learning (DL) has achieved notable successes in many machine learning tasks. A number of frameworks have been developed to expedite the process of designing and training deep neural networks (DNNs), such as Caffe, Torch and Theano. Currently they can harness multiple GPUs on a single machine, but are unable to use GPUs that are […]

Dec, 23

On the Way to Future’s High Energy Particle Physics Transport Code

High Energy Physics (HEP) needs a huge amount of computing resources. In addition data acquisition, transfer, and analysis require a well developed infrastructure too. In order to prove new physics disciplines it is required to higher the luminosity of the accelerator facilities, which produce more-and-more data in the experimental detectors. Both testing new theories and […]

OpenCL

Dec, 23

On the Development and Implementation of High-Order Flux Reconstruction Schemes for Computational Fluid Dynamics

High-order numerical methods for unstructured grids combine the superior accuracy of high-order spectral or finite difference methods with the geometric flexibility of low-order finite volume or finite element schemes. The Flux Reconstruction (FR) approach unifies various high-order schemes for unstructured grids within a single framework. Additionally, the FR approach exhibits a significant degree of element […]

CUDA

•

OpenCL

Dec, 23

Real time mitigation of atmospheric turbulence in long distance imaging using the lucky region fusion algorithm with FPGA and GPU hardware acceleration

"Lucky-region" fusion (LRF) is a synthetic imaging technique that has proven successful in enhancing the quality of images distorted by atmospheric turbulence. The LRF algorithm selects sharp regions of an image obtained from a series of short exposure frames, and fuses the sharp regions into a final, improved image. In previous research, the LRF algorithm […]

OpenCL

Dec, 23

SqueezCL: Squeezing OpenCL Kernels for Approximate Computing on Contemporary GPUs

Approximate computing provides an opportunity for exploiting application characteristics to improve performance of computing systems. However, such opportunity must be balanced against generality of methods and quality guarantees that the system designer can provide to the application developer. Improved parallel processing in graphics processing units (GPUs) provides one such means for data-level parallel applications. We […]

OpenCL

Dec, 22

OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures

The proliferation of heterogeneous computing platforms presents the parallel computing community with new challenges. One such challenge entails evaluating the efficacy of such parallel architectures and identifying the architectural innovations that ultimately benefit applications. To address this challenge, we need benchmarks that capture the execution patterns (i.e., dwarfs or motifs) of applications, both present and […]

OpenCL

Dec, 22

GPU-accelerated Bernstein-Bezier discontinuous Galerkin methods for wave problems

We evaluate the computational performance of the Bernstein-Bezier basis for discontinuous Galerkin (DG) discretizations and show how to exploit properties of derivative and lift operators specific to Bernstein polynomials. Issues of efficiency and numerical stability are discussed in the context of a model wave propagation problem. We compare the performance of Bernstein-Bezier kernels to both […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Accelerator weather forecasting

Study of basic vector operations on Intel Xeon Phi and NVIDIA Tesla using OpenCL

A Deep Generative Deconvolutional Image Model

Parallel 3D Fast Wavelet Transform comparison on CPUs and GPUs

Accelerating Fluids Simulation Using SPH and Implementation on GPU

Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines

On the Way to Future’s High Energy Particle Physics Transport Code

On the Development and Implementation of High-Order Flux Reconstruction Schemes for Computational Fluid Dynamics

Real time mitigation of atmospheric turbulence in long distance imaging using the lucky region fusion algorithm with FPGA and GPU hardware acceleration

SqueezCL: Squeezing OpenCL Kernels for Approximate Computing on Contemporary GPUs

OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures

GPU-accelerated Bernstein-Bezier discontinuous Galerkin methods for wave problems

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)