high performance computing on graphics processing units: hgpu.org

Posts

Jul, 22

GPU accelerated computation of Polarized Subsurface BRDF for Flat Particulate Layers

BRDF of most real world materials has two components, the surface BRDF due to the light reflecting at the surface of the material and the subsurface BRDF due to the light entering and going through many scattering events inside the material. Each of these events modifies light’s path, power, polarization state. Computing polarized subsurface BRDF […]

OpenCL

Jul, 22

Parallelization of an Unsteady ALE Solver with Deforming Mesh Using OpenACC

This paper presents a parallel, GPU-based, deforming mesh-enabled unsteady numerical solver for solving moving body problems by using OpenACC. Both the 2D and 3D parallel algorithms based on spring-like deforming mesh methods are proposed and then implemented through OpenACC programming model. Furthermore, these algorithms are coupled with an unstructured mesh based, implicit time scheme integrated […]

Jul, 22

Automatically Selecting Profitable Thread Block Sizes Using Machine Learning

Graphics processing units (GPUs) provide high performance at low power consumption as long as resources are well utilized. Thread block size is one factor in determining a kernel’s occupancy, which is a metric for measuring GPU utilization. A general guideline is to find the block size that leads to the highest occupancy. However, many combinations […]

CUDA

Jul, 19

GPU LSM: A Dynamic Dictionary Data Structure for the GPU

We develop and implement a concurrent dictionary data structure based on the Log Structured Merge tree (LSM), suitable for current massively parallel GPU architectures. Our GPU LSM is dynamic (mutable) in that it provides fast updates (insertions and deletions). For example, on an NVIDIA K40c GPU we can get an average update rate of 225 […]

CUDA

Jul, 19

Termination Analysis for GPU Kernels

We describe a thread-modular technique for proving termination of massively parallel GPU kernels. The technique reduces the termination problem for these kernels to a sequential termination problem by abstracting the shared state, and as such allows us to leverage termination analysis techniques for sequential programs. An implementation in KITTeL is able to show termination of […]

CUDA

•

OpenCL

Jul, 19

Parallelization and Performance of the NIM Weather Model on CPU, GPU and MIC Processors

Next-generation super-computers containing millions of processors will require weather prediction models be designed and developed by teams of scientists, software engineers, and parallelization experts so they are portable and run efficiently on increasingly diverse HPC systems. The design and performance of the NIM global weather prediction model is described. NIM is a dynamical core designed […]

Jul, 19

Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression

We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using […]

CUDA

Jul, 19

Fine-Grain Acceleration of Graph Algorithms on a Heterogeneous Chip

With the rise of heterogeneous chips available in the market, where integrated GPU cores and CPU cores reside in the same chip and share a unified memory, it is possible to have better execution schemes for many graph algorithms. Graph algorithms can exhibit producer-consumer behavior, a varying amount of parallelism during execution, and irregularity which […]

OpenCL

Jul, 15

International Conference on Image and Graphics Processing (ICIGP), 2018

Aiming at bringing together researchers and practitioners from academia and industry to exchange technical knowledge and boost technical and educational collaboration activities within the related fields of image and graphics processing, 2018 International Conference on Image and Graphics Processing (ICIGP 2018) will be held in Hong Kong during February 24-26, 2018. It is sponsored by […]

Jul, 15

8th International Conference on Bioscience, Biochemistry and Bioinformatics (ICBBB), 2018

ICBBB conference series held annually to provide an interactive forum for presentation and discussion on Bioscience, Biochemistry and Bioinformatics. The conference welcomes participants from all over the world who are interested in developing professional ties to and/or exploring career opportunities in the region. The conference should serve as an ideal forum to establish relationships from […]

Jul, 15

The 10th International Conference on Computer Modeling and Simulation (ICCMS), 2018

The 10th International Conference on Computer Modeling and Simulation is the main annual research conference aims to bring together researchers around the world to exchange research results and address open issues in all aspects of Computer Modeling and Simulation. The ICCMS 2009-2017 were held in Macau, Sanya, Mumbai, Hong Kong, Rome, Barcelona, Amsterdam, Brisbane, and […]

Jul, 15

4th International Conference on Virtual Reality (ICVR), 2018

2018 4th International Conference on Virtual Reality (ICVR 2018) will be held during February 24-26, 2018 in Hong Kong. ICVR 2018 will bring together top researchers from Asian Pacific areas, North America, Europe and all around the world to exchange research results and address open issues in all aspects of Virtual Reality. Publication Accepted papers […]

high performance computing on graphics processing units: hgpu.org

Posts

GPU accelerated computation of Polarized Subsurface BRDF for Flat Particulate Layers

Parallelization of an Unsteady ALE Solver with Deforming Mesh Using OpenACC

Automatically Selecting Profitable Thread Block Sizes Using Machine Learning

GPU LSM: A Dynamic Dictionary Data Structure for the GPU

Termination Analysis for GPU Kernels

Parallelization and Performance of the NIM Weather Model on CPU, GPU and MIC Processors

Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression

Fine-Grain Acceleration of Graph Algorithms on a Heterogeneous Chip

International Conference on Image and Graphics Processing (ICIGP), 2018

8th International Conference on Bioscience, Biochemistry and Bioinformatics (ICBBB), 2018

The 10th International Conference on Computer Modeling and Simulation (ICCMS), 2018

4th International Conference on Virtual Reality (ICVR), 2018

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)