high performance computing on graphics processing units: hgpu.org

Posts

Dec, 14

Visualizing Complex Functions Using GPUs

This document explains some common methods of visualizing complex functions and how to implement them on the GPU. Using the fragment shader, we visualize complex functions in the complex plane with the domain coloring method. Then using the vertex shader, we visualize complex functions defined on a unit sphere like spherical harmonics. Finally, we redesign […]

OpenGL

Dec, 14

A Static Load Balancing Scheme for Parallel Volume Rendering on Multi-GPU Clusters

GPU-based clusters are an attractive option for parallel volume rendering. One of the key issues in parallel volume rendering is load balancing, keeping a balanced workload per node is essential for improving performance. A good number of dynamic load balancing schemes have been proposed throughout the years. However, most of these approaches require runtime dynamic […]

CUDA

Dec, 12

Towards Domain-specific Computing for Stencil Codes in HPC

High Performance Computing (HPC) systems are nowadays more and more heterogeneous. Different processor types can be found on a single node including accelerators such as Graphics Processing Units (GPUs). To cope with the challenge of programming such complex systems, this work presents a domain-specific approach to automatically generate code tailored to different processor types. Low-level […]

CUDA

•

OpenCL

Dec, 12

Development of a CUDA Implementation of the 3D FDTD Method

The use of general-purpose computing on a GPU is an effective way to accelerate the FDTD method. This paper introduces flexibility to the theoretically best available approach. It examines the performance on both Tesla- and Fermi-architecture GPUs, and identifies the best way to determine the GPU parameters for the proposed method.

CUDA

Dec, 12

Multi-level Parallelism for Incompressible Flow Computations on GPU Clusters

We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues […]

CUDA

Dec, 12

Fast and Robust Linear Motion Deblurring

We investigate efficient algorithmic realisations for robust deconvolution of grey-value images with known space-invariant point-spread function, with emphasis on 1D motion blur scenarios. The goal is to make deconvolution suitable as preprocessing step in automated image processing environments with tight time constraints. Candidate deconvolution methods are selected for their restoration quality, robustness and efficiency. Evaluation […]

CUDA

Dec, 12

Hybrid Parallel Streamline Extraction Combining MPI and OpenCL

Recently scientific simulation application take advantage of modern accelerator technology more and more. For in-situ visualization techniques especially in this case scalability will become an issue. In this work we present a scalability evaluation for a hybrid parallelized streamline extraction algorithm.

OpenCL

Dec, 12

ACM International Conference on Computing Frontiers, CF2013

The Computing Frontiers conference focuses on a wide spectrum of advanced methodologies, technologies and radically new solutions relevant to the development of the whole spectrum of computing over the next few decades. The goals of the meeting range over from embedded to high-performance computing. We seek contributions on novel algorithms, computing paradigms, computational models, application […]

Dec, 12

CUDACLAW: a Data Parallel Solution Framework for Hyperbolic PDEs

We present CUDACLAW, a data-parallel solution framework for 2D and 3D hyperbolic partial differential equation (PDE) systems. CUDACLAW is a finite volume method based on time adaptive point-wise Riemann problem solvers, and can handle linear and nonlinear problems. The framework is tailored for the GPU architecture, optimized to take advantage of the powerful computational potential, […]

CUDA

Dec, 12

FFT Parallel Implementation for MRI Image Reconstruction

This paper describes FFT Cooley-Tukey algorithm implementation used in MRI image reconstruction on a revolutionary parallel computing machine, Connex Array. By taking advantage of it’s vectorial structure and processing manner, MRI image reconstruction was much faster than most of usual MRI commercial scanners. Results are remarkable. Our proposal in this paper is the use of […]

CUDA

Dec, 12

Gaussian Mixture Model Based Volume Visualization

Representing uncertainty when creating visualizations is becoming more indispensable to understand and analyze scientific data. Uncertainty may come from different sources, such as, ensembles of experiments or unavoidable information loss when performing data reduction. One natural model to represent uncertainty is to assume that each position in space instead of a single value may take […]

CUDA

Dec, 12

Multi-level Debugging for Multi-stage, Parallelizing Compilers

A multi-stage compilation framework transforms portions of programs written in a productivity-level language into an efficiency-level language, such as C, with explicit hardware-specific optimizations. It is challenging for compiler programmers to debug errors in the compilation because they must perform complicated end-to-end reasoning, relating the programs across the multiple stages of compilation. To simplify this […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Visualizing Complex Functions Using GPUs

A Static Load Balancing Scheme for Parallel Volume Rendering on Multi-GPU Clusters

Towards Domain-specific Computing for Stencil Codes in HPC

Development of a CUDA Implementation of the 3D FDTD Method

Multi-level Parallelism for Incompressible Flow Computations on GPU Clusters

Fast and Robust Linear Motion Deblurring

Hybrid Parallel Streamline Extraction Combining MPI and OpenCL

ACM International Conference on Computing Frontiers, CF2013

CUDACLAW: a Data Parallel Solution Framework for Hyperbolic PDEs

FFT Parallel Implementation for MRI Image Reconstruction

Gaussian Mixture Model Based Volume Visualization

Multi-level Debugging for Multi-stage, Parallelizing Compilers

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)