high performance computing on graphics processing units: hgpu.org

Posts

Feb, 16

Towards Porting a Real-World Seismological Application to the Intel MIC Architecture

This whitepaper aims to discuss first experiences with porting an MPI-based real-world geophysical application to the new Intel Many Integrated Core (MIC) architecture. The selected code SeisSol is an application written in Fortran that can be used to simulate earthquake rupture and radiating seismic wave propagation in complex 3-D heterogeneous materials. The PRACE prototype cluster […]

Feb, 16

Direct Numerical Simulation and Large Eddy Simulation on a Turbulent Wall-Bounded Flow Using Lattice Boltzmann Method and Multiple GPUs

Direct numerical simulation (DNS) and large eddy simulation (LES) were performed on the wall-bounded flow at Re_tau = 180 using lattice Boltzmann method (LBM) and multiple Graphic Processing Units (GPUs). In the DNS, 8 K20M GPUs were adopted. The maximum number of meshes is 6.7×10^7, which results in the non-dimensional mesh size of Delta+=1.41 for […]

CUDA

Feb, 16

Cuda K-Nn: application to the segmentation of the retinal vasculature within SD-OCT volumes of mice

In this work, a speed comparison between GPU-based CUDA k-NN implementation and the ANN implementation has been tested on three sets of medical imaging data. The results show that with higher dimensional data, CUDA-based k-NN approach could have up to two orders of magnitude of speed up. Otherwise, ANN would be a better implementation to […]

CUDA

Feb, 16

Application of the Characteristic Basis Function Method using CUDA

The Characteristic Basis Function Method (CBFM) is a popular technique for efficiently solving the Method of Moments (MoM) matrix equations. In this work, we address the adaptation of this method to a relatively new computing infrastructure provided by NVIDIA, the Compute Unified Device Architecture (CUDA), and take into account some of the limitations which appear […]

CUDA

Feb, 16

LDetector: A Low Overhead Race Detector For GPU Programs

Data race detection is an important problem in GPU programming. The paper presents a novel solution. It uses the compiler support to privatize shared data and then at run time parallelizes the race checking. It has two distinct features. First, there is no per access monitoring, so the race detection has a low overhead and […]

CUDA

Feb, 15

ADBIS workshop on GPUs In Databases, GID 2014

High performance of modern Graphics Processing Units may be utilized not only for graphics related application but also for general computing. This computing power has been utilized in new variants of many algorithms from almost every computer science domain. Unfortunately, while other application domains strongly benefit from utilizing the GPUs, databases related applications seem not […]

Feb, 15

High-Performance Graphics 2014

High-Performance Graphics is the leading international forum for performance-oriented graphics and imaging systems research including innovative algorithms, efficient implementations, languages, parallelism, compilers, parallelism, hardware and architectures for high-performance graphics. High-Performance Graphics was founded in 2009 to synthesize and broaden two important and well-respected conferences in computer graphics: Graphics Hardware and Interactive Ray Tracing. The conference […]

Feb, 15

Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths

Finding the shortest paths from a single source to all other vertices is a fundamental method used in a variety of higher-level graph algorithms. We present three parallel-friendly and work-efficient methods to solve this Single-Source Shortest Paths (SSSP) problem: Workfront Sweep, Near-Far and Bucketing. These methods choose different approaches to balance the tradeoff between saving […]

CUDA

Feb, 15

GPGPUs: How to Combine High Computational Power with High Reliability

GPGPUs are increasingly used in several domains, from gaming to different kinds of computationally intensive applications. In some cases, their reliability is becoming a serious issue and several research activities are focusing on its evaluation. This paper aims at overviewing some major results in the area. First, it shows and analyzes the results of some […]

CUDA

Feb, 15

Computation of Galois field expressions for quaternary logic functions on GPUs

Galois field (GF) expressions are polynomials used as representations of multiple-valued logic (MVL) functions. For this purpose, MVL functions are considered as functions defined over a finite (Galois) field of order p – GF(p). The problem of computing these functional expressions has an important role in areas such as digital signal processing and logic design. […]

OpenCL

Feb, 15

Optimizing exact computation of Betweenness Centrality for CUDA

Betweenness centrality is an important metric in the study of network analysis. This report discusses the problem of exact computation of betweenness cenrality index in network analysis. BC is an important metric in small world network analysis which is expensive to compute. A new strategy is presented to parallelize the best known serial algorithm for […]

CUDA

Feb, 15

Accelerator Aware MPI Micro-benchmarking using CUDA, OpenACC and OpenCL

Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrated Core (MIC) and nVidia GPU. This has been accomplished by changes to different levels of the software stacks and MPI implementations. In order to evaluate performance and scalability of accelerator aware MPI libraries, we developed portable micro-benchmarks to identify factors that influence […]

CUDA

•

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Towards Porting a Real-World Seismological Application to the Intel MIC Architecture

Direct Numerical Simulation and Large Eddy Simulation on a Turbulent Wall-Bounded Flow Using Lattice Boltzmann Method and Multiple GPUs

Cuda K-Nn: application to the segmentation of the retinal vasculature within SD-OCT volumes of mice

Application of the Characteristic Basis Function Method using CUDA

LDetector: A Low Overhead Race Detector For GPU Programs

ADBIS workshop on GPUs In Databases, GID 2014

High-Performance Graphics 2014

Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths

GPGPUs: How to Combine High Computational Power with High Reliability

Computation of Galois field expressions for quaternary logic functions on GPUs

Optimizing exact computation of Betweenness Centrality for CUDA

Accelerator Aware MPI Micro-benchmarking using CUDA, OpenACC and OpenCL

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)