high performance computing on graphics processing units: hgpu.org

Posts

Feb, 19

Large-Scale Geospatial Processing on Multi-Core and Many-Core Processors: Evaluations on CPUs, GPUs and MICs

Geospatial Processing, such as queries based on point-to-polyline shortest distance and point-in-polygon test, are fundamental to many scientific and engineering applications, such as post-processing large-scale environmental and climate model outputs and analyzing traffic and travel patterns from massive GPS collections in transportation engineering and urban studies. Commodity parallel hardware, such as multi-core CPUs, many-core GPUs […]

CUDA

Feb, 19

Using of GPUs for cluster analysis of large data by K-means method

This problem was solved within the framework of the grant project "Solving of problems of cluster analysis with application of parallel algorithms and cloud technologies" in the Institute of Mathematics and Mathematical Modelling in Almaty. The problem of cluster analysis for the large amount of data is very important in different areas of science – […]

CUDA

Feb, 19

Parallel algorithms for problems of cluster analysis with very large amount of data

In this paper we solve on GPUs massive problems with large amount of data, which are not appropriate for solution with the SIMD technology. For the given problem we consider a three-level parallelization. The multithreading of CPU is used at the top level and graphic processors for massive computing. For solving problems of cluster analysis […]

CUDA

Feb, 19

Fast Hamiltonian Monte Carlo Using GPU Computing

In recent years, the Hamiltonian Monte Carlo (HMC) algorithm has been found to work more efficiently compared to other popular Markov Chain Monte Carlo (MCMC) methods (such as random walk Metropolis-Hastings) in generating samples from a posterior distribution. A general framework for HMC based on the use of graphical processing units (GPUs) is shown to […]

CUDA

Feb, 17

Towards a Performance-Portable FFT Library for Heterogeneous Computing

The fast Fourier transform (FFT), a spectral method that computes the discrete Fourier transform and its inverse, pervades many applications in digital signal processing, such as imaging, tomography, and software-defined radio. Its importance has caused the research community to expend significant resources to accelerate the FFT, of which FFTW is the most prominent example. With […]

OpenCL

Feb, 17

A Similarity-Based Analysis Tool for Scientific Application Porting

Porting applications to a new system is a nontrivial job in the HPC field. It is a very time-consuming, labor-intensive process, and the quality of the results will depend critically on the experience of the experts involved. In order to ease the porting process, we propose a methodology to address an important aspect of software […]

CUDA

Feb, 17

The battle of the giants: a case study of GPU vs FPGA optimisation for real-time image processing

This paper focuses on a thorough comparison of the two main hardware targets for real-time optimization of a computer vision algorithm: GPU and FPGA. Based on a complex case study algorithm for threaded isle detection, implementation on both hardware targets is compared in terms of resulting time performance, code translation effort, hardware cost, power efficiency […]

OpenCL

Feb, 17

Petascale elliptic solvers for anisotropic PDEs on GPU clusters

Memory bound applications such as solvers for large sparse systems of equations remain a challenge for GPUs. Fast solvers should be based on numerically efficient algorithms and implemented such that global memory access is minimised. To solve systems with up to one trillion (10^12) unknowns the code has to make efficient use of several million […]

CUDA

Feb, 17

GPU Programming with CUDA: A brief overview

In this paper we describe the architecture of a NVIDIA GPU, as well as the CUDA programming model. The basic statements are explained. We also provide an example of CUDA code, explaining its execution workflow in a GPU device.

CUDA

Feb, 17

Optimizing Performance of Stencil Code with SPL Conqueror

A standard technique to numerically solve elliptic partial differential equations on structured grids is to discretize them via finite differences and then to apply an efficient geometric multi-grid solver. Unfortunately, finding the optimal choice of multi-grid components and parameters is challenging and platform dependent, especially, in cases where domain knowledge is incomplete. Auto-tuning is a […]

CUDA

•

OpenCL

Feb, 17

Interactive Design Exploration for Constrained Meshes

In architectural design, surface shapes are commonly subject to geometric constraints imposed by material, fabrication or assembly. Rationalization algorithms can convert a freeform design into a form feasible for production, but often require design modifications that might not comply with the design intent. In addition, they only offer limited support for exploring alternative feasible shapes, […]

CUDA

Feb, 17

Efficient pseudo-random number generation for monte-carlo simulations using graphic processors

A hybrid approach based on the combination of three Tausworthe generators and one linear congruential generator for pseudo random number generation for GPU programing as suggested in NVIDIA-CUDA library has been used for MONTE-CARLO sampling. On each GPU thread, a random seed is generated on fly in a simple way using the quick and dirty […]

CUDA

•

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Large-Scale Geospatial Processing on Multi-Core and Many-Core Processors: Evaluations on CPUs, GPUs and MICs

Using of GPUs for cluster analysis of large data by K-means method

Parallel algorithms for problems of cluster analysis with very large amount of data

Fast Hamiltonian Monte Carlo Using GPU Computing

Towards a Performance-Portable FFT Library for Heterogeneous Computing

A Similarity-Based Analysis Tool for Scientific Application Porting

The battle of the giants: a case study of GPU vs FPGA optimisation for real-time image processing

Petascale elliptic solvers for anisotropic PDEs on GPU clusters

GPU Programming with CUDA: A brief overview

Optimizing Performance of Stencil Code with SPL Conqueror

Interactive Design Exploration for Constrained Meshes

Efficient pseudo-random number generation for monte-carlo simulations using graphic processors

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)