high performance computing on graphics processing units: hgpu.org

Posts

May, 29

Efficient GPU-based Graph Cuts for Stereo Matching

Although graph cuts (GC) is popularly used in many computer vision problems, slow execution time due to its high complexity hinders wide usage. Manycore solution using Graphics Processing Unit (GPU) may solve this problem. However, conventional GC implementation does not fully exploit GPU’s computing power. To address this issue, a new GC algorithm which is […]

CUDA

May, 29

Parallelization of Mesh Contraction and Fairing using OpenCL

We propose a parallel method for computing local Laplacian curvature flows for triangular meshes. Laplace operator is widely used in mesh processing for mesh fairing, noise removal or curvature estimation. If the Laplacian flow is used in global sense constraining a whole mesh with an iterative weighted linear system, it can be used even for […]

OpenCL

May, 28

Effects of Concurrency Techniques and Algorithm Performance: A Comparative Analysis of Single-Threaded, Multi-Threaded, and GPGPU Programming Techniques

Deployment of parallel architectures in computing systems is increasing. In this paper we study the performance effects of a variety of programming techniques and technologies that utilize these parallel architectures as applied to example algorithms. We demonstrate that algorithms, which are highly parallel in nature, gain significant performance increases through proper application of both parallel […]

OpenCL

May, 28

MATLAB Medical Images Classification on Graphics Processors

Due to their massively parallel hardware design, graphic processors can easily beat ordinary CPUs in applications which involve large amount of data. Considering their great potential, the objective of this paper is to continue previous work and optimize the speed and efficiency of texture and fractal analysis, as used for medical images classification processes for […]

CUDA

May, 28

Power Modeling and Optimization for GPGPUs

State-of-the-art General-Purpose computing on Graphics Processing Unit (GPGPU) is facing severe power challenge due to the increasing number of cores placed on a chip with decreasing feature size. In order to hide the long latency operations, GPGPU employs the fine-grained multi-threading among numerous active threads, leading to the sizeable register files with massive power consumption. […]

May, 28

On Leveraging GPUs for Security: discussing k-anonymity and pattern matching

In recent years the need to solve complex problems that require large computing resources in shorter time has especially arisen. Some of these in the scientific field are: weather forecast, seismic simulations, chemical reactions simulation and studies on the human genoma [1]. All of them belong to the "Grand Challenge Problems" set. As can be […]

OpenCL

May, 28

Analysis of Parallel Montgomery Multiplication in CUDA

For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose […]

CUDA

May, 27

Performance Portability in Accelerated Parallel Kernels

Heterogeneous architectures, by definition, include multiple processing components with very different microarchitectures and execution models.In particular, computing platforms from supercomputers to smartphones can now incorporate both CPU and GPU processors. Disparities between CPU and GPU processor architectures have naturally led to distinct programming models and development patterns for each component.Developers for a specific system decompose […]

OpenCL

May, 27

A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs

This paper presents a performance modeling and optimization analysis tool to predict and optimize the performance of sparse matrix-vector multiplication (SpMV) on GPUs. We make the following contributions: (1) We present an integrated analytical and profile-based performance modeling to accurately predict the kernel execution times of CSR, ELL, COO, and HYB SpMV kernels. Our proposed […]

CUDA

May, 27

Rapid Computation of Sodium Bioscales Using GPU-Accelerated Image Reconstruction

Quantitative sodium magnetic resonance imaging permits noninvasive measurement of the tissue sodium concentration (TSC) bioscale in the brain. Computing the TSC bioscale requires reconstructing and combining multiple datasets acquired with a non-Cartesian acquisition that highly oversamples the center of k-space. Even with an optimized implementation of the algorithm to compute TSC, the overall processing time […]

CUDA

May, 27

Trapping of giant-planet cores – I. vortex aided trapping at the outer dead zone edge

In this paper the migration of a 10 Earth mass planetary core is investigated at the outer boundary of the dead zone of a protoplanetary disc by means of 2D hydrodynamic simulations done with the GPU version of the FARGO code. In the dead zone the effective viscosity is greatly reduced due to the disc […]

CUDA

May, 27

Scaling Radio Astronomy Signal Correlation on Heterogeneous Supercomputers Using Various Data Distribution Methodologies

Next generation radio telescopes will require orders of magnitude more computing power to provide a view of the universe with greater sensitivity. In the initial stages of the signal processing flow of a radio telescope, signal correlation is one of the largest challenges in terms of handling huge data throughput and intensive computations. We implemented […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Efficient GPU-based Graph Cuts for Stereo Matching

Parallelization of Mesh Contraction and Fairing using OpenCL

Effects of Concurrency Techniques and Algorithm Performance: A Comparative Analysis of Single-Threaded, Multi-Threaded, and GPGPU Programming Techniques

MATLAB Medical Images Classification on Graphics Processors

Power Modeling and Optimization for GPGPUs

On Leveraging GPUs for Security: discussing k-anonymity and pattern matching

Analysis of Parallel Montgomery Multiplication in CUDA

Performance Portability in Accelerated Parallel Kernels

A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs

Rapid Computation of Sodium Bioscales Using GPU-Accelerated Image Reconstruction

Trapping of giant-planet cores – I. vortex aided trapping at the outer dead zone edge

Scaling Radio Astronomy Signal Correlation on Heterogeneous Supercomputers Using Various Data Distribution Methodologies

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)