high performance computing on graphics processing units: hgpu.org

Posts

Apr, 2

Parallel implementation of the Finite-Difference Time-Domain method in Open Computing Language

In this paper we evaluate the usability and performance of Open Computing Language (OpenCL) targeted for implementation of the Finite-Difference Time-Domain (FDTD) method. The simulation speed was compared to implementations based on alternative techniques of parallel processor programming. Moreover, the portability of OpenCL FDTD code between modern computing architectures was assessed. The average speed of […]

CUDA

•

OpenCL

Apr, 2

Speeding-up Pearson Correlation Coefficient calculation on graphical processing units

Sample correlation coefficient is used widely for finding signal similarity in data processing, multimedia, pattern recognition and artificial intelligence applications. Pearson Correlation Coefficient is the most common measure for the correlation coefficient between discrete signals. Similarity search in huge pattern databases require a fast way of calculating the correlation coefficient between numerical vectors. In this […]

CUDA

•

OpenCL

Apr, 2

GPU-Enabled AI

GPU-enabled AI is a subset of so- called general-purpose GPU computing (GPGPU). But it promises to be one of the fastest-growing subsets. The rise of cloud computing, recent high-powered graphics-chip releases by AMD’s competitor Nvidia, and the growing acceptance of the OpenCL programming platform have all converged to allow GPU-enabled AI to take off in […]

OpenCL

Apr, 2

Uncertainty-Aware Guided Volume Segmentation

Although direct volume rendering is established as a powerful tool for the visualization of volumetric data, efficient and reliable feature detection is still an open topic. Usually, a tradeoff between fast but imprecise classification schemes and accurate but time-consuming segmentation techniques has to be made. Furthermore, the issue of uncertainty introduced with the feature detection […]

OpenCL

•

OpenGL

Apr, 2

A characterization and analysis of PTX kernels

General purpose application development for GPUs (GPGPU) has recently gained momentum as a cost-effective approach for accelerating data- and compute-intensive applications. It has been driven by the introduction of C-based programming environments such as NVIDIA’s CUDA, OpenCL, and Intel’s Ct. While significant effort has been focused on developing and evaluating applications and software tools, comparatively […]

CUDA

Apr, 2

Parallel computing with CUDA

Summary form only given. NVIDIA’s CUDA architecture provides a powerful platform for writing highly parallel programs. By providing simple abstractions for hierarchical thread organization, memories, and synchronization, the CUDA programming model allows programmers to write scalable programs without the burden of learning a multitude of new programming constructs. The CUDA architecture can support many languages […]

CUDA

Apr, 1

Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures

New high performance computing (HPC) applications recently have to face scalability over an increasing number of nodes and the programming of special accelerator hardware. Hybrid composition of large computing systems leads to a new dimension in complexity of software development. This paper presents a novel approach to gain insight into accelerator interaction and utilization without […]

CUDA

•

OpenCL

Apr, 1

Message Passing Interface support for the runtime adaptive multi-processor system-on-chip RAMPSoC

Parallel processor architectures are a promising solution to provide the required computing performance for current and future high performance applications. Certainly, the impact on the computational power of such a parallel computer system is related to the inherent parallelism of the algorithm to be implemented. The implementation of an algorithm onto a parallel computer architecture, […]

CUDA

•

OpenCL

•

OpenGL

Apr, 1

Remote Sensing Processing: From Multicore to GPU

As the amount of data and the complexity of the processing rise, the demand for processing power in remote sensing applications is increasing. The processing speed is a critical aspect to enable a productive interaction between the human operator and the machine in order to achieve ever more complex tasks satisfactorily. Graphic processing units (GPU) […]

OpenCL

Apr, 1

Poster: GPU-accelerated rigid body fitting of atomic structures into electron density maps

Three initial fits of 1ubi in a 6.6A resolution synthesized density map had backbone RMSDs to the correct placement of 2.7, 2.9 and 6.6A. They have been refined with a Powell optimizer [5] in 10 iterations using 6 directions, 3 rotations a, beta with 0.15 radians and gamma with 0.075 radians starting direction to cover […]

OpenCL

Apr, 1

A Dynamic Resource Management and Scheduling Environment for Embedded Multimedia and Communications Platforms

We present a framework, OpenCLosE, for dynamic resource management and scheduling of applications written in open compute language (OpenCL) for heterogeneous multimedia and graphics platforms, such as those found in multimedia smartphones and automotive infotainment clusters. We describe the design of a resource manager and master scheduler for the OpenCLosE environment, that allows efficient realization […]

OpenCL

Apr, 1

Poster: GPU-accelerated artificial neural network for QSAR modeling

Here, we present a GPU-accelerated OpenCL implementation of a back-propagation artificial neural network for the creation of QSAR models for drug discovery and virtual high-throughput screening. A QSAR model for HSD achieved an enrichment of 5.9 and area under the curve of 0.83 on an independent data set which signifies sufficient predictive ability for virtual […]

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Parallel implementation of the Finite-Difference Time-Domain method in Open Computing Language

Speeding-up Pearson Correlation Coefficient calculation on graphical processing units

GPU-Enabled AI

Uncertainty-Aware Guided Volume Segmentation

A characterization and analysis of PTX kernels

Parallel computing with CUDA

Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures

Message Passing Interface support for the runtime adaptive multi-processor system-on-chip RAMPSoC

Remote Sensing Processing: From Multicore to GPU

Poster: GPU-accelerated rigid body fitting of atomic structures into electron density maps

A Dynamic Resource Management and Scheduling Environment for Embedded Multimedia and Communications Platforms

Poster: GPU-accelerated artificial neural network for QSAR modeling

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)