high performance computing on graphics processing units: hgpu.org

Posts

Sep, 25

Automatic Software Synthesis from High-Level ForSyDe Models Targeting Massively Parallel Processors

In the past decade we have witnessed an abrupt shift to parallel computing subsequent to the increasing demand for performance and functionality that can no longer be satisfied by conventional paradigms. As a consequence, the abstraction gab between the applications and the underlying hardware increased, triggering both industry and academia in several research directions. This […]

CUDA

Sep, 25

Fast k-NNG construction with GPU-based quick multi-select

In this paper we describe a new brute force algorithm for building the k-Nearest Neighbor Graph (k-NNG). The k-NNG algorithm has many applications in areas such as machine learning, bioinformatics, and clustering analysis. While there are very efficient algorithms for data of low dimensions, for high dimensional data the brute force search is the best […]

CUDA

Sep, 25

The density matrix renormalization group algorithm on kilo-processor architectures: implementation and trade-offs

In the numerical analysis of strongly correlated quantum lattice models one of the leading algorithms developed to balance the size of the effective Hilbert space and the accuracy of the simulation is the density matrix renormalization group (DMRG) algorithm, in which the run-time is dominated by the iterative diagonalization of the Hamilton operator. As the […]

CUDA

Sep, 24

Evaluation of autoparallelization toolkits for commodity graphics hardware

In this paper we evaluate the performance of the OpenACC and Mint toolkits against C and CUDA implementations of the standard PolyBench test suite. Our analysis reveals that performance is similar in many cases, but that a certain set of code constructs impede the ability of Mint to generate optimal code. We then present some […]

CUDA

Sep, 24

Pipeline strategies to accelerate range query processing on a multi-GPU environment

Nowadays, similarity search is becoming a field of increasing interest because these kinds of methods can be applied to different areas in computer science and engineering, such as voice and image recognition, text retrieval, and many others. However, when processing large volumes of data, query response time can be quite high. In this case, it […]

CUDA

Sep, 24

Realtime Deformation of Constrained Meshes Using GPU

Constrained meshes play an important role in freeform architectural design, as they can represent panel layouts on freeform surfaces. It is challenging to perform realtime manipulation on such meshes, because all constraints need to be respected during the deformation while the shape quality needs to be maintained. This usually leads to nonlinear constrained optimization problems, […]

CUDA

Sep, 24

Parallel multi-agent path planning in dynamic environments for real-time applications

Current pathplanning algorithms are not efficient enough to provide optimal pathplanning in dynamic environments for a large number of agents in real time. Furthermore, there are no real-time algorithms that fully use the potential of parallelism. The goal of this thesis is to find a basis for such an algorithm. Based on the literature study, […]

CUDA

Sep, 24

GPU Based Massive Parallel Kawasaki Kinetics In Monte Carlo Modelling of Lipid Microdomains

This paper introduces novel method of simulation of lipid biomembranes based on Metropolis Hastings algorithm and Graphic Processing Unit computational power. Method gives up to 55 times computational boost in comparison to classical computations. Extensive study of algorithm correctness is provided. Analysis of simulation results and results obtained with classical simulation methodologies are presented.

CUDA

Sep, 23

Performance of OpenCL

OpenCL is a relatively new standard that supports computation on a variety of parallel architectures. The author was unable to find reliable information about performance of OpenCL programs on CPU’s in comparison to traditional parallel processing standards like OpenMP. This paper describes the results of an experiment that tries to answer the following question: "Which […]

OpenCL

Sep, 23

Multi-GPU Acceleration of Black-Scholes Equation based Option Pricing

In high-frequency trading of option, "milliseconds earn or lose millions", the computational speed of predicting option price is the crucial factor for option traders to efficiently decide the price and evaluate the corresponding risk.Black-Scholes equation is a mathematical equation describing the option pricing over time. Multi-GPU is a recently developed platform for high-performance computing, which […]

CUDA

Sep, 23

Improving Resource Utilization in Heterogeneous CPU-GPU Systems

Graphics processing units (GPUs) have attracted enormous interest over the past decade due to substantial increases in both performance and programmability. Programmers can potentially leverage GPUs for substantial performance gains, but at the cost of significant software engineering effort. In practice, most GPU applications do not effectively utilize all of the available resources in a […]

CUDA

•

OpenCL

Sep, 23

BenchFriend: Correlating the Performance of GPU Benchmarks

Graphics processing units (GPUs) have become an important platform for general-purpose computing, thanks to their high parallel throughput and high memory bandwidth. GPUs present significantly different architectures from CPUs and require specific mappings and optimizations to achieve high performance. This makes GPU workloads demonstrate application characteristics different from those of CPU workloads. It is critical […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Automatic Software Synthesis from High-Level ForSyDe Models Targeting Massively Parallel Processors

Fast k-NNG construction with GPU-based quick multi-select

The density matrix renormalization group algorithm on kilo-processor architectures: implementation and trade-offs

Evaluation of autoparallelization toolkits for commodity graphics hardware

Pipeline strategies to accelerate range query processing on a multi-GPU environment

Realtime Deformation of Constrained Meshes Using GPU

Parallel multi-agent path planning in dynamic environments for real-time applications

GPU Based Massive Parallel Kawasaki Kinetics In Monte Carlo Modelling of Lipid Microdomains

Performance of OpenCL

Multi-GPU Acceleration of Black-Scholes Equation based Option Pricing

Improving Resource Utilization in Heterogeneous CPU-GPU Systems

BenchFriend: Correlating the Performance of GPU Benchmarks

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)