high performance computing on graphics processing units: hgpu.org

Posts

Jul, 2

Efficient linear-scaling quantum transport calculations on graphics processing units and applications on electron transport in graphene

We implement, optimize, and validate the linear-scaling Kubo-Greenwood quantum transport simulation on graphics processing units by examining resonant scattering in graphene. We consider two practical representations of the Kubo-Greenwood formula: a Green-Kubo formula based on the velocity auto-correlation and an Einstein formula based on the mean square displacement. The code is fully implemented on graphics […]

CUDA

Jul, 2

GPU Accelerated Fluid Flow Computations Using the Latice Boltzmann Method

We propose a numerical implementation based on a Graphics Processing Unit (GPU) for the acceleration of the execution time of the Lattice Boltzmann Method. The performance analysis is based on three three-dimensional benchmark applications: Poisseuille flow, lid-driven cavity flow and flow in an elbow shaped domain. Three different, recently released GPU cards are considered for […]

CUDA

Jul, 1

vSMC: Parallel Sequential Monte Carlo in C++

Sequential Monte Carlo is a family of algorithms for sampling from a sequence of distributions. Some of these algorithms, such as particle filters, are widely used in the physics and signal processing researches. More recent developments have established their application in more general inference problems such as Bayesian modeling. These algorithms have attracted considerable attentions […]

OpenCL

Jul, 1

A Survey On Parallelization Of Data Mining Techniques

This paper contains the overview of various parallelization techniques to improve the performance of existing data mining algorithms and make the capable of handling large amount of data. There are variety of techniques to achieve the parallelization in data mining field, in this paper a brief introduction to few of the popular techniques is presented. […]

CUDA

•

OpenCL

Jul, 1

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory […]

Jul, 1

Exploiting multi-level parallelism in streaming applications for heterogeneous platforms with GPUs

Heterogeneous computing platforms support the traditional types of parallelism, such as e.g., instruction-level, data, task, and pipeline parallelism, and provide the opportunity to exploit a combination of different types of parallelism at different platform levels. The architectural diversity of platform components makes tapping into the platform potential a challenging programming task. This thesis makes an […]

CUDA

Jul, 1

Towards Performance-Portable, Scalable, and Convenient Linear Algebra

The rise of multi- and many-core architectures also gave birth to a plethora of new parallel programming models. Among these, the open industry standard OpenCL addresses this heterogeneity of programming environments by providing a unified programming framework. The price to pay, however, is that OpenCL requires additional low-level boilerplate code, when compared to vendor-specific solutions, […]

OpenCL

Jun, 30

Cropped Quad-Tree Based Solid Object Colouring with CUDA

In this study, surfaces of solid objects are coloured with Cropped Quad-Tree method utilizing GPU computing optimization. There are numerous methods used in solid object colouring. When the studies carried out in different fields are taken into consideration, it is seen that quad-tree method displays a prominent position in terms of speed and performance. Cropped […]

CUDA

•

OpenGL

Jun, 30

Accelerating SELECT WHERE and SELECT JOIN Queries on a GPU

This paper presents implementations of a few selected SQL operations using the CUDA programming framework on the GPU platform. Nowadays, the GPU’s parallel architectures give a high speed-up on certain problems. Therefore, the number of non-graphical problems that can be run and sped-up on the GPU still increases. Especially, there has been a lot of […]

CUDA

Jun, 30

HadoopCL: MapReduce on Distributed Heterogeneous Platforms Through Seamless Integration of Hadoop and OpenCL

As the scale of high performance computing systems grows, three main challenges arise: the programmability, reliability, and energy efficiency of those systems. Accomplishing all three without sacrificing performance requires a rethinking of legacy distributed programming models and homogeneous clusters. In this work, we integrate Hadoop MapReduce with OpenCL to enable the use of heterogeneous processors […]

OpenCL

Jun, 30

Intel Xeon Phi Coprocessor High-Performance Programming

This book is useful even before you ever touch a system with an Intel Xeon Phi coprocessor. To ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi coprocessors, or other high performance microprocessors. Applying these […]

Jun, 30

Best Practice Guide – Intel Xeon Phi

This best practice guide provides information about Intel’s MIC architecture and programming models for the Intel Xeon Phi coprocessor in order to enable programmers to achieve good performance of their applications. The guide covers a wide range of topics from the description of the hardware of the Intel Xeon Phi coprocessor through information about the […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Efficient linear-scaling quantum transport calculations on graphics processing units and applications on electron transport in graphene

GPU Accelerated Fluid Flow Computations Using the Latice Boltzmann Method

vSMC: Parallel Sequential Monte Carlo in C++

A Survey On Parallelization Of Data Mining Techniques

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Exploiting multi-level parallelism in streaming applications for heterogeneous platforms with GPUs

Towards Performance-Portable, Scalable, and Convenient Linear Algebra

Cropped Quad-Tree Based Solid Object Colouring with CUDA

Accelerating SELECT WHERE and SELECT JOIN Queries on a GPU

HadoopCL: MapReduce on Distributed Heterogeneous Platforms Through Seamless Integration of Hadoop and OpenCL

Intel Xeon Phi Coprocessor High-Performance Programming

Best Practice Guide – Intel Xeon Phi

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)