high performance computing on graphics processing units: hgpu.org

Posts

Aug, 16

GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm

Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex […]

CUDA

Aug, 15

Parallel Gravitation Field Algorithm Based on the CUDA Platform

Gravitation Field Algorithm (GFA) is a simple but very effective heuristic search algorithm. This algorithm has obvious advantages in multimodal function optimization problems compared with SA and GA. However, when we want to get a more precise global optimal value, it needs a lot of initial dusts involved in computing, which causes a low efficiency […]

CUDA

Aug, 15

General Transformations for GPU Execution of Tree Traversals

With the advent of programmer-friendly GPU computing environments, there has been much interest in offloading workloads that can exploit the high degree of parallelism available on modern GPUs. Exploiting this parallelism and optimizing for the GPU memory hierarchy is well-understood for regular applications that operate on dense data structures such as arrays and matrices. However, […]

CUDA

Aug, 15

Programming Dense Linear Algebra Kernels on Vectorized Architectures

The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (GEMM) routine. This obsession is not without reason. Most, if not all, Level 3 Basic Linear Algebra Subroutines (BLAS) can be written in terms of GEMM, and many of the higher level linear algebra solvers’ (i.e., LU, Cholesky) performance depend on GEMM’s […]

Aug, 15

First experiences with the Intel MIC architecture at LRZ

With the rapidly growing demand for computing power new accelerator based architectures have entered the world of high performance computing since around 5 years. In particular GPGPUs have recently become very popular, however programming GPGPUs using programming languages like CUDA or OpenCL is cumbersome and error-prone. Trying to overcome these difficulties, Intel developed their own […]

Aug, 15

Detecting Data Races on OpenCL Kernels with Symbolic Execution

We present an automatic analysis technique for checking data races on OpenCL kernels. Our method defines symbolic execution techniques based on separation logic with suitable abstractions to automatically detect non-benign racy behaviours on kernels.

OpenCL

Aug, 14

Lattice Boltzmann Method for Simulating Turbulent Flows

The lattice Boltzmann method (LBM) is a relatively new method for fluid flow simulations, and is recently gaining popularity due to its simple algorithm and parallel scalability. Although the method has been successfully applied to a wide range of flow physics, its capabilities in simulating turbulent flow is still under-validated. Hence, in this project, a […]

CUDA

Aug, 14

The Yin and Yang of Processing Data Warehousing Queries on GPU Devices

Database community has made significant research efforts to optimize query processing on GPUs in the past few years. However, we can hardly find that GPUs have been truly adopted in major warehousing production systems. Preparing to merge GPUs to the warehousing systems, we have identified and addressed several critical issues in a three-dimensional study of […]

CUDA

•

OpenCL

Aug, 14

GPU Acceleration of a Basket Option Pricing Engine

One of the most important methods for pricing complex derivatives is Monte Carlo simulation. However, this method requires a large amount of computing resources for accurate estimates. Since Monte Carlo simulations used in derivatives pricing are often parallelisable, one way to reduce the computing time is to use GPUs, which allow many copies of the […]

CUDA

•

OpenCL

Aug, 14

A Haptic Device Interface for Medical Simulations using OpenCL

The project evaluates how well a haptic device can be used to interact with a visualization of volumetric data. Since the interface to the haptic device require explicit surface descriptions, triangles had to be constructed from the volumetric data. The algorithm used to extract these triangles is marching cubes. The triangles produced by marching cubes […]

OpenCL

Aug, 14

An Automated Video Surveillance System Using Viewpoint Feature Histogram and CUDA-enabled GPUs

This paper presents an automated video surveillance system which deals with content monitoring and activity change in the environment. We use Viewpoint Feature Histogram, an image descriptor for object recognition and pose estimation for purpose of monitoring in the surveillance system. In order to enhance the performance of the system, we exploit the GPU architecture […]

CUDA

Aug, 13

Exploiting the Parallelism of Heterogeneous Systems using Dataflow Graphs on Top of OpenCL

Programming heterogeneous systems has been greatly simplified by OpenCL, which provides a common low-level API for a large variety of compute devices. However, many low-level details, including data transfer, task scheduling, or synchronization, must still be managed by the application designer. Often, it is desirable to program heterogeneous systems in a higher-level language, making the […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm

Parallel Gravitation Field Algorithm Based on the CUDA Platform

General Transformations for GPU Execution of Tree Traversals

Programming Dense Linear Algebra Kernels on Vectorized Architectures

First experiences with the Intel MIC architecture at LRZ

Detecting Data Races on OpenCL Kernels with Symbolic Execution

Lattice Boltzmann Method for Simulating Turbulent Flows

The Yin and Yang of Processing Data Warehousing Queries on GPU Devices

GPU Acceleration of a Basket Option Pricing Engine

A Haptic Device Interface for Medical Simulations using OpenCL

An Automated Video Surveillance System Using Viewpoint Feature Histogram and CUDA-enabled GPUs

Exploiting the Parallelism of Heterogeneous Systems using Dataflow Graphs on Top of OpenCL

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)