high performance computing on graphics processing units: hgpu.org

Posts

Jul, 10

Many-Core Compiler Fuzzing

We address the compiler correctness problem for many-core systems through novel applications of fuzz testing to OpenCL compilers. Focusing on two methods from prior work, random differential testing and testing via equivalence modulo inputs (EMI), we present several strategies for random generation of deterministic, communicating OpenCL kernels, and an injection mechanism that allows EMI testing […]

OpenCL

Jul, 3

Modelling the Formation of Ordered Acentrosomal Microtubule Arrays

Acentrosomal microtubules are not bound to a microtubule organising centre yet are still able to form ordered arrays. Two clear examples of this behaviour are the acentrosomal apico-basal (side wall) array in epithelial cells and the parallel organisation of plant cortical microtubules. This research investigates their formation through mathematical modelling and Monte Carlo simulations with […]

OpenCL

Jun, 22

GPU accelerated spectral finite elements on all-hex meshes

This paper presents a spectral element finite element scheme that efficiently solves elliptic problems on unstructured hexahedral meshes. The discrete equations are solved using a matrix-free preconditioned conjugate gradient algorithm. An additive Schwartz two-scale preconditioner is employed that allows h-independence convergence. An extensible multi-threading programming API is used as a common kernel language that allows […]

CUDA

•

OpenCL

Jun, 17

Automatic Data Layout Optimizations for GPUs

Memory optimizations have became increasingly important in order to fully exploit the computational power of modern GPUs. The data arrangement has a big impact on the performance, and it is very hard for GPU programmers to identify a well-suited data layout. Classical data layout transformations include grouping together data fields that have similar access patterns, […]

OpenCL

Jun, 16

Perfect Hashing Structures for Parallel Similarity Searches

Seed-based heuristics have proved to be efficient for studying similarity between genetic databases with billions of base pairs. This paper focuses on algorithms and data structures for the filtering phase in seed-based heuristics, with an emphasis on efficient parallel GPU/manycores implementation. We propose a 2-stage index structure which is based on neighborhood indexing and perfect […]

OpenCL

Jun, 7

A Parallel Implementation of the Galerkin Method for Solving Partial Differential Equations on a Triangular Mesh

Finite Element Methods are techniques for estimating solutions to boundary value problems for partial differential equations from an approximating subspace. These methods are based on weak or variational forms of the BVP that require less of the problem functions than what the original PDE would suggest in terms of order of differentiability and continuity. In […]

OpenCL

Jun, 5

Accelerated Nodal Discontinuous Galerkin Simulations for Reverse Time Migration with Large Clusters

Improving both accuracy and computational performance of numerical tools is a major challenge for seismic imaging and generally requires specialized implementations to make full use of modern parallel architectures. We present a computational strategy for reverse-time migration (RTM) with accelerator-aided clusters. A new imaging condition computed from the pressure and velocity fields is introduced. The […]

CUDA

•

OpenCL

May, 20

Physically Based Rendering: Implementation of Path Tracer

The main topic of this thesis was to implement a computer program that can render photorealistic images by simulating the laws of physics. In practice the program builds an image by finding every possible path that a light ray can travel. Technique presented in this thesis will naturally simulate many physical phenomenons such as reflections, […]

OpenCL

•

OpenGL

May, 16

Efficient Resource Scheduling for Big Data Processing on Accelerator-based Heterogeneous Systems

The involvement of accelerators is becoming widespread in the field of heterogeneous processing, performing computation tasks through a wide range of applications. In this paper, we examine the heterogeneity in modern computing systems, particularly, how to achieve a good level of resource utilization and fairness, when multiple tasks with different load and computation ratios are […]

OpenCL

May, 15

Adaptive discrete cosine transform-based image compression method on a heterogeneous system platform using Open Computing Language

Discrete cosine transform (DCT) is one of the major operations in image compression standards and it requires intensive and complex computations. Recent computer systems and handheld devices are equipped with high computing capability devices such as a general-purpose graphics processing unit (GPGPU) in addition to the traditional multicores CPU. We develop an optimized parallel implementation […]

OpenCL

May, 7

SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters

We introduce SparkCL, an open source unified programming framework based on Java, OpenCL and the Apache Spark framework. The motivation behind this work is to bring unconventional compute cores such as FPGAs/GPUs/APUs/DSPs and future core types into mainstream programming use. The framework allows equal treatment of different computing devices under the Spark framework and introduces […]

OpenCL

May, 3

PyTransit: Fast and Easy Exoplanet Transit Modelling in Python

We present a fast and user friendly exoplanet transit light curve modelling package PyTransit, implementing optimised versions of the Gimen’ez and the Mandel & Agol transit models. The package offers an object-oriented Python interface to access the two models implemented natively in Fortran with OpenMP parallelisation. A partial OpenCL version of the quadratic Mandel-Agol model […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Many-Core Compiler Fuzzing

Modelling the Formation of Ordered Acentrosomal Microtubule Arrays

GPU accelerated spectral finite elements on all-hex meshes

Automatic Data Layout Optimizations for GPUs

Perfect Hashing Structures for Parallel Similarity Searches

A Parallel Implementation of the Galerkin Method for Solving Partial Differential Equations on a Triangular Mesh

Accelerated Nodal Discontinuous Galerkin Simulations for Reverse Time Migration with Large Clusters

Physically Based Rendering: Implementation of Path Tracer

Efficient Resource Scheduling for Big Data Processing on Accelerator-based Heterogeneous Systems

Adaptive discrete cosine transform-based image compression method on a heterogeneous system platform using Open Computing Language

SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters

PyTransit: Fast and Easy Exoplanet Transit Modelling in Python

Recent source codes

CuPBoP-AMD: Extending CUDA to AMD Platforms

Adopter: Automated Deep Learning Optimization via DSL-based Source Code Transformation

ROCm's implementation of Gromacs

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

Most viewed papers (last 30 days)