high performance computing on graphics processing units: hgpu.org

Posts

Jul, 22

An algebraic parallel treecode in arbitrary dimensions

We present a parallel treecode for fast kernel summation in high dimensions – a common problem in data analysis and computational statistics. Fast kernel summations can be viewed as approximation schemes for dense kernel matrices. Treecode algorithms (or simply treecodes) construct low-rank approximations of certain off-diagonal blocks of the kernel matrix. These blocks are identified […]

CUDA

Jul, 22

Generating Binary Optimal Codes Using Heterogeneous Parallel Computing

Generation of optimal codes is a well known problem in coding theory. Many computational approaches exist in the literature for finding record breaking codes. However generating codes with long lengths n using serial algorithms is computationally very expensive, for example the worst case time complexity of a Greedy algorithm is O(n4^n). In order to improve […]

CUDA

Jul, 20

Performance Analysis of GPU-Accelerated Filter-Based Source Finding for HI Spectral Line Image Data

Searching for sources of electromagnetic emission in spectral-line radio astronomy interferometric data is a computationally intensive process. Parallel programming techniques and High Performance Computing hardware may be used to improve the computational performance of a source finding program. However, it is desirable to further reduce the processing time of source finding in order to decrease […]

CUDA

•

OpenCL

Jul, 20

Accelerating a Movie Recommender System Using VirtualCL on a Heterogeneous GPU Cluster

Present day market offers a large number of movies which overwhelm people with choices. In order to quickly navigate through all the possible movies and find the interesting ones, the user can take advantage of recommender systems for movies. This thesis studies a movie recommender system which uses image processing and computer vision algorithms. The […]

OpenCL

Jul, 20

Parallel Programming in Actor-Based Applications via OpenCL

GPU and multicore hardware architectures are commonly used in many different application areas to accelerate problem solutions relative to single CPU architectures. The typical approach to accessing these hardware architectures requires embedding logic into the programming language used to construct the application; the two primary forms of embedding are: calls to API routines to access […]

OpenCL

Jul, 20

Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations

OpenCL is an open heterogeneous programming framework. Although OpenCL programs are functionally portable, it does not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus extensively used. However, locality concerns exposed in GPU-specific OpenCL code […]

OpenCL

Jul, 20

GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced […]

CUDA

•

OpenCL

Jul, 17

GPU-based visualization of domain-coloured algebraic Riemann surfaces

We examine an algorithm for the visualization of domain-coloured Riemann surfaces of plane algebraic curves. The approach faithfully reproduces the topology of the surface and also preserves some of its geometry. We discuss how the algorithm can be implemented efficiently in OpenGL with geometry shaders, and (less efficiently) even in WebGL with multiple render targets […]

OpenGL

Jul, 17

Scaling Monte Carlo Tree Search on Intel Xeon Phi

Many algorithms have been parallelized successfully on the Intel Xeon Phi coprocessor, especially those with regular, balanced, and predictable data access patterns and instruction flows. Irregular and unbalanced algorithms are harder to parallelize efficiently. They are, for instance, present in artificial intelligence search algorithms such as Monte Carlo Tree Search (MCTS). In this paper we […]

Jul, 17

Many-Core Architectures: Hardware-Software Optimization and Modeling Techniques

During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the […]

CUDA

Jul, 17

DeepFont: Identify Your Font from An Image

As font is one of the core design concepts, automatic font identification and similar font suggestion from an image or photo has been on the wish list of many designers. We study the Visual Font Recognition (VFR) problem, and advance the state-of-the-art remarkably by developing the DeepFont system. First of all, we build up the […]

CUDA

Jul, 17

Overhauling SC atomics in C11 and OpenCL

Despite the conceptual simplicity of sequential consistency (SC), the semantics of SC atomic operations and fences in the C11 and OpenCL memory models is subtle, leading to convoluted prose descriptions that translate to complex axiomatic formalisations. We conduct an overhaul of SC atomics in C11, reducing the associated axioms in both number and complexity. A […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

An algebraic parallel treecode in arbitrary dimensions

Generating Binary Optimal Codes Using Heterogeneous Parallel Computing

Performance Analysis of GPU-Accelerated Filter-Based Source Finding for HI Spectral Line Image Data

Accelerating a Movie Recommender System Using VirtualCL on a Heterogeneous GPU Cluster

Parallel Programming in Actor-Based Applications via OpenCL

Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations

GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

GPU-based visualization of domain-coloured algebraic Riemann surfaces

Scaling Monte Carlo Tree Search on Intel Xeon Phi

Many-Core Architectures: Hardware-Software Optimization and Modeling Techniques

DeepFont: Identify Your Font from An Image

Overhauling SC atomics in C11 and OpenCL

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)