high performance computing on graphics processing units: hgpu.org

Posts

Feb, 9

Interactive GPU active contours for segmenting inhomogeneous objects

We present a segmentation software package primarily targeting medical and biological applications, with a high level of visual feedback and several usability enhancements over existing packages. Specifically, we provide a substantially faster GPU implementation of the local Gaussian distribution fitting energy model, which can segment inhomogeneous objects with poorly defined boundaries as often encountered in […]

OpenCL

Feb, 9

Tuning Streamed Applications on Intel Xeon Phi: A Machine Learning Based Approach

Many-core accelerators, as represented by the XeonPhi coprocessors and GPGPUs, allow software to exploit spatial and temporal sharing of computing resources to improve the overall system performance. To unlock this performance potential requires software to effectively partition the hardware resource to maximize the overlap between hostdevice communication and accelerator computation, and to match the granularity […]

Feb, 3

REOH: Runtime Energy Optimization for Heterogeneous Systems

Significant efforts have been devoted to choosing the best configuration of a computing system to run an application energy efficiently. However, available tuning approaches mainly focus on homogeneous systems and are inextensible for heterogeneous systems which include several components (e.g., CPUs, GPUs) with different architectures. This study proposes a holistic tuning approach called REOH, based […]

OpenCL

Feb, 3

Accelerating recurrent neural network language model based online speech recognition system

This paper presents methods to accelerate recurrent neural network based language models (RNNLMs) for online speech recognition systems. Firstly, a lossy compression of the past hidden layer outputs (history vector) with caching is introduced in order to reduce the number of LM queries. Next, RNNLM computations are deployed in a CPU-GPU hybrid manner, which computes […]

CUDA

Feb, 3

A Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques

Developing efficient software and hardware has never been harder whether it is for a tiny IoT device or an Exascale supercomputer. Apart from the ever growing design and optimization complexity, there exist even more fundamental problems such as lack of interdisciplinary knowledge required for effective software/hardware co-design, and a growing technology transfer gap between academia […]

OpenCL

Feb, 3

Efficient SIMD Vectorization for Hashing in OpenCL

Hashing is at the core of many efficient database operators such as hash-based joins and aggregations. Vectorization is a technique that uses Single Instruction Multiple Data (SIMD) instructions to process multiple data elements at once. Applying vectorization to hash tables results in promising speedups for build and probe operations. However, vectorization typically requires intrinsics – […]

OpenCL

Feb, 3

Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning

The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires deep changes within each framework […]

Jan, 30

The 4th International Conference on Control, Automation and Robotics (ICCAR), 2018

ICCAR 2018 is a not-to-be-missed opportunity that distills the most current knowledge on a rapidly advancing discipline in one conference. Join key researchers and established professionals in the field of control, automation and robotics as they assess the current state-of-the-art and road-map crucial areas for future research. It will provide a valuable opportunity for researchers, […]

Jan, 30

6th International Conference on Sustainable Development (ICSD), 2018

The International Conference on Sustainable Development is organized by the European Center of Sustainable Development in collaboration with CIT University. The 6th ICSD 2018 is inspired from the critical challenge of human, environmental, and economic sustainability concerning the present and future generations in a global-scale context. The Conference venue is: Roma Eventi, Congress Center, Piazza […]

Jan, 30

3rd International Conference on Computer and Communication Systems (ICCCS), 2018

Published by: Accepted papers will be published in the conference proceedings, which will be submitted for inclusion into IEEE Xplore, submitted for indexing in EI Compendex and Scopus. The conference proceedings of ICCCS 2015 can be checked in IEEE Xplore Papers of ICCCS 2015 have been indexed by Ei Compendex, and Scopus The conference proceeding […]

Jan, 28

SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems

In this article we present SkePU 2, the next generation of the SkePU C++ skeleton programming framework for heterogeneous parallel systems. We critically examine the design and limitations of the SkePU 1 programming interface. We present a new, flexible and type-safe, interface for skeleton programming in SkePU 2, and a source-to-source transformation tool which knows […]

CUDA

•

OpenCL

Jan, 28

Revisiting Online Autotuning for Sparse-Matrix Vector Multiplication Kernels on Next-Generation Architectures

Sparse-Matrix Vector products (SpMV) are highly irregular computational kernels that can be found in a diverse collection of high-performance science applications. Performance for this important kernel is often highly correlated with the associated matrix sparsity, which, in turn, governs the computational granularity, and therefore, the efficiency of the memory system. In this paper, we propose […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Interactive GPU active contours for segmenting inhomogeneous objects

Tuning Streamed Applications on Intel Xeon Phi: A Machine Learning Based Approach

REOH: Runtime Energy Optimization for Heterogeneous Systems

Accelerating recurrent neural network language model based online speech recognition system

A Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques

Efficient SIMD Vectorization for Hashing in OpenCL

Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning

The 4th International Conference on Control, Automation and Robotics (ICCAR), 2018

6th International Conference on Sustainable Development (ICSD), 2018

3rd International Conference on Computer and Communication Systems (ICCCS), 2018

SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems

Revisiting Online Autotuning for Sparse-Matrix Vector Multiplication Kernels on Next-Generation Architectures

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)