high performance computing on graphics processing units: hgpu.org

Posts

Apr, 30

ICNet for Real-Time Semantic Segmentation on High-Resolution Images

We focus on the challenging task of realtime semantic segmentation in this paper. It finds many practical applications and yet is with fundamental difficulty of reducing a large portion of computation for pixel-wise label inference. We propose an compressed-PSPNet-based image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. […]

CUDA

Apr, 30

Automatic source code adaptation for heterogeneous platforms

The demise of frequency scaling, which is the easiest way to improve computing performance, in addition to the growing gap between CPU and memory speeds and the increase in arithmetic intensity in current problems, has given rise to a new range of devices created to improve performance. Heterogeneous Computing (HC), and many-cores are examples of […]

CUDA

Apr, 30

Accelerating Discrete Wavelet Transforms on Parallel Architectures

The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algorithms. Until recently, several studies have compared the performance of such transform on various shared-memory parallel architectures, especially on graphics processing units (GPUs). All these studies, however, considered only separable calculation schemes. We show that corresponding separable parts can be […]

OpenCL

•

OpenGL

Apr, 30

Low-complexity Distributed Tomographic Backprojection for large datasets

In this manuscript we present a fast GPU implementation for tomographic reconstruction of large datasets using data obtained at the Brazilian synchrotron light source. The algorithm is distributed in a cluster with 4 GPUs through a fast pipeline implemented in C programming language. Our algorithm is theoretically based on a recently discovered low complexity formula, […]

CUDA

Apr, 26

Developing a massive real-time crowd simulation framework on the GPU

Crowd simulations are used to imitate the behaviour of a large group of people. Such simulations are used in industries ranging from video-games to public security. In recent years, research has turned to the parallel nature of GPUs to simulate the behaviour of individuals in a crowd in parallel. This allows for real time visualisation […]

OpenCL

Apr, 26

Lattice Quantum Chromodynamics on Intel Xeon Phi based supercomputers

The aim of this master’s thesis project was to expand the QPhiX library for twisted-mass fermions with and without clover term. To this end, I continued work initiated by Mario Schrock et al. [63]. In writing this thesis, I was following two main goals. Firstly, I wanted to stress the intricate interplay of the four […]

Apr, 26

A Training Framework and Architectural Design for Distributed Deep Learning

Deep learning has recently gained a lot of attention on account of its incredible success in many complex data-driven applications, such as image classification. However, deep learning is quite user-hostile and is thus difficult to apply. For example, it is tricky and slow to train a large model which may consume a lot of memory. […]

CUDA

Apr, 26

OpenCL-Based FPGA Accelerator for 3D FDTD with Periodic and Absorbing Boundary Conditions

Finite difference time domain (FDTD) method is a very poplar way of numerically solving partial differential equations. FDTD has a low operational intensity so that the performances in CPUs and GPUs are often restricted by the memory bandwidth. Recently, deeply pipelined FPGA accelerators have shown a lot of success by exploiting streaming data flows in […]

OpenCL

Apr, 26

OpenCL JIT Compilation for Dynamic Programming Languages

Graphics Processor Units (GPUs) are powerful hardware to parallelize and speed-up applications. However, programming these devices is too complex for most users and the existing standards for GPU programming are available only for low-level languages such as C. Dynamic programming languages offer higher abstractions and functionality for many users. GPU programming is possible for dynamic […]

OpenCL

Apr, 23

4th International Conference on Biomedical and Bioinformatics Engineering (ICBBE), 2017

ICBBE 2017 is to bring together innovative academics and industrial experts in the field of Biomedical and Bioinformatics Engineering to a common forum. The primary goal of the conference is to promote research and developmental activities in Biomedical and Bioinformatics Engineering. Another goal is to promote scientific information interchange between researchers, developers, engineers, students, and […]

Apr, 23

9th International Conference on Signal Processing Systems (ICSPS), 2017

2017 9th International Conference on Signal Processing Systems (ICSPS 2017) is the main annual research conference aims to bring together top researchers around the world to exchange research results and address open issues in all aspects of Signal Processing Systems. Publication Two options: 1 Conference Proceedings, Ei Compendex and Scopus and submitted to be reviewed […]

Apr, 23

The 5th International conference on Control, Mechatronics and Automation (ICCMA), 2017

2017 The 5th International conference on Control, Mechatronics and Automation will be held in University of Alberta, Canada during October 11-13, 2017. ICCMA 2013 was held in Sydney, ICCMA 2014 was held in Dubai, ICCMA 2015 and ICCMA 2016 were both held in Barcelona. The idea of the conference is for the scientists, scholars, engineers […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

ICNet for Real-Time Semantic Segmentation on High-Resolution Images

Automatic source code adaptation for heterogeneous platforms

Accelerating Discrete Wavelet Transforms on Parallel Architectures

Low-complexity Distributed Tomographic Backprojection for large datasets

Developing a massive real-time crowd simulation framework on the GPU

Lattice Quantum Chromodynamics on Intel Xeon Phi based supercomputers

A Training Framework and Architectural Design for Distributed Deep Learning

OpenCL-Based FPGA Accelerator for 3D FDTD with Periodic and Absorbing Boundary Conditions

OpenCL JIT Compilation for Dynamic Programming Languages

4th International Conference on Biomedical and Bioinformatics Engineering (ICBBE), 2017

9th International Conference on Signal Processing Systems (ICSPS), 2017

The 5th International conference on Control, Mechatronics and Automation (ICCMA), 2017

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)