May, 12

EngineCL: Usability and Performance in Heterogeneous Computing

Heterogeneous systems composed by a CPU and a set of hardware accelerators have become one of the most common architectures today, thanks to their excellent performance and energy consumption. However, due to their heterogeneity they are very complex to program and even more to achieve performance portability on different devices. This paper presents EngineCL, a […]
May, 12

Modeling and Evaluation of Synchronous Stochastic Gradient Descent in Distributed Deep Learning on Multiple GPUs

With huge amounts of training data, deep learning has made great breakthroughs in many artificial intelligence (AI) applications. However, such large-scale data sets present computational challenges, requiring training to be distributed on a cluster equipped with accelerators like GPUs. With the fast increase of GPU computing power, the data communications among GPUs have become a […]
May, 12

Efficient Hardware Acceleration on SoC-FPGA with OpenCL

Field Programmable Gate Arrays (FPGAs) are taking over the conventional processors in the field of High Performance computing. With the advent of FPGA architectures and High level synthesis tools, FPGAs can now be easily used to accelerate computationally intensive applications like, e.g., AI and Cognitive computing. One of the advantages of raising the level of […]
May, 12

Parallel Programming for FPGAs

This book focuses on the use of algorithmic high-level synthesis (HLS) to build application-specific FPGA systems. Our goal is to give the reader an appreciation of the process of creating an optimized hardware design using HLS. Although the details are, of necessity, different from parallel programming for multicore processors or GPUs, many of the fundamental […]
May, 5

Simulating Quantum Computers Using OpenCL

I present QCGPU, an open source Rust library for simulating quantum computers. QCGPU uses the OpenCL framework to enable acceleration by devices such as GPUs, FPGAs and DSPs. I perform a number of optimizations including parallelizing operations such as the application of gates and the calculation of various state probabilities for the purpose of measurment. […]
May, 5

Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing

Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite […]
May, 5

Auto-tuning Streamed Applications on Intel Xeon Phi

Many-core accelerators, as represented by the XeonPhi coprocessors and GPGPUs, allow software to exploit spatial and temporal sharing of computing resources to improve the overall system performance. To unlock this performance potential requires software to effectively partition the hardware resource to maximize the overlap between hostdevice communication and accelerator computation, and to match the granularity […]
May, 5

Tiramisu: A Code Optimization Framework for High Performance Systems

This paper introduces Tiramisu, an optimization framework designed to generate efficient code for high-performance systems such as multicores, GPUs, FPGAs, distributed machines, or any combination of these. Tiramisu relies on a flexible representation based on the polyhedral model and introduces a novel four-level IR that allows full separation between algorithms, schedules, data-layouts and communication. This […]
May, 5

A Survey of ReRAM-based Architectures for Processing-in-memory and Neural Networks

As data movement operations and power-budget become key bottlenecks in the design of computing systems, the interest in unconventional approaches such as processing-in-memory (PIM) and machine learning (ML), especially neural network (NN) based accelerators has grown significantly. Resistive RAM (ReRAM) is a promising technology for efficiently architecting PIM and NN based accelerators due to its […]
Apr, 28

Multidimensional Parallelization for Streaming Text Processing Applications Based on Parabix Framework

Streaming text processing is important for transforming and analyzing the rapidly growing data in modern society. Unfortunately, text processing software written using the sequential byte-at-a-time processing model fails to take full advantage of the resources available on modern processors for many reasons, including significant branch misprediction penalties due to the input-dependent branching structures of text […]
Apr, 28

A Comparative Study on Exact Triangle Counting Algorithms on the GPU

We implement exact triangle counting in graphs on the GPU using three different methodologies: subgraph matching to a triangle pattern; programmable graph analytics, with a set-intersection approach; and a matrix formulation based on sparse matrix-matrix multiplies. All three deliver best-of-class performance over CPU implementations and over comparable GPU implementations, with the graph-analytic approach achieving the […]
Apr, 28

A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs

We propose and evaluate a novel strategy for tuning the performance of a class of stencil computations on Graphics Processing Units. The strategy uses a machine learning model to predict the optimal way to load data from memory followed by a heuristic that divides other optimizations into groups and exhaustively explores one group at a […]
Page 4 of 952« First...23456...102030...Last »

* * *

* * *

Featured events

Hida Takayama, Japan

The Third International Workshop on GPU Computing and AI (GCA), 2018

Nagoya University, Japan

The 5th International Conference on Power and Energy Systems Engineering (CPESE), 2018

MediaCityUK, Salford Quays, Greater Manchester, England

The 10th International Conference on Information Management and Engineering (ICIME), 2018

No. 1037, Luoyu Road, Hongshan District, Wuhan, China

The 4th International Conference on Control Science and Systems Engineering (ICCSSE), 2018

Nanyang Executive Centre in Nanyang Technological University, Singapore

The 2018 International Conference on Cloud Computing and Internet of Things (CCIOT’18), 2018

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: