high performance computing on graphics processing units: hgpu.org

Posts

Aug, 4

TREES: A CPU/GPU Task-Parallel Runtime with Explicit Epoch Synchronization

We have developed a task-parallel runtime system, called TREES, that is designed for high performance on CPU/GPU platforms. On platforms with multiple CPUs, Cilk’s "work-first" principle underlies how task-parallel applications can achieve performance, but work-first is a poor fit for GPUs. We build upon work-first to create the "work-together" principle that addresses the specific strengths […]

OpenCL

Aug, 4

A survey of sparse matrix-vector multiplication performance on large matrices

We contribute a third-party survey of sparse matrix-vector (SpMV) product performance on industrial-strength, large matrices using: (1) The SpMV implementations in Intel MKL, the Trilinos project (Tpetra subpackage), the CUSPARSE library, and the CUSP library, each running on modern architectures. (2) NVIDIA GPUs and Intel multi-core CPUs (supported by each software package). (3) The CSR, […]

CUDA

Aug, 4

Programming Embedded Manycore: Refinement and Optimizing Compilation of a Parallel Action Language for Hierarchical State Machines

Modeling languages propose convenient abstractions and transformations to handle the com- plexity of today’s embedded systems. Based on the formalism of Hierarchical State Machine, they enable the expression of hierarchical control parallelism. However, they face two importants challenges when it comes to model data-intensive applications: no unified approach that also accounts for data-parallel actions; and […]

OpenCL

Aug, 4

A Gb/s Parallel Block-based Viterbi Decoder for Convolutional Codes on GPU

In this paper, we propose a parallel block-based Viterbi decoder (PBVD) on the graphic processing unit (GPU) platform for the decoding of convolutional codes. The decoding procedure is simplified and parallelized, and the characteristic of the trellis is exploited to reduce the metric computation. Based on the compute unified device architecture (CUDA), two kernels with […]

CUDA

Aug, 1

The ANTAREX Approach to Autotuning and Adaptivity for Energy Efficient HPC Systems

The ANTAREX project aims at expressing the application self-adaptivity through a Domain Specific Language (DSL) and to run-time manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application […]

OpenCL

Aug, 1

Drug Drug Interaction Extraction from Biomedical Literature Using Syntax Convolutional Neural Network

MOTIVATION: Detecting drug-drug interaction (DDI) has become a vital part of public health safety. Therefore, using text mining techniques to extract DDIs from biomedical literature has received great attentions. However, this research is still at an early stage and its performance has much room to improve. RESULTS: In this paper, we present a syntax convolutional […]

CUDA

Aug, 1

3D visualization of astronomy data cubes using immersive displays

We report on an exploratory project aimed at performing immersive 3D visualization of astronomical data, starting with spectral-line radio data cubes from galaxies. This work is done as a collaboration between the Department of Physics and Astronomy and the Department of Computer Science at the University of Manitoba. We are building our prototype using the […]

OpenGL

Aug, 1

Automatic Loop Partitioning for Heterogeneous Systems

In this work, we implement a tool that automatically partitions loops and then executes these partitions on heterogeneous systems. Partitioning a loop is the process of dividing a loop to form two or more new loops, each iterating over a portion of the original loops iteration space. A heterogeneous system is a system that is […]

CUDA

Aug, 1

Unified system of code transformation and execution for heterogeneous multi-core architectures

Heterogeneous architectures have been widely used in the domain of high performance computing. However developing applications on heterogeneous architectures is time consuming and error-prone because going from a single accelerator to multiple ones indeed requires to deal with potentially non-uniform domain decomposition, inter-accelerator data movements, and dynamic load balancing. The aim of this thesis is […]

OpenCL

Jul, 31

Perspectives of GPU computing in Science, 2016

A meeting to discuss and assess impacts and perspectives of GPU and many-core computing in various fields of scientific research. The meeting is focused on applications and developments, to share ideas and foster discussions on the invaluable off-the-shelf tools as well as dedicated solutions (hardware and software) that have helped in achieving outstanding scientific advances, […]

Jul, 31

The Second International Workshop on Pattern Recognition (IWPR), 2017

Publication Submitted and accepted papers will be published in the conference proceedings, which will be indexed by Ei, Scopus and ISI. Submission Methods Full Paper (publication and oral presentation) Abstract (oral presentation only) Electronic Submission System (.pdf) http://www.easychair.org/conferences/?conf=icopr2017

Jul, 30

Accelerating Database Query Processing on OpenCL-based FPGAs

The release of OpenCL support for FPGAs represents a significant improvement in extending database applications to the reconfigurable domain. Taking advantage of the programmability offered by the OpenCL HLS tool, an OpenCL database can be easily ported and re-designed for FPGAs. A single SQL query in these database systems usually consists of multiple operators, and […]

OpenCL