Jan, 15

The 2018 International Conference on High Performance Computing & Simulation (HPCS), 2018

The 2018 International Conference on High Performance Computing & Simulation (HPCS 2018) will be held on July 16 – 20, 2018 in Orléans, France (Tentative). Under the theme of “HPC and Modeling & Simulation for the 21st Century," HPCS 2018 will focus on a wide range of the state-of-the-art as well as emerging topics pertaining […]
Jan, 13

ImageCL: Language and source-to-source compiler for performance portability, load balancing, and scalability prediction on heterogeneous systems

Applications written for heterogeneous CPU-GPU systems often suffer from poor performance portability. Finding good work partitions can also be challenging as different devices are suited for different applications. This article describes ImageCL, a high-level domain-specific language and source-to-source compiler, targeting single system as well as distributed heterogeneous hardware. Initially targeting image processing algorithms, our framework […]
Jan, 13

Graph Processing on GPUs: A Survey

In the big data era, much real-world data can be naturally represented as graphs. Consequently, many application domains can be modeled as graph processing. Graph processing, especially the processing of the large scale graphs with the number of vertices and edges in the order of billions or even hundreds of billions, has attracted much attention […]
Jan, 13

pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment

BACKGROUND: Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of […]
Jan, 13

Deep In-GPU Experience Replay

Experience replay allows a reinforcement learning agent to train on samples from a large amount of the most recent experiences. A simple in-RAM experience replay stores these most recent experiences in a list in RAM, and then copies sampled batches to the GPU for training. I moved this list to the GPU, thus creating an […]
Jan, 13

High Performance Stencil Code Generation with Lift

Stencil computations are widely used from physical simulations to machine-learning. They are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing Units. Although stencil computations have been extensively studied, optimizing them for increasingly diverse hardware remains challenging. Domain Specific Languages (DSLs) have raised the programming abstraction and offer good performance. However, this places […]
Jan, 6

GPU Acceleration of a High-Order Discontinuous Galerkin Incompressible Flow Solver

We present a GPU-accelerated version of a high-order discontinuous Galerkin discretization of the unsteady incompressible Navier-Stokes equations. The equations are discretized in time using a semi-implicit scheme with explicit treatment of the nonlinear term and implicit treatment of the split Stokes operators. The pressure system is solved with a conjugate gradient method together with a […]
Jan, 6

Rubus: A compiler for seamless and extensible parallelism

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single […]
Jan, 6

ThunderSVM: A Fast SVM Library on GPUs and CPUs

Support Vector Machines (SVMs) are classic supervised learning models for classification, regression and distribution estimation. A survey conducted by Kaggle in 2017 shows that 26% of the data mining and machine learning practitioners are users of SVMs. However, SVM training and prediction are very expensive computationally for large and complex problems. This paper presents an […]
Jan, 6

Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer

We explore scaling of the standard distributed Tensorflow with GRPC primitives on up to 512 Intel Xeon Phi (KNL) nodes of Cori supercomputer with synchronous stochastic gradient descent (SGD), and identify causes of scaling inefficiency at higher node counts. To our knowledge, this is the first exploration of distributed GRPC Tensorflow scalability on a HPC […]
Jan, 6

Analysing the Performance of GPU Hash Tables for State Space Exploration

In the past few years, General Purpose Graphics Processors (GPUs) have been used to significantly speed up numerous applications. One of the areas in which GPUs have recently led to a significant speed-up is model checking. In model checking, state spaces, i.e., large directed graphs, are explored to verify whether models satisfy desirable properties. GPUexplore […]
Dec, 28

Using reconfigurable computing technology to accelerate matrix decomposition and applications

Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse […]
Page 1 of 94012345...102030...Last »

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: