high performance computing on graphics processing units: hgpu.org

Posts

Dec, 10

Distributed learning of CNNs on heterogeneous CPU/GPU architectures

Convolutional Neural Networks (CNNs) have shown to be powerful classification tools in tasks that range from check reading to medical diagnosis, reaching close to human perception, and in some cases surpassing it. However, the problems to solve are becoming larger and more complex, which translates to larger CNNs, leading to longer training times that not […]

CUDA

•

OpenCL

Dec, 7

Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing

We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel Xeon Phi Knights Landing (KNL). The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that represents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather […]

Dec, 7

A tutorial on the implementations of linear image filters in CPU and GPU

This article presents an overview of the implementation of linear image filters in CPU and GPU. The main goal is to present a self contained discussion of different implementations and their background using tools from digital signal processing. First, using signal processing tools, we discuss different algorithms and estimate their computational cost. Then, we discuss […]

CUDA

Dec, 7

A programming framework for data streaming on the Xeon Phi

ALICE (A Large Ion Collider Experiment) is the dedicated heavy-ion detector studying the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider). After the second long shut-down of the LHC, the ALICE detector will be upgraded to cope with an interaction rate of 50 kHz in Pb-Pb collisions, […]

Dec, 7

MILC Code Performance on High End CPU and GPU Supercomputer Clusters

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on […]

CUDA

Dec, 7

Study of Bandwidth Partitioning for Co-executing GPU Kernels

Co-executing GPU kernels on a partitioned GPU has been shown to improve utilization efficiency of poorly scaling tasks. While kernels can be executed in parallel, data transfers to the GPU are serial which can negatively impact parallelism and predictability of the kernels.In this work we implement a fairness-based approach to memory transfers by chunking data […]

CUDA

Dec, 6

5th World Machine Learning and Deep Learning Congress, 2018

5th World Machine Learning and Deep Learning Congress welcome you to Machine Learning 2018 conference going to be held in Dubai, UAE during August 30-31, 2018 which unites brief keynote presentations, speaker talks, exhibitions, Symposiums, workshops. Machine Learning 2018 is the Congress which will be most visited by all the most innovative minds, practitioners, experts, […]

Dec, 3

HCudaBLAST: an implementation of BLAST on Hadoop and Cuda

The world of DNA sequencing has not only been a difficult field since it was first worked upon, but it is also growing at an exponential rate. The amount of data involved in DNA searching is huge, thereby normal tools or algorithms are not suitable to handle this degree of data processing. BLAST is a […]

CUDA

Dec, 3

Methods for GPU Acceleration of Big Data Applications

Big Data applications are trivially parallelizable because they typically consist of simple and straightforward operations performed on a large number of independent input records. GPUs appear to be particularly well suited for this class of applications given their high degree of parallelism and high memory bandwidth. However, a number of issues severely complicate matters when […]

CUDA

Dec, 3

Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study

Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architecture. While optimizing applications on CPUs, GPUs and first Xeon Phi’s has been largely studied in the last years, the new features in Knights Landing processors require […]

Dec, 3

A Hybrid-parallel Architecture for Applications in Bioinformatics

Since the advent of Next Generation Sequencing (NGS) technology, the amount of data from whole genome sequencing has been rising fast. In turn, the availability of these resources led to the tapping of whole new research fields in molecular and cellular biology, producing even more data. On the other hand, the available computational power is […]

CUDA

Dec, 3

STAR-RT: Visual attention for real-time video game playing

In this paper we present STAR-RT – the first working prototype of Selective Tuning Attention Reference (STAR) model and Cognitive Programs (CPs). The Selective Tuning (ST) model received substantial support through psychological and neurophysiological experiments. The STAR framework expands ST and applies it to practical visual tasks. In order to do so, similarly to many […]

OpenCL

•

OpenGL