16765

Posts

Nov, 25

PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI

In this paper we present a novel method for the correction of motion artifacts that are present in fetal Magnetic Resonance Imaging (MRI) scans of the whole uterus. Contrary to current slice-to-volume registration (SVR) methods, requiring an inflexible anatomical enclosure of a single investigated organ, the proposed patch-to-volume reconstruction (PVR) approach is able to reconstruct […]
Nov, 25

Efficient Kernel Synthesis for Performance Portable Programming

The diversity of microarchitecture designs in heterogeneous computing systems allows programs to achieve high performance and energy efficiency, but results in substantial software re-development cost for each type or generation of hardware. To mitigate this cost, a performance portable programming system is required. One fundamental difference between architectures that makes performance portability challenging is the […]
Nov, 25

dMath: Distributed Linear Algebra for DL

The paper presents a parallel math library, dMath, that demonstrates leading scaling when using intranode, internode, and hybrid-parallelism for deep learning (DL). dMath provides easy-to-use distributed primitives and a variety of domain-specific algorithms including matrix multiplication, convolutions, and others allowing for rapid development of scalable applications like deep neural networks (DNNs). Persistent data stored in […]
Nov, 23

Performance Analysis of CUDA and OpenCL By Implementation of Cryptographic Algorithms

This paper presents a Performance Analysis of CUDA and OpenCL. Three different cryptographic algorithms, i.e. DES, MD5, and SHA-1 have been selected as the benchmarks for extensive analysis of the performance gaps between the two. Our results show that, on the average scenario, CUDA performs 27% better than OpenCL while in the best case scenario […]
Nov, 23

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite increasing hardware flexibility and software programming toolchain maturity, high efficiency GPU programming remains difficult: it suffers from high complexity, […]
Nov, 23

Deep Tensor Convolution on Multicores

Deep convolutional neural networks (ConvNets) have become a de facto standard for image classification and segmentation problems. These networks have also had early success in the video domain, despite failing to capture motion continuity and other rich temporal correlations. Evidence has since emerged that extending ConvNets to 3-dimensions leads to state-of-the-art performance across a broad […]
Nov, 23

GA3C: GPU-based A3C for Deep Reinforcement Learning

We introduce and analyze the computational aspects of a hybrid CPU/GPU implementation of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. Our analysis concentrates on the critical aspects to leverage the GPU’s computational power, including the introduction of a system of queues and a dynamic scheduling […]
Nov, 23

Optimization and Evaluation of VLPL-S Particle-in-cell Code on Knights Landing

VLPL-S code is developed based on the particlein-cell (PIC) algorithm, which is the mainstream algorithm of plasma behavior research. In this paper, we report our early experience on porting and optimizing the VLPL-S particle-in-cell code on the Knights Landing. By applying general optimization methods such as memory access optimization, thread level parallelism and vectorization to […]
Nov, 22

High performance pattern matching and data remanence on graphics processing units

Pattern matching is an important task in a plethora of different fields ranging from computer science to medical application, but is also a resource consuming problem.With the increase in network link speed, and the tremendous amounts of data generated, serial pattern matching on Central Processing Unit (CPU) is close to being rendered obsolete. The ubiquitous […]
Nov, 22

Processing OLTP Workloads on Hybrid CPU/GPU Systems

In recent times there have been a plethora of researches done on the utilization of co-processors like GPU and FPGA in database management system (DBMS). The reason for this trend is that modern processors have reached a performance threshold. Two major factors that have led to this behaviour are Memory Wall and Power Wall. This […]
Nov, 22

SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

With recent advancing of Internet of Things (IoTs), it becomes very attractive to implement the deep convolutional neural networks (DCNNs) onto embedded/portable systems. Presently, executing the software-based DCNNs requires high-performance server clusters in practice, restricting their widespread deployment on the mobile devices. To overcome this issue, considerable research efforts have been conducted in the context […]
Nov, 22

GPU-accelerated Red Blood Cells Simulations with Transport Dissipative Particle Dynamics

Mesoscopic numerical simulations provide a unique approach for the quantification of the chemical influences on red blood cell functionalities. The transport Dissipative Particles Dynamics (tDPD) method can lead to such effective multiscale simulations due to its ability to simultaneously capture mesoscopic advection, diffusion, and reaction. In this paper, we present a GPU-accelerated red blood cell […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org