15439
Deepak Majeti
With the end of Dennard scaling and emergence of dark silicon, the bets are high on heterogeneous architectures to achieve both application performance and energy efficiency. However, diversity in heterogeneous architectures poses severe programming challenges in terms of data layout, memory coherence, task partitioning, data distribution, and sharing of virtual addresses. Existing high-level programming languages […]
Rajesh Gandham
This thesis presents high-order numerical methods for time-dependent simulations of oceanic wave propagation on modern many-core hardware architecture. Simulation of the waves such as tsunami, is challenging because of the varying fluid depths, propagation in many regions, requirement of high resolution near the shore, complex nonlinear wave phenomenon, and necessity of faster than real-time predictions. […]
View View   Download Download (PDF)   
Rafael Asenjo, Angeles Navarro, Andres Rodriguez, Jose Nunez-Yanez
In this paper we evaluate the performance and energy effectiveness of FPGA and CPU devices for a kind of parallel computing applications in which the workload can be distributed in a way that enables simultaneous computing in addition to simple off loading. The FPGA device is programmed via OpenCL using the recent availability of commercial […]
View View   Download Download (PDF)   
Anton Lokhmotov, Grigori Fursin
Designing faster, more energy efficient and reliable computer systems requires effective collaboration between hardware designers, system programmers and performance analysts, as well as feedback from system users. We present Collective Knowledge (CK), an open framework for reproducible and collaborative design and optimization. CK enables systematic and reproducible experimentation, combined with leading edge predictive analytics to […]
View View   Download Download (PDF)   
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu
We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. […]
View View   Download Download (PDF)   
Gang Mei, Hong Tian
This paper focuses on evaluating the impact of different data layouts on the computational efficiency of GPU-accelerated Inverse Distance Weighting (IDW) interpolation algorithm. First we redesign and improve our previous GPU implementation that was performed by exploiting the feature of CUDA dynamic parallelism (CDP). Then we implement three versions of GPU implementations, i.e., the naive […]
Milan Ceska, Petr Pilar, Nicola Paoletti, Lubos Brim, Marta Kwiatkowska
In this paper we present PRISM-PSY, a novel tool that performs precise GPU-accelerated parameter synthesis for continuous-time Markov chains and time-bounded temporal logic specifications. We redesign, in terms of matrix-vector operations, the recently formulated algorithms for precise parameter synthesis in order to enable effective dataparallel processing, which results in significant acceleration on many-core architectures. High […]
View View   Download Download (PDF)   
Jinhwan Park, Wonyong Sung
Deep neural networks (DNNs) demand a very large amount of computation and weight storage, and thus efficient implementation using special purpose hardware is highly desired. In this work, we have developed an FPGA based fixed-point DNN system using only on-chip memory not to access external DRAM. The execution time and energy consumption of the developed […]
View View   Download Download (PDF)   
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally
State-of-the art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware can help the computation, fetching the weights from DRAM can be as much as two orders of magnitude […]
View View   Download Download (PDF)   
Zeke Wang, Bingsheng He, Wei Zhang, Shunning Jiang
Recently, FPGA vendors such as Altera and Xilinx have released OpenCL SDK for programming FPGAs. However, the architecture of FPGA is significantly different from that of CPU/GPU, for which OpenCL is originally designed. Tuning the OpenCL code for good performance on FPGAs is still an open problem, since the existing OpenCL tools and models designed […]
View View   Download Download (PDF)   
Patrick O. Glauner
Inspired by recent successes of deep learning in computer vision, we propose a novel application of deep convolutional neural networks to facial expression recognition, in particular smile recognition. A smile recognition test accuracy of 99.45% is achieved for the Denver Intensity of Spontaneous Facial Action (DISFA) database, significantly outperforming existing approaches based on hand-crafted features […]
View View   Download Download (PDF)   
Ashwin Trikuta Srinath
Compact finite difference schemes are widely used in the direct numerical simulation of fluid flows for their ability to better resolve the small scales of turbulence. However, they can be expensive to evaluate and difficult to parallelize. In this work, we present an approach for the computation of compact finite differences and similar tridiagonal schemes […]
View View   Download Download (PDF)   
Page 1 of 54412345...102030...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1746 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

371 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: