17252

Posts

May, 24

Espresso: Efficient Forward Propagation for BCNNs

There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented […]
May, 22

GPU System Call

GPUs are becoming first-class compute citizens and are being tasked to perform increasingly complex work. Modern GPUs increasingly support programmability-enhancing features such as shared virtual memory and hardware cache coherence, enabling them to run a wider variety of programs. But a key aspect of general-purpose programming where GPUs are still found lacking is the ability […]
May, 22

GPUMap: A Transparently GPU-Accelerated Map Function

As GPGPU computing becomes more popular, it will be used to tackle a wider range of problems. However, due to the current state of GPGPU programming, programmers are typically required to be familiar with the architecture of the GPU in order to effectively program it. Fortunately, there are software packages that attempt to simplify GPGPU […]
May, 22

MEDINA: MECCA Development in Accelerators – KPP Fortran to CUDA source-to-source Preprocessor

The global climate model ECHAM/MESSy Atmospheric Chemistry (EMAC) is a modular global model that simulates climate change and air quality scenarios. The application includes different sub-models for the calculation of chemical species concentrations, their interaction with land and sea, and the human interaction. The paper presents a source-to-source parser that enables support for Graphics Processing […]
May, 22

Inferring the Scheduling Policies of an Embedded CUDA GPU

Embedded systems augmented with graphics processing units (GPUs) are seeing increased use in safety-critical real-time systems such as autonomous vehicles. Due to monetary cost requirements along with size, weight, and power (SWaP) constraints, embedded GPUs are often computationally impoverished compared to those used in non-embedded systems. In order to maximize performance on these impoverished GPUs, […]
May, 22

Design of Hardware Accelerator for Lempel-Ziv 4 (LZ4) Compression

Hardware accelerators are being considered as important architectural components in the context of datacenter customization to achieve high performance and low power. Compression has played an important role in computer systems by enhancing storage and communication efficiency in the charge of extra computational cost. In this letter, we present a fully pipelined compression accelerator for […]
May, 18

CLBlast: A Tuned OpenCL BLAS Library

This work demonstrates how to accelerate dense linear algebra computations using CLBlast, an open-source OpenCL BLAS library providing optimized routines for a wide variety of devices. It is targeted at machine learning and HPC applications and thus provides a fast matrix-multiplication routine (GEMM) to accelerate the core of many applications (e.g. deep learning, iterative solvers, […]
May, 18

Fast GPU-Based Seismogram Simulation from Microseismic Events in Marine Environments Using Heterogeneous Velocity Models

A novel approach is presented for fast generation of synthetic seismograms due to microseismic events, using heterogeneous marine velocity models. The partial differential equations (PDEs) for the 3D elastic wave equation have been numerically solved using the Fourier domain pseudo-spectral method which is parallelizable on the graphics processing unit (GPU) cards, thus making it faster […]
May, 18

Group Marching Tree: Sampling-Based Approximately Optimal Motion Planning on GPUs

This paper presents a novel approach, named the Group Marching Tree (GMT*) algorithm, to planning on GPUs at rates amenable to application within control loops, allowing planning in real-world settings via repeated computation of near-optimal plans. GMT*, like the Fast Marching Tree (FMT) algorithm, explores the state space with a "lazy" dynamic programming recursion on […]
May, 18

Efficient Parallel Methods for Deep Reinforcement Learning

We propose a novel framework for efficient parallelization of deep reinforcement learning algorithms, enabling these algorithms to learn from multiple actors on a single machine. The framework is algorithm agnostic and can be applied to on-policy, off-policy, value based and policy gradient based algorithms. Given its inherent parallelism, the framework can be efficiently implemented on […]
May, 18

Real-Time Adaptive Image Compression

We present a machine learning-based approach to lossy image compression which outperforms all existing codecs, while running in real-time. Our algorithm typically produces files 2.5 times smaller than JPEG and JPEG 2000, 2 times smaller than WebP, and 1.7 times smaller than BPG on datasets of generic images across all quality levels. At the same […]
May, 11

Block-Parallel IDA* for GPUs

We investigate GPU-based parallelization of Iterative-Deepening A* (IDA*). We show that straightforward thread-based parallelization techniques which were previously proposed for massively parallel SIMD processors perform poorly due to warp divergence and load imbalance. We propose Block-Parallel IDA* (BPIDA*), which assigns the search of a subtree to a block (a group of threads with access to […]
Page 39 of 957« First...102030...3738394041...506070...Last »

Recent source codes

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: