17234

Posts

May, 18

Group Marching Tree: Sampling-Based Approximately Optimal Motion Planning on GPUs

This paper presents a novel approach, named the Group Marching Tree (GMT*) algorithm, to planning on GPUs at rates amenable to application within control loops, allowing planning in real-world settings via repeated computation of near-optimal plans. GMT*, like the Fast Marching Tree (FMT) algorithm, explores the state space with a "lazy" dynamic programming recursion on […]
May, 18

Efficient Parallel Methods for Deep Reinforcement Learning

We propose a novel framework for efficient parallelization of deep reinforcement learning algorithms, enabling these algorithms to learn from multiple actors on a single machine. The framework is algorithm agnostic and can be applied to on-policy, off-policy, value based and policy gradient based algorithms. Given its inherent parallelism, the framework can be efficiently implemented on […]
May, 18

Real-Time Adaptive Image Compression

We present a machine learning-based approach to lossy image compression which outperforms all existing codecs, while running in real-time. Our algorithm typically produces files 2.5 times smaller than JPEG and JPEG 2000, 2 times smaller than WebP, and 1.7 times smaller than BPG on datasets of generic images across all quality levels. At the same […]
May, 11

Block-Parallel IDA* for GPUs

We investigate GPU-based parallelization of Iterative-Deepening A* (IDA*). We show that straightforward thread-based parallelization techniques which were previously proposed for massively parallel SIMD processors perform poorly due to warp divergence and load imbalance. We propose Block-Parallel IDA* (BPIDA*), which assigns the search of a subtree to a block (a group of threads with access to […]
May, 11

Accelerating solutions of PDEs with GPU-based swept time-space decomposition

The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution difficult to achieve in reasonable amounts of time—even in effectively parallelized solvers. Though the scale of the problem […]
May, 11

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

In recent years deep learning algorithms have shown extremely high performance on machine learning tasks such as image classification and speech recognition. In support of such applications, various FPGA accelerator architectures have been proposed for convolutional neural networks (CNNs) that enable high performance for classification tasks at lower power than CPU and GPU processors. However, […]
May, 11

DeepMetabolism: A Deep Learning System to Predict Phenotype from Genome Sequencing

Life science is entering a new era of petabyte-level sequencing data. Converting such big data to biological insights represents a huge challenge for computational analysis. To this end, we developed DeepMetabolism, a biology-guided deep learning system to predict cell phenotypes from transcriptomics data. By integrating unsupervised pre-training with supervised training, DeepMetabolism is able to predict […]
May, 11

Resource-Aware Just-in-Time OpenCL Compiler for Coarse-Grained FPGA Overlays

FPGA vendors have recently started focusing on OpenCL for FPGAs because of its ability to leverage the parallelism inherent to heterogeneous computing platforms. OpenCL allows programs running on a host computer to launch accelerator kernels which can be compiled at run-time for a specific architecture, thus enabling portability. However, the prohibitive compilation times (specifically the […]
May, 9

Towards Enhancing Performance, Programmability, and Portability in Heterogeneous Computing

The proliferation of a diverse set of heterogeneous computing platforms in conjunction with the plethora of programming languages and optimization techniques on each language for each underlying architecture exacerbate widespread adoption of such platforms. This is especially true for novice programmers and the non-technical-savvy masses that are largely precluded from enjoying the advantages of high-performance […]
May, 9

Efficient Parallel Strategy Improvement for Parity Games

We study strategy improvement algorithms for solving parity games. While these algorithms are known to solve parity games using a very small number of iterations, experimental studies have found that a high step complexity causes them to perform poorly in practice. In this paper we seek to address this situation. Every iteration of the algorithm […]
May, 9

Fast Sorting Algorithms using AVX-512 on Intel Knights Landing

This paper describes fast sorting techniques using the recent AVX-512 instruction set. Our implementations benefit from the latest possibilities offered by AVX-512 to vectorize a two-parts hybrid algorithm: we sort the small arrays using a branch- free Bitonic variant, and we provide a vectorized partitioning kernel which is the main component of the well-known Quicksort. […]
May, 9

Acceleration of Deep Learning on FPGA

In recent years, deep convolutional neural networks (ConvNet) have shown their popularity in various real world applications. To provide more accurate results, the state-of-the-art ConvNet requires millions of parameters and billions of operations to process a single image, which represents a computational challenge for general purpose processors. As a result, hardware accelerators such as Graphic […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: