14984

Posts

Nov, 24

Embedded Ensemble Propagation for Improving Performance, Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures

Quantifying simulation uncertainties is a critical component of rigorous predictive simulation. A key component of this is forward propagation of uncertainties in simulation input data to output quantities of interest. Typical approaches involve repeated sampling of the simulation over the uncertain input data, and can require numerous samples when accurately propagating uncertainties from large numbers […]
Nov, 20

Recurrent Neural Networks Hardware Implementation on FPGA

Recurrent Neural Networks (RNNs) have the ability to retain memory and learn data sequences, and are a recent breakthrough of machine learning. Due to the recurrent nature of RNNs, it is sometimes hard to parallelize all its computations on conventional hardware. CPUs do not currently offer large parallelism, while GPUs offer limited parallelism due to […]
Nov, 20

Supervised Hashing with Deep Neural Networks

In this paper, we propose training very deep neural networks (DNNs) for supervised learning of hash codes. Existing methods in this context train relatively "shallow" networks limited by the issues arising in back propagation (vanishing gradients) as well as computational efficiency. We propose a novel and efficient training algorithm inspired by alternating direction method of […]
Nov, 20

Large Scale Artificial Neural Network Training Using Multi-GPUs

This paper describes a method for accelerating large scale Artificial Neural Networks (ANN) training using multi-GPUs by reducing the forward and backward passes to matrix multiplication. We propose an out-of-core multi-GPU matrix multiplication and integrate the algorithm with the ANN training. The experiments demonstrate that our matrix multiplication algorithm achieves linear speedup on multiple inhomogeneous […]
Nov, 20

GPU-accelerated adjoint algorithmic differentiation

Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store […]
Nov, 20

GPU-Based Inverse Rendering With Multi-Objective Particle Swarm Optimization

We present a novel, GPU-accelerated per-pixel inverse rendering (IR) optimization algorithm based on Particle Swarm Optimization (PSO), IRPSO. IRPSO estimates the per-pixel scene attributes including reflectance properties of a 3D model, and is fast enough to do in situ visualization of the optimization in real-time. We utilize the GPU framebuffer as a computational domain, where […]
Nov, 13

Fast Neuromimetic Object Recognition using FPGA Outperforms GPU Implementations

Recognition of objects in still images has traditionally been regarded as a difficult computational problem. Although modern automated methods for visual object recognition have achieved steadily increasing recognition accuracy, even the most advanced computational vision approaches are unable to obtain performance equal to that of humans. This has led to the creation of many biologically-inspired […]
Nov, 13

GEMMbench: a framework for reproducible and collaborative benchmarking of matrix multiplication

The generic matrix-matrix multiplication (GEMM) is arguably the most popular computational kernel of the 20th century. Yet, surprisingly, no common methodology for evaluating GEMM performance has been established over the many decades of using GEMM for comparing architectures, compilers and ninja-class programmers. We introduce GEMMbench, a framework and methodology for evaluating performance of GEMM implementations. […]
Nov, 13

Accelerating Recommender Systems using GPUs

We describe GPU implementations of the matrix recommender algorithms CCD++ and ALS. We compare the processing time and predictive ability of the GPU implementations with existing multi-core versions of the same algorithms. Results on the GPU are better than the results of the multi-core versions (maximum speedup of 14.8).
Nov, 13

Accelerating Adaptive IDW Interpolation Algorithm on a Single GPU

This paper focuses on the design and implementing of GPU-accelerated Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm. The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the spatial points distribution pattern and achieve more accurate predictions than those by IDW. In this paper, we first […]
Nov, 13

A Survey Of Techniques for Architecting and Managing Asymmetric Multicore Processors

To meet the needs of diverse range of workloads, asymmetric multicore processors (AMPs) have been proposed, which feature cores of different microarchitecture or ISAs. However, given the diversity inherent in their design and application scenarios, several challenges need to be addressed to effectively architect AMPs and leverage their potential in optimizing both sequential and parallel […]
Nov, 12

Assembly-Free Structural Dynamics On CPU and GPU

Finite Element Analysis helps designers at the early stages of product design through simulation and behavioral prediction. This thesis is on transient finite element analysis, specifically, structural dynamics, where the behavior of a product due to time-dependent loads is desired. A critical computational challenge in structural dynamics is that it typically requires significant amounts of […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: