17142

Posts

Apr, 20

Exploration of cyber-physical systems for GPGPU computer vision-based detection of biological viruses

This work presents a method for a computer vision-based detection of biological viruses in PAMONO sensor images and, related to this, methods to explore cyber-physical systems such as those consisting of the PAMONO sensor, the detection software, and processing hardware. The focus is especially on an exploration of Graphics Processing Units (GPU) hardware for "General-Purpose […]
Apr, 20

A hybrid CPU-GPU parallelization scheme of variable neighborhood search for inventory optimization problems

In this paper, we study various parallelization schemes for the Variable Neighborhood Search (VNS) metaheuristic on a CPU-GPU system via OpenMP and OpenACC. A hybrid parallel VNS method is applied to recent benchmark problem instances for the multi-product dynamic lot sizing problem with product returns and recovery, which appears in reverse logistics and is known […]
Apr, 20

Evaluation of GPU-based track-triggering for the CMS detector at CERN’s HL-LHC

In this work we present an evaluation of GPUs as a possible L1 Track Trigger for the High Luminosity LHC, effective after Long Shutdown 3 around 2025. The novelty lies in presenting an implementation based on calculations done entirely in software, in contrast to currently discussed solutions relying on specialized hardware, such as FPGAs and […]
Apr, 17

Random Finite Set Based Bayesian Filtering with OpenCL in a Heterogeneous Platform

While most filtering approaches based on random finite sets have focused on improving performance, in this paper, we argue that computation times are very important in order to enable real-time applications such as pedestrian detection. Towards this goal, this paper investigates the use of OpenCL to accelerate the computation of random finite set-based Bayesian filtering […]
Apr, 17

Investigation of heterogeneous computing through novel parallel programming platforms

The computational landscape is dominated by the use of a very high number of CPU resources; this has however provided diminishing returns in recent years, pushing for a paradigm shift in the choice for computational systems. The following work was aimed at determining the maturity of heterogeneous computer systems in terms of computational performance and […]
Apr, 17

Parallel Multi Channel Convolution using General Matrix Multiplication

Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally intensive parts of CNNs are the convolutional layers, which convolve multi-channel images with multiple kernels. A common approach to implementing convolutional layers is to expand the image into a column matrix (im2col) […]
Apr, 17

GPU implementation of the Rosenbluth generation method for static Monte Carlo simulations

We present parallel version of Rosenbluth Self-Avoiding Walk generation method implemented on Graphics Processing Units (GPUs) using CUDA libraries. The method scales almost linearly with the number of CUDA cores and the method efficiency has only hardware limitations. The method is introduced in two realizations: on a cubic lattice and in real space. We find […]
Apr, 17

CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

Extracting per-frame features using convolutional neural networks for real-time processing of video data is currently mainly performed on powerful GPU-accelerated workstations and compute clusters. However, there are many applications such as smart surveillance cameras that require or would benefit from on-site processing. To this end, we propose and evaluate a novel algorithm for change-based evaluation […]
Apr, 15

Portable, high-performance containers for HPC

Building and deploying software on high-end computing systems is a challenging task. High performance applications have to reliably run across multiple platforms and environments, and make use of site-specific resources while resolving complicated software-stack dependencies. Containers are a type of lightweight virtualization technology that attempt to solve this problem by packaging applications and their environments […]
Apr, 15

Faster across the PCIe bus: A GPU library for lightweight decompression

This short paper present a collection of GPU lightweight decompression algorithms implementations within a FOSS library, Giddy – the first to be published to offer such function-ality. As the use of compression is important in ameliorating PCIe data transfer bottlenecks, we believe this library and its constituent implementations can serve as useful building blocks in […]
Apr, 15

Unfolding and Shrinking Neural Machine Translation Ensembles

Ensembling is a well-known technique in neural machine translation (NMT). Instead of a single neural net, multiple neural nets with the same topology are trained separately, and the decoder generates predictions by averaging over the individual models. Ensembling often improves the quality of the generated translations drastically. However, it is not suitable for production systems […]
Apr, 15

A Domain Specific Language for Performance Portable Molecular Dynamics Algorithms

Developers of Molecular Dynamics (MD) codes face significant challenges when adapting existing simulation packages to new hardware. In a continuously diversifying hardware landscape it becomes increasingly difficult for scientists to be experts both in their own domain (physics/chemistry/biology) and specialists in the low level parallelisation and optimisation of their codes. To address this challenge, we […]
Page 2 of 91612345...102030...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: