17135

Posts

Apr, 17

Parallel Multi Channel Convolution using General Matrix Multiplication

Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally intensive parts of CNNs are the convolutional layers, which convolve multi-channel images with multiple kernels. A common approach to implementing convolutional layers is to expand the image into a column matrix (im2col) […]
Apr, 17

GPU implementation of the Rosenbluth generation method for static Monte Carlo simulations

We present parallel version of Rosenbluth Self-Avoiding Walk generation method implemented on Graphics Processing Units (GPUs) using CUDA libraries. The method scales almost linearly with the number of CUDA cores and the method efficiency has only hardware limitations. The method is introduced in two realizations: on a cubic lattice and in real space. We find […]
Apr, 17

CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

Extracting per-frame features using convolutional neural networks for real-time processing of video data is currently mainly performed on powerful GPU-accelerated workstations and compute clusters. However, there are many applications such as smart surveillance cameras that require or would benefit from on-site processing. To this end, we propose and evaluate a novel algorithm for change-based evaluation […]
Apr, 15

Faster across the PCIe bus: A GPU library for lightweight decompression

This short paper present a collection of GPU lightweight decompression algorithms implementations within a FOSS library, Giddy – the first to be published to offer such function-ality. As the use of compression is important in ameliorating PCIe data transfer bottlenecks, we believe this library and its constituent implementations can serve as useful building blocks in […]
Apr, 15

Portable, high-performance containers for HPC

Building and deploying software on high-end computing systems is a challenging task. High performance applications have to reliably run across multiple platforms and environments, and make use of site-specific resources while resolving complicated software-stack dependencies. Containers are a type of lightweight virtualization technology that attempt to solve this problem by packaging applications and their environments […]
Apr, 15

Unfolding and Shrinking Neural Machine Translation Ensembles

Ensembling is a well-known technique in neural machine translation (NMT). Instead of a single neural net, multiple neural nets with the same topology are trained separately, and the decoder generates predictions by averaging over the individual models. Ensembling often improves the quality of the generated translations drastically. However, it is not suitable for production systems […]
Apr, 15

A Domain Specific Language for Performance Portable Molecular Dynamics Algorithms

Developers of Molecular Dynamics (MD) codes face significant challenges when adapting existing simulation packages to new hardware. In a continuously diversifying hardware landscape it becomes increasingly difficult for scientists to be experts both in their own domain (physics/chemistry/biology) and specialists in the low level parallelisation and optimisation of their codes. To address this challenge, we […]
Apr, 15

Parallelized Kendall’s Tau Coefficient Computation via SIMD Vectorized Sorting On Many-Integrated-Core Processors

Pairwise association measure is an important operation in data analytics. Kendall’s tau coefficient is one widely used correlation coefficient identifying non-linear relationships between ordinal variables. In this paper, we investigated a parallel algorithm accelerating all-pairs Kendall’s tau coefficient computation via single instruction multiple data (SIMD) vectorized sorting on Intel Xeon Phis by taking advantage of […]
Apr, 11

Acceleration of Linear Finite-Difference Poisson-Boltzmann Methods on Graphics Processing Units

Electrostatic interactions play crucial roles in biophysical processes such as protein folding and molecular recognition. Poisson-Boltzmann equation (PBE)-based models have emerged as widely used in modeling these important processes. Though great efforts have been put into developing efficient PBE numerical models, challenges still remain due to the high dimensionality of typical biomolecular systems. In this […]
Apr, 11

Machine Learning from Streaming Data in Heterogeneous Computing Environments

With the advent of many-core general-purpose processors (CPUs), the use of an increased number of cores has provided a certain speedup for algorithms that can be parallized. Nowadays, there are distributed and parallel data processing platforms, such as Apache Flink, which inherently makes use of parallel computing. On the other hand, graphics processors(GPUs) offers high […]
Apr, 11

Performance and energy optimization of the iterative solution of sparse linear systems on multicore processors

Large sparse systems of linear equations are ubiquitous problems in diverse scientific and engineering applications and big-data analytics. The interest of these applications and the fact that the solution of the linear system is usually a significant time-consuming stage has promoted the design and high-performance implementation of numerous matrix storage formats, algorithms, and libraries to […]
Apr, 11

A modular GPU raytracer using OpenCL for non-interactive graphics

We describe the development of a modular plugin based raytracer renderer called RenderGirl suitable for running inside the OpenCL framework. We aim to take advantage of heterogeneous computing devices such as GPUs and many-core CPUs, focusing on parallelism. We implemented the traditional partitioning scheme called bounding volume hierarchies, where each scene is hierarchically subdivided into […]
Page 2 of 91512345...102030...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: