18987

Posts

Jul, 10

GPU-based Parallel Computation Support for Stan

This paper details an extensible OpenCL framework that allows Stan to utilize heterogeneous compute devices. It includes GPU-optimized routines for the Cholesky decomposition, its derivative, other matrix algebra primitives and some commonly used likelihoods, with more additions planned for the near future. Stan users can now benefit from speedups offered by GPUs with little effort […]
Jul, 10

Optimizing Xeon Phi for Interactive Data Analysis

The Intel Xeon Phi manycore processor is designed to provide high performance matrix computations of the type often performed in data analysis. Common data analysis environments include Matlab, GNU Octave, Julia, Python, and R. Achieving optimal performance of matrix operations within data analysis environments requires tuning the Xeon Phi OpenMP settings, process pinning, and memory […]
Jul, 10

PANNA: Properties from Artificial Neural Network Architectures

Prediction of material properties from first principles is often a computationally expensive task. Recently, artificial neural networks and other machine learning approaches have been successfully employed to obtain accurate models at a low computational cost by leveraging existing example data. Here, we present a software package "Properties from Artificial Neural Network Architectures" (PANNA) that provides […]
Jul, 7

Exploring Portability and Performance of OpenCL FPGA Kernels on Intel HARPv2

FPGAs offer a heterogenous compute solution to the continuous desire for increased performance by enabling the creation of applicationspecific hardware that accelerates computation. While the barrier to entry has historically been steep, advances in High Level Synthesis (HLS) are making FPGAs more accessible. Specifically, the Intel FPGA OpenCL SDK allows software designers to abstract away […]
Jul, 7

Efficient Spatial Anti-Aliasing Rendering for Line Joins on Vector Maps

The spatial anti-aliasing technique for line joins (intersections of the road segments) on vector maps is exclusively crucial to visual experience and system performance. Due to limitations of OpenGL API, one common practice to achieve the anti-aliased effect is splicing multiple triangles at varying scale levels to approximate the fan-shaped line joins. However, this approximation […]
Jul, 7

Novel Methodologies for Predictable CPU-To-GPU Command Offloading

There is an increasing industrial and academic interest towards a more predictable characterization of real-time tasks on high-performance heterogeneous embedded platforms, where a host system offloads parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). In this paper, we analyze an important aspect that has not yet been considered in the […]
Jul, 7

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Modern deep learning applications urge to push the model inference taking place at the edge devices for multiple reasons such as achieving shorter latency, relieving the burden of the network connecting to the cloud, and protecting user privacy. The Convolutional Neural Network (CNN) is one of the most widely used model family in the applications. […]
Jul, 7

FusionAccel: A General Re-configurable Deep Learning Inference Accelerator on FPGA for Convolutional Neural Networks

The deep learning accelerator is one of the methods to accelerate deep learning network computations, which is mainly based on convolutional neural network acceleration. To address the fact that concurrent convolutional neural network accelerators are not solely open-source and the exclusiveness of platforms, FusionAccel, a scalable convolutional neural network accelerator hardware architecture with supporting software […]
Jul, 4

Semantic Product Search

We study the problem of semantic matching in product search, that is, given a customer query, retrieve all semantically related products from the catalog. Pure lexical matching via an inverted index falls short in this respect due to several factors: a) lack of understanding of hypernyms, synonyms, and antonyms, b) fragility to morphological variants (e.g. […]
Jul, 4

PIConGPU: Predictive Simulations of Laser-Particle Accelerators with Manycore Hardware

The presented thesis establishes simulations on modern massively parallel computing hardware to investigate relativistic laser-driven plasmas. The latter are of special interest as they may provide a compact source for energetic ion beams. Computer simulations provide valuable insight into ultrafast plasma processes, evolving in the ultrahigh intensity (l_0 >> 1018 W/cm^2) focus of the ultrashort […]
Jul, 4

Themis: Fair and Efficient GPU Cluster Scheduling for Machine Learning Workloads

Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads while ensuring overall cluster efficiency. We find that established cluster scheduling disciplines that provide instantaneous fair […]
Jun, 30

Automated Generation of OpenCL Programs Based on Algebra-Algorithmic Approach

The paper proposes the further development of algebra-algorithmic design and synthesis tools towards the development of OpenCL programs. The method for semi-automatic parallelization of cyclic operators is proposed. The particular feature of the approach consists in using high-level algebraalgorithmic program specifications (schemes) and rewriting rules technique. The developed tools provide the construction of parallel algorithm […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: