16018

Posts

Jun, 21

cltorch: a Hardware-Agnostic Backend for the Torch Deep Neural Network Library, Based on OpenCL

This paper presents cltorch, a hardware-agnostic backend for the Torch neural network framework. cltorch enables training of deep neural networks on GPUs from diverse hardware vendors, including AMD, NVIDIA, and Intel. cltorch contains sufficient implementation to run models such as AlexNet, VGG, Overfeat, and GoogleNet. It is written using the OpenCL language, a portable compute […]
Jun, 21

Acceleration of Statistical Detection of Zero-day Malware in the Memory Dump Using CUDA-enabled GPU Hardware

This paper focuses on the anticipatory enhancement of methods of detecting stealth software. Cyber security detection tools are insufficiently powerful to reveal the most recent cyber-attacks which use malware. In this paper, we will present first an idea of the highest stealth malware, as this is the most complicated scenario for detection because it combines […]
Jun, 21

A Parallel Algorithm for LZW Decompression, with GPU Implementation

The main contribution of this paper is to present a parallel algorithm for LZW decompression and to implement it in a CUDA-enabled GPU. Since sequential LZW decompression creates a dictionary table by reading codes in a compressed file one by one, its parallelization is not an easy task. We first present a parallel LZW decompression […]
Jun, 16

Electric potential and field calculation of charged BEM triangles and rectangles by Gaussian cubature

It is a widely held view that analytical integration is more accurate than the numerical one. In some special cases, however, numerical integration can be more advantageous than analytical integration. In our paper we show this benefit for the case of electric potential and field computation of charged triangles and rectangles applied in the boundary […]
Jun, 16

NCAM: Near-Data Processing for Nearest Neighbor Search

Deep down in many applications like natural language processing (NLP), vision, and robotics is a form of the k-nearest neighbor search algorithm (kNN). The kNN algorithm is primarily bottlenecked by data movement, limiting throughput and incurring latency in these applications. While there do exist well bounded kNN approximations that improve the performance of kNN, these […]
Jun, 16

Splotch: porting and optimizing for the Xeon Phi

With the increasing size and complexity of data produced by large scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous High Performance Computing environments for increased throughput and efficiency. We focus on the porting and optimization of Splotch, a scalable visualization algorithm, to utilize […]
Jun, 16

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

We perform a study of the factors affecting training time in multi-device deep learning systems. Given a specification of a convolutional neural network, we study how to minimize the time to train this model on a cluster of commodity CPUs and GPUs. Our first contribution focuses on the single-node setting, in which we show that […]
Jun, 16

Multi-Tenant Virtual GPUs for Optimising Performance of a Financial Risk Application

Graphics Processing Units (GPUs) are becoming popular accelerators in modern High-Performance Computing (HPC) clusters. Installing GPUs on each node of the cluster is not efficient resulting in high costs and power consumption as well as underutilisation of the accelerator. The research reported in this paper is motivated towards the use of few physical GPUs by […]
Jun, 14

International Conference on Robotics and Machine Vision (ICRMV’16), 2016

Index: Scopus, Ei Compendex, Web of Science (CPCI), Inspec, Google Scholar, Microsoft Academic Search, etc. AGENDA: September 14, 2016: Registration & Conference Materials Collection September 15, 2016: Keynote Speeches & Participants’ Oral Presentation September 16, 2016: Visit PUBLICATION: ICRMV 2016 conference Proceedings CONTACT US: Ms.Janet Hsiao E-mail: icrmv@academic.net
Jun, 14

International Conference on Cybernetics, Robotics and Control (ICCRC’16), 2016

Publication: All accepted papers of CRC 2016 (Registered & Presented) will be collected in the conference proceedings, which will be indexed by EI and Scopus. Selected papers will be published in International Journal of Mechanical Engineering and Robotics Research, (ISSN: 2278-0149) which is Indexed by Index Corpernicus, Scopus (since 2016) etc. Contact: Ethell Shin E-mail: […]
Jun, 14

Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond

With the appearance of the heterogeneous platform OpenPower,many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGPUs, […]
Jun, 14

First Application of Lattice QCD to Pezy-SC Processor

Pezy-SC processor is a novel new architecture developed by Pezy Computing K. K. that has achieved large computational power with low electric power consumption. It works as an accelerator device similarly to GPGPUs. A programming environment that resembles OpenCL is provided. Using a hybrid parallel system "Suiren" installed at KEK, we port and tune a […]
Page 2 of 87412345...102030...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1927 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

432 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: