17774

Posts

Nov, 16

Hydra: a C++11 framework for data analysis in massively parallel platforms

Hydra is a header-only, templated and C++11-compliant framework designed to perform the typical bottleneck calculations found in common HEP data analyses on massively parallel platforms. The framework is implemented on top of the C++11 Standard Library and a variadic version of the Thrust library and is designed to run on Linux systems, using OpenMP, CUDA […]
Nov, 16

Launch-time Optimization of OpenCL Kernels

OpenCL kernels are compiled first before kernel arguments and launch geometry are provided later at launch time. Although some of these values remain constant during execution, the compiler is unable to optimize for them since it has no access to them. We propose and implement a novel approach that identifies such arguments, geometry, and optimizations […]
Nov, 16

Deep learning for galaxy surface brightness profile fitting

Numerous ongoing and future large area surveys (e.g. DES, EUCLID, LSST, WFIRST), will increase by several orders of magnitude the volume of data that can be exploited for galaxy morphology studies. The full potential of these surveys can only be unlocked with the development of automated, fast and reliable analysis methods. In this paper we […]
Nov, 16

Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation

Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent a powerful and affordable tool for scientists who look to speed up simulations of complex systems. However, porting code to such devices requires a detailed understanding of heterogeneous programming tools and effective strategies for parallelization. In this paper we present a source to source compilation […]
Nov, 16

Accelerating HPC codes on Intel(R) Omni-Path Architecture networks: From particle physics to Machine Learning

We discuss practical methods to ensure near wirespeed performance from clusters with either one or two Intel(R) Omni-Path host fabric interfaces (HFI) per node, and Intel(R) Xeon Phi(TM) 72xx (Knight’s Landing) processors, and using the Linux operating system. The study evaluates the performance improvements achievable and the required programming approaches in two distinct example problems: […]
Nov, 12

GPU computing and Many Integrated Core Computing (PDP), 2018

TOPICS: * GPU computing, multi GPU processing, hybrid computing * Programming models, programming frameworks, CUDA, OpenCL, communication libraries * Mechanisms for mapping codes * Task allocation * Fault tolerance * Performance analysis * Many Integrated Core architecture, MIC * Intel coprocessor, Xeon Phi * Vectorization * Applications: image processing, signal processing, linear algebra, numerical simulation, […]
Nov, 12

Vectorized algorithm for multidimensional Monte Carlo integration on modern GPU, CPU and MIC architectures

The aim of this paper is to show that the multidimensional Monte Carlo integration can be efficiently implemented on computers with modern multicore CPUs and manycore accelerators including Intel MIC and GPU architectures using a new vectorized version of LCG pseudorandom number generator which requires limited amount of memory. We introduce two new implementations of […]
Nov, 12

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

We present a highly scalable Monte Carlo (MC) 3D photon transport simulation platform designed for heterogeneous computing systems. By developing a massively parallel MC algorithm using the OpenCL framework, this research extends our existing GPU-accelerated MC technique to a highly-scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel […]
Nov, 12

Best Practice Guide – GPGPU

Graphics Processing Units (GPUs) were originally developed for computer gaming and other graphical tasks, but for many years have been exploited for general purpose computing across a number of areas. They offer advantages over traditional CPUs because they have greater computational capability, and use high-bandwidth memory systems (where memory bandwidth is the main bottleneck for […]
Nov, 12

Low-power System-on-Chip Processors for Energy Efficient High Performance Computing: The Texas Instruments Keystone II

The High Performance Computing (HPC) community recognizes energy consumption as a major problem. Extensive research is underway to identify means to increase energy efficiency of HPC systems including consideration of alternative building blocks for future systems. This thesis considers one such system, the Texas Instruments Keystone II, a heterogeneous Low-Power System-on-Chip (LPSoC) processor that combines […]
Nov, 12

Performance Evaluation of Deep Learning Tools in Docker Containers

With the success of deep learning techniques in a broad range of application domains, many deep learning software frameworks have been developed and are being updated frequently to adapt to new hardware features and software libraries, which bring a big challenge for end users and system administrators. To address this problem, container techniques are widely […]
Nov, 7

Radeon PRO Solid State Graphics (SSG) API User Manual

The Radeon Pro SSG software library enables peer-to-peer (P2P) data transfers between GPU and Radeon on board SSD devices. It allows a methodology to read OS file data from SSDs to OpenCL, OpenGL and DirectX buffers with very low-latency P2P communication. The development kit version of this library supports only the Microsoft Windows 10 operating […]
Page 4 of 938« First...23456...102030...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: