17773

Posts

Nov, 16

Launch-time Optimization of OpenCL Kernels

OpenCL kernels are compiled first before kernel arguments and launch geometry are provided later at launch time. Although some of these values remain constant during execution, the compiler is unable to optimize for them since it has no access to them. We propose and implement a novel approach that identifies such arguments, geometry, and optimizations […]
Nov, 16

Deep learning for galaxy surface brightness profile fitting

Numerous ongoing and future large area surveys (e.g. DES, EUCLID, LSST, WFIRST), will increase by several orders of magnitude the volume of data that can be exploited for galaxy morphology studies. The full potential of these surveys can only be unlocked with the development of automated, fast and reliable analysis methods. In this paper we […]
Nov, 16

Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation

Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent a powerful and affordable tool for scientists who look to speed up simulations of complex systems. However, porting code to such devices requires a detailed understanding of heterogeneous programming tools and effective strategies for parallelization. In this paper we present a source to source compilation […]
Nov, 16

Accelerating HPC codes on Intel(R) Omni-Path Architecture networks: From particle physics to Machine Learning

We discuss practical methods to ensure near wirespeed performance from clusters with either one or two Intel(R) Omni-Path host fabric interfaces (HFI) per node, and Intel(R) Xeon Phi(TM) 72xx (Knight’s Landing) processors, and using the Linux operating system. The study evaluates the performance improvements achievable and the required programming approaches in two distinct example problems: […]
Nov, 12

GPU computing and Many Integrated Core Computing (PDP), 2018

TOPICS: * GPU computing, multi GPU processing, hybrid computing * Programming models, programming frameworks, CUDA, OpenCL, communication libraries * Mechanisms for mapping codes * Task allocation * Fault tolerance * Performance analysis * Many Integrated Core architecture, MIC * Intel coprocessor, Xeon Phi * Vectorization * Applications: image processing, signal processing, linear algebra, numerical simulation, […]
Nov, 12

Vectorized algorithm for multidimensional Monte Carlo integration on modern GPU, CPU and MIC architectures

The aim of this paper is to show that the multidimensional Monte Carlo integration can be efficiently implemented on computers with modern multicore CPUs and manycore accelerators including Intel MIC and GPU architectures using a new vectorized version of LCG pseudorandom number generator which requires limited amount of memory. We introduce two new implementations of […]
Nov, 12

Low-power System-on-Chip Processors for Energy Efficient High Performance Computing: The Texas Instruments Keystone II

The High Performance Computing (HPC) community recognizes energy consumption as a major problem. Extensive research is underway to identify means to increase energy efficiency of HPC systems including consideration of alternative building blocks for future systems. This thesis considers one such system, the Texas Instruments Keystone II, a heterogeneous Low-Power System-on-Chip (LPSoC) processor that combines […]
Nov, 12

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

We present a highly scalable Monte Carlo (MC) 3D photon transport simulation platform designed for heterogeneous computing systems. By developing a massively parallel MC algorithm using the OpenCL framework, this research extends our existing GPU-accelerated MC technique to a highly-scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel […]
Nov, 12

Best Practice Guide – GPGPU

Graphics Processing Units (GPUs) were originally developed for computer gaming and other graphical tasks, but for many years have been exploited for general purpose computing across a number of areas. They offer advantages over traditional CPUs because they have greater computational capability, and use high-bandwidth memory systems (where memory bandwidth is the main bottleneck for […]
Nov, 12

Performance Evaluation of Deep Learning Tools in Docker Containers

With the success of deep learning techniques in a broad range of application domains, many deep learning software frameworks have been developed and are being updated frequently to adapt to new hardware features and software libraries, which bring a big challenge for end users and system administrators. To address this problem, container techniques are widely […]
Nov, 7

Scalable Streaming Tools for Analyzing N-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass

Cosmological N-body simulations play a vital role in studying how the Universe evolves. To compare to observations and make scientific inference, statistic analysis on large simulation datasets, e.g., finding halos, obtaining multi-point correlation functions, is crucial. However, traditional in-memory methods for these tasks do not scale to the datasets that are forbiddingly large in modern […]
Nov, 7

Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and OpenMP 4, supporting different programming languages (mainly C/C++ and Fortran). There are also several compiler suites (clang, nvcc, PGI, XL) each […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: