high performance computing on graphics processing units: hgpu.org

Posts

Nov, 26

Computing the distance between two finite element solutions defined on different 3D meshes on a GPU

This article introduces a new method to efficiently compute the distance (i.e., L^p norm of the difference) between two functions supported by two different meshes of the same 3D domain. The functions that we consider are typically finite element solutions discretized in different function spaces supported by meshes that are potentially completely unrelated. Our method […]

OpenGL

Nov, 26

Efficient Target and Application Specific Selection and Ordering of Compiler Passes

Programmers usually rely on one from a set of optimizing compiler optimization level flags shipped with the compiler they are using to compile their source code. Those compiler flags represent fixed compiler pass sequences, and therefore in some situations better performance and/or other metrics such as code size can be achieved if using compiler sequences […]

CUDA

•

OpenCL

Nov, 26

GPU Pro 7: Advanced Rendering

The latest edition of this bestselling game development reference offers proven tips and techniques for the real-time rendering of special effects and visualization data that are useful for beginners and seasoned game and graphics programmers alike. Exploring recent developments in the rapidly evolving field of real-time rendering, GPU Pro 7: Advanced Rendering Techniques assembles a […]

CUDA

•

OpenGL

Nov, 21

A survey on graphic processing unit computing for large-scale data mining

General purpose computation using Graphic Processing Units (GPUs) is a well-established research area focusing on high-performance computing solutions for massively parallelizable and time-consuming problems. Classical methodologies in machine learning and data mining cannot handle processing of massive and high-speed volumes of information in the context of the big data era. GPUs have successfully improved the […]

CUDA

Nov, 21

Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR

Given their massively parallel computing capabilities heterogeneous architectures comprised of CPUs and accelerators have been increasingly used to speed-up scientific and engineering applications. Nevertheless, programming such architectures is a challenging task for most non-expert programmers as typical accelerator programming languages (e.g. CUDA and OpenCL) demand a thoroughly understanding of the underlying hardware to enable an […]

OpenCL

Nov, 21

Bitmap Filter: Speeding up Exact Set Similarity Joins with Bitwise Operations

The Exact Set Similarity Join problem aims to find all similar sets between two collections of sets, with respect to a threshold and a similarity function such as overlap, Jaccard, dice or cosine. The naive approach verifies all pairs of sets and it is often considered impractical due the high number of combinations. So, Exact […]

CUDA

Nov, 21

GPU Parallelization for Unstructured Sparse Matrix Problems with OpenMP 4.5 and OpenACC

The effective use of parallelized hardware is an important goal of today’s computer developments. Nvidia GPUs are an important footing in this context. While CUDA implemented algorithms focus on detailed optimized usage of GPU elements the pragma directive parallelization targets GPU computation for a broader community. In this paper we focus on the implementation of […]

CUDA

Nov, 21

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Deep learning and complex machine learning has quickly become one of the most important computationally intensive applications for a wide variety of fields. The combination of large data sets, high-performance computational capabilities, and evolving and improving algorithms has enabled many successful applications which were previously difficult or impossible to consider. This paper explores the challenges […]

OpenCL

Nov, 16

Hydra: a C++11 framework for data analysis in massively parallel platforms

Hydra is a header-only, templated and C++11-compliant framework designed to perform the typical bottleneck calculations found in common HEP data analyses on massively parallel platforms. The framework is implemented on top of the C++11 Standard Library and a variadic version of the Thrust library and is designed to run on Linux systems, using OpenMP, CUDA […]

CUDA

Nov, 16

Launch-time Optimization of OpenCL Kernels

OpenCL kernels are compiled first before kernel arguments and launch geometry are provided later at launch time. Although some of these values remain constant during execution, the compiler is unable to optimize for them since it has no access to them. We propose and implement a novel approach that identifies such arguments, geometry, and optimizations […]

OpenCL

Nov, 16

Deep learning for galaxy surface brightness profile fitting

Numerous ongoing and future large area surveys (e.g. DES, EUCLID, LSST, WFIRST), will increase by several orders of magnitude the volume of data that can be exploited for galaxy morphology studies. The full potential of these surveys can only be unlocked with the development of automated, fast and reliable analysis methods. In this paper we […]

Nov, 16

Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation

Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent a powerful and affordable tool for scientists who look to speed up simulations of complex systems. However, porting code to such devices requires a detailed understanding of heterogeneous programming tools and effective strategies for parallelization. In this paper we present a source to source compilation […]

OpenCL