Posts
Jul, 21
Abelian: A Compiler for Graph Analytics on Distributed, Heterogeneous Platforms
The trend towards processor heterogeneity and distributed-memory has significantly increased the complexity of parallel programming. In addition, the mix of applications that need to run on parallel platforms today is very diverse, and includes graph applications that typically have irregular memory accesses and unpredictable control-flow. To simplify the programming of graph applications on such platforms, […]
Jul, 21
ARC: Adaptive Ray-tracing with CUDA, a New Ray Tracing Code for Parallel GPUs
We present the methodology of a photon-conserving, spatially-adaptive, ray-tracing radiative transfer algorithm, designed to run on multiple parallel Graphic Processing Units (GPUs). Each GPU has thousands computing cores, making them ideally suited to the task of tracing independent rays. This ray-tracing implementation has speed competitive with approximate momentum methods, even with thousands of ionization sources, […]
Jul, 21
LeFlow: Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural Networks
Recent work has shown that Field-Programmable Gate Arrays (FPGAs) play an important role in the acceleration of Machine Learning applications. Initial specification of machine learning applications are often done using a high-level Python-oriented framework such as Tensorflow, followed by a manual translation to either C or RTL for synthesis using vendor tools. This manual translation […]
Jul, 15
Cross-Compiling Shading Languages
Shading languages are the major class of programming languages for a modern mainstream Graphics Processing Unit (GPU). The programs of those languages are called "Shaders" as they were originally used to describe shading characteristics for computer graphics applications. To make use of GPU accelerated shaders a sophisticated rendering Application Programming Interface (API) is required and […]
Jul, 15
Revisiting Online Autotuning for Sparse-Matrix Vector Multiplication Kernels on High-Performance Accelerators
Kokkos [1], [2] is a C++ programming model that offers the ability to write portable code that targets a wide degree of parallelism found in current HPC systems. It works by providing abstractions for parallel execution and data layouts that are mapped to different hardware resources during compilation. Some parameters, such as the size of […]
Jul, 15
CloudCL: Single-Paradigm Distributed Heterogeneous Computing for Cloud Infrastructures
The ever-growing demand for compute resources has reached a wide range of application domains, and with that has created a larger audience for compute-intensive tasks. In this paper, we present the CloudCL framework, which empowers users to run compute-intensive tasks without having to face the total cost of ownership of operating an extensive high-performance compute […]
Jul, 15
Multicore architecture and cache optimization techniques for solving graph problems
With the advent of era of Big Data and Internet of Things, there has been an exponential increase in the availability of large data sets. These data sets require in-depth analysis that provides intelligence for improvements in methods for academia and industry. Majority of the data sets are represented and available in the form of […]
Jul, 15
Data-Parallel Hashing Techniques for GPU Architectures
Hash tables are one of the most fundamental data structures for effectively storing and accessing sparse data, with widespread usage in domains ranging from computer graphics to machine learning. This study surveys the state-of-the-art research on data-parallel hashing techniques for emerging massively-parallel, many-core GPU architectures. Key factors affecting the performance of different hashing schemes are […]
Jul, 7
Application of Deep-Learning to Compiler-Based Graphs
Graph-structured data is used in many domains to represent complex objects, such as the molecular structure of chemicals or interactions between members of a social network. However, extracting meaningful information from these graphs is a difficult task, which is often undertaken on a case by case basis. Devising automated methods to mine information from graphs […]
Jul, 7
Calamari – A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition
Optical Character Recognition (OCR) on contemporary and historical data is still in the focus of many researchers. Especially historical prints require book specific trained OCR models to achieve applicable results (Springmann and L"udeling, 2016, Reul et al., 2017a). To reduce the human effort for manually annotating ground truth (GT) various techniques such as voting and […]
Jul, 7
Energy Consumption of Algorithms for Solving the Compressible Navier-Stokes Equations on CPU’s, GPU’s and KNL’s
In addition to the hardware wall-time restrictions commonly seen in high-performance computing systems, it is likely that future systems will also be constrained by energy budgets. In the present work, finite difference algorithms of varying computational and memory intensity are evaluated with respect to both energy efficiency and runtime on an Intel Ivy Bridge CPU […]
Jul, 7
FluidFFT: common API (C++ and Python) for Fast Fourier Transform HPC libraries
The Python package fluidfft provides a common Python API for performing Fast Fourier Transforms (FFT) in sequential, in parallel and on GPU with different FFT libraries (FFTW, P3DFFT, PFFT, cuFFT). fluidfft is a comprehensive FFT framework which allows Python users to easily and efficiently perform FFT and the associated tasks, such as as computing linear […]