18381

Posts

Jul, 21

ARC: Adaptive Ray-tracing with CUDA, a New Ray Tracing Code for Parallel GPUs

We present the methodology of a photon-conserving, spatially-adaptive, ray-tracing radiative transfer algorithm, designed to run on multiple parallel Graphic Processing Units (GPUs). Each GPU has thousands computing cores, making them ideally suited to the task of tracing independent rays. This ray-tracing implementation has speed competitive with approximate momentum methods, even with thousands of ionization sources, […]
Jul, 21

LeFlow: Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural Networks

Recent work has shown that Field-Programmable Gate Arrays (FPGAs) play an important role in the acceleration of Machine Learning applications. Initial specification of machine learning applications are often done using a high-level Python-oriented framework such as Tensorflow, followed by a manual translation to either C or RTL for synthesis using vendor tools. This manual translation […]
Jul, 15

Cross-Compiling Shading Languages

Shading languages are the major class of programming languages for a modern mainstream Graphics Processing Unit (GPU). The programs of those languages are called "Shaders" as they were originally used to describe shading characteristics for computer graphics applications. To make use of GPU accelerated shaders a sophisticated rendering Application Programming Interface (API) is required and […]
Jul, 15

Revisiting Online Autotuning for Sparse-Matrix Vector Multiplication Kernels on High-Performance Accelerators

Kokkos [1], [2] is a C++ programming model that offers the ability to write portable code that targets a wide degree of parallelism found in current HPC systems. It works by providing abstractions for parallel execution and data layouts that are mapped to different hardware resources during compilation. Some parameters, such as the size of […]
Jul, 15

Multicore architecture and cache optimization techniques for solving graph problems

With the advent of era of Big Data and Internet of Things, there has been an exponential increase in the availability of large data sets. These data sets require in-depth analysis that provides intelligence for improvements in methods for academia and industry. Majority of the data sets are represented and available in the form of […]
Jul, 15

CloudCL: Single-Paradigm Distributed Heterogeneous Computing for Cloud Infrastructures

The ever-growing demand for compute resources has reached a wide range of application domains, and with that has created a larger audience for compute-intensive tasks. In this paper, we present the CloudCL framework, which empowers users to run compute-intensive tasks without having to face the total cost of ownership of operating an extensive high-performance compute […]
Jul, 15

Data-Parallel Hashing Techniques for GPU Architectures

Hash tables are one of the most fundamental data structures for effectively storing and accessing sparse data, with widespread usage in domains ranging from computer graphics to machine learning. This study surveys the state-of-the-art research on data-parallel hashing techniques for emerging massively-parallel, many-core GPU architectures. Key factors affecting the performance of different hashing schemes are […]
Jul, 7

Application of Deep-Learning to Compiler-Based Graphs

Graph-structured data is used in many domains to represent complex objects, such as the molecular structure of chemicals or interactions between members of a social network. However, extracting meaningful information from these graphs is a difficult task, which is often undertaken on a case by case basis. Devising automated methods to mine information from graphs […]
Jul, 7

Energy Consumption of Algorithms for Solving the Compressible Navier-Stokes Equations on CPU’s, GPU’s and KNL’s

In addition to the hardware wall-time restrictions commonly seen in high-performance computing systems, it is likely that future systems will also be constrained by energy budgets. In the present work, finite difference algorithms of varying computational and memory intensity are evaluated with respect to both energy efficiency and runtime on an Intel Ivy Bridge CPU […]
Jul, 7

Calamari – A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition

Optical Character Recognition (OCR) on contemporary and historical data is still in the focus of many researchers. Especially historical prints require book specific trained OCR models to achieve applicable results (Springmann and L"udeling, 2016, Reul et al., 2017a). To reduce the human effort for manually annotating ground truth (GT) various techniques such as voting and […]
Jul, 7

FluidFFT: common API (C++ and Python) for Fast Fourier Transform HPC libraries

The Python package fluidfft provides a common Python API for performing Fast Fourier Transforms (FFT) in sequential, in parallel and on GPU with different FFT libraries (FFTW, P3DFFT, PFFT, cuFFT). fluidfft is a comprehensive FFT framework which allows Python users to easily and efficiently perform FFT and the associated tasks, such as as computing linear […]
Jul, 7

An Efficient Dispatcher for Large Scale GraphProcessing on OpenCL-based FPGAs

High parallel framework has been proved to be very suitable for graph processing. There are various work to optimize the implementation in FPGAs, a pipeline parallel device. The key to make use of the parallel performance of FPGAs is to process graph data in pipeline model and take advantage of on-chip memory to realize necessary […]

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: