Posts
Jul, 18
Optimisation and GPU code generation of Stencils for Futhark
Stencils are a common problem in the area of scientific computing. Exploitation of parallel computing is a central part when optimising for faster execution times of stencils running on large amounts of data. For this reason stencils are well suited to be run in a GPGPU setting. However, programming stencils to run on massively-parallel hardware […]
Jul, 18
GPTPU: Accelerating Applications using Edge Tensor Processing Units
Neural network (NN) accelerators have been integrated into a wide-spectrum of computer systems to accommodate the rapidly growing demands for artificial intelligence (AI) and machine learning (ML) applications. NN accelerators share the idea of providing native hardware support for operations on multidimensional tensor data. Therefore, NN accelerators are theoretically tensor processors that can improve system […]
Jul, 11
Bringing OpenCL to Commodity RISC-V CPUs
The importance of open-source hardware has been increasing in recent years with the introduction of the RISC-V Open ISA. This has also accelerated the push for support of the open-source software stack from compiler tools to full-blown operating systems. Parallel computing with today’s Application Programming Interfaces such as OpenCL has proven to be effective at […]
Jul, 11
KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks
Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC’s larger memory footprint hinders its applicability to large models. We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second-order optimizer framework that adapts the memory footprint, communication, and computation given specific […]
Jul, 11
Dynamic Adaptation Techniques and Opportunities to Improve HPC Runtimes
Exascale, a new era of computing, is knocking at the door. Leaving behind the days of high frequency, singlecore processors, the new paradigm of multicore/manycore processors in complex heterogeneous systems dominates today’s HPC landscape. With the advent of accelerators and special-purpose processors alongside general processors, the role of high performance computing (HPC) runtime systems has […]
Jul, 11
Block Conjugate Gradient Solver in OpenCL
The conjugate gradient method for solving certain systems of linear equations is widely used due to its iterative nature and fast convergence. Its boiled down algorithm contains simple matrix and vector operations which can be done in parallel with potential for great speedup. With the advent of GPGPU computing and accompanying programming models like OpenCL, […]
Jul, 11
Opening the Black Box: Performance Estimation during Code Generation for GPUs
Automatic code generation is frequently used to create implementations of algorithms specifically tuned to particular hardware and application parameters. The code generation process involves the selection of adequate code transformations, tuning parameters, and parallelization strategies. To cover the huge search space, code generation frameworks may apply time-intensive autotuning, exploit scenario-specific performance models, or treat performance […]
Jul, 4
A Sorting Library for FPGA Implementation in OpenCL Programming
In this study, we focus on data sorting, which is a basic arithmetic operation, and we present a sorting library that can be used with the OpenCL programming model for field-programmable gate arrays (FPGAs). Our sorting library is built by combining three hardware sorting algorithms. It consumes more than twice the overall hardware resources compared […]
Jul, 4
Productivity, Portability, Performance: Data-Centric Python
Python has become the de facto language for scientific computing. Programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python support in High Performance Computing (HPC) has skyrocketed. However, the Python language itself does not necessarily offer high performance. […]
Jul, 4
Optimization of Heterogeneous Parallel Computing Systems using Machine Learning
Background: Heterogeneous parallel computing systems utilize the combination of different resources CPUs and GPUs to achieve high performance and, reduced latency and energy consumption. Programming applications that target various processing units requires employing different tools and programming models/languages. Furthermore, selecting the most optimal implementation, which may either target different processing units (i.e. CPU or GPU) […]
Jul, 4
HALF: Holistic Auto Machine Learning for FPGAs
Deep Neural Networks (DNNs) are capable of solving complex problems in domains related to embedded systems, such as image and natural language processing. To efficiently implement DNNs on a specific FPGA platform for a given cost criterion, e.g. energy efficiency, an enormous amount of design parameters has to be considered from the topology down to […]
Jul, 4
Object Detection Based Handwriting Localization
We present an object detection based approach to localize handwritten regions from documents, which initially aims to enhance the anonymization during the data transmission. The concatenated fusion of original and preprocessed images containing both printed texts and handwritten notes or signatures are fed into the convolutional neural network, where the bounding boxes are learned to […]