25332

Posts

Jul, 18

Designing a high-performance boundary element library with OpenCL and Numba

The Bempp boundary element library is a well known library for the simulation of a range of electrostatic, acoustic and electromagnetic problems in homogeneous bounded and unbounded domains. It originally started as a traditional C++ library with a Python interface. Over the last two years we have completely redesigned Bempp as a native Python library, […]
Jul, 18

Optimisation and GPU code generation of Stencils for Futhark

Stencils are a common problem in the area of scientific computing. Exploitation of parallel computing is a central part when optimising for faster execution times of stencils running on large amounts of data. For this reason stencils are well suited to be run in a GPGPU setting. However, programming stencils to run on massively-parallel hardware […]
Jul, 18

GPTPU: Accelerating Applications using Edge Tensor Processing Units

Neural network (NN) accelerators have been integrated into a wide-spectrum of computer systems to accommodate the rapidly growing demands for artificial intelligence (AI) and machine learning (ML) applications. NN accelerators share the idea of providing native hardware support for operations on multidimensional tensor data. Therefore, NN accelerators are theoretically tensor processors that can improve system […]
Jul, 11

Bringing OpenCL to Commodity RISC-V CPUs

The importance of open-source hardware has been increasing in recent years with the introduction of the RISC-V Open ISA. This has also accelerated the push for support of the open-source software stack from compiler tools to full-blown operating systems. Parallel computing with today’s Application Programming Interfaces such as OpenCL has proven to be effective at […]
Jul, 11

KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks

Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC’s larger memory footprint hinders its applicability to large models. We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second-order optimizer framework that adapts the memory footprint, communication, and computation given specific […]
Jul, 11

Dynamic Adaptation Techniques and Opportunities to Improve HPC Runtimes

Exascale, a new era of computing, is knocking at the door. Leaving behind the days of high frequency, singlecore processors, the new paradigm of multicore/manycore processors in complex heterogeneous systems dominates today’s HPC landscape. With the advent of accelerators and special-purpose processors alongside general processors, the role of high performance computing (HPC) runtime systems has […]
Jul, 11

Block Conjugate Gradient Solver in OpenCL

The conjugate gradient method for solving certain systems of linear equations is widely used due to its iterative nature and fast convergence. Its boiled down algorithm contains simple matrix and vector operations which can be done in parallel with potential for great speedup. With the advent of GPGPU computing and accompanying programming models like OpenCL, […]
Jul, 11

Opening the Black Box: Performance Estimation during Code Generation for GPUs

Automatic code generation is frequently used to create implementations of algorithms specifically tuned to particular hardware and application parameters. The code generation process involves the selection of adequate code transformations, tuning parameters, and parallelization strategies. To cover the huge search space, code generation frameworks may apply time-intensive autotuning, exploit scenario-specific performance models, or treat performance […]
Jul, 4

A Sorting Library for FPGA Implementation in OpenCL Programming

In this study, we focus on data sorting, which is a basic arithmetic operation, and we present a sorting library that can be used with the OpenCL programming model for field-programmable gate arrays (FPGAs). Our sorting library is built by combining three hardware sorting algorithms. It consumes more than twice the overall hardware resources compared […]
Jul, 4

Productivity, Portability, Performance: Data-Centric Python

Python has become the de facto language for scientific computing. Programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python support in High Performance Computing (HPC) has skyrocketed. However, the Python language itself does not necessarily offer high performance. […]
Jul, 4

Optimization of Heterogeneous Parallel Computing Systems using Machine Learning

Background: Heterogeneous parallel computing systems utilize the combination of different resources CPUs and GPUs to achieve high performance and, reduced latency and energy consumption. Programming applications that target various processing units requires employing different tools and programming models/languages. Furthermore, selecting the most optimal implementation, which may either target different processing units (i.e. CPU or GPU) […]
Jul, 4

Object Detection Based Handwriting Localization

We present an object detection based approach to localize handwritten regions from documents, which initially aims to enhance the anonymization during the data transmission. The concatenated fusion of original and preprocessed images containing both printed texts and handwritten notes or signatures are fed into the convolutional neural network, where the bounding boxes are learned to […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: