25334

Posts

Jul, 18

OpenCL FPGA Optimization guided by memory accesses and roofline model analysis applied to tomography acceleration

Backward projection is one of the most time-consuming steps in method-based iterative reconstruction computed tomography. The 3D backprojection memory access pattern is potentially enough regular to exploit efficiently the computation power of acceleration boards based on GPU or FPGA. The highlevel tools like HLS or OpenCL ease consider such particular memory accesses during the design […]
Jul, 18

Accelerating Regular-Expression Matching on FPGAs with High-Level Synthesis

The importance of security infrastructures for high-throughput networks has rapidly grown as a result of expanding internet traffic and increasingly high-bandwidth connections. Intrusion-detection systems (IDSs), such as SNORT, rely upon rule sets designed to alert system administrators of malicious packets. Methods for deep-packet inspection, which often depend upon regular-expression searches, can be accelerated on programmable-logic […]
Jul, 18

Designing a high-performance boundary element library with OpenCL and Numba

The Bempp boundary element library is a well known library for the simulation of a range of electrostatic, acoustic and electromagnetic problems in homogeneous bounded and unbounded domains. It originally started as a traditional C++ library with a Python interface. Over the last two years we have completely redesigned Bempp as a native Python library, […]
Jul, 18

Optimisation and GPU code generation of Stencils for Futhark

Stencils are a common problem in the area of scientific computing. Exploitation of parallel computing is a central part when optimising for faster execution times of stencils running on large amounts of data. For this reason stencils are well suited to be run in a GPGPU setting. However, programming stencils to run on massively-parallel hardware […]
Jul, 18

GPTPU: Accelerating Applications using Edge Tensor Processing Units

Neural network (NN) accelerators have been integrated into a wide-spectrum of computer systems to accommodate the rapidly growing demands for artificial intelligence (AI) and machine learning (ML) applications. NN accelerators share the idea of providing native hardware support for operations on multidimensional tensor data. Therefore, NN accelerators are theoretically tensor processors that can improve system […]
Jul, 11

Bringing OpenCL to Commodity RISC-V CPUs

The importance of open-source hardware has been increasing in recent years with the introduction of the RISC-V Open ISA. This has also accelerated the push for support of the open-source software stack from compiler tools to full-blown operating systems. Parallel computing with today’s Application Programming Interfaces such as OpenCL has proven to be effective at […]
Jul, 11

KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks

Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC’s larger memory footprint hinders its applicability to large models. We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second-order optimizer framework that adapts the memory footprint, communication, and computation given specific […]
Jul, 11

Dynamic Adaptation Techniques and Opportunities to Improve HPC Runtimes

Exascale, a new era of computing, is knocking at the door. Leaving behind the days of high frequency, singlecore processors, the new paradigm of multicore/manycore processors in complex heterogeneous systems dominates today’s HPC landscape. With the advent of accelerators and special-purpose processors alongside general processors, the role of high performance computing (HPC) runtime systems has […]
Jul, 11

Block Conjugate Gradient Solver in OpenCL

The conjugate gradient method for solving certain systems of linear equations is widely used due to its iterative nature and fast convergence. Its boiled down algorithm contains simple matrix and vector operations which can be done in parallel with potential for great speedup. With the advent of GPGPU computing and accompanying programming models like OpenCL, […]
Jul, 11

Opening the Black Box: Performance Estimation during Code Generation for GPUs

Automatic code generation is frequently used to create implementations of algorithms specifically tuned to particular hardware and application parameters. The code generation process involves the selection of adequate code transformations, tuning parameters, and parallelization strategies. To cover the huge search space, code generation frameworks may apply time-intensive autotuning, exploit scenario-specific performance models, or treat performance […]
Jul, 4

A Sorting Library for FPGA Implementation in OpenCL Programming

In this study, we focus on data sorting, which is a basic arithmetic operation, and we present a sorting library that can be used with the OpenCL programming model for field-programmable gate arrays (FPGAs). Our sorting library is built by combining three hardware sorting algorithms. It consumes more than twice the overall hardware resources compared […]
Jul, 4

Productivity, Portability, Performance: Data-Centric Python

Python has become the de facto language for scientific computing. Programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python support in High Performance Computing (HPC) has skyrocketed. However, the Python language itself does not necessarily offer high performance. […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org