26153

Posts

Jan, 16

Dopia: Online Parallelism Management for Integrated CPU/GPU Architectures

Recent desktop and mobile processors often integrate CPU and GPU onto the same die. The limited memory bandwidth of these integrated architectures can negatively affect the performance of data-parallel workloads when all computational resources are active. The combination of active CPU and GPU cores achieving the maximum performance depends on a workload’s characteristics, making manual […]
Jan, 2

System-Level Optimization and Code Generation for Graphics Processors using a Domain-Specific Language

As graphics processing units (GPUs) are being used increasingly for general purpose processing, efficient tooling for programming such parallel architectures becomes essential. Despite the continuous effort of programmability improvement in CUDA and OpenCL, they remain relatively low-level languages and require in-depth architecture knowledge to achieve high-performance implementations. Developers have to perform memory management manually to […]
Dec, 12

CitiusSynapse: A Deep Learning Framework for Embedded Systems

As embedded systems, such as smartphones with limited resources, have become increasingly popular, active research has recently been conducted on performing on-device deep learning in such systems. Therefore, in this study, we propose a deep learning framework that is specialized for embedded systems with limited resources, the operation processing structure of which differs from that […]
Dec, 12

High performance computing on Android devices – a case study

High performance computing for low power devices can be useful to speed up calculations on processors that use a lower clock rate than computers for which energy efficiency is not an issue. In this trial, different high performance techniques for Android devices have been compared, with a special focus on the use of the GPU. […]
Dec, 12

GPU backed Data Mining on Android Devices

Choosing an appropriate programming paradigm for high-performance computing on low-power devices can be useful to speed up calculations. Many Android devices have an integrated GPU and – although not officially supported – the OpenCL framework can be used on Android devices for addressing these GPUs. OpenCL supports thread and data parallelism. Applications that use the […]
Nov, 14

Performance Optimisations for Heterogeneous Managed Runtime Systems

High demand for increased computational capabilities and power efficiency has resulted in making commodity devices integrating diverse hardware resources. Desktops, laptops, and smartphones have embraced heterogeneity through multi-core Central Processing Units (CPUs), energy-efficient integrated Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), powerful discrete GPUs, and Tensor Processing Units (TPUs). To ease the programmability of […]
Oct, 17

Accelerating AutoDock VINA with GPUs

AutoDock VINA is one of the most-used docking tools in the early stage of modern drug discovery. It uses a Monte-Carlo based iterated search method and multithreading parallelism scheme on multicore machines to improve docking accuracy and speed. However, virtual screening from huge compound databases is common for modern drug discovery, which puts forward a […]
Oct, 3

HLS Portability from Intel to Xilinx: A Case Study

Field-programmable gate arrays (FPGAs) are a hardware accelerator option that is growing in popularity. However, FPGAs are notoriously hard to program. To this end, high-level synthesis (HLS) tools have been developed to allow programmers to design hardware accelerators with FPGAs using familiar software languages. The two largest FPGA vendors, Intel and Xilinx, support both C/C++ […]
Oct, 3

Embedded Software Synthesis using Heterogeneous Dataflow Models

Dataflow process networks (DPNs) consist of statically defined process nodes with First-In-First-Out (FIFO) buffered point-to-point connections. DPNs are intrinsically data-driven, i.e., node actions are not synchronized among each other and may fire whenever sufficient input operands arrived at a node. In this original form, DPNs are data-driven and therefore a suitable model of computation (MoC) […]
Jul, 18

Accelerating Regular-Expression Matching on FPGAs with High-Level Synthesis

The importance of security infrastructures for high-throughput networks has rapidly grown as a result of expanding internet traffic and increasingly high-bandwidth connections. Intrusion-detection systems (IDSs), such as SNORT, rely upon rule sets designed to alert system administrators of malicious packets. Methods for deep-packet inspection, which often depend upon regular-expression searches, can be accelerated on programmable-logic […]
Jul, 18

Optimisation and GPU code generation of Stencils for Futhark

Stencils are a common problem in the area of scientific computing. Exploitation of parallel computing is a central part when optimising for faster execution times of stencils running on large amounts of data. For this reason stencils are well suited to be run in a GPGPU setting. However, programming stencils to run on massively-parallel hardware […]
Jul, 18

GPTPU: Accelerating Applications using Edge Tensor Processing Units

Neural network (NN) accelerators have been integrated into a wide-spectrum of computer systems to accommodate the rapidly growing demands for artificial intelligence (AI) and machine learning (ML) applications. NN accelerators share the idea of providing native hardware support for operations on multidimensional tensor data. Therefore, NN accelerators are theoretically tensor processors that can improve system […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: