19219

Posts

Dec, 8

GPU Computing with Python: Performance, Energy Efficiency and Usability

In this work, we examine the performance, energy efficiency and usability when using Python for developing HPC codes running on the GPU. We investigate the portability of performance and energy efficiency between CUDA and OpenCL; between GPU generations; and between low-end, mid-range and high-end GPUs. Our findings show that the impact of using Python is […]
Dec, 1

FBLAS: Streaming Linear Algebra Kernels on FPGA

Reconfigurable hardware represents an attractive alternative to load-store architectures, as it allows eliminating expensive control and data movement overheads in computations. In practice, these devices are often not considered in the highperformance computing community, due to the steep learning curve and low productivity of hardware design, and the lack of available library support for fundamental […]
Dec, 1

Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework

The efficient parallel execution of scientific applications is a key challenge in high-performance computing (HPC). With growing parallelism and heterogeneity of compute resources as well as increasingly complex software, performance analysis has become an indispensable tool in the development and optimization of parallel programs. It is a recurring task as HPC systems and their software […]
Dec, 1

FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things

The growing number of low-power smart devices in the Internet of Things is coupled with the concept of "Edge Computing", that is moving some of the intelligence, especially machine learning, towards the edge of the network. Enabling machine learning algorithms to run on resource-constrained hardware, typically on low-power smart devices, is challenging in terms of […]
Dec, 1

Titan: A Parallel Asynchronous Library for Multi-Agent and Soft-Body Robotics using NVIDIA CUDA

While most robotics simulation libraries are built for low-dimensional and intrinsically serial tasks, soft-body and multi-agent robotics have created a demand for simulation environments that can model many interacting bodies in parallel. Despite the increasing interest in these fields, no existing simulation library addresses the challenge of providing a unified, highly-parallelized, GPU-accelerated interface for simulating […]
Dec, 1

FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads

Performance optimization is the art of continuous seeking a harmonious mapping between the application domain and hardware. Recent years have witnessed a surge of deep learning (DL) applications in industry. Conventional wisdom for optimizing such workloads mainly focus on compute intensive ops (GEMM, Convolution, etc). Yet we show in this work, that the performance of […]
Nov, 24

Understanding the Performance of HPC Applications

High performance computing is an important asset to scientific research, enabling the study of phenomena such as nuclear physics or climate change, that are difficult or impossible to be studied in traditional experiments or allowing researchers to utilize large amounts of data from experiments such as the Large Hadron Collider. No matter the use of […]
Nov, 24

Benchmarking Deep Learning Models on Jetson TX2

In conclusion, the present work brings an overview of artificial intelligence and, mainly, deep learning fields with a focus on image recognition and the history behind the models and techniques present nowadays. Beyond that, we explored how embedded hardware work with the new scenarios that AI brings to the table and how companies are developing […]
Nov, 24

Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU

There is growing interest in graph mining algorithms such as motif counting. Generic graph mining systems have been developed to provide unified interfaces for programming these algorithms. However, existing systems take minutes or even hours to mine even simple patterns in moderate-sized graphs, which significantly limits their real-world usability. We present Pangolin, a high-performance and […]
Nov, 24

Hacking Neural Networks: A Short Introduction

A large chunk of research on the security issues of neural networks is focused on adversarial attacks. However, there exists a vast sea of simpler attacks one can perform both against and with neural networks. In this article, we give a quick introduction on how deep learning in security works and explore the basic methods […]
Nov, 24

FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10

Deep learning and Convolutional Neural Network (CNN) have becoming increasingly more popular and important in both academic and industrial areas in recent years cause they are able to provide better accuracy and result in classification, detection and recognition areas, compared to traditional approaches. Currently, there are many popular frameworks in the market for deep learning […]
Nov, 17

A Computing Kernel for Network Binarization on PyTorch

Deep Neural Networks have now achieved state-of-the-art results in a wide range of tasks including image classification, object detection and so on. However, they are both computation consuming and memory intensive, making them difficult to deploy on low-power devices. Network binarization is one of the existing effective techniques for model compression and acceleration, but there […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org