20723

Posts

Apr, 26

GEVO: GPU Code Optimization using Evolutionary Computation

GPUs are a key enabler of the revolution in machine learning and high performance computing, functioning as de facto co-processors to accelerate large-scale computation. As the programming stack and tool support have matured, GPUs have also become accessible to programmers, who may lack detailed knowledge of the underlying architecture and fail to fully leverage the […]
Apr, 26

Evaluating FPGA Accelerator Performance with a Parameterized OpenCL Adaptation of the HPCChallenge Benchmark Suite

FPGAs have found increasing adoption in data center applications since a new generation of high-level tools have become available which noticeably reduce development time for FPGA accelerators and still provide high quality of results. There is however no high-level benchmark suite available which specifically enables a comparison of FPGA architectures, programming tools and libraries for […]
Apr, 26

Cpp-Taskflow: A General-purpose Parallel and Heterogeneous Task Programming System at Scale

The Cpp-Taskflow project addresses the long-standing question: How can we make it easier for developers to write parallel and heterogeneous programs with high performance and simultaneous high productivity? Cpp-Taskflow develops a simple and powerful task programming model to enable efficient implementations of heterogeneous decomposition strategies. Our programming model empowers users with both static and dynamic […]
Apr, 19

OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework

Object detection is a technology that deals with recognizing classes of objects and their location. It is used in many different areas, such as in face-detecting systems [16, 34, 37], surveillance tools [9], human-machine interfaces [17], and self-driving cars [18, 23, 25, 26, 30]. These days, deep learning object detection approaches have achieved significantly better […]
Apr, 19

Design Space Exploration of an OpenCL Based SAXPY Kernel Implementation on FPGAs

High-performance computing researchers are trying to find new options, tools to satisfy the performance criteria of a hardware design. FPGA (Field Programmable Gate Array) is one of the accelerators which is widely used for power-efficient applications due to its reconfigurability and high performance. Traditionally FPGA can be programmed using Hardware Description Language (HDL). Using HDL, […]
Apr, 19

FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System

Tensor computation plays a paramount role in a broad range of domains, including machine learning, data analytics, and scientific computing. The wide adoption of tensor computation and its huge computation cost has led to high demand for flexible, portable, and high-performance library implementation on heterogeneous hardware accelerators such as GPUs and FPGAs. However, the current […]
Apr, 19

Deep-Edge: An Efficient Framework for Deep Learning Model Update on Heterogeneous Edge

Deep Learning (DL) model-based AI services are increasingly offered in a variety of predictive analytics services such as computer vision, natural language processing, speech recognition. However, the quality of the DL models can degrade over time due to changes in the input data distribution, thereby requiring periodic model updates. Although cloud data-centers can meet the […]
Apr, 19

A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs

GPUs are playing an increasingly important role in general-purpose computing. Many algorithms require synchronizations at different levels of granularity in a single GPU. Additionally, the emergence of dense GPU nodes also calls for multi-GPU synchronization. Nvidia’s latest CUDA provides a variety of synchronization methods. Until now, there is no full understanding of the characteristics of […]
Apr, 12

MNN: A Universal and Efficient Inference Engine

Deploying deep learning models on mobile devices draws more and more attention recently. However, designing an efficient inference engine on devices is under the great challenges of model compatibility, device diversity, and resource limitation. To deal with these challenges, we propose Mobile Neural Network (MNN), a universal and efficient inference engine tailored to mobile applications. […]
Apr, 12

Open Source Face Recognition API

Face recognition applications are widely used today for a variety of tasks, whether personal or professional. When looking for a service that provides face detection and classification, it is easy to find several solutions. In this project another way is described so that it is possible to perform this task according to the desired needs […]
Apr, 12

Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based SpMV Implementation on an FPGA

Hardware designers use High-Level Synthesis (HLS) tools in order to reduce the design time and design complexity. OpenCL is a framework that uses HLS tools and permits the programmer to write standardized C-like code for the host as well as for the hardware accelerators. Using OpenCL, a program can be written using different memory access […]
Apr, 12

Neural Architecture Search for Lightweight Non-Local Networks

Non-Local (NL) blocks have been widely studied in various vision tasks. However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and […]

Recent source codes

* * *

* * *

HGPU group © 2010-2020 hgpu.org

All rights belong to the respective authors

Contact us: