18293

Posts

Jun, 22

The 4th International Conference on Communication and Information Processing (ICCIP), 2018

Keynote speakers: Prof. Jalel Ben-Othman – University of Paris 13, France. Prof. Herwig Unger – University of Hagen, Germany. Published by: The accepted paper of ICCIP 2018 will be published into ICCIP 2018 Conference Proceedings, which will be published in the International Conference Proceedings Series by ACM and archived in the ACM Digital Library. The […]
Jun, 20

Energy Efficient Computing on Multi-core Processors: Vectorization and Compression Techniques

Over the past few years, energy consumption has become the main limiting factor for computing in general. This has led CPU vendors to aggressively promote parallel computing using multiple cores without significantly increasing the thermal design power of the processor. However, achieving maximum performance and energy efficiency from the available resources on the multi-core and […]
Jun, 20

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

Deep neural networks (DNN) have demonstrated effectiveness for various applications such as image processing, video segmentation, and speech recognition. Running state-of-the-art DNNs on current systems mostly relies on either general purpose processors, ASIC designs, or FPGA accelerators, all of which suffer from data movements due to the limited on chip memory and data transfer bandwidth. […]
Jun, 20

Comparing Two Generations of Embedded GPUs Running a Feature Detection Algorithm

Graphics processing units (GPUs) in embedded mobile platforms are reaching performance levels where they may be useful for computer vision applications. We compare two generations of embedded GPUs for mobile devices when running a state-of-the-art feature detection algorithm, i.e., Harris-Hessian/FREAK. We compare architectural differences, execution time, temperature, and frequency on Sony Xperia Z3 and Sony […]
Jun, 20

AVX-512 extension to OpenQCD 1.6

We publish an extension of openQCD-1.6 with AVX-512 vector instructions using Intel intrinsics. Recent Intel processors support extended instruction sets with operations on 512-bit wide vectors, increasing both the capacity for floating point operations and register memory. Optimal use of the new capabilities requires reorganising data and floating point operations into these wider vector units. […]
Jun, 20

Neural Code Comprehension: A Learnable Representation of Code Semantics

With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis. Most works attempt to process the code directly or use a syntactic tree representation, treating it like sentences written in a natural language. However, none of the existing methods are sufficient to comprehend program […]
Jun, 17

Combining Multiple Optimised FPGA-based Pulsar Search Modules Using OpenCL

Field-Programmable Gate Arrays (FPGAs) are widely used in the central signal processing design of the Square Kilometre Array (SKA) as acceleration hardware. The frequency domain acceleration search (FDAS) module is an important part of the SKA1-MID pulsar search engine. To develop for a yet to be finalised hardware, for cross-discipline interoperability and to achieve fast […]
Jun, 17

Dank Learning: Generating Memes Using Deep Neural Networks

We introduce a novel meme generation system, which given any image can produce a humorous and relevant caption. Furthermore, the system can be conditioned on not only an image but also a user-defined label relating to the meme template, giving a handle to the user on meme content. The system uses a pretrained Inception-v3 network […]
Jun, 17

Neural scene representation and rendering

Scene representation – the process of converting visual sensory data into concise descriptions – is a requirement for intelligent behaviour. Recent work has shown that neural networks excel at this task when provided large labelled datasets. However, removing the reliance on human labelling remains an important open problem. To this end, we introduce the Generative […]
Jun, 17

Acceleration of k-Nearest Neighbor and SRAD Algorithms Using Intel FPGA SDK for OpenCL

Field Programmable Gate Arrays (FPGAs) have been widely used for accelerating machine learning algorithms. However, the high design cost and time for implementing FPGA-based accelerators using traditional HDL-based design methodologies has discouraged users from designing FPGA-based accelerators. In recent years, a new CAD tool called Intel FPGA SDK for OpenCL (IFSO) allowed fast and efficient […]
Jun, 17

NCRF++: An Open-source Neural Sequence Labeling Toolkit

This paper describes NCRF++, a toolkit for neural sequence labeling. NCRF++ is designed for quick implementation of different neural sequence labeling models with a CRF inference layer. It provides users with an inference for building the custom model structure through configuration file with flexible neural feature design and utilization. Built on PyTorch, the core operations […]
Jun, 13

Implementing general matrix-matrix multiplication algorithm on the Intel Xeon Phi Knights Landing Processor

This paper presents the design and implementation of general matrix-matrix multiplication (GEMM) algorithm for the second generation Intel Xeon Phi processor codenamed Knights Landing (KNL). We illustrate several developing guidelines to achieve optimal performance with C programming language and the Advanced Vector Extensions (AVX-512) instruction set. Further, we present several environment variable issues associated with […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: