18205

Posts

Jun, 5

The 4th International Conference on Control Science and Systems Engineering (ICCSSE), 2018

Meeting time: August 21-23, 2018. Meeting place: Huazhong University of Science and Technology of China. No. 1037, Luoyu Road, Hongshan District, Wuhan, China. Published by: Selected and registered papers to be published by IEEE Conference Publication. After a careful reviewing process, all accepted papers after proper registration and presentation, will be published in the conference […]
Jun, 5

The 2018 International Conference on Cloud Computing and Internet of Things (CCIOT’18), 2018

Meeting time: October 29-31, 2018. Meeting place: Nanyang Executive Centre in Nanyang Technological University, Singapore Host unit: ACM Singapore Chapter. keynote speaker Prof. Latif Ladid, University of Luxembourg, Luxembourg. Prof. Dimitrios Georgakopoulos, Swinburne University of Technology, Australia. Published by: Accepted papers will be published into conference proceedings which is indexed by EI Compendex, Scopus, Thomson […]
Jun, 2

clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization

Alternating least squares (ALS) has been proved to be an effective solver for matrix factorization in recommender systems. To speed up factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-cores and many-cores. Existing implementations are limited in either speed or portability. In this paper, we present an efficient and portable ALS […]
Jun, 2

Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL

CPU has insufficient resources to satisfy the efficient computation of the Convolution Neural Network (CNN), especially for embedded applications. Therefore, heterogeneous computing platforms are widely used to accelerate CNN tasks, such as GPU, FPGA and ASIC. Among these, FPGA can accelerate the computation by mapping the algorithm to the parallel hardware instead of CPU, which […]
Jun, 2

NengoDL: Combining deep learning and neuromorphic modelling methods

NengoDL is a software framework designed to combine the strengths of neuromorphic modelling and deep learning. NengoDL allows users to construct biologically detailed neural models, intermix those models with deep learning elements (such as convolutional networks), and then efficiently simulate those models in an easy-to-use, unified framework. In addition, NengoDL allows users to apply deep […]
Jun, 2

Marian: Cost-effective High-Quality Neural Machine Translation in C++

This paper describes the submissions of the "Marian" team to the WNMT 2018 shared task. We investigate combinations of teacher-student training, low-precision matrix products, auto-tuning and other methods to optimize the Transformer model on GPU and CPU. By further integrating these methods with the new averaging attention networks, a recently introduced faster Transformer variant, we […]
Jun, 2

FPGA-based Acceleration of FT Convolution for Pulsar Search Using OpenCL

The Square Kilometre Array (SKA) project will be the world largest radio telescope array. With its large number of antennas, the number of signals that need to be processed is dramatic. One important element of the SKA’s Central Signal Processor package is pulsar search. This paper focuses on the FPGA-based acceleration of the Frequency-Domain Acceleration […]
May, 26

OpenCL 2.2 API Specification

Modern processor architectures have embraced parallelism as an important pathway to increased performance. Facing technical challenges with higher clock speeds in a fixed power envelope, Central Processing Units (CPUs) now improve performance by adding multiple cores. Graphics Processing Units (GPUs) have also evolved from fixed function rendering devices into programmable parallel processors. As todays computer […]
May, 26

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learning systems. However, existing systems rely on manually optimized libraries such as cuDNN where only a narrow range of server class GPUs are […]
May, 26

Transformations of High-Level Synthesis Codes for High-Performance Computing

Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to […]
May, 26

One machine, one minute, three billion tetrahedra

This paper presents a new scalable parallelization scheme to generate the 3D Delaunay triangulation of a given set of points. Our first contribution is an efficient serial implementation of the incremental Delaunay insertion algorithm. A simple dedicated data structure and a number of improvements in the insertion algorithm have permitted to accelerate by a factor […]
May, 26

Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe

High throughput and low latency inference of deep neural networks are critical for the deployment of deep learning applications. This paper presents the efficient inference techniques of IntelCaffe, the first Intel optimized deep learning framework that supports efficient 8-bit low precision inference and model optimization techniques of convolutional neural networks on Intel Xeon Scalable Processors. […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: