high performance computing on graphics processing units: hgpu.org

Posts

Jun, 9

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

Convolutional neural networks (CNNs) are widely used in many computer vision applications. Previous FPGA implementations of CNNs are mainly based on the conventional convolutional algorithm. However, the high arithmetic complexity of conventional convolution algorithm for CNNs restricts the performance of accelerators and significantly increases the challenges of design. It has been proved that the Winograd […]

Jun, 9

Fast Locality Sensitive Hashing for Beam Search on GPU

We present a GPU-based Locality Sensitive Hashing (LSH) algorithm to speed up beam search for sequence models. We utilize the winner-take-all (WTA) hash, which is based on relative ranking order of hidden dimensions and thus resilient to perturbations in numerical values. Our algorithm is designed by fully considering the underling architecture of CUDA-enabled GPUs (Algorithm/Architecture […]

CUDA

Jun, 9

Deep Fluids: A Generative Network for Parameterized Fluid Simulations

This paper presents a novel generative model to synthesize fluid simulations from a set of reduced parameters. A convolutional neural network is trained on a collection of discrete, parameterizable fluid simulation velocity fields. Due to the capability of deep learning architectures to learn representative features of the data, our generative model is able to accurately […]

Jun, 5

The Third International Workshop on GPU Computing and AI (GCA), 2018

==================================================== The Third International Workshop on GPU Computing and AI (GCA) http://is-candar.org/GCA18/ to be held in conjunction with The Sixth International Symposium on Computing and Networking (CANDAR’18), Hida Takayama, Japan, November 27-30, 2018 http://is-candar.org/ ==================================================== [Introduction] Built for massive parallelism, General Purpose computing on Graphic Processing Unit (GPGPU) has superseded high-performance CPU in several important […]

Jun, 5

The 5th International Conference on Power and Energy Systems Engineering (CPESE), 2018

Meeting time: September 19-21, 2018 Meeting place: Nagoya University, Japan keynote speakers Prof. Tony C.Y. Chung – Fellow of IEEE University of Saskatchewan, Canada Prof. Hassan Bevrani – University of Kurdistan, Iran Published by All accepted papers after proper registration and presentation, will be published in the CPESE 2018 conference Proceedings. Important dates Paper Submission: […]

Jun, 5

The 10th International Conference on Information Management and Engineering (ICIME), 2018

Meeting time: September 22-24, 2018 Meeting place: MediaCityUK, Salford Quays, Greater Manchester, England keynote speakers Prof. Sunil Vadera – University of Salford, UK. Prof. Marat Akhmet – Middle East Technical University, Turkey. Published by All the registered and presented papers will published in the International Conference Proceedings Series by ACM, which will be archived in […]

Jun, 5

The 4th International Conference on Control Science and Systems Engineering (ICCSSE), 2018

Meeting time: August 21-23, 2018. Meeting place: Huazhong University of Science and Technology of China. No. 1037, Luoyu Road, Hongshan District, Wuhan, China. Published by: Selected and registered papers to be published by IEEE Conference Publication. After a careful reviewing process, all accepted papers after proper registration and presentation, will be published in the conference […]

Jun, 5

The 2018 International Conference on Cloud Computing and Internet of Things (CCIOT’18), 2018

Meeting time: October 29-31, 2018. Meeting place: Nanyang Executive Centre in Nanyang Technological University, Singapore Host unit: ACM Singapore Chapter. keynote speaker Prof. Latif Ladid, University of Luxembourg, Luxembourg. Prof. Dimitrios Georgakopoulos, Swinburne University of Technology, Australia. Published by: Accepted papers will be published into conference proceedings which is indexed by EI Compendex, Scopus, Thomson […]

Jun, 2

clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization

Alternating least squares (ALS) has been proved to be an effective solver for matrix factorization in recommender systems. To speed up factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-cores and many-cores. Existing implementations are limited in either speed or portability. In this paper, we present an efficient and portable ALS […]

OpenCL

Jun, 2

Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL

CPU has insufficient resources to satisfy the efficient computation of the Convolution Neural Network (CNN), especially for embedded applications. Therefore, heterogeneous computing platforms are widely used to accelerate CNN tasks, such as GPU, FPGA and ASIC. Among these, FPGA can accelerate the computation by mapping the algorithm to the parallel hardware instead of CPU, which […]

OpenCL

Jun, 2

NengoDL: Combining deep learning and neuromorphic modelling methods

NengoDL is a software framework designed to combine the strengths of neuromorphic modelling and deep learning. NengoDL allows users to construct biologically detailed neural models, intermix those models with deep learning elements (such as convolutional networks), and then efficiently simulate those models in an easy-to-use, unified framework. In addition, NengoDL allows users to apply deep […]

OpenCL

Jun, 2

Marian: Cost-effective High-Quality Neural Machine Translation in C++

This paper describes the submissions of the "Marian" team to the WNMT 2018 shared task. We investigate combinations of teacher-student training, low-precision matrix products, auto-tuning and other methods to optimize the Transformer model on GPU and CPU. By further integrating these methods with the new averaging attention networks, a recently introduced faster Transformer variant, we […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

Fast Locality Sensitive Hashing for Beam Search on GPU

Deep Fluids: A Generative Network for Parameterized Fluid Simulations

The Third International Workshop on GPU Computing and AI (GCA), 2018

The 5th International Conference on Power and Energy Systems Engineering (CPESE), 2018

The 10th International Conference on Information Management and Engineering (ICIME), 2018

The 4th International Conference on Control Science and Systems Engineering (ICCSSE), 2018

The 2018 International Conference on Cloud Computing and Internet of Things (CCIOT’18), 2018

clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization

Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL

NengoDL: Combining deep learning and neuromorphic modelling methods

Marian: Cost-effective High-Quality Neural Machine Translation in C++

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)