18272

Posts

Jun, 13

Assessment of various GPU acceleration strategies in text categorization processing flow

Automatic text categorization presents many difficulties. Modern algorithms are getting better in extracting meaningful information from human language. However, they often significantly increase complexity of computations. This increased demand for computational capabilities can be facilitated by the usage of hardware accelerators like general purpose graphic cards. In this paper we present a full processing flow […]
Jun, 13

Indigo: A Domain-Specific Language for Fast, Portable Image Reconstruction

Linear operators used in iterative methods like conjugate gradient have typically been implemented either as "matrix-driven" subroutines backed by explicit sparse or dense matrices, or as "matrix-free" subroutines that implement specific linear operations directly (e.g. FFTs). The matrix-driven approach is generally more portable because it can target widely available BLAS libraries, but it can be […]
Jun, 13

Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Writing mixed-precision kernels allows to achieve higher throughput together with outputs whose precision remain within given limits. The recent introduction of native half-precision arithmetic capabilities in several GPUs, such as NVIDIA P100 and AMD Vega 10, contributes to make precision-tuning even more relevant as of late. However, it is not trivial to manually find which […]
Jun, 13

Efficient Large-scale Approximate Nearest Neighbor Search on OpenCL FPGA

We present a new method for Product Quantization (PQ) based approximated nearest neighbor search (ANN) in high dimensional spaces. Specifically, we first propose a quantization scheme for the codebook of coarse quantizer, product quantizer, and rotation matrix, to reduce the cost of accessing these codebooks. Our approach also combines a highly parallel k-selection method, which […]
Jun, 9

Optimizing Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures

Sparse matrix vector multiplication (SpMV) is one of the most common operations in scientific and high-performance applications, and is often responsible for the application performance bottleneck. While the sparse matrix representation has a significant impact on the resulting application performance, choosing the right representation typically relies on expert knowledge and trial and error. This paper […]
Jun, 9

GPU Virtualization and Scheduling Methods: A Comprehensive Survey

The integration of graphics processing units (GPUs) on high-end compute nodes has established a new accelerator-based heterogeneous computing model, which now permeates high performance computing. The same paradigm nevertheless has limited adoption in cloud computing or other large-scale distributed computing paradigms. Heterogeneous computing with GPUs can benefit the Cloud by reducing operational costs and improving […]
Jun, 9

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

Convolutional neural networks (CNNs) are widely used in many computer vision applications. Previous FPGA implementations of CNNs are mainly based on the conventional convolutional algorithm. However, the high arithmetic complexity of conventional convolution algorithm for CNNs restricts the performance of accelerators and significantly increases the challenges of design. It has been proved that the Winograd […]
Jun, 9

Fast Locality Sensitive Hashing for Beam Search on GPU

We present a GPU-based Locality Sensitive Hashing (LSH) algorithm to speed up beam search for sequence models. We utilize the winner-take-all (WTA) hash, which is based on relative ranking order of hidden dimensions and thus resilient to perturbations in numerical values. Our algorithm is designed by fully considering the underling architecture of CUDA-enabled GPUs (Algorithm/Architecture […]
Jun, 9

Deep Fluids: A Generative Network for Parameterized Fluid Simulations

This paper presents a novel generative model to synthesize fluid simulations from a set of reduced parameters. A convolutional neural network is trained on a collection of discrete, parameterizable fluid simulation velocity fields. Due to the capability of deep learning architectures to learn representative features of the data, our generative model is able to accurately […]
Jun, 5

The Third International Workshop on GPU Computing and AI (GCA), 2018

==================================================== The Third International Workshop on GPU Computing and AI (GCA) http://is-candar.org/GCA18/ to be held in conjunction with The Sixth International Symposium on Computing and Networking (CANDAR’18), Hida Takayama, Japan, November 27-30, 2018 http://is-candar.org/ ==================================================== [Introduction] Built for massive parallelism, General Purpose computing on Graphic Processing Unit (GPGPU) has superseded high-performance CPU in several important […]
Jun, 5

The 5th International Conference on Power and Energy Systems Engineering (CPESE), 2018

Meeting time: September 19-21, 2018 Meeting place: Nagoya University, Japan keynote speakers Prof. Tony C.Y. Chung – Fellow of IEEE University of Saskatchewan, Canada Prof. Hassan Bevrani – University of Kurdistan, Iran Published by All accepted papers after proper registration and presentation, will be published in the CPESE 2018 conference Proceedings. Important dates Paper Submission: […]
Jun, 5

The 10th International Conference on Information Management and Engineering (ICIME), 2018

Meeting time: September 22-24, 2018 Meeting place: MediaCityUK, Salford Quays, Greater Manchester, England keynote speakers Prof. Sunil Vadera – University of Salford, UK. Prof. Marat Akhmet – Middle East Technical University, Turkey. Published by All the registered and presented papers will published in the International Conference Proceedings Series by ACM, which will be archived in […]

Recent source codes

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: