21811

Posts

Jun, 30

International Conference on Wireless Networks and Embedded Systems (ICWNES’20), 2020

★ 2020 International Conference on Wireless Networks and Embedded Systems (ICWNES 2020) — Ei Compendex & Scopus — Call for paperDecember 14-16, 2020|Bangkok, Thailand ★ Researchers, scientists, engineers and industry professionals will join together this year at ICWNES 2020, where the latest research will be unveiled and groundbreaking research projects will be presented. The field […]
Jun, 30

International Joint Conference on Signals, Systems and Computers (CSSC’20), 2020

★ 2020 International Joint Conference on Signals, Systems and Computers (CSSC 2020) — Ei Compendex & Scopus — Call for papersDecember 14-16, 2020|Bangkok, Thailand ★ 2020 International Joint Conference on Signals, Systems and Computers (CSSC 2020) will be held in Bangkok, Thailand. From keynote lectures by internationally recognized academics and leading experts, to a forum […]
Jun, 28

Performance benchmarking of deep learning framework on Intel Xeon Phi

With the success of deep learning (DL) methods in diverse application domains, several deep learning software frameworks have been proposed to facilitate the usage of these methods. By knowing the frameworks which are employed in big data analysis, the analysis process will be more efficient in terms of time and accuracy. Thus, benchmarking DL software […]
Jun, 28

GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices

Matrix-free solvers for finite element method (FEM) avoid assembly of elemental matrices and replace sparse matrix-vector multiplication required in iterative solution method by an element level dense matrix-vector product. In this paper, a novel matrix-free strategy for FEM is proposed which computes element level matrix-vector product by using only the symmetric part of the elemental […]
Jun, 28

Sparse GPU Kernels for Deep Learning

Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because these applications have relatively moderate levels of sparsity that are not sufficient for existing sparse kernels to outperform their dense counterparts. In this […]
Jun, 28

Preparing Ginkgo for AMD GPUs – A Testimonial on Porting CUDA Code to HIP

With AMD reinforcing their ambition in the scientific high performance computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP backend for AMD GPUs. In this paper, we report and discuss the porting effort from CUDA, the extension of the HIP framework to add missing features such as […]
Jun, 28

Automatic Kernel Generation for Volta Tensor Cores

A commonly occurring computation idiom in neural networks is to perform some pointwise operations on the result of a matrix multiplication. Such a sequence of operations is typically represented as a computation graph in deep learning compilers. When compiling to a GPU target, these computations can be individually mapped to manually tuned implementations provided by […]
Jun, 21

Autotuning for Automatic Parallelization on Heterogeneous Systems

To meet the surging demand for high-speed computation in an era of stagnating increase in performance per processor, systems designers resort to aggregating many and even heterogeneous processors into single systems. Automatic parallelization tools relieve application developers of the tedious and error prone task of programming these heterogeneous systems. For these tools, there are two […]
Jun, 21

FPGA Based Satisfiability Checking

The Boolean satisfiability problem, abbreviated as SAT, is the backbone of many applications in VLSI design automation and verification. Over the years, many SAT solvers, both complete and incomplete, have been developed. Complete solvers are usually based on the DPLL (Davis–Putnam–Logemann–Loveland) algorithm, which is a backtracking algorithm. Industrial strength problems are very large and make […]
Jun, 21

Heterogeneous Parallelization and Acceleration of Molecular Dynamics Simulations in GROMACS

The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching and cut-offs. Here, we […]
Jun, 21

Unsupervised Deep Learning of Incompressible Fluid Dynamics

Fast and stable fluid simulations are an essential prerequisite for applications ranging from computer aided aerodynamic design of automobiles or airplanes to simulations of physical effects in CGI to research in meteorology. Recent differentiable fluid simulations allow gradient based methods to optimize e.g. fluid control systems in an informed manner. Solving the partial differential equations […]
Jun, 21

Ansor: Generating High-Performance Tensor Programs for Deep Learning

High-performance tensor programs are crucial to guarantee efficient execution of deep learning models. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously difficult. Currently, deep learning systems rely on vendor-provided kernel libraries or various search strategies to get performant tensor programs. These approaches either require significant engineering efforts in developing […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: