Feb, 18

The 5th International Conference on Electrical and Electronics Engineering (ICEEE), 2018

We would like to invite you to contribute to and participate in 2018 5th International Conference on Electrical and Electronics Engineering (ICEEE 2018), which will be held in Istanbul, Turkey during May 3-5, 2018. ICEEE 2018 is a forum for presenting excellent results and new challenges facing the field of the reliability and availability of […]
Feb, 17

Evaluating High-Level Synthesis Techniques for Scalable Hardware-Accelerated Computing

Hardware acceleration is considered a powerful tool in parallel-computing, able to overcome the limitations imposed by sequential execution of software applications and, at the same time, provide energy-efficient alternatives to other parallel computing platforms such as GPUs. However, the increasing application complexity makes it unaffordable to map algorithms directly into HDL. Hence, High-Level Synthesis tools […]
Feb, 17

Long-time Simulations with Complex Code Using Multiple Nodes of Intel Xeon Phi Knights Landing

Modern partial differential equation (PDE) models across scientific disciplines require sophisticated numerical methods resulting in complex codes as well as large numbers of simulations for analysis like parameter studies and uncertainty quantification. To evaluate the behavior of the model for sufficeintly long times, for instance, to compare to laboratory time scales, often requires long-time simulations […]
Feb, 17

The performances of R GPU implementations of the GMRES method

Although the performance of commodity computers has improved drastically with the introduction of multicore processors and GPU computing, the standard R distribution is still based on single-threaded model of computation, using only a small fraction of the computational power available now for most desktops and laptops. Modern statistical software packages rely on high performance implementations […]
Feb, 17

Input-Aware Auto-Tuning of Compute-Bound HPC Kernels

Efficient implementations of HPC applications for parallel architectures generally rely on external software packages (e.g., BLAS, LAPACK, CUDNN). While these libraries provide highly optimized routines for certain characteristics of inputs (e.g., square matrices), they generally do not retain optimal performance across the wide range of problems encountered in practice. In this paper, we present an […]
Feb, 17

Transforming and Optimizing Irregular Applications for Parallel Architectures

Parallel architectures, including multi-core processors, many-core processors, and multi-node systems, have become commonplace, as it is no longer feasible to improve single-core performance through increasing its operating clock frequency. Furthermore, to keep up with the exponentially growing desire for more and more computational power, the number of cores/nodes in parallel architectures has continued to dramatically […]
Feb, 15

A Survey of Techniques for Improving Security of Non-volatile Memories

Due to their high density and near-zero leakage power consumption, non-volatile memories (NVMs) are promising candidates for designing future memory systems. However, compared to conventional memories, NVMs also face more-severe security threats, e.g., the limited write endurance of NVMs makes them vulnerable to write-attacks. Also, the non-volatility of NVMs allows the data to persist even […]
Feb, 15

Accelerating Interpreted Programming Languages on GPUs with Just-In-Time Compilation and Runtime Optimisations

Nowadays, most computer systems are equipped with powerful parallel devices such as Graphics Processing Units (GPUs). They are present in almost every computer system including mobile devices, tablets, desktop computers and servers. These parallel systems have unlocked the possibility for many scientists and companies to process significant amounts of data in shorter time. But the […]
Feb, 15

TVM: End-to-End Optimization Stack for Deep Learning

Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive the current popularity and utility of deep learning. However, these frameworks are optimized for a narrow range of server-class GPUs and deploying workloads to other platforms such as mobile phones, embedded devices, and specialized accelerators (e.g., FPGAs, ASICs) requires laborious manual effort. We propose TVM, […]
Feb, 15

Improving Locality of Unstructured Mesh Algorithms on GPUs

To most efficiently utilize modern parallel architectures, the memory access patterns of algorithms must make heavy use of the cache architecture: successively accessed data must be close in memory (spatial locality) and one piece of data must be reused as many times as possible (temporal locality). In this work we analyse the performance of unstructured […]
Feb, 15

GPU Accelerated Finite Element Assembly with Runtime Compilation

In recent years, high performance scientific computing on graphics processing units (GPUs) have gained widespread acceptance. These devices are designed to offer massively parallel threads for running code with general purpose. There are many researches focus on finite element method with GPUs. However, most of the works are specific to certain problems and applications. Some […]
Feb, 10

Using Meta-heuristics and Machine Learning for Software Optimization of Parallel Computing Systems: A Systematic Literature Review

While the modern parallel computing systems offer high performance, utilizing these powerful computing resources to the highest possible extent demands advanced knowledge of various hardware architectures and parallel programming models. Furthermore, optimized software execution on parallel computing systems demands consideration of many parameters at compile-time and run-time. Determining the optimal set of parameters in a […]

* * *

* * *

HGPU group © 2010-2019 hgpu.org

All rights belong to the respective authors

Contact us: