18022

Posts

Mar, 3

OpenCL Acceleration for TensorFlow

There is huge demand for targeting complex and large-scale machine learning applications particularly those based on popular actively-maintained frameworks such as TensorFlow and CAFFE to a variety of platforms with accelerators ranging from high-end desktop GPUs to resource-constrained embedded or mobile GPUs, FPGAs, and DSPs. However, to deliver good performance different platforms may require different […]
Mar, 3

Technical Report about Tiramisu: a Three-Layered Abstraction for Hiding Hardware Complexity from DSL Compilers

High-performance DSL developers work hard to take advantage of modern hardware. The DSL compilers have to build their own complex middle-ends before they can target a common back-end such as LLVM, which only handles single instruction streams with SIMD instructions. We introduce Tiramisu, a common middle-end that can generate efficient code for modern processors and […]
Mar, 3

Equalizer 2.0 – Convergence of a Parallel Rendering Framework

Developing complex, real world graphics applications which leverage multiple GPUs and computers for interactive 3D rendering tasks is a complex task. It requires expertise in distributed systems and parallel rendering in addition to the application domain itself. We present a mature parallel rendering framework which provides a large set of features, algorithms and system integration […]
Mar, 3

New High Performance GPGPU Code Transformation Framework Applied to Large Production Weather Prediction Code

We introduce "Hybrid Fortran", a new approach that allows a high performance GPGPU port for structured grid Fortran codes. This technique only requires minimal changes for a CPU targeted codebase, which is a significant advancement in terms of productivity. It has been successfully applied to both dynamical core and physical processes of ASUCA, a Japanese […]
Mar, 3

QMCPACK: An open source ab initio Quantum Monte Carlo package for the electronic structure of atoms, molecules, and solids

QMCPACK is an open source quantum Monte Carlo package for ab-initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater-Jastrow type trial wave functions in conjunction with a sophisticated optimizer […]
Feb, 18

The 2018 International Workshop on Embedded Multicore Systems (ICPP-EMS), 2018

The 2018 International Workshop on Embedded Multicore Systems to be held in conjunction with the 47th International Conference on Parallel Processing (ICPP 2018) https://sites.google.com/view/icpp-ems2018/ Embedded systems with multicore designs are of major focuses from both industry and academia. While embedded multicore systems will look to play an important role ahead for system designs, many challenging […]
Feb, 18

The 5th International Conference on Electrical and Electronics Engineering (ICEEE), 2018

We would like to invite you to contribute to and participate in 2018 5th International Conference on Electrical and Electronics Engineering (ICEEE 2018), which will be held in Istanbul, Turkey during May 3-5, 2018. ICEEE 2018 is a forum for presenting excellent results and new challenges facing the field of the reliability and availability of […]
Feb, 17

Evaluating High-Level Synthesis Techniques for Scalable Hardware-Accelerated Computing

Hardware acceleration is considered a powerful tool in parallel-computing, able to overcome the limitations imposed by sequential execution of software applications and, at the same time, provide energy-efficient alternatives to other parallel computing platforms such as GPUs. However, the increasing application complexity makes it unaffordable to map algorithms directly into HDL. Hence, High-Level Synthesis tools […]
Feb, 17

Long-time Simulations with Complex Code Using Multiple Nodes of Intel Xeon Phi Knights Landing

Modern partial differential equation (PDE) models across scientific disciplines require sophisticated numerical methods resulting in complex codes as well as large numbers of simulations for analysis like parameter studies and uncertainty quantification. To evaluate the behavior of the model for sufficeintly long times, for instance, to compare to laboratory time scales, often requires long-time simulations […]
Feb, 17

The performances of R GPU implementations of the GMRES method

Although the performance of commodity computers has improved drastically with the introduction of multicore processors and GPU computing, the standard R distribution is still based on single-threaded model of computation, using only a small fraction of the computational power available now for most desktops and laptops. Modern statistical software packages rely on high performance implementations […]
Feb, 17

Input-Aware Auto-Tuning of Compute-Bound HPC Kernels

Efficient implementations of HPC applications for parallel architectures generally rely on external software packages (e.g., BLAS, LAPACK, CUDNN). While these libraries provide highly optimized routines for certain characteristics of inputs (e.g., square matrices), they generally do not retain optimal performance across the wide range of problems encountered in practice. In this paper, we present an […]
Feb, 17

Transforming and Optimizing Irregular Applications for Parallel Architectures

Parallel architectures, including multi-core processors, many-core processors, and multi-node systems, have become commonplace, as it is no longer feasible to improve single-core performance through increasing its operating clock frequency. Furthermore, to keep up with the exponentially growing desire for more and more computational power, the number of cores/nodes in parallel architectures has continued to dramatically […]
Page 7 of 950« First...56789...203040...Last »

* * *

* * *

Featured events

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: