Posts
Mar, 10
Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs
Chebyshev filter diagonalization is well established in quantum chemistry and quantum physics to compute bulks of eigenvalues of large sparse matrices. Choosing a block vector implementation, we investigate optimization opportunities on the new class of high-performance compute devices featuring both high-bandwidth and low-bandwidth memory. We focus on the transparent access to the full address space […]
Mar, 3
Technical Report about Tiramisu: a Three-Layered Abstraction for Hiding Hardware Complexity from DSL Compilers
High-performance DSL developers work hard to take advantage of modern hardware. The DSL compilers have to build their own complex middle-ends before they can target a common back-end such as LLVM, which only handles single instruction streams with SIMD instructions. We introduce Tiramisu, a common middle-end that can generate efficient code for modern processors and […]
Mar, 3
OpenCL Acceleration for TensorFlow
There is huge demand for targeting complex and large-scale machine learning applications particularly those based on popular actively-maintained frameworks such as TensorFlow and CAFFE to a variety of platforms with accelerators ranging from high-end desktop GPUs to resource-constrained embedded or mobile GPUs, FPGAs, and DSPs. However, to deliver good performance different platforms may require different […]
Mar, 3
New High Performance GPGPU Code Transformation Framework Applied to Large Production Weather Prediction Code
We introduce "Hybrid Fortran", a new approach that allows a high performance GPGPU port for structured grid Fortran codes. This technique only requires minimal changes for a CPU targeted codebase, which is a significant advancement in terms of productivity. It has been successfully applied to both dynamical core and physical processes of ASUCA, a Japanese […]
Mar, 3
Equalizer 2.0 – Convergence of a Parallel Rendering Framework
Developing complex, real world graphics applications which leverage multiple GPUs and computers for interactive 3D rendering tasks is a complex task. It requires expertise in distributed systems and parallel rendering in addition to the application domain itself. We present a mature parallel rendering framework which provides a large set of features, algorithms and system integration […]
Mar, 3
QMCPACK: An open source ab initio Quantum Monte Carlo package for the electronic structure of atoms, molecules, and solids
QMCPACK is an open source quantum Monte Carlo package for ab-initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater-Jastrow type trial wave functions in conjunction with a sophisticated optimizer […]
Feb, 18
The 2018 International Workshop on Embedded Multicore Systems (ICPP-EMS), 2018
The 2018 International Workshop on Embedded Multicore Systems to be held in conjunction with the 47th International Conference on Parallel Processing (ICPP 2018) https://sites.google.com/view/icpp-ems2018/ Embedded systems with multicore designs are of major focuses from both industry and academia. While embedded multicore systems will look to play an important role ahead for system designs, many challenging […]
Feb, 18
The 5th International Conference on Electrical and Electronics Engineering (ICEEE), 2018
We would like to invite you to contribute to and participate in 2018 5th International Conference on Electrical and Electronics Engineering (ICEEE 2018), which will be held in Istanbul, Turkey during May 3-5, 2018. ICEEE 2018 is a forum for presenting excellent results and new challenges facing the field of the reliability and availability of […]
Feb, 17
Evaluating High-Level Synthesis Techniques for Scalable Hardware-Accelerated Computing
Hardware acceleration is considered a powerful tool in parallel-computing, able to overcome the limitations imposed by sequential execution of software applications and, at the same time, provide energy-efficient alternatives to other parallel computing platforms such as GPUs. However, the increasing application complexity makes it unaffordable to map algorithms directly into HDL. Hence, High-Level Synthesis tools […]
Feb, 17
Long-time Simulations with Complex Code Using Multiple Nodes of Intel Xeon Phi Knights Landing
Modern partial differential equation (PDE) models across scientific disciplines require sophisticated numerical methods resulting in complex codes as well as large numbers of simulations for analysis like parameter studies and uncertainty quantification. To evaluate the behavior of the model for sufficeintly long times, for instance, to compare to laboratory time scales, often requires long-time simulations […]
Feb, 17
The performances of R GPU implementations of the GMRES method
Although the performance of commodity computers has improved drastically with the introduction of multicore processors and GPU computing, the standard R distribution is still based on single-threaded model of computation, using only a small fraction of the computational power available now for most desktops and laptops. Modern statistical software packages rely on high performance implementations […]
Feb, 17
Input-Aware Auto-Tuning of Compute-Bound HPC Kernels
Efficient implementations of HPC applications for parallel architectures generally rely on external software packages (e.g., BLAS, LAPACK, CUDNN). While these libraries provide highly optimized routines for certain characteristics of inputs (e.g., square matrices), they generally do not retain optimal performance across the wide range of problems encountered in practice. In this paper, we present an […]