Posts
Feb, 10
BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
We introduce BinaryNet, a method which trains DNNs with binary weights and activations when computing parameters’ gradient. We show that it is possible to train a Multi Layer Perceptron (MLP) on MNIST and ConvNets on CIFAR-10 and SVHN with BinaryNet and achieve nearly state-of-the-art results. At run-time, BinaryNet drastically reduces memory usage and replaces most […]
Feb, 10
GPU-Accelerated High-Level Synthesis for Bitwidth Optimization of FPGA Datapaths
Bitwidth optimization of FPGA datapaths can save hardware resources by choosing the fewest number of bits required for each datapath variable to achieve a desired quality of result. However, it is an NP-hard problem that requires unacceptably long runtimes when using sequential CPU-based heuristics. We show how to parallelize the key steps of bitwidth optimization […]
Feb, 10
FARGO3D: A new GPU-oriented MHD code
We present the FARGO3D code, recently publicly released. It is a magnetohydrodynamics code developed with special emphasis on protoplanetary disks physics and planet-disk interactions, and parallelized with MPI. The hydrodynamics algorithms are based on finite difference upwind, dimensionally split methods. The magnetohydrodynamics algorithms consist of the constrained transport method to preserve the divergence-free property of […]
Feb, 10
Performance Portable GPU Code Generation for Matrix Multiplication
Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full performance potential is a job best left for ninja programmers. High-level programming languages coupled with optimizing compilers have been proposed to attempt to address this issue. However, they rely on device-specific heuristics or hard-coded library implementations to achieve good performance resulting in […]
Feb, 9
Guided Profiling for Auto-Tuning Array Layouts on GPUs
Auto-tuning for Graphics Processing Units (GPUs) has become very popular in recent years. It removes the necessity to hand-tune GPU code especially when a new hardware architecture is released. Our auto-tuner optimizes memory access patterns. This is a key aspect to exploit the full performance of modern GPUs. As the memory hierarchy has historically changed […]
Feb, 8
Portable Programming Models for Heterogeneous Platforms
With the end of Dennard scaling and emergence of dark silicon, the bets are high on heterogeneous architectures to achieve both application performance and energy efficiency. However, diversity in heterogeneous architectures poses severe programming challenges in terms of data layout, memory coherence, task partitioning, data distribution, and sharing of virtual addresses. Existing high-level programming languages […]
Feb, 8
Workload distribution and balancing in FPGAs and CPUs with OpenCL and TBB
In this paper we evaluate the performance and energy effectiveness of FPGA and CPU devices for a kind of parallel computing applications in which the workload can be distributed in a way that enables simultaneous computing in addition to simple off loading. The FPGA device is programmed via OpenCL using the recent availability of commercial […]
Feb, 8
Integrating GPGPU computations with CPU coroutines in C++
We present results on integration of two major GPGPU APIs with reactor-based event processing model in C++ that utilizes coroutines. With current lack of universally usable GPGPU programming interface that gives optimal performance and debates about the style of implementing asynchronous computing in C++, we present a working implementation that allows a uniform and seamless […]
Feb, 8
Collaborative design and optimization using Collective Knowledge
Designing faster, more energy efficient and reliable computer systems requires effective collaboration between hardware designers, system programmers and performance analysts, as well as feedback from system users. We present Collective Knowledge (CK), an open framework for reproducible and collaborative design and optimization. CK enables systematic and reproducible experimentation, combined with leading edge predictive analytics to […]
Feb, 8
High performance high-order numerical methods: applications in ocean modeling
This thesis presents high-order numerical methods for time-dependent simulations of oceanic wave propagation on modern many-core hardware architecture. Simulation of the waves such as tsunami, is challenging because of the varying fluid depths, propagation in many regions, requirement of high resolution near the shore, complex nonlinear wave phenomenon, and necessity of faster than real-time predictions. […]
Feb, 8
Utilizing GPUs to Accelerate Turbomachinery CFD Codes
GPU computing has established itself as a way to accelerate parallel codes in the high performance computing world. This work focuses on speeding up APNASA, a legacy CFD code used at NASA Glenn Research Center, while also drawing conclusions about the nature of GPU computing and the requirements to make GPGPU worthwhile on legacy codes. […]
Feb, 6
GPU Hackathons, 2016
Background General-purpose Graphics Processing Units (GPGPUs) potentially offer exceptionally high memory bandwidth and performance for a wide range of applications. The challenge in utilizing such accelerators has been the difficulty in programming them. The OpenACC Directives for Accelerators offers straightforward pragma extensions to C++ and Fortran to address this programming hurdle, but other GPU programming […]