Posts
Jun, 21
Autotuning for Automatic Parallelization on Heterogeneous Systems
To meet the surging demand for high-speed computation in an era of stagnating increase in performance per processor, systems designers resort to aggregating many and even heterogeneous processors into single systems. Automatic parallelization tools relieve application developers of the tedious and error prone task of programming these heterogeneous systems. For these tools, there are two […]
Jun, 21
FPGA Based Satisfiability Checking
The Boolean satisfiability problem, abbreviated as SAT, is the backbone of many applications in VLSI design automation and verification. Over the years, many SAT solvers, both complete and incomplete, have been developed. Complete solvers are usually based on the DPLL (Davis–Putnam–Logemann–Loveland) algorithm, which is a backtracking algorithm. Industrial strength problems are very large and make […]
Jun, 21
Heterogeneous Parallelization and Acceleration of Molecular Dynamics Simulations in GROMACS
The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching and cut-offs. Here, we […]
Jun, 21
Unsupervised Deep Learning of Incompressible Fluid Dynamics
Fast and stable fluid simulations are an essential prerequisite for applications ranging from computer aided aerodynamic design of automobiles or airplanes to simulations of physical effects in CGI to research in meteorology. Recent differentiable fluid simulations allow gradient based methods to optimize e.g. fluid control systems in an informed manner. Solving the partial differential equations […]
Jun, 21
Ansor: Generating High-Performance Tensor Programs for Deep Learning
High-performance tensor programs are crucial to guarantee efficient execution of deep learning models. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously difficult. Currently, deep learning systems rely on vendor-provided kernel libraries or various search strategies to get performant tensor programs. These approaches either require significant engineering efforts in developing […]
Jun, 14
The Rodinia Benchmark Suite in SYCL
We apply the SYCL programming model to the Rodinia benchmark suite, describe the transformations from the OpenCL implementations to the SYCL implementations, and evaluate the benchmarks on microprocessors with a CPU and an integrated GPU. The publicly available implementations of the benchmark suite will track the development of the SYCL compilers, and provide programs for […]
Jun, 14
A Compiler Infrastructure for Embedded Multicore SoCs
Compilers play a pivotal role in the software development process for microprocessors, by automatically translating high-level programming languages into machinespecific executable code. For a long time, while processors were scalar, compilers were regarded as a black box among the software community, due to their successful internal encapsulation of machine-specific details. Over a decade ago, major […]
Jun, 14
Software Testing – Test Suite Compilation and Execution Optimizations
The requirements and responsibilities assumed by software have increasingly rendered it to be large and complex. Testing to ensure that software meets all its requirements and is free from failures is a difficult and time-consuming task that necessitates the use of large test suites, containing many test cases. Time needed to compile and execute large […]
Jun, 14
AutoMat – Automatic Differentiation for Generalized Standard Materials on GPUs
We propose a universal method for the evaluation of generalized standard materials that greatly simplifies the material law implementation process. By means of automatic differentiation and a numerical integration scheme, AutoMat reduces the implementation effort to two potential functions. By moving AutoMat to the GPU, we close the performance gap to conventional evaluation routines and […]
Jun, 14
Neural Architecture Search without Training
The time and effort involved in hand-designing deep neural networks is immense. This has prompted the development of Neural Architecture Search (NAS) techniques to automate this design. However, NAS algorithms tend to be extremely slow and expensive; they need to train vast numbers of candidate networks to inform the search process. This could be remedied […]
Jun, 7
OpenABLext: An automatic code generation framework for agent-based simulations on CPU-GPU-FPGA heterogeneous platforms
The execution of agent-based simulations (ABSs) on hardware accelerator devices such as graphics processing units (GPUs) has been shown to offer great performance potentials. However, in heterogeneous hardware environments, it can become increasingly difficult to find viable partitions of the simulation and provide implementations for different hardware devices. To automate this process, we present OpenABLext, […]
Jun, 7
SOFF: An OpenCL High-Level Synthesis Framework for FPGAs
Recently, OpenCL has been emerging as a programming model for energy-efficient FPGA accelerators. However, the state-of-the-art OpenCL frameworks for FPGAs suffer from poor performance and usability. This paper proposes a highlevel synthesis framework of OpenCL for FPGAs, called SOFF. It automatically synthesizes a datapath to execute many OpenCL kernel threads in a pipelined manner. It […]