21741

Posts

Jun, 21

Autotuning for Automatic Parallelization on Heterogeneous Systems

To meet the surging demand for high-speed computation in an era of stagnating increase in performance per processor, systems designers resort to aggregating many and even heterogeneous processors into single systems. Automatic parallelization tools relieve application developers of the tedious and error prone task of programming these heterogeneous systems. For these tools, there are two […]
Jun, 21

FPGA Based Satisfiability Checking

The Boolean satisfiability problem, abbreviated as SAT, is the backbone of many applications in VLSI design automation and verification. Over the years, many SAT solvers, both complete and incomplete, have been developed. Complete solvers are usually based on the DPLL (Davis–Putnam–Logemann–Loveland) algorithm, which is a backtracking algorithm. Industrial strength problems are very large and make […]
Jun, 21

Heterogeneous Parallelization and Acceleration of Molecular Dynamics Simulations in GROMACS

The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching and cut-offs. Here, we […]
Jun, 21

Unsupervised Deep Learning of Incompressible Fluid Dynamics

Fast and stable fluid simulations are an essential prerequisite for applications ranging from computer aided aerodynamic design of automobiles or airplanes to simulations of physical effects in CGI to research in meteorology. Recent differentiable fluid simulations allow gradient based methods to optimize e.g. fluid control systems in an informed manner. Solving the partial differential equations […]
Jun, 21

Ansor: Generating High-Performance Tensor Programs for Deep Learning

High-performance tensor programs are crucial to guarantee efficient execution of deep learning models. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously difficult. Currently, deep learning systems rely on vendor-provided kernel libraries or various search strategies to get performant tensor programs. These approaches either require significant engineering efforts in developing […]
Jun, 14

The Rodinia Benchmark Suite in SYCL

We apply the SYCL programming model to the Rodinia benchmark suite, describe the transformations from the OpenCL implementations to the SYCL implementations, and evaluate the benchmarks on microprocessors with a CPU and an integrated GPU. The publicly available implementations of the benchmark suite will track the development of the SYCL compilers, and provide programs for […]
Jun, 14

A Compiler Infrastructure for Embedded Multicore SoCs

Compilers play a pivotal role in the software development process for microprocessors, by automatically translating high-level programming languages into machinespecific executable code. For a long time, while processors were scalar, compilers were regarded as a black box among the software community, due to their successful internal encapsulation of machine-specific details. Over a decade ago, major […]
Jun, 14

Software Testing – Test Suite Compilation and Execution Optimizations

The requirements and responsibilities assumed by software have increasingly rendered it to be large and complex. Testing to ensure that software meets all its requirements and is free from failures is a difficult and time-consuming task that necessitates the use of large test suites, containing many test cases. Time needed to compile and execute large […]
Jun, 14

AutoMat – Automatic Differentiation for Generalized Standard Materials on GPUs

We propose a universal method for the evaluation of generalized standard materials that greatly simplifies the material law implementation process. By means of automatic differentiation and a numerical integration scheme, AutoMat reduces the implementation effort to two potential functions. By moving AutoMat to the GPU, we close the performance gap to conventional evaluation routines and […]
Jun, 14

Neural Architecture Search without Training

The time and effort involved in hand-designing deep neural networks is immense. This has prompted the development of Neural Architecture Search (NAS) techniques to automate this design. However, NAS algorithms tend to be extremely slow and expensive; they need to train vast numbers of candidate networks to inform the search process. This could be remedied […]
Jun, 7

OpenABLext: An automatic code generation framework for agent-based simulations on CPU-GPU-FPGA heterogeneous platforms

The execution of agent-based simulations (ABSs) on hardware accelerator devices such as graphics processing units (GPUs) has been shown to offer great performance potentials. However, in heterogeneous hardware environments, it can become increasingly difficult to find viable partitions of the simulation and provide implementations for different hardware devices. To automate this process, we present OpenABLext, […]
Jun, 7

SOFF: An OpenCL High-Level Synthesis Framework for FPGAs

Recently, OpenCL has been emerging as a programming model for energy-efficient FPGA accelerators. However, the state-of-the-art OpenCL frameworks for FPGAs suffer from poor performance and usability. This paper proposes a highlevel synthesis framework of OpenCL for FPGAs, called SOFF. It automatically synthesizes a datapath to execute many OpenCL kernel threads in a pipelined manner. It […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: