Papers on hgpu.org (.txt-file)
Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs

Automatic fitting of spiking neuron models to electrophysiological recordings

Automatic Fusions of CUDA-GPU Kernels for Parallel Map

Automatic Generation Of Application-Specific Accelerators for FPGAs from Python Loop Nests

Automatic generation of CUDA code performing tensor manipulations using C++ expression templates

Automatic Generation of FFT Libraries for GPU Platforms

Automatic generation of heterogeneous spectrometers for radio astronomy

Automatic Generation of Multicore Chemical Kernels
Automatic Generation of OpenCL Code for ARM Architectures

Automatic Generation of OpenCL Code through Polyhedral Compilation with LLM

Automatic generation of software pipelines for heterogeneous parallel systems

Automatic generation of warp-level primitives and atomic instructions for fast and portable parallel reduction on GPUs

Automatic GPU optimization through higher-order functions in functional languages

Automatic Hepatic Vessel Segmentation Using Graphics Hardware

Automatic Implementation of Evolutionary Algorithms on GPUs using ESDL

Automatic Kernel Generation for Volta Tensor Cores

Automatic library generation for BLAS3 on GPUs

Automatic Loop Partitioning for Heterogeneous Systems

Automatic Mapping for OpenCL-Programs on CPU/GPU Heterogeneous Platforms

Automatic Mapping of Stream Programs on Multicore Architectures

Automatic Multi-Camera Setup Optimization for Optical Tracking

Automatic Multi-GPU Code Generation applied to Simulation of Electrical Machines

Automatic NUMA Characterization using Cbench

Automatic Online Tuning (AutoTune): Fully Extended Analysis

Automatic OpenCL code generation for multi-device heterogeneous architectures

Automatic OpenCL Device Characterization: Guiding Optimized Kernel Design

Automatic OpenCL Task Adaptation for Heterogeneous Architectures

Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators based on a Domain-Specific Language for Medical Imaging

Automatic Optimization of OpenCL-Based Stencil Codes for FPGAs and Its Evaluation

Automatic Optimization of Thread Mapping for a GPGPU Programming Framework

Automatic Parallelization for GPUs

Automatic parallelization for graphics processing units

Automatic Parallelization for Heterogeneous Embedded Systems

Automatic Parallelization of a Gap Model using Java and OpenCL

Automatic Parallelization of Tiled Loop Nests with Enhanced Fine-Grained Parallelism on GPUs

Automatic Parallelization of Tiled Stencil Loop Nests on GPUs

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

Automatic Performance Optimisation of Parallel Programs for GPUs via Rewrite Rules

Automatic Performance Optimization in ViennaCL for GPUs

Automatic Performance Optimization on Heterogeneous Computer Systems using Manycore Coprocessors

Automatic Performance Tuning of Pipeline Patterns for Heterogeneous Parallel Architectures

Automatic Performance Tuning of Stencil Computations on Graphics Processing Units

Automatic Point Target Detection for Interactive Visual Analysis of SAR Images

Automatic Pose Estimation for Range Images on the GPU

Automatic program analysis for data parallel kernels

Automatic program parallelization for multicore processors

Automatic Resource-Constrained Static Task Parallelization

Automatic run-time mapping of polyhedral computations to heterogeneous devices with memory-size restrictions

Automatic safety proofs for asynchronous memory operations

Automatic Scan Parallelization in OpenMP

Automatic scanning of nuclear emulsions with wide-angle acceptance for nuclear fragment detection

Automatic Scheduling of Compute Kernels Across Heterogeneous Architectures

Automatic Selection of Sparse Matrix Representation on GPUs

Automatic shader level of detail
Automatic SIMD Code Generation

Automatic Skeleton-Based Compilation through Integration with an Algorithm Classification

Automatic Software Synthesis from High-Level ForSyDe Models Targeting Massively Parallel Processors

Automatic source code adaptation for heterogeneous platforms

Automatic Synthesis of Heterogeneous CPU-GPU Embedded Applications from a UML Profile

Automatic Termination Analysis for GPU Kernels

Automatic Test Case Reduction for OpenCL

Automatic test case reduction of randomly generated OpenCL kernels

Automatic transformation and optimization of applications on GPUs and GPU clusters

Automatic Translation of CUDA to OpenCL and Comparison of Performance Optimizations on GPUs

Automatic tuning matrix multiplication performance on graphics hardware

Automatic Tuning of Local Memory Use on GPGPUs

Automatic Virtualization of Accelerators

Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation

Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation

Automatically Generating Efficient Simulation Codes on GPUs from Partial Differential Equations

Automatically Harnessing Sparse Acceleration

Automatically Selecting Profitable Thread Block Sizes Using Machine Learning

Automatically translating a general purpose C++ image processing library for GPUs

Automatically Tuned Dense Linear Algebra for Multicore+GPU

Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
Automating a Labour Performance Measurement and Risk Assessment: An Evaluation of Methods for a Computer Vision based System

Automating elimination of idle functions by run-time reconfiguration

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach

Automating GPU computing in MATLAB
Automating Heterogeneous Parallelism in Numerical Differential Equations

Automating the Last-Mile for High Performance Dense Linear Algebra

AutOMP: An Automatic OpenMP Parallelization Generator for Variable-Oriented High-Performance Scientific Codes

AutoParBench: A Unified Test Framework for OpenMP-based Parallelizers

AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning

Autotuning CUDA Compiler Parameters for Heterogeneous Applications using the OpenTuner Framework

Autotuning CUDA: Applying NLP Techniques to LS-CAT

Autotuning for Automatic Parallelization on Heterogeneous Systems

Autotuning GPU Kernels via Static and Predictive Analysis

Autotuning of Pattern Runtimes for Accelerated Parallel Systems

Autotuning OpenACC Work Distribution via Direct Search

Autotuning OpenCL Workgroup Size for Stencil Patterns

Autotuning Programs with Algorithmic Choice

Autotuning Stencil-Based Computations on GPUs

Autotuning Stencils Codes with Algorithmic Skeletons

Autotuning Tensor Contraction Computations on GPUs

Autotuning Wavefront Abstractions for Heterogeneous Architectures

Autotuning Wavefront Patterns for Heterogeneous Architectures

Autotuning, Code Generation and Optimizing Compiler Technology for GPUs

Auxiliary Image Regularization for Deep CNNs with Noisy Labels

Titles: 100
open PDFs: 96
packages: 17
