Papers on hgpu.org (.txt-file)
Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs
Auto-Tuning Dedispersion for Many-Core Accelerators
Auto-tuning Dense Matrix Multiplication for GPGPU with Cache
Auto-tuning Dense Vector and Matrix-Vector Operations for Fermi GPUs
Auto-tuning Hybrid CPU-GPU Execution of Algorithmic Skeletons in SkePU
Auto-tuning interactive ray tracing using an analytical GPU architecture model
Auto-tuning of fast fourier transform on graphics processors
Auto-Tuning of Level 1 and Level 2 BLAS for GPUs
Auto-tuning on the macro scale: high level algorithmic auto-tuning for scientific applications
Auto-tuning Shallow water simulations on GPUs
Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems
Auto-tuning Streamed Applications on Intel Xeon Phi
Auto-Tunning of Data Communication on Heterogeneous Systems
Auto-Vectorizing a Large-scale Production Unstructured-mesh CFD Application
AutoDDL: Automatic Distributed Deep Learning with Asymptotically Optimal Communication
AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning
AutoMat – Automatic Differentiation for Generalized Standard Materials on GPUs
Automated and interactive approaches for optimal surface finding based segmentation of medical image data
Automated and parallel code generation for finite-differencing stencils with arbitrary data types
Automated Architecture Design for Deep Neural Networks
Automated architecture-aware mapping of streaming applications onto GPUs
Automated Buffer Sizing of Dataflow Applications in a High-Level Synthesis Workflow
Automated C/C++ Program Repair for High-Level Synthesis via Large Language Models
Automated Deep Learning Optimization via DSL-Based Source Code Transformation
Automated development of applications for graphical processing units using rewriting rules
Automated Dynamic Analysis of CUDA Programs
Automated Enhanced Parallelization of Sequential C to Parallel OpenMP
Automated Generation of OpenCL Programs Based on Algebra-Algorithmic Approach
Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications
Automated image alignment for 2D gel electrophoresis in a high-throughput proteomics pipeline
Automated Long-Term Monitoring of Parallel Microfluidic Operations Applying a Machine Vision-Assisted Positioning Method
Automated Partitioning of Data-Parallel Kernels using Polyhedral Compilation
Automated pose estimation in 3D point clouds applying annealing particle filters and inverse kinematics on a GPU
Automated Runtime Analysis and Adaptation for Scalable Heterogeneous Computing
Automated Software Testing of Memory Performance in Embedded GPUs
Automated Techniques for Enabling Efficient MPI Application Migration
Automated test generation for OpenCL kernels using fuzzing and constraint solving
Automated Testing of Graphics Shader Compilers
Automated Tool to Generate Parallel CUDA code from a Serial C Code
Automatic abstraction and fault tolerance in cortical microachitectures
Automatic acceleration of Numpy applications on GPUs and multicore CPUs
Automatic and Explicit Parallelization Approaches for Mathematical Simulation Models
Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems
Automatic bi-layer video segmentation based on sensor fusion
Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper
Automatic C-to-CUDA Code Generation for Affine Programs
Automatic classification of object code using machine learning
Automatic Code Generation and Adaptive Grid Scheduling for GPU Cluster Computing
Automatic code generation and tuning for stencil kernels on modern shared memory architectures
Automatic code generation for solvers of cardiac cellular membrane dynamics in GPUs
Automatic Code Generation for Stencil Computations on GPU Architectures
Automatic code generation methods applied to numerical linear algebra in high performance computing
Automatic Code Rewriting for Performance Portability
Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL
Automatic Compilation for Heterogeneous Architectures with Single Assignment C
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
Automatic Compiler Based FPGA Accelerator for CNN Training
Automatic contention detection and amelioration for data-intensive operations
Automatic CPU-GPU communication management and optimization
Automatic CUDA Code Synthesis Framework for Multicore CPU and GPU architectures
Automatic Data Layout Generation and Kernel Mapping for CPU+GPU Architectures
Automatic Data Layout Optimizations for GPUs
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
Automatic Detection and Denoising of Signals in Large Geophysical Datasets
Automatic Discovery of Algorithms for Multi-Agent Systems
Automatic Dynamic Task Distribution between CPU and GPU for Real-Time Systems
Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs
Automatic fitting of spiking neuron models to electrophysiological recordings
Automatic Fusions of CUDA-GPU Kernels for Parallel Map
Automatic Generation Of Application-Specific Accelerators for FPGAs from Python Loop Nests
Automatic generation of CUDA code performing tensor manipulations using C++ expression templates
Automatic Generation of FFT Libraries for GPU Platforms
Automatic generation of heterogeneous spectrometers for radio astronomy
Automatic Generation of Multicore Chemical Kernels
Automatic Generation of OpenCL Code for ARM Architectures
Automatic Generation of OpenCL Code through Polyhedral Compilation with LLM
Automatic generation of software pipelines for heterogeneous parallel systems
Automatic generation of warp-level primitives and atomic instructions for fast and portable parallel reduction on GPUs
Automatic GPU optimization through higher-order functions in functional languages
Automatic Hepatic Vessel Segmentation Using Graphics Hardware
Automatic Implementation of Evolutionary Algorithms on GPUs using ESDL
Automatic Kernel Generation for Volta Tensor Cores
Automatic library generation for BLAS3 on GPUs
Automatic Loop Partitioning for Heterogeneous Systems
Automatic Mapping for OpenCL-Programs on CPU/GPU Heterogeneous Platforms
Automatic Mapping of Stream Programs on Multicore Architectures
Automatic Multi-Camera Setup Optimization for Optical Tracking
Automatic Multi-GPU Code Generation applied to Simulation of Electrical Machines
Automatic NUMA Characterization using Cbench
Automatic Online Tuning (AutoTune): Fully Extended Analysis
Automatic OpenCL code generation for multi-device heterogeneous architectures
Automatic OpenCL Device Characterization: Guiding Optimized Kernel Design
Automatic OpenCL Task Adaptation for Heterogeneous Architectures
Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators based on a Domain-Specific Language for Medical Imaging
Automatic Optimization of OpenCL-Based Stencil Codes for FPGAs and Its Evaluation
Automatic Optimization of Thread Mapping for a GPGPU Programming Framework
Automatic Parallelization for GPUs
Automatic parallelization for graphics processing units
Automatic Parallelization for Heterogeneous Embedded Systems
Automatic Parallelization of a Gap Model using Java and OpenCL
Titles: 100
open PDFs: 95
packages: 16