Papers on hgpu.org (.txt-file)
Flexible Performant GEMM Kernels on GPUs
Flexible Pixel Compositor for Plug-and-Play Multi-Projector Displays
Flexible Software Profiling of GPU Architectures
Flexible, Fast and Accurate Sequence Alignment Profiling on GPGPU with PaSWAS
Flexible, high performance convolutional neural networks for image classification
FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System
FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous Computing
Flip-Flop: Convex Hull Construction via Star-Shaped Polyhedron in 3D
Floating Point Arithmetic for Transport Triggered Architectures
Floating-Point Arithmetic in Transport Triggered Architectures
Floating-point data compression at 75 Gb/s on a GPU
Floating-point Mixed-radix FFT Core Generation for FPGA and Comparison with GPU and CPU
Flocking Implementation for the Blender Game Engine
Flow Charts: Visualization of Vector Fields on Arbitrary Surfaces
FLOWER: A Comprehensive Dataflow Compiler for High-Level Synthesis
FlowPM: Distributed TensorFlow Implementation of the FastPM Cosmological N-body Solver
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow
FlowTour: An Automatic Guide for Exploring Internal Flow Features
Fluid Dynamics Simulations on Multi-GPU Systems
Fluid Motion Modelling Using Vortex Particle Method on GPU
Fluid Simulation and Generating Textures with Reaction-Diffusion Systems on Surfaces in the GPU
Fluid Simulation by the Smoothed Particle Hydrodynamics Method: A Survey
Fluid Simulation on Surfaces in the GPU
Fluid simulation with SIMPLE method using graphic processors
Fluid Simulation: Smoothed Particle Hydrodynamics on the GPU
Fluid-solid coupling on a cluster of GPU graphics cards for seismic wave propagation
FluidFFT: common API (C++ and Python) for Fast Fourier Transform HPC libraries
FluoroSim: A Visual Problem-Solving Environment for Fluorescence Microscopy
Flux tubes at Finite Temperature
FMM-based vortex method for simulation of isotropic turbulence on GPUs, compared with a spectral method
fMRI analysis on the GPU-possibilities and challenges
Focus measurement on programmable graphics hardware for all in-focus rendering from light fields
Focused Volumetric Visual Hull with Color Extraction
Forecasting high frequency financial time series using parallel FFN with CUDA and ZeroMQ
Forecasting time series with constraints
Forensics on GPU Coprocessing in Databases – Research Challenges, First Experiments, and Countermeasures
Formal Analysis of GPU Programs with Atomics via Conflict-Directed Delay-Bounding
Formal Description and Optimization Based High – Performance Computing on CUDA
Formal Semantics of Heterogeneous CUDA-C: A Modular Approach with Applications
Formal specification and verification of OpenCL Kernel optimization
Formalizing Address Spaces with application to Cuda, OpenCL, and beyond
ForOpenCL: Transformations Exploiting Array Syntax in Fortran for Accelerator Programming
Fortran High-Level Synthesis: Reducing the barriers to accelerating HPC codes on FPGAs
Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in Flang
FortranX: Harnessing Code Generation, Portability, and Heterogeneity in Fortran
Four styles of parallel and net programming
Four-dimensional Cone Beam CT Reconstruction and Enhancement using a Temporal Non-Local Means Method
Fourier Volume Rendering on the GPU Using a Split-Stream-FFT
FPGA accelerated 3D reconstruction using compressive sensing
FPGA Accelerated Simulation of Biologically Plausible Spiking Neural Networks
FPGA Acceleration of Multifunction Printer Image Processing using OpenCL
FPGA acceleration of rigid-molecule docking codes
FPGA Acceleration of Structured-Mesh-Based Explicit and Implicit Numerical Solvers using SYCL
FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods
FPGA Accelerators on Heterogeneous Systems: An Approach Using High Level Synthesis
FPGA and GPU implementation of large scale SpMV
FPGA Based Acceleration of Decimal Operations
FPGA Based High Performance and Scalable Block LU Decomposition Architecture
FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only
FPGA Based Satisfiability Checking
FPGA based Speeded Up Robust Features
FPGA implementation of a Convolutional Neural Network for "Wake up word" detection
FPGA Implementation of Bluetooth Low Energy Physical Layer with OpenCL
FPGA Implementation of Reduced Precision Convolutional Neural Networks
FPGA in HPC: High Level Synthesys of OpenCL kernels for Molecular Dynamics
FPGA vs. GPU for sparse matrix vector multiply
FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application
FPGA-Accelerated Image Processing Using High Level Synthesis with OpenCL
FPGA-based acceleration of a particle simulation High Performance Computing application
FPGA-based acceleration of CHARMM-potential minimization
FPGA-based Acceleration of FT Convolution for Pulsar Search Using OpenCL
FPGA-Based Accelerator Design from a Domain-Specific Language
FPGA-Based Design of Numerical Algorithms for Kernel Density Estimation Using High Level Synthesis Approach
FPGA-based Tsunami Simulation: Performance Comparison with GPUs, and Roofline Model for Scalability Analysis
FPGA-GPU architecture for kernel SVM pedestrian detection
FPGA-GPU-CPU Heterogenous Architecture for Real-time Cardiac Physiological Optical Mapping
FPGA: An Efficient And Promising Platform For Real-Time Image Processing Applications
fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs
FPGAs, GPUs and the PS2 – A Single Programming Methodology
Fractal Art Generation using GPUs
Fractal Based Method on Hardware Acceleration for Natural Environments
Fractal Video Compression in OpenCL: An Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms
Fractals Image Rendering and Compression using GPUs
Frame-based parallelization of MPEG-4 on compute unified device architecture (CUDA)
Framework for Batched and GPU-resident Factorization Algorithms Applied to Block Householder Transformations
Framework for Parallel Kernels Auto-tuning
Framework for utilizing computational devices within simulation
Frameworks for GPU Accelerators: A comprehensive evaluation using 2D/3D image registration
Frameworks for multi-core architectures: a comprehensive evaluation using 2D/3D image registration
Frameworks in Medical Image Analysis with Deep Neural Networks
Free Launch: Optimizing GPU Dynamic Kernel Launches through Thread Reuse
Free surface flow simulations on GPGPUs using the LBM
Free-form interest rate term structure decomposition: a 2nd order optimization problem
Frequent itemset mining on graphics processors
From Constraint Programming to Heterogeneous Parallelism
From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming
From English To Foreign Languages: Transferring Pre-trained Language Models
From Experiment to Design – Fault Characterization and Detection in Parallel Computer Systems Using Computational Accelerators
Titles: 100
open PDFs: 98
packages: 18