Papers on hgpu.org (.txt-file)
Fine-grained Parallel ILU Preconditioners with Fill-ins for Multi-core CPUs and GPUs

Fine-Grained Parallel Incomplete LU Factorization

Fine-grained parallelization of a Vlasov-Poisson application on GPU

Fine-Grained Resource Sharing for Concurrent GPGPU Kernels

Fine-Grained Synchronizations and Dataflow Programming on GPUs

Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation

Fine-Granular Parallel EBCOT and Optimization with CUDA for Digital Cinema Image Compression

Fine-sorting One-dimensional Particle-In-Cell Algorithm with Monte-Carlo Collisions on a Graphics Processing Unit

Fine-Tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors: LU Decomposition of Small Matrices

Fingerprint grid enhancement on GPU

Fingerprint Local Invariant Feature Extraction on GPU with CUDA

Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors

Finite Difference Time-Domain Modelling of Metamaterials: GPU Implementation of Cylindrical Cloak

Finite differences numerical method for two-dimensional superlattice Boltzmann transport equation and case comparison of CPU(C) and GPGPU(CUDA) implementations

Finite element assembly strategies on multi-and many-core architectures

Finite Element Integration on GPUs

Finite Element Integration with Quadrature on the GPU

Finite Element Matrix Generation on a GPU

Finite Element Modelling of Prostate Deformation and Needle-Tissue Interactions

Finite element numerical integration for first order approximations on multi-core architectures

Finite Element Numerical Integration on Xeon Phi coprocessor

Finite Pointset Method for 2D Dam-Break Problem with GPU-Acceleration

Finite temperature lattice QCD with GPUs

Finite-difference time-domain simulations of metamaterials

Finite-difference time-domain solver for room acoustics using graphics processing units

Finite-size scaling method for the Berezinskii-Kosterlitz-Thouless transition

FIR filtering and AES encryption with OpenCL 2.0

Fireflies: New software for interactively exploring dynamical systems using GPU computing

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs

Firepile: Run-time Compilation for GPUs in Scala

First Application of Lattice QCD to Pezy-SC Processor

First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

First Experiences Optimizing Smith-Waterman on Intel’s Knights Landing Processor

First experiences with the Intel MIC architecture at LRZ

First Steps Towards More Numerical Reproducibility

Fitting multi-planet transit models to photometric time-data series by evolution strategies

Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs

FLASH: Fast All-to-All Communication in GPU Clusters

FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

Flashlight: Enabling Innovation in Tools for Machine Learning

FlexGrip: A Soft GPGPU for FPGAs

Flexible FPGA design for FDTD using OpenCL

Flexible Hardware Mapping for Finite Element Simulations on Hybrid CPU / GPU Clusters

Flexible Linear Algebra Development and Scheduling with Cholesky Factorization

Flexible N-Way MIMO Detector on GPU

Flexible neuronal network simulation framework using code generation for NVidia CUDA

Flexible OpenCL accelerated disparity estimation for video communication applications
Flexible Performant GEMM Kernels on GPUs

Flexible Pixel Compositor for Plug-and-Play Multi-Projector Displays

Flexible Software Profiling of GPU Architectures

Flexible, Fast and Accurate Sequence Alignment Profiling on GPGPU with PaSWAS

Flexible, high performance convolutional neural networks for image classification

FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System

FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous Computing

Flip-Flop: Convex Hull Construction via Star-Shaped Polyhedron in 3D

Floating Point Arithmetic for Transport Triggered Architectures

Floating-Point Arithmetic in Transport Triggered Architectures

Floating-point data compression at 75 Gb/s on a GPU

Floating-point Mixed-radix FFT Core Generation for FPGA and Comparison with GPU and CPU

Flocking Implementation for the Blender Game Engine

Flow Charts: Visualization of Vector Fields on Arbitrary Surfaces

FLOWER: A Comprehensive Dataflow Compiler for High-Level Synthesis

FlowPM: Distributed TensorFlow Implementation of the FastPM Cosmological N-body Solver

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

FlowTour: An Automatic Guide for Exploring Internal Flow Features

Fluid Dynamics Simulations on Multi-GPU Systems

Fluid Motion Modelling Using Vortex Particle Method on GPU

Fluid Simulation and Generating Textures with Reaction-Diffusion Systems on Surfaces in the GPU

Fluid Simulation by the Smoothed Particle Hydrodynamics Method: A Survey

Fluid Simulation on Surfaces in the GPU

Fluid simulation with SIMPLE method using graphic processors

Fluid Simulation: Smoothed Particle Hydrodynamics on the GPU

Fluid-solid coupling on a cluster of GPU graphics cards for seismic wave propagation

FluidFFT: common API (C++ and Python) for Fast Fourier Transform HPC libraries

FluoroSim: A Visual Problem-Solving Environment for Fluorescence Microscopy

Flux tubes at Finite Temperature

FMM-based vortex method for simulation of isotropic turbulence on GPUs, compared with a spectral method

fMRI analysis on the GPU-possibilities and challenges

Focus measurement on programmable graphics hardware for all in-focus rendering from light fields

Focused Volumetric Visual Hull with Color Extraction

Forecasting high frequency financial time series using parallel FFN with CUDA and ZeroMQ

Forecasting time series with constraints

Forensics on GPU Coprocessing in Databases – Research Challenges, First Experiments, and Countermeasures

Formal Analysis of GPU Programs with Atomics via Conflict-Directed Delay-Bounding

Formal Description and Optimization Based High – Performance Computing on CUDA

Formal Semantics of Heterogeneous CUDA-C: A Modular Approach with Applications

Formal specification and verification of OpenCL Kernel optimization

Formalizing Address Spaces with application to Cuda, OpenCL, and beyond

ForOpenCL: Transformations Exploiting Array Syntax in Fortran for Accelerator Programming

Fortran High-Level Synthesis: Reducing the barriers to accelerating HPC codes on FPGAs

Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in Flang

FortranX: Harnessing Code Generation, Portability, and Heterogeneity in Fortran

Four styles of parallel and net programming
Four-dimensional Cone Beam CT Reconstruction and Enhancement using a Temporal Non-Local Means Method

Fourier Volume Rendering on the GPU Using a Split-Stream-FFT

FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error

FPGA accelerated 3D reconstruction using compressive sensing

Titles: 100
open PDFs: 98
packages: 21
