Papers on hgpu.org (.txt-file)
Fast, Memory-Efficient Construction of Voxelized Shadows

Fast, parallel and secure cryptography algorithm using Lorenz’s attractor

Fast, parallel implementation of particle filtering on the GPU architecture

Fast, parallel, GPU-based construction of space filling curves and octrees

Fast, Processor-Cardinality Agnostic PRNG with a Tracking Application

Fast, Realistic Terrain Synthesis

FAST: fast architecture sensitive tree search on modern CPUs and GPUs

FastCollect: Offloading Generational Garbage Collection to Integrated GPUs

Faster across the PCIe bus: A GPU library for lightweight decompression

Faster Algorithms for RNA-folding using the Four-Russians method

Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs

Faster Dark Matter Calculations Using the GPU

Faster File Matching using GPGPUs

Faster GPU Based Genetic Programming Using A Two Dimensional Stack

Faster GPU-based convolutional gridding via thread coarsening

Faster Maliciously Secure Two-Party Computation Using the GPU

Faster matrix-vector multiplication on GeForce 8800GTX

Faster Multipattern Matching System on GPU Based on Aho-Corasick Algorithm

Faster Multiple Pattern Matching System on GPU based on Bit-Parallelism

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster Radix Sort via Virtual Memory and Write-Combining

Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space

Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO

Faster Upper Body Pose Estimation and Recognition Using CUDA

Faster Upper Body Pose Estimation Using CUDA

FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours

fastHOG – a real-time GPU implementation of HOG

FastMag: Fast micromagnetic simulator for complex magnetic structures

Fastplay: A Parallelization Model and Implementation of SMC on CUDA Based GPU Cluster Architecture

Fastrack: Fast IO for Secure ML using GPU TEEs

FastSpMM: An Efficient Library for Sparse Matrix Matrix Product on GPUs

FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation

FastTree: A Hardware KD-Tree Construction Acceleration Engine for Real-Time Ray Tracing

Fat versus Thin Threading Approach on GPUs: Application to Stochastic Simulation of Chemical Reactions

Fat vs. Thin Threading Approach on GPUs: Application to Stochastic Simulation of Chemical Reactions

FATSEA-An Architectural Simulator for General Purpose Computing on GPUs

Fault Injection techniques for GPU Reliability Evaluation

Fault Table Computation on GPUs
Fault table generation using Graphics Processing Units

Fault Tree Analysis Speed-up with GPU Parallel Computing

FBLAS: Streaming Linear Algebra Kernels on FPGA

FBLAS: Streaming Linear Algebra on FPGA

FC_ACCEL: Enabling Efficient, Low-Latency and Flexible Inference in DNN Fully Connected Layers, using Optimized Checkerboard Block matrix decomposition, fast scheduling, and a resource efficient 1D PE array with a custom HBM2 memory subsystem

FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs

FDTD calculations using graphical processing units
FDTD on Distributed Heterogeneous Multi-GPU Systems

Feasibility Analysis of Bilateral Filtering by General Purpose Graphical Processing Unit Computing

Feasibility Analysis of Low Cost Graphical Processing Units for Electromagnetic Field Simulations by Finite Difference Time Domain Method

FEAST – Realisation of hardware-oriented Numerics for HPC simulations with Finite Elements

Feature Aligned Volume Manipulation for Illustration and Visualization

Feature based terrain generation using diffusion equation

Feature Extraction and Visualization from Higher-Order CFD Data

Feature Generation for Quantification of Visual Similarity

Feature tracking and matching in video using programmable graphics hardware

Feature Tracking in Time-Varying Volumetric Data through Scale Invariant Feature Transform

Feature-based speed limit sign detection using a graphics processing unit

Feature-preserving triangular geometry images for level-of-detail representation of static and skinned meshes

FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10

FELARE: Fair Scheduling of Machine Learning Applications on Heterogeneous Edge Systems

Ferrofluid Simulations with the Barnes-Hut Algorithm on Graphics Processing Units

Feynman Machine: The Universal Dynamical Systems Computer

FFT and Convolution Performance in Image Filtering on GPU

FFT Implementation on a Streaming Architecture
FFT Parallel Implementation for MRI Image Reconstruction

FFT-SPA Non-Binary LDPC Decoding on GPU

FIELA: A Fast Image Encryption with Lorenz Attractor using Hybrid Computing

Field modelling acceleration on ultrasonic systems using graphic hardware
FIESTA 4: optimized Feynman integral calculations with GPU support

FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification

File I/O on Intel Xeon Phi Coprocessors: RAM disks, VirtIO, NFS and Lustre

Filtered Blending: A new, minimal Reconstruction Filter for Ghosting-Free Projective Texturing with Multiple Images

Final Project Implementing Extremely Randomized Trees in CUDA

Financial Derivatives Modeling Using GPU’s
Financial modeling on the cell broadband engine

Finding Convex Hulls Using Quickhull on the GPU

Finding faint HI structure in and around galaxies: scraping the barrel

Finding Longest Common Subsequences by GPU-Based Parallel Ant Colony Optimization

Finding Missed Code Size Optimizations in Compilers using LLMs

Finding Next Best Views for Autonomous UAV Mapping through GPU-Accelerated Particle Simulation

Finding the Force – Consistent Particle Seeding for Satellite Aerodynamics

Finding, Measuring, and Reducing Inefficiencies in Contemporary Computer Systems

Fine-Grain Acceleration of Graph Algorithms on a Heterogeneous Chip

Fine-grain Parallelism using Multi-core, Cell/BE, and GPU Systems
Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function

Fine-grain Task Aggregation and Coordination on GPUs

Fine-grained Parallel ILU Preconditioners with Fill-ins for Multi-core CPUs and GPUs

Fine-Grained Parallel Incomplete LU Factorization

Fine-grained parallelization of a Vlasov-Poisson application on GPU

Fine-Grained Resource Sharing for Concurrent GPGPU Kernels

Fine-Grained Synchronizations and Dataflow Programming on GPUs

Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation

Fine-Granular Parallel EBCOT and Optimization with CUDA for Digital Cinema Image Compression

Fine-sorting One-dimensional Particle-In-Cell Algorithm with Monte-Carlo Collisions on a Graphics Processing Unit

Fine-Tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors: LU Decomposition of Small Matrices

Fingerprint grid enhancement on GPU

Fingerprint Local Invariant Feature Extraction on GPU with CUDA

Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors

Finite Difference Time-Domain Modelling of Metamaterials: GPU Implementation of Cylindrical Cloak

Titles: 100
open PDFs: 93
packages: 19
