Papers on hgpu.org (.txt-file)
Feature-based speed limit sign detection using a graphics processing unit
Feature-preserving triangular geometry images for level-of-detail representation of static and skinned meshes
FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10
FELARE: Fair Scheduling of Machine Learning Applications on Heterogeneous Edge Systems
Ferrofluid Simulations with the Barnes-Hut Algorithm on Graphics Processing Units
Feynman Machine: The Universal Dynamical Systems Computer
FFT and Convolution Performance in Image Filtering on GPU
FFT Implementation on a Streaming Architecture
FFT Parallel Implementation for MRI Image Reconstruction
FFT-SPA Non-Binary LDPC Decoding on GPU
FIELA: A Fast Image Encryption with Lorenz Attractor using Hybrid Computing
Field modelling acceleration on ultrasonic systems using graphic hardware
FIESTA 4: optimized Feynman integral calculations with GPU support
FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification
File I/O on Intel Xeon Phi Coprocessors: RAM disks, VirtIO, NFS and Lustre
Filtered Blending: A new, minimal Reconstruction Filter for Ghosting-Free Projective Texturing with Multiple Images
Final Project Implementing Extremely Randomized Trees in CUDA
Financial Derivatives Modeling Using GPU’s
Financial modeling on the cell broadband engine
Finding Convex Hulls Using Quickhull on the GPU
Finding faint HI structure in and around galaxies: scraping the barrel
Finding Longest Common Subsequences by GPU-Based Parallel Ant Colony Optimization
Finding Missed Code Size Optimizations in Compilers using LLMs
Finding Next Best Views for Autonomous UAV Mapping through GPU-Accelerated Particle Simulation
Finding the Force – Consistent Particle Seeding for Satellite Aerodynamics
Finding, Measuring, and Reducing Inefficiencies in Contemporary Computer Systems
Fine-Grain Acceleration of Graph Algorithms on a Heterogeneous Chip
Fine-grain Parallelism using Multi-core, Cell/BE, and GPU Systems
Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function
Fine-grain Task Aggregation and Coordination on GPUs
Fine-grained Parallel ILU Preconditioners with Fill-ins for Multi-core CPUs and GPUs
Fine-Grained Parallel Incomplete LU Factorization
Fine-grained parallelization of a Vlasov-Poisson application on GPU
Fine-Grained Resource Sharing for Concurrent GPGPU Kernels
Fine-Grained Synchronizations and Dataflow Programming on GPUs
Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation
Fine-Granular Parallel EBCOT and Optimization with CUDA for Digital Cinema Image Compression
Fine-sorting One-dimensional Particle-In-Cell Algorithm with Monte-Carlo Collisions on a Graphics Processing Unit
Fine-Tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors: LU Decomposition of Small Matrices
Fingerprint grid enhancement on GPU
Fingerprint Local Invariant Feature Extraction on GPU with CUDA
Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors
Finite Difference Time-Domain Modelling of Metamaterials: GPU Implementation of Cylindrical Cloak
Finite differences numerical method for two-dimensional superlattice Boltzmann transport equation and case comparison of CPU(C) and GPGPU(CUDA) implementations
Finite element assembly strategies on multi-and many-core architectures
Finite Element Integration on GPUs
Finite Element Integration with Quadrature on the GPU
Finite Element Matrix Generation on a GPU
Finite Element Modelling of Prostate Deformation and Needle-Tissue Interactions
Finite element numerical integration for first order approximations on multi-core architectures
Finite Element Numerical Integration on Xeon Phi coprocessor
Finite Pointset Method for 2D Dam-Break Problem with GPU-Acceleration
Finite temperature lattice QCD with GPUs
Finite-difference time-domain simulations of metamaterials
Finite-difference time-domain solver for room acoustics using graphics processing units
Finite-size scaling method for the Berezinskii-Kosterlitz-Thouless transition
FIR filtering and AES encryption with OpenCL 2.0
Fireflies: New software for interactively exploring dynamical systems using GPU computing
Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs
Firepile: Run-time Compilation for GPUs in Scala
First Application of Lattice QCD to Pezy-SC Processor
First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC
First Experiences Optimizing Smith-Waterman on Intel’s Knights Landing Processor
First experiences with the Intel MIC architecture at LRZ
First Steps Towards More Numerical Reproducibility
Fitting multi-planet transit models to photometric time-data series by evolution strategies
Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs
FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
Flashlight: Enabling Innovation in Tools for Machine Learning
FlexGrip: A Soft GPGPU for FPGAs
Flexible FPGA design for FDTD using OpenCL
Flexible Hardware Mapping for Finite Element Simulations on Hybrid CPU / GPU Clusters
Flexible Linear Algebra Development and Scheduling with Cholesky Factorization
Flexible N-Way MIMO Detector on GPU
Flexible neuronal network simulation framework using code generation for NVidia CUDA
Flexible OpenCL accelerated disparity estimation for video communication applications
Flexible Performant GEMM Kernels on GPUs
Flexible Pixel Compositor for Plug-and-Play Multi-Projector Displays
Flexible Software Profiling of GPU Architectures
Flexible, Fast and Accurate Sequence Alignment Profiling on GPGPU with PaSWAS
Flexible, high performance convolutional neural networks for image classification
FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System
FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous Computing
Flip-Flop: Convex Hull Construction via Star-Shaped Polyhedron in 3D
Floating Point Arithmetic for Transport Triggered Architectures
Floating-Point Arithmetic in Transport Triggered Architectures
Floating-point data compression at 75 Gb/s on a GPU
Floating-point Mixed-radix FFT Core Generation for FPGA and Comparison with GPU and CPU
Flocking Implementation for the Blender Game Engine
Flow Charts: Visualization of Vector Fields on Arbitrary Surfaces
FLOWER: A Comprehensive Dataflow Compiler for High-Level Synthesis
FlowPM: Distributed TensorFlow Implementation of the FastPM Cosmological N-body Solver
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow
FlowTour: An Automatic Guide for Exploring Internal Flow Features
Fluid Dynamics Simulations on Multi-GPU Systems
Fluid Motion Modelling Using Vortex Particle Method on GPU
Titles: 100
open PDFs: 94
packages: 17