Papers on hgpu.org (.txt-file)
The 3D Flow Field Around an Embedded Planet
The accelerating implementation of BLAST with stream processor
The Accelerator Wall: Limits of Chip Specialization
The AES Implantation Based on OpenCL for Multi/many Core Architecture
The AGILE library for image reconstruction in biomedical sciences using graphics card hardware acceleration
The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition
The AlexNet Moment for Homomorphic Encryption: HCNN, the First Homomorphic CNN on Encrypted Data with GPUs
The Anatomy of High-Performance 2D Similarity Calculations
The ANTAREX Approach to Autotuning and Adaptivity for Energy Efficient HPC Systems
The ANTAREX Domain Specific Language for High Performance Computing
The Application of AI Technology in GPU Scheduling Algorithm Optimization
The Application of CUDA Architecture in Facial Expression Recognition
The application of GPU particle tracing to diffusion tensor field visualization
The Application Perspective: Seeking Productivity and Performance
The Arcane development framework
The Architecture and Evolution of CPU-GPU Systems for General Purpose Computing
The architecture of the DecentVM: towards a decentralized virtual machine for many-core computing
The Art of Balance: A RateupDB Experience of Building a CPU/GPU Hybrid Database Product
The Astrophysical Multipurpose Software Environment
The battle of the giants: a case study of GPU vs FPGA optimisation for real-time image processing
The BiConjugate gradient method on GPUs
The Boat Hull Model: Adapting the Roofline Model to Enable Performance Prediction for Parallel Computing
The BondMachine toolkit: Enabling Machine Learning on FPGA
The Bones Source-to-Source Compiler Manual
The Case for Higher Computational Density in the Memory-Bound FDTD Method within Multicore Environments
The case for VOS: the vector operating system
The Celerity High-level API: C++20 for Accelerator Clusters
The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units
The Comparisons of OpenCL and OpenMP Computing Paradigm
The Complete Rank Transform: A Tool for Accurate and Morphologically Invariant Matching of Structures
The computer graphics wars heat up
The conjugate gradient solver accelerated by GPU for solving wave-propagation problems
The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding
The CUBLAS and CULA based GPU acceleration of adaptive finite element framework for bioluminescence tomography
The CUDA Handbook: A Comprehensive Guide to GPU Programming
The CUDA implementation of the method of lines for the curvature dependent flows
The CUDA LATCH Binary Descriptor: Because Sometimes Faster Means Better
The DabR – A multitouch system for intuitive 3D scene navigation
The Deep Learning Compiler: A Comprehensive Survey
The density matrix renormalization group algorithm on kilo-processor architectures: implementation and trade-offs
The Design and Implementation of a GPU-enabled Multi-objective Tabu-search Intended for Real World and High-dimensional Applications
The Design and Implementation of a Verification Technique for GPU Kernels
The design and verification of Mumax3
The development and expansion of HOOMD-blue through six years of GPU proliferation
The discrete dipole approximation code DDscat.C++: features, limitations and plans
The distributed diagonal force decomposition method for parallelizing molecular dynamics simulations
The Distribution of OpenCL Kernel Execution Across Multiple Devices
The Dual-Path Execution Model for Efficient GPU Control Flow
The Dynamical Kernel Scheduler – Part 1
The Ecological Footprint of Neural Machine Translation Systems
The effects of nutrient chemotaxis on bacterial aggregation patterns with non-linear degenerate cross diffusion
The Fast and Wideband MoM Based on GPU and Two-Path AFS Acceleration
The fast evaluation of hidden Markov models on GPU
The fast multipole method on parallel clusters, multicore processors, and graphics processing units
The Fast Multipole Method on the Cell processor
The Fat-Link Computation On Large GPU Clusters for Lattice QCD
The Feasibility of Using OpenCL Instead of OpenMP for Parallel CPU Programming
The Flocking Based and GPU Accelerated Internet Traffic Classification
The Framework and Compilation Techniques for Directive-based GPU Cluster Programming
The Future in Mobile Multicore Computing
The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?
The GASPI API specification and its implementation GPI 2.0
The Geant4 Visualisation System – a multi-driver graphics system
The GeForce 6 series GPU architecture
The Genetic Convolutional Neural Network Model Based on Random Sample
The GENGA Code: Gravitational Encounters in N-body simulations with GPU Acceleration
The GPU as a high performance computational resource
The GPU as numerical simulation engine
The GPU Computing Revolution: From Multi-Core CPUs To Many-Core Graphics Processors
The GPU Enhanced Parallel Computing for Large Scale Data Clustering
The GPU enters computing’s mainstream
The GPU on biomedical image processing for color and phenotype analysis
The GPU on irregular computing: performance issues and contributions
The GPU on the simulation of cellular computing models
The GPU vs Phi Debate: Risk Analytics Using Many-Core Computing
The GPU-based High-performance Pattern-matching Algorithm for Intrusion Detection
The GPU-based Parallel Ant Colony System
The GPU-based String Matching System in Advanced AC Algorithm
The gputools package enables GPU computing in R
The GPUVerify Method: a Tutorial Overview
The Graphics Card as a Streaming Computer
The Graphics Processor as a Mathematical Coprocessor in MATLAB
The Heisenberg spin glass model on GPU: myths and actual facts
The Hierarchical Memory Machine Model for GPUs
The Hitchhiker’s Guide to Cross-Platform OpenCL Application Development
The impact of accelerator processors for high-throughput molecular modeling and simulation
The impact of diverse memory architectures on multicore consumer software: an industrial perspective from the video games domain
The Impact of GPU DVFS on the Energy and Performance of Deep Learning: an Empirical Study
The impact of GPU/Multicore in Signal Processing: a quantitative approach
The Impact of Modern Consumer GPUs on Commonly Used Secure Password Standards
The Implement of Common Beam Forming Using GPU
The implementation and optimization of Bitonic sort algorithm based on CUDA
The Implementation of a Real-Time Polyphase Filter
The implementation of Multi-Scale Retinex image enhancement algorithm based on GPU via CUDA
Titles: 100
open PDFs: 87
packages: 18