Papers on hgpu.org (.txt-file)
CoreTSAR: Task Scheduling for Accelerator-aware Runtimes
Correctly rounding elementary functions on GPU
Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU
Correlating Radio Astronomy Signals with Many-Core Hardware
Correlation analysis on GPU systems using NVIDIA’s CUDA
Cortical architectures on a GPGPU
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
Cosmological Calculations on the GPU
Cost Efficient PageRank Computation using GPU
Cost-aware function migration in heterogeneous systems
Cost-effective low-power graphics processing unit for handheld devices
Cost-effective medical image reconstruction: from clusters to graphics processing units
Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality
Cost-Effective Soft-Error Protection for SRAM-Based Structures in GPGPUs
COTS cluster-based sort-last rendering: performance evaluation and pipelined implementation
Coulomb and Landau Gauge Fixing in GPUs using CUDA and MILC
Coulomb, Landau and Maximally Abelian Gauge Fixing in Lattice QCD with Multi-GPUs
Counting and Occurrence Sort for GPUs using an Embedded Language
Counting Triangles in Large Graphs on GPU
Coupled Vlasov and two-fluid codes on GPUs
Coupler Design and Optimization by GPU-Accelerated DG-FEM
Coupling a Generalized DEM and an SPH Models Under a Heterogeneous Massively Parallel Framework
Coupling between Meshless FEM Modeling and Rendering on GPU for Real-time Physically-based Volumetric Deformation
Coupling Lattice Boltzmann Gas and Level Set Method for Simulating Free Surface Flow in GPU/CUDA Environment
COVRA: A compression-domain output-sensitive volume rendering architecture based on a sparse representation of voxel blocks
COX: CUDA on X86 by Exposing Warp-Level Functions to CPUs
COX: Exposing CUDA Warp-Level Functions to CPUs
cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications
Cpp-Taskflow: A General-purpose Parallel and Heterogeneous Task Programming System at Scale
CPU and GPU Co-processing for Sound
CPU and GPU Implementation of QCD by using OpenCL
CPU and/or GPU: Revisiting the GPU Vs. CPU Myth
CPU-GPU Algorithms for Triangular Surface Mesh Simplification
CPU-GPU Collaboration for Output Quality Monitoring
CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction application
CPU-GPU Hybrid Parallel Binomial American Option Pricing
CPU-GPU Layer-Switched Low Latency CNN Inference
CPU, GPU and FPGA Implementations of MALD: Ceramic Tile Surface Defects Detection Algorithm
CPU, SMP and GPU implementations of Nohalo level 1, a fast co-convex antialiasing image resampler
CPU/GPGPU/HW comparison of an Eigenfaces face recognition system
CPU/GPU Code Acceleration on Heterogeneous Systems and Code Verification for CFD Applications
CPU/GPU computing for long-wave radiation physics on large GPU clusters
CPUless PCs inside networked control systems
CRAC: Checkpoint-Restart Architecture for CUDA with Streams and UVM
Crack-free rendering of dynamically tesselated B-Rep models
Cracks in the Sky: Abelian-Higgs Cosmic String Evolution with CUDA
Cramming: Training a Language Model on a Single GPU in One Day
Crane – Fast and Migratable GPU Passthrough for OpenCL applications
Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C+
Creating a Dataset Supporting Translation Between OpenMP Fortran and C++ Code
Creating HW/SW co-designed MPSoPC’s from high level programming models
Creating Optimal Code for GPU-Accelerated CT Reconstruction Using Ant Colony Optimization
Creation and control of rain in virtual environments
CRINK: Automatic CUDA code generation for affine C programs
Critical Comparison of the Classification Ability of Deep Convolutional Neural Network Frameworks with Support Vector Machine Techniques in the Image Classification Process
Critical Links Detection using CUDA
Criticality of the XY model in complex topologies
Cropped Quad-Tree Based Solid Object Colouring with CUDA
Cross Teaching Parallelism and Ray Tracing: A Project-based Approach to Teaching Applied Parallel Computing
Cross-Compiling Shading Languages
Cross-Platform OpenCL Code and Performance Portability for CPU and GPU Architectures Investigated with a Climate and Weather Physics Model
Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels
Cross-platform programming model for many-core lattice Boltzmann simulations
CrowdCL: Web-Based Volunteer Computing with WebCL
CRUM: Checkpoint-Restart Support for CUDA’s Unified Memory
Cryptanalysis of the Full AES Using GPU-Like Special-Purpose Hardware
Cryptanalysis of the McEliece Cryptosystem on GPGPUs
CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU
CryptoGraphics: Secret Key Cryptography Using Graphics Cards
Cryptography on Graphics Processing Unit: A Survey
CrystalGPU: Transparent and Efficient Utilization of GPU Power
CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication
CST: Constructive Solid Trimming for Rendering BReps and CSG
CT image reconstruction using hexagonal grids
CT image reconstruction with half precision floating-point values
CT to Cone-beam CT Deformable Registration With Simultaneous Intensity Correction
CU2CL: A CUDA-to-OpenCL Translator for Multi-and Many-core Architectures
CU2rCU: A CUDA-to-rCUDA Converter
CuBA – a CUDA implementation of BAMPS
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU
CUBPT: Lock-free bulk insertions to B+ tree on GPU architecture
cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications
CUD@ASP: Experimenting with GPUs in ASP solving
CUDA 2D Stencil Computations for the Jacobi Method
CUDA Accelerated Entropy Constrained Vector Quantization and Multiple K-Means
CUDA Accelerated Face Recognition Using Local Binary Patterns
CUDA accelerated iris template matching on Graphics Processing Units (GPUs)
CUDA accelerated large scale vehicular area network simulator
CUDA Accelerated LTL Model Checking
CUDA Accelerated Robot Localization and Mapping
CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging
CUDA and OpenCL-based asynchronous PSO
CUDA Application Design and Development
CUDA au Coq: A Framework for Machine-validating GPU Assembly Programs
CUDA Based CAMshift Algorithm for Object Tracking Systems
CUDA Based Enhanced Differential Evolution: a Computational Analysis
CUDA Based Fast Implementation of Very Large Matrix Computation
CUDA Based GPU Programming to Simulate 3D Tissue Deformation
Titles: 100
open PDFs: 94
packages: 19