Papers on hgpu.org (.txt-file)
Coprocessor Computing with FPGA and GPU
CoreTSAR: Task Scheduling for Accelerator-aware Runtimes

Correctly rounding elementary functions on GPU

Correctly Treating Synchronizations in Compiling Fine-Grained SPMD-Threaded Programs for CPU

Correlating Radio Astronomy Signals with Many-Core Hardware

Correlation analysis on GPU systems using NVIDIA’s CUDA

Cortical architectures on a GPGPU

CosmoFlow: Using Deep Learning to Learn the Universe at Scale

Cosmological Calculations on the GPU

Cost Efficient PageRank Computation using GPU

Cost-aware function migration in heterogeneous systems
Cost-effective low-power graphics processing unit for handheld devices

Cost-effective medical image reconstruction: from clusters to graphics processing units
Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality

Cost-Effective Soft-Error Protection for SRAM-Based Structures in GPGPUs

Cost-Performance Analysis: A Comparative Study of CPU-Based Serverless and GPU-Based Training Architectures

COTS cluster-based sort-last rendering: performance evaluation and pipelined implementation

Coulomb and Landau Gauge Fixing in GPUs using CUDA and MILC

Coulomb, Landau and Maximally Abelian Gauge Fixing in Lattice QCD with Multi-GPUs

Counting and Occurrence Sort for GPUs using an Embedded Language

Counting Triangles in Large Graphs on GPU

Coupled Vlasov and two-fluid codes on GPUs

Coupler Design and Optimization by GPU-Accelerated DG-FEM

Coupling a Generalized DEM and an SPH Models Under a Heterogeneous Massively Parallel Framework

Coupling between Meshless FEM Modeling and Rendering on GPU for Real-time Physically-based Volumetric Deformation

Coupling Lattice Boltzmann Gas and Level Set Method for Simulating Free Surface Flow in GPU/CUDA Environment

COVRA: A compression-domain output-sensitive volume rendering architecture based on a sparse representation of voxel blocks

COX: CUDA on X86 by Exposing Warp-Level Functions to CPUs

COX: Exposing CUDA Warp-Level Functions to CPUs

cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

Cpp-Taskflow: A General-purpose Parallel and Heterogeneous Task Programming System at Scale

CPPJoules: An Energy Measurement Tool for C++

CPU and GPU Co-processing for Sound

CPU and GPU Implementation of QCD by using OpenCL

CPU and/or GPU: Revisiting the GPU Vs. CPU Myth

CPU-GPU Algorithms for Triangular Surface Mesh Simplification

CPU-GPU co-execution through the exploitation of hybrid technologies via SYCL

CPU-GPU Collaboration for Output Quality Monitoring

CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction application

CPU-GPU Hybrid Parallel Binomial American Option Pricing

CPU-GPU Layer-Switched Low Latency CNN Inference

CPU, GPU and FPGA Implementations of MALD: Ceramic Tile Surface Defects Detection Algorithm

CPU, SMP and GPU implementations of Nohalo level 1, a fast co-convex antialiasing image resampler

CPU/GPGPU/HW comparison of an Eigenfaces face recognition system

CPU/GPU Code Acceleration on Heterogeneous Systems and Code Verification for CFD Applications

CPU/GPU computing for long-wave radiation physics on large GPU clusters

CPUless PCs inside networked control systems

CRAC: Checkpoint-Restart Architecture for CUDA with Streams and UVM

Crack-free rendering of dynamically tesselated B-Rep models

Cracks in the Sky: Abelian-Higgs Cosmic String Evolution with CUDA

Cramming: Training a Language Model on a Single GPU in One Day

Crane – Fast and Migratable GPU Passthrough for OpenCL applications

Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C+

Creating a Dataset Supporting Translation Between OpenMP Fortran and C++ Code

Creating HW/SW co-designed MPSoPC’s from high level programming models

Creating Optimal Code for GPU-Accelerated CT Reconstruction Using Ant Colony Optimization

Creation and control of rain in virtual environments
CRINK: Automatic CUDA code generation for affine C programs

Critical Comparison of the Classification Ability of Deep Convolutional Neural Network Frameworks with Support Vector Machine Techniques in the Image Classification Process

Critical Links Detection using CUDA

Criticality of the XY model in complex topologies

CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads

Cropped Quad-Tree Based Solid Object Colouring with CUDA

Cross Teaching Parallelism and Ray Tracing: A Project-based Approach to Teaching Applied Parallel Computing

Cross-Compiling Shading Languages

Cross-Platform OpenCL Code and Performance Portability for CPU and GPU Architectures Investigated with a Climate and Weather Physics Model

Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels

Cross-platform programming model for many-core lattice Boltzmann simulations

CrossTL: A Universal Programming Language Translator with Unified Intermediate Representation

CrowdCL: Web-Based Volunteer Computing with WebCL

CRUM: Checkpoint-Restart Support for CUDA’s Unified Memory

Cryptanalysis of the Full AES Using GPU-Like Special-Purpose Hardware

Cryptanalysis of the McEliece Cryptosystem on GPGPUs

CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU

CryptoGraphics: Secret Key Cryptography Using Graphics Cards

Cryptography on Graphics Processing Unit: A Survey

CrystalGPU: Transparent and Efficient Utilization of GPU Power

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

CST: Constructive Solid Trimming for Rendering BReps and CSG

CT image reconstruction using hexagonal grids

CT image reconstruction with half precision floating-point values

CT to Cone-beam CT Deformable Registration With Simultaneous Intensity Correction

CU2CL: A CUDA-to-OpenCL Translator for Multi-and Many-core Architectures

CU2rCU: A CUDA-to-rCUDA Converter

CuBA – a CUDA implementation of BAMPS

cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU

CUBPT: Lock-free bulk insertions to B+ tree on GPU architecture

cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications

CUD@ASP: Experimenting with GPUs in ASP solving

CUDA 2D Stencil Computations for the Jacobi Method

CUDA Accelerated Entropy Constrained Vector Quantization and Multiple K-Means

CUDA Accelerated Face Recognition Using Local Binary Patterns

CUDA accelerated iris template matching on Graphics Processing Units (GPUs)

CUDA accelerated large scale vehicular area network simulator

CUDA Accelerated LTL Model Checking

CUDA Accelerated Robot Localization and Mapping

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging

CUDA and OpenCL-based asynchronous PSO

Titles: 100
open PDFs: 95
packages: 22
