Papers on hgpu.org (.txt-file)
Coarse grain parallelization of evolutionary algorithms on GPGPU cards with EASEA
Coating Process Monitoring Using Computer Vision
CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning
Code Generation Compiler for the OpenMP 4.0 Accelerator Model onto OMPSS
Code Generation for a Variety of Accelerators for a Graph DSL
Code Generation for Embedded Heterogeneous Architectures on Android
Code Generation for High-Level Synthesis of Multiresolution Applications on FPGAs
Code Generation from Functional to Imperative: Combining Destination-Passing Style and Views
Code Optimization and Performance Analysis of Oceanographic Software Package NEMO for GPGPU Systems
Code Optimization and Scaling of the Astrophysics Software Gadget on Intel Xeon Phi
Code optimization based on source to source transformations using profile guided metrics
Code Optimization on Kepler GPUs and Xeon Phi
Code Optimization Techniques for Graphics Processing Units
Code Refinement of Stencil Codes
Coding Ants: Using Ant Colony Optimization to Accelerate CT Reconstruction
CoDL: Efficient CPU-GPU Co-execution for Deep Learning Inference on Mobile Devices
Cofactorization on Graphics Processing Units
COFFEE: an Optimizing Compiler for Finite Element Local Assembly
Cognitive radio network for the smart grid: Experimental system architecture, control algorithms, security, and microgrid testbed
Coherence aware GPU-based ray casting for virtual colonoscopy
Coherent Photon Mapping on the Intel MIC Architecture
Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos
Coherent transport by adiabatic passage on atom chips
Collaborative design and optimization using Collective Knowledge
Collaborative Diffusion on the GPU for Path-Finding in Games
Collaborative diffusion: programming antiobjects
Collaborative execution environment for heterogeneous parallel systems
Collage: Automated Integration of Deep Learning Backends
Collection skeletons: declarative abstractions for data collections
Collision Detection Based on Fuzzy Scene Subdivision
Collision Detection of Triangle Meshes using GPU
Collision detection on the GPU
Collision Detection: Broad Phase Adaptation from Multi-Core to Multi-GPU Architecture
Collision for 75-step SHA-1: Intensive Parallelization with GPU
Collision-Driven Volumetric Deformation on the GPU
Collision-streams: fast GPU-based collision detection for deformable models
Color and motion-based particle filter target tracking in a network of overlapping cameras with multi-threading and GPGPU
Color Correction Acceleration Using a Color Cube and OpenCL
Color Me Noisy: Example-based Rendering of Hand-colored Animations with Temporal Noise Control
Color Seamlessness in Multi-Projector Displays Using Constrained Gamut Morphing
Colored stochastic shadow maps
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
Colour flux-tubes in static Pentaquark and Tetraquark systems
Combinatorial Optimization of Work Distribution on Heterogeneous Systems
Combined acoustic and optical trapping
Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL
Combining approximate inference methods for efficient learning on large computer clusters
Combining Belief Propagation and Successive Cancellation List Decoding of Polar Codes on a GPU Platform
Combining computer vision and physics simulations using GPGPU
Combining Data Parallelism and Task Parallelism for Efficient Performance on Hybrid CPU and GPU Systems
Combining Multiple Optimised FPGA-based Pulsar Search Modules Using OpenCL
Combining recent HPC techniques for 3D geophysics acceleration
Combustion Simulations Using Graphic Processing Units
Coming Soon: Research in a Cloud
Communication and Coordination Paradigms for Highly-Parallel Accelerators
Communication Architectures for Scalable GPU-centric Computing Systems
Communication Optimization for Multi GPU Implementation of Smith-Waterman Algorithm
Communication-Avoiding Optimization of Geometric Multigrid on GPUs
Communication-avoiding QR decomposition for GPUs
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Communication-Minimizing 2D Convolution in GPU Registers
Communication-minimizing Asynchronous Tensor Parallelism
Community Structure Discovery algorithm on GPU with CUDA
Compact data structure and scalable algorithms for the sparse grid technique
Comparative Analysis of OpenACC, OpenMP and CUDA using Sequential and Parallel Algorithms
Comparative Evaluation of Binary Features
Comparative evaluation of platforms for parallel Ant Colony Optimization
Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU
Comparative Performance and Scalability Analysis of GPU-accelerated Database Operations
Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning
Comparative Study of Frequent Itemset Mining Techniques on Graphics Processor
Comparative Study of High Performance Computing Using Multi-core Parallel Systems
Comparative study of parallel programming models for multicore computing
Comparative Study of the Parallelization of the Smith-Waterman Algorithm on OpenMP and Cuda C
Comparing CUDA and OpenGL implementations for a Jacobi iteration
Comparing CUDA, OpenCL and OpenGL Implementations of the Cardiac Monodomain Equations
Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels
Comparing FPGAs to Graphics Accelerators and the Playstation 2 Using a Unified Source Description
Comparing GPU and CPU in OLAP Cubes Creation
Comparing GPU-based multi-volume ray casting techniques
Comparing Hardware Accelerators in Scientific Applications: A Case Study
Comparing Intra- and Inter-Processor Parallelism on Multi-Core Cell Processors for Scientific Simulations
Comparing Linear and Convex Relaxations for Stereo and Motion
Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation
Comparing Many-Core Accelerator Frameworks
Comparing Parallel Hardware Architectures for Visually Guided Robot Navigation
Comparing Parallel Simulation of Social Agents using Cilk and OpenCL
Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing
Comparing Performance and Portability between CUDA and SYCL for Protein Database Search on NVIDIA, AMD, and Intel GPUs
Comparing Programmer Productivity in OpenACC and CUDA: an Empirical Investigation
Comparing SYCL data transfer strategies for tracking use cases
Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips
Comparing the Power and Performance of Intel’s SCC to State-of-the-Art CPUs and GPUs
Comparing the Treecode with FMM on GPUs for vortex particle simulations of a leapfrogging vortex ring
Comparing Two Generations of Embedded GPUs Running a Feature Detection Algorithm
Comparison and Analysis of GPGPU and Parallel Computing on Multi-Core CPU
Comparison and Analysis of GPU Energy Effciency For CUDA and OpenCL
Comparison and Analysis of GPU Energy Efficiency For CUDA and OpenCL
Titles: 100
open PDFs: 94
packages: 12