hgpu.org » Operating systems
Ardalan Amiri Sani, Lin Zhong, Dan S. Wallach
November 18, 2014 by hgpu
Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, Kenji Kono
July 4, 2014 by hgpu
Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil
Tags: Computer science, CUDA, nVidia, Operating systems, Performance, Tesla K20
June 25, 2014 by hgpu
Mario Kicherer, Wolfgang Karl
Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 275, nVidia GeForce GTX 560 Ti, Operating systems
May 17, 2014 by hgpu
Samaneh Kazemi, Rohan Garg, Gene Cooperman
December 24, 2013 by hgpu
Martin Krulis, Zbynek Falt, David Bednarek, Jakub Yaghob
Tags: Computer science, Heterogeneous systems, nVidia, nVidia GeForce GTX 580, OpenCL, Operating systems, Task scheduling, Tesla M2090
June 2, 2013 by hgpu
Mark Silberstein, Bryan Ford, Idit Keidar, Emmett Witchel
Tags: Computer science, CUDA, nVidia, Operating systems, Tesla C2075
January 26, 2013 by hgpu
Shinpei Kato
Tags: Computer science, CUDA, nVidia, Operating systems, Package
January 23, 2013 by hgpu
Liberios Vokorokos, Anton Balaz, Branislav Mados
January 12, 2013 by hgpu
Peter Fodrek, Tomas Murgas, Michal Blaho
Tags: Algorithms, Computer science, nVidia, OpenCL, Operating systems
December 1, 2012 by hgpu
Flavio Vella, Igor Neri, Osvaldo Gervasi, Sergio Tasso
September 17, 2012 by hgpu
Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee
Tags: Code generation, Computer science, GPU cluster, Heterogeneous systems, MPI, nVidia, nVidia GeForce GTX 480, OpenCL, Operating systems, Package, Programming Languages, Programming techniques
July 26, 2012 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- Performance Portable Gradient Computations Using Source Transformation
- ConTraPh: Contrastive Learning for Parallelization and Performance Optimization
- Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling
- Understanding the Landscape of Ampere GPU Memory Errors
- Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
- SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching
- GBOTuner: Autotuning of OpenMP Parallel Codes with Bayesian Optimization and Code Representation Transfer Learning
- DGEMM without FP64 Arithmetic - using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme
- Luthier: Bridging Auto-Tuning and Vendor Libraries for Efficient Deep Learning Inference
- OpenDwarfs 2025: Modernizing the OpenDwarfs Benchmark Suite for Heterogeneous Computing
* * *