hgpu.org » Operating systems
Ardalan Amiri Sani, Lin Zhong, Dan S. Wallach
November 18, 2014 by hgpu
Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, Kenji Kono
July 4, 2014 by hgpu
Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil
Tags: Computer science, CUDA, nVidia, Operating systems, Performance, Tesla K20
June 25, 2014 by hgpu
Mario Kicherer, Wolfgang Karl
Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 275, nVidia GeForce GTX 560 Ti, Operating systems
May 17, 2014 by hgpu
Samaneh Kazemi, Rohan Garg, Gene Cooperman
December 24, 2013 by hgpu
Martin Krulis, Zbynek Falt, David Bednarek, Jakub Yaghob
Tags: Computer science, Heterogeneous systems, nVidia, nVidia GeForce GTX 580, OpenCL, Operating systems, Task scheduling, Tesla M2090
June 2, 2013 by hgpu
Mark Silberstein, Bryan Ford, Idit Keidar, Emmett Witchel
Tags: Computer science, CUDA, nVidia, Operating systems, Tesla C2075
January 26, 2013 by hgpu
Shinpei Kato
Tags: Computer science, CUDA, nVidia, Operating systems, Package
January 23, 2013 by hgpu
Liberios Vokorokos, Anton Balaz, Branislav Mados
January 12, 2013 by hgpu
Peter Fodrek, Tomas Murgas, Michal Blaho
Tags: Algorithms, Computer science, nVidia, OpenCL, Operating systems
December 1, 2012 by hgpu
Flavio Vella, Igor Neri, Osvaldo Gervasi, Sergio Tasso
September 17, 2012 by hgpu
Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee
Tags: Code generation, Computer science, GPU cluster, Heterogeneous systems, MPI, nVidia, nVidia GeForce GTX 480, OpenCL, Operating systems, Package, Programming Languages, Programming techniques
July 26, 2012 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study
- AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
- LLMQ: Efficient Lower-Precision LLM Training for Consumer GPUs
- CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe
- DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation
- MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?
- Mixed-precision numerics in scientific applications: survey and perspectives
- Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context
- SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits
- MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU
* * *




