hgpu.org » Operating systems
Ardalan Amiri Sani, Lin Zhong, Dan S. Wallach
November 18, 2014 by hgpu
Yusuke Suzuki, Shinpei Kato, Hiroshi Yamada, Kenji Kono
July 4, 2014 by hgpu
Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil
Tags: Computer science, CUDA, nVidia, Operating systems, Performance, Tesla K20
June 25, 2014 by hgpu
Mario Kicherer, Wolfgang Karl
Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 275, nVidia GeForce GTX 560 Ti, Operating systems
May 17, 2014 by hgpu
Samaneh Kazemi, Rohan Garg, Gene Cooperman
December 24, 2013 by hgpu
Martin Krulis, Zbynek Falt, David Bednarek, Jakub Yaghob
Tags: Computer science, Heterogeneous systems, nVidia, nVidia GeForce GTX 580, OpenCL, Operating systems, Task scheduling, Tesla M2090
June 2, 2013 by hgpu
Mark Silberstein, Bryan Ford, Idit Keidar, Emmett Witchel
Tags: Computer science, CUDA, nVidia, Operating systems, Tesla C2075
January 26, 2013 by hgpu
Shinpei Kato
Tags: Computer science, CUDA, nVidia, Operating systems, Package
January 23, 2013 by hgpu
Liberios Vokorokos, Anton Balaz, Branislav Mados
January 12, 2013 by hgpu
Peter Fodrek, Tomas Murgas, Michal Blaho
Tags: Algorithms, Computer science, nVidia, OpenCL, Operating systems
December 1, 2012 by hgpu
Flavio Vella, Igor Neri, Osvaldo Gervasi, Sergio Tasso
September 17, 2012 by hgpu
Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee
Tags: Code generation, Computer science, GPU cluster, Heterogeneous systems, MPI, nVidia, nVidia GeForce GTX 480, OpenCL, Operating systems, Package, Programming Languages, Programming techniques
July 26, 2012 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
- Accurate Models of NVIDIA Tensor Cores
- TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization
- PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations
- cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution
- Tilus: A Tile-Level GPGPU Programming Language for Low-Precision Computation
- Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation
- Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters
- BoltzGen:Toward Universal Binder Design
- AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization
* * *




