hgpu.org » Streaming
Mehrzad Samadi, Amir Hormati, Mojtaba Mehrara, Janghaeng Lee, and Scott Mahlke
Tags: Compiler, CUDA, GPU, nVidia, nVidia GeForce GTX 285, Optimization, Portability, Streaming, Tesla C2050
March 31, 2012 by Moaddeli
Recent source codes
* * *
Most viewed papers (last 30 days)
- A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading
- KernelBench: Can LLMs Write Efficient GPU Kernels?
- The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition
- Seamless acceleration of Fortran intrinsics via AMD AI engines
- pyATF: Constraint-Based Auto-Tuning in Python
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
- WgPy: GPU-accelerated NumPy-like array library for web browsers
- Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment
- Forecasting time series with constraints
- CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads
* * *