Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 480, Operating systems, Package
Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 480, Operating systems, Package, Task scheduling
Tags: Algorithms, Benchmarking, Computer science, nVidia, nVidia Quadro FX 3800, OpenCL, OpenMP, Operating systems, Performance
Tags: Computer science, CUDA, Heterogeneous systems, nVidia, Operating systems, Software Engineering, Tesla C2070
Tags: ATI, ATI Radeon HD 6970, Computer science, Heterogeneous systems, MPI, nVidia, nVidia GeForce GTX 480, OpenCL, Operating systems, Package
Tags: AES, ATI, ATI Radeon HD 6750 M, Computer science, nVidia, nVidia GeForce GTX 580, nVidia GeForce GTX 590, OpenCL, Operating systems, Security, Tesla M2070
Tags: Computer science, CUDA, Energy-efficient computing, nVidia, nVidia GeForce GTX 210, nVidia GeForce GTX 280, nVidia GeForce GTX 570, Operating systems, Task scheduling
Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 480, Operating systems, Package
Tags: Cloud, Computer science, GPU cluster, nVidia, nVidia GeForce GTX 480, OpenCL, Operating systems, Package
Tags: ATI, ATI Radeon HD 5870, Computer science, Heterogeneous systems, OpenCL, Operating systems, Package
Recent source codes
Most viewed papers (last 30 days)
- Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study
- AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
- LLMQ: Efficient Lower-Precision LLM Training for Consumer GPUs
- CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe
- DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation
- MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?
- Mixed-precision numerics in scientific applications: survey and perspectives
- Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context
- SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits
- MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU




