hgpu.org » Local memory
Jianbin Fang, Henk Sips, Ana Lucia Varbanescu
Tags: ATI, ATI Radeon HD 7970, Computer science, Intel Xeon Phi, Local memory, nVidia, OpenCL, Performance, Portability, Tesla C1060, Tesla C2050, Tesla K20
July 29, 2014 by jfang
Jianbin Fang, Henk Sips, Pekka Jaaskelainen, Ana Lucia Varbanescu
Tags: ATI, ATI Radeon HD 7970, Intel Xeon Phi, Local memory, nVidia, OpenCL, Reverse Engineering, Tesla C2050, Tesla K20
June 16, 2014 by jfang
Recent source codes
* * *
Most viewed papers (last 30 days)
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
- Accurate Models of NVIDIA Tensor Cores
- TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization
- PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations
- cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution
- Tilus: A Tile-Level GPGPU Programming Language for Low-Precision Computation
- Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation
- Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters
- AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization
- BoltzGen:Toward Universal Binder Design
* * *




