hgpu.org » Streaming
Mehrzad Samadi, Amir Hormati, Mojtaba Mehrara, Janghaeng Lee, and Scott Mahlke
Tags: Compiler, CUDA, GPU, nVidia, nVidia GeForce GTX 285, Optimization, Portability, Streaming, Tesla C2050
March 31, 2012 by Moaddeli
Recent source codes
* * *
Most viewed papers (last 30 days)
- Fortran High-Level Synthesis: Reducing the barriers to accelerating HPC codes on FPGAs
- PoCL-R: An Open Standard Based Offloading Layer for Heterogeneous Multi-Access Edge Computing with Server Side Scalability
- Compute units in OpenMP: Extensions for heterogeneous parallel programming
- Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation
- Many Cores, Many Models: GPU Programming Model vs. Vendor Compatibility Overview
- Leveraging Memory Copy Overlap for Efficient Sparse Matrix-Vector Multiplication on GPUs
- Scope is all you need: Transforming LLMs for HPC Code
- Novel insights on atomic synchronization for sort-based group-by on GPUs
- Performant low-order matrix-free finite element kernels on GPU architectures
- HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPU
* * *