Sorry, but there aren't any posts in the Downloads category yet.
These might be of interest though...
- InteropUnityCUDA: A Tool for Interoperability Between Unity and CUDA
- MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications
- DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training
- LithOS: An Operating System for Efficient Machine Learning on GPUs
- Data-efficient LLM Fine-tuning for Code Generation
- Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework
- Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs
- A Power-Efficient Scheduling Approach in a Cpu-Gpu Computing System by Thread-Based Parallel Programming
- GPU-centric Communication Schemes for HPC and ML Applications
- GigaAPI for GPU Parallelization
- Advances in Semantic Patching for HPC-oriented Refactorings with Coccinelle
- TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives
- Hardware-Assisted Software Testing and Debugging for Heterogeneous Computing
- Efficient allocation of image recognition and LLM tasks on multi-GPU system
- PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch