high performance computing on graphics processing units: hgpu.org

hgpu.org » AMD FireStream 9370

Approaches for the Parallelization of Software Implementation of Integer Multiplication

Vladislav Kovtun, Andrew Okhrimenko

View

Tags: Algorithms, AMD FireStream 9370, ATI, Computational Complexity, Computer science, nVidia, OpenCL, OpenMP, Security, Tesla M2090

August 22, 2012 by hgpu

NVIDIA Nemotron Parse 1.1

NVIDIA Nemotron Parse 1.1

ThunderKittens: Tile primitives for speedy kernels

ParallelKittens: Systematic and Practical Simplification of Multi-GPU AI Kernels

Iris: AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Iris: First-Class Multi-GPU Programming Experience in Triton

HipKittens: Fast and Furious AMD Kernels

HipKittens: Fast and Furious AMD Kernels

Fortran xDSL dialects

An MLIR pipeline for offloading Fortran to FPGAs via OpenMP

mt4g: Memory Topology 4 GPUs

MT4G: A Tool for Reliable Auto-Discovery of NVIDIA and AMD GPU Compute and Memory Topologies

Falcon: GPU-Based Floating-point Adaptive Lossless Compression

A High-Throughput GPU Framework for Adaptive Lossless Compression of Floating-Point Data

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

pplx-garden: Perplexity open source garden for inference technology

RDMA Point-to-Point Communication for LLM Systems

LC Framework

Characterizing the Performance of Parallel Data-Compression Algorithms across Compilers and GPUs

See all packages

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: