high performance computing on graphics processing units: hgpu.org

hgpu.org » Vector Machine

MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs

Zeyi Wen, Rui Zhang, Kotagiri Ramamohanarao, Jianzhong Qi, Kerry Taylor

View

Tags: Algorithms, Computer science, CUDA, nVidia, nVidia GeForce GTX 460, Vector Machine

September 25, 2014 by hgpu

MIC-SVM: Designing A Highly Efficient Support Vector Machine For Advanced Modern Multi-Core and Many-Core Architectures

Yang You, Shuaiwen Leon Song, Haohuan Fu, Andres Marquez, Guangwen Yang, Kevin Barker, Kirk W. Cameron, Maryam Mehri Dehnavi, Amanda Peters Randles

View

Tags: Computer science, CUDA, Intel Xeon Phi, Machine learning, nVidia, Tesla K20, Vector Machine

June 20, 2014 by hgpu

Luthier: Bridging Auto-Tuning and Vendor Libraries for Efficient Deep Learning Inference

Luthier: Bridging Auto-Tuning and Vendor Libraries for Efficient Deep Learning Inference

Fused Kernel Library (FKL)

The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries

GPUHammer: Rowhammer Attacks on GPU Memories are Practical

GPUHammer: Rowhammer Attacks on GPU Memories are Practical

Block: Balance Loader of LLM Serving with Context, Knowledge and Predictive Scheduling

Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling

SIGMo: Scalable Isomorphism Graph Matching on GPUs

SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching

DGEMM without FP64 Arithmetic - using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme

DGEMM without FP64 Arithmetic – using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme

GEAK-agent: LLM-based AI agent, which can write correct and efficient GPU kernels automatically

Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

OpenDwarfs 2025: re-engineered version of the OpenDwarfs benchmark suite, for compatibility with modern platforms

OpenDwarfs 2025: Modernizing the OpenDwarfs Benchmark Suite for Heterogeneous Computing

Specx: Speculative task-based runtime system

Specx: a C++ task-based runtime system for heterogeneous distributed architectures

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

See all packages

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us:

contact@hpgu.org