high performance computing on graphics processing units: hgpu.org

hgpu.org » Algorithms

HPC acceleration of large (min, +) matrix products to compute domination-type parameters in graphs

E.M. Garzón, J.A. Martínez, J.J. Moreno, M.L. Puertas

View

Download (PDF)

Source codes

Tags: Algorithms, Computer science, CUDA, Graph theory, HPC, nVidia, OpenMP, Package, Tesla V100

September 29, 2024 by hgpu

Supercharging Federated Learning with Flower and NVIDIA FLARE

Holger R. Roth, Daniel J. Beutel, Yan Cheng, Javier Fernandez Marques, Heng Pan, Chester Chen, Zhihong Zhang, Yuhong Wen, Sean Yang, Isaac (Te-Chung)Yang, Yuan-Ting Hsieh, Ziyue Xu, Daguang Xu, Nicholas D. Lane, Andrew Feng

View

Download (PDF)

Source codes

Tags: AI, Algorithms, Computer science, nVidia, Package, Security

July 7, 2024 by hgpu

GPU Implementations for Midsize Integer Addition and Multiplication

Cosmin E. Oancea, Stephen M. Watt

View

Download (PDF)

Tags: Algorithms, Computer science, CUDA, nVidia, nVidia A100, Performance, Programming Languages

May 26, 2024 by hgpu

Fast Truncated SVD of Sparse and Dense Matrices on Graphics Processors

Andres E. Tomas, Enrique S. Quintana-Orti, Hartwig Anzt

View

Download (PDF)

Tags: Algorithms, Computer science, CUDA, Linear Algebra, nVidia, nVidia A100

March 18, 2024 by hgpu

A Review of the Parallelization Strategies for Iterative Algorithms

Xingxing Zhou, Ming Ling, Shidi Tang, YanXiang Zhu

View

Download (PDF)

Tags: Algorithms, Computational Complexity, Computer science, CUDA, Data parallelism, OpenCL

December 3, 2023 by hgpu

RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs

Benjamin Brock, Aydın Buluç, Katherine Yelick

View

Download (PDF)

Tags: Algorithms, Computer science, CUDA, Matrix multiplication, MPI, nVidia, nVidia DGX-2, Sparse matrix, Tesla V100

December 3, 2023 by hgpu

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

Hyun Park, Parth Patel, Roland Haas, E. A. Huerta

View

Download (PDF)

Source codes

Tags: AI, Algorithms, Biology, Biomolecules, Biophysics, nVidia, nVidia A100, nVidia A40, Package

August 20, 2023 by hgpu

Fast Knowledge Graph Completion using Graphics Processing Units

Chun-Hee Lee, Dong-oh Kang, Hwa Jeon Song

View

Download (PDF)

Tags: AI, Algorithms, Computer science, CUDA, Databases, Graph theory, nVidia, nVidia A100

July 30, 2023 by hgpu

Tile-based Lightweight Integer Compression in GPU

Anil Shanbhag, Bobbi W. Yogatama, Xiangyao Yu, Samuel Madden

View

Download (PDF)

Source codes

Tags: Algorithms, Compression, Computer science, CUDA, nVidia, nVidia V100, Package

July 16, 2023 by hgpu

Matrix Multiplication Using Only Addition

Daniel Cussen, Jeffrey D. Ullman

View

Download (PDF)

Tags: Algorithms, Computer science, Matrix multiplication

July 9, 2023 by hgpu

cuSLINK: Single-linkage Agglomerative Clustering on the GPU

Corey J. Nolet, Divye Gala, Alex Fender, Mahesh Doijade, Joe Eaton, Edward Raff, John Zedlewski, Brad Rees, Tim Oates

View

Download (PDF)

Source codes

Tags: Algorithms, Cluster analysis, Clustering, Computer science, CUDA, Hierarchical clustering, Machine learning, Nearest neighbour, nVidia, nVidia A100, nVidia DGX-1, Package

July 2, 2023 by hgpu

ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

Kazuaki Matsumura, Simon Garcia De Gonzalo, Antonio J. Peña

View

Download (PDF)

Tags: Algorithms, Benchmarking, Code generation, Compilers, Computer science, nVidia, OpenACC, OpenMP, Tesla A100

June 25, 2023 by hgpu

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

No More Shading Languages: Compiling C++ to Vulkan Shaders

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

HPC acceleration of large (min, +) matrix products to compute domination-type parameters in graphs

Supercharging Federated Learning with Flower and NVIDIA FLARE

GPU Implementations for Midsize Integer Addition and Multiplication

Fast Truncated SVD of Sparse and Dense Matrices on Graphics Processors

A Review of the Parallelization Strategies for Iterative Algorithms

RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

Fast Knowledge Graph Completion using Graphics Processing Units

Tile-based Lightweight Integer Compression in GPU

Matrix Multiplication Using Only Addition

cuSLINK: Single-linkage Agglomerative Clustering on the GPU

ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)