high performance computing on graphics processing units: hgpu.org

Programming

Algorithms (3604)

ATI Stream (33)

hgpu.org » Programming » CUDA

Event-Based OpenMP Tasks for Time-Sensitive GPU-Accelerated Systems

Cyril Cetre, Chenle Yu, Sara Royuela, Rémi Barrere, Eduardo Quiñones, Damien Gratadour

View

Tags: AMD Radeon Instinct MI100, ATI, Computer science, CUDA, HIP, HPC, nVidia, nVidia A100, Performance

October 6, 2024 by hgpu

Benchmarking Thread Block Cluster

Tim Lühnen, Tobias Marschner, Sohan Lal

View

Tags: Benchmarking, Computer science, CUDA, nVidia, nVidia H100, PTX

October 6, 2024 by hgpu

HPC acceleration of large (min, +) matrix products to compute domination-type parameters in graphs

E.M. Garzón, J.A. Martínez, J.J. Moreno, M.L. Puertas

View

Tags: Algorithms, Computer science, CUDA, Graph theory, HPC, nVidia, OpenMP, Package, Tesla V100

September 29, 2024 by hgpu

Automatic Generation of OpenCL Code through Polyhedral Compilation with LLM

Marek Palkowski, Mateusz Gruzewski

View

Tags: AMD Radeon RX 6800, ATI, Code generation, Computer science, CUDA, LLM, nVidia, nVidia A100, OpenCL, OpenMP

September 29, 2024 by hgpu

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Xinyao Yi

View

Tags: Computer science, CUDA, Heterogeneous systems, HPC, nVidia, nVidia A100, OpenMP, Performance, Programming techniques

September 22, 2024 by hgpu

DNA sequence alignment: An assignment for OpenMP, MPI, and CUDA/OpenCL

Arturo Gonzalez-Escribano, Diego García-Álvarez, Jesús Cámara

View

Tags: Bioinformatics, Biology, CUDA, MPI, nVidia, nVidia A100, nVidia GeForce GTX Titan X, nVidia GeForce RTX 4050, OpenCL, Optimization, Sequence alignment

September 15, 2024 by hgpu

Enhancing Code Portability, Problem Scale, and Storage Efficiency in Exascale Applicationsin Exascale Applications

Nigel Tan

View

Tags: AMD Radeon Vega VII, ATI, Computer science, CUDA, Heterogeneous systems, HPC, nVidia, nVidia A100, nVidia DGX-A100, nVidia Quadro RTX 5000, OpenMP, OpenMPI, Particle-in-cell methods, performance portability, Tesla V100, Thesis

September 15, 2024 by hgpu

Refining HPCToolkit for application performance analysis at exascale

Laksono Adhianto, Jonathon Anderson, Robert Matthew Barnett, Dragana Grbic, Vladimir Indic, Mark Krentel, Yumeng Liu, Srdan Milakovíc, Wileam Phan, John Mellor-Crumme

View

Tags: AMD Radeon Instinct MI300A, ATI, Computer science, CUDA, HPC, MPI, OpenCL, Package, Performance

September 15, 2024 by hgpu

Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee

Chayanon (Namo)Wichitrnithed, Woo-Sun-Yang, Yun (Helen)He, Brad Richardson, Koichi Sakaguchi, Manuel Arenaz, William I. Gustafson Jr., Jacob Shpund, Ulises Costi Blanco, Alvaro Goldar Dieste

View

Tags: Computer science, CUDA, Fortran, MPI, nVidia, nVidia A100, OpenMP, Optimization, Package, Weather prediction

September 15, 2024 by hgpu

Owl: Differential-based Side-Channel Leakage Detection for CUDA Applications

Yu Zhao, Wenjie Xue, Weijie Chen, Weizhong Qiang, Deqing Zou, Hai Jin

View

Tags: Computer science, CUDA, nVidia, nVidia RTX A4000, Package, Security

September 1, 2024 by hgpu

VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing

Jaebeom Jeon, Minseong Gil, Junsu Kim, Jaeyong Park, Gunjae Koo, Myung Kuk Yoon, Yunho Oh

View

Tags: AI, Artificial intelligence, Computer science, CUDA, Deep learning, Neural networks, nVidia, nVidia Jetson AGX Orin, Performance

September 1, 2024 by hgpu

Exploring Scalability in C++ Parallel STL Implementations

Ruben Laso, Diego Krupitza, Sascha Hunold

View

Tags: Benchmarking, Computer science, CUDA, nVidia, nVidia Ampere A2, OpenMP, Package, Performance, Tesla T4

September 1, 2024 by hgpu

Efficient GPU Implementation of Multi-Precision Integer Division

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication

exa-AMD: Exascale Accelerated Materials Discovery

Accelerated discovery and design of Fe-Co-Zr magnets with tunable magnetic anisotropy through machine learning and parallel computing

WiLLM: An Open Wireless LLM Communication System

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

No More Shading Languages: Compiling C++ to Vulkan Shaders

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us:

contact@hpgu.org