2402

Views of posts on hgpu.org

Tensor Computation Based on Heterogeneous Memory  595 views

Performance portability study of epistasis detection using SYCL on NVIDIA GPU  595 views

DGEMM on Integer Matrix Multiplication Unit  594 views

FPGA Acceleration of Structured-Mesh-Based Explicit and Implicit Numerical Solvers using SYCL  594 views

Deep Language Models for Software Testing and Optimisation  594 views

Statistical Computing With Graphics Processing Units  594 views

Dropbear: Machine Learning Marketplaces made Trustworthy with Byzantine Model Agreement  593 views

Pulsar search acceleration using FPGAs and OpenCL templates  589 views

Thwarting Piracy: Anti-debugging Using GPU-assisted Self-healing Codes  589 views

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference  588 views

Thread-safe lattice Boltzmann for high-performance computing on GPUs  587 views

PySAGES: flexible, advanced sampling methods accelerated with GPUs  587 views

Porting numerical integration codes from CUDA to oneAPI: a case study  586 views

Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL)  585 views

Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality  585 views

GPU First – Execution of Legacy CPU Codes on GPUs  585 views

Software Optimization and Orchestration for Heterogeneous and Distributed Architectures  584 views

Novel Parallel Approaches to Efficiently Solve Spatial Problems on Heterogeneous CPU-GPU Systems  584 views

Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL  582 views

Solving MaxSAT with Matrix Multiplication  581 views

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis  581 views

Automated Buffer Sizing of Dataflow Applications in a High-Level Synthesis Workflow  580 views

Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments  580 views

Efficient Incremental Text-to-Speech on GPUs  579 views

Exploring Thread Coarsening on FPGA  578 views

Gallatin: A General-Purpose GPU Memory Manager  576 views

GPU Load Balancing  575 views

A Domain-Extensible Compiler with Controllable Automation of Optimisations  575 views

A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures  569 views

Fuzzing Loop Optimizations in Compilers for C++ and Data-Parallel Languages  569 views

Simple and efficient GPU accelerated topology optimisation: Codes and applications  569 views

cuSZ-I: High-Fidelity Error-Bounded Lossy Compression for Scientific Data on GPUs  567 views

High Performance Simulation for Scalable Multi-Agent Reinforcement Learning  566 views

Understanding the Impact of Input Entropy on FPU, CPU, and GPU Power  566 views

RTIndeX: Exploiting Hardware-Accelerated GPU Raytracing for Database Indexing  564 views

Mixing Low-Precision Formats in Multiply-Accumulate Units for DNN Training  563 views

A Survey on Optimization Techniques for Edge Artificial Intelligence (AI)  562 views

Auto-SpMV: Automated Optimizing SpMV Kernels on GPU  562 views

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU  561 views

AutoDDL: Automatic Distributed Deep Learning with Asymptotically Optimal Communication  561 views

SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs  560 views

GPUNet: Searching the Deployable Convolution Neural Networks for GPUs  560 views

Improving Energy Efficiency of Basic Linear Algebra Routines on Heterogeneous Systems with Multiple GPUs  558 views

Edge AI for Internet of Energy: Challenges and Perspectives  558 views

Long Code for Code Search  557 views

ARK: GPU-driven Code Execution for Distributed Deep Learning  557 views

Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay  556 views

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s  555 views

BenchDirect: A Directed Language Model for Compiler Benchmarks  553 views

SkyFlow: Heterogeneous streaming for skyline computation using FlowGraph and SYCL  551 views

Seer: Predictive Runtime Kernel Selection for Irregular Problems  551 views

Frameworks in Medical Image Analysis with Deep Neural Networks  550 views

mu-grind: A Framework for Dynamically Instrumenting HLS-Generated RTL  549 views

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers  549 views

BaCO: A Fast and Portable Bayesian Compiler Optimization Framework  548 views

Optimization of massive data applications on heterogeneous architectures  546 views

Static and Dynamic Analyses for Efficient GPU Execution  545 views

Understanding Performance Portability of Bioinformatics Applications in SYCL on an NVIDIA GPU  544 views

ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills  544 views

HiRace: Accurate and Fast Source-Level Race Checking of GPU Programs  543 views

PoCL-R: An Open Standard Based Offloading Layer for Heterogeneous Multi-Access Edge Computing with Server Side Scalability  540 views

Using scheduling entropy amplification in CUDA/OpenMP code to exhibit non-reproducibility issues  539 views

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics  537 views

On the Three P’s of Parallel Programming for Heterogeneous Computing: Performance, Productivity, and Portability  537 views

Orca: FSS-based Secure Training with GPUs  531 views

Porting Batched Iterative Solvers onto Intel GPUs with SYCL  531 views

PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks  531 views

Analyzing GPU Performance in Virtualized Environments: A Case Study  530 views

Towards Intelligent Runtime Framework for Distributed Heterogeneous Systems  529 views

Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems  528 views

A Deep Learning Model for Loop Interchange  528 views

cuSLINK: Single-linkage Agglomerative Clustering on the GPU  528 views

Fortran High-Level Synthesis: Reducing the barriers to accelerating HPC codes on FPGAs  527 views

Hybrid quantum programming with PennyLane Lightning on HPC platforms  525 views

UniFL: Accelerating Federated Learning Using Heterogeneous Hardware Under a Unified Framework  525 views

Compilation and Design Space Exploration of Dataflow Programs for Heterogeneous CPU-GPU Platforms  523 views

FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous Computing  523 views

Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies  521 views

An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing  521 views

CHARM-SYCL: New Unified Programming Environment for Multiple Accelerator Types  519 views

SYCL compute kernels for ExaHyPE  518 views

Performant low-order matrix-free finite element kernels on GPU architectures  518 views

cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications  517 views

DSDP: A Blind Docking Strategy Accelerated by GPUs  515 views

Matrix Multiplication Using Only Addition  512 views

EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs  511 views

Beehive SPIR-V Toolkit: A Composable and Functional API for Runtime SPIR-V Code Generation  511 views

CMLCompiler: A Unified Compiler for Classical Machine Learning  510 views

Hardware Checkpointing and Productive Debugging Flows for FPGAs  509 views

Scope is all you need: Transforming LLMs for HPC Code  508 views

Revisiting Query Performance in GPU Database Systems  506 views

Novel insights on atomic synchronization for sort-based group-by on GPUs  506 views

Efficient OpenCL system integration of non-blocking FPGA accelerators  506 views

A Study on the Intersection of GPU Utilization and CNN Inference  504 views

A Performance-Portable SYCL Implementation of CRK-HACC for Exascale  504 views

Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble Execution  504 views

Out-of-the-box library support for DBMS operations on GPUs  503 views

Precision and Performance Analysis of C Standard Math Library Functions on GPUs  500 views

A High-Performance Computing Cluster for Distributed Deep Learning: A Practical Case of Weed Classification Using Convolutional Neural Network Models  499 views

Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications  499 views

 

Brief statistics for this page

Titles: 100

Total views: 54941

 

Most viewed items:

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: