Views of posts on hgpu.org
Tensor Computation Based on Heterogeneous Memory 595 views
Performance portability study of epistasis detection using SYCL on NVIDIA GPU 595 views
DGEMM on Integer Matrix Multiplication Unit 594 views
FPGA Acceleration of Structured-Mesh-Based Explicit and Implicit Numerical Solvers using SYCL 594 views
Deep Language Models for Software Testing and Optimisation 594 views
Statistical Computing With Graphics Processing Units 594 views
Dropbear: Machine Learning Marketplaces made Trustworthy with Byzantine Model Agreement 593 views
Pulsar search acceleration using FPGAs and OpenCL templates 589 views
Thwarting Piracy: Anti-debugging Using GPU-assisted Self-healing Codes 589 views
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference 588 views
Thread-safe lattice Boltzmann for high-performance computing on GPUs 587 views
PySAGES: flexible, advanced sampling methods accelerated with GPUs 587 views
Porting numerical integration codes from CUDA to oneAPI: a case study 586 views
Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL) 585 views
GPU First – Execution of Legacy CPU Codes on GPUs 585 views
Software Optimization and Orchestration for Heterogeneous and Distributed Architectures 584 views
Novel Parallel Approaches to Efficiently Solve Spatial Problems on Heterogeneous CPU-GPU Systems 584 views
Solving MaxSAT with Matrix Multiplication 581 views
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis 581 views
Automated Buffer Sizing of Dataflow Applications in a High-Level Synthesis Workflow 580 views
Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments 580 views
Efficient Incremental Text-to-Speech on GPUs 579 views
Exploring Thread Coarsening on FPGA 578 views
Gallatin: A General-Purpose GPU Memory Manager 576 views
GPU Load Balancing 575 views
A Domain-Extensible Compiler with Controllable Automation of Optimisations 575 views
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures 569 views
Fuzzing Loop Optimizations in Compilers for C++ and Data-Parallel Languages 569 views
Simple and efficient GPU accelerated topology optimisation: Codes and applications 569 views
cuSZ-I: High-Fidelity Error-Bounded Lossy Compression for Scientific Data on GPUs 567 views
High Performance Simulation for Scalable Multi-Agent Reinforcement Learning 566 views
Understanding the Impact of Input Entropy on FPU, CPU, and GPU Power 566 views
RTIndeX: Exploiting Hardware-Accelerated GPU Raytracing for Database Indexing 564 views
Mixing Low-Precision Formats in Multiply-Accumulate Units for DNN Training 563 views
A Survey on Optimization Techniques for Edge Artificial Intelligence (AI) 562 views
Auto-SpMV: Automated Optimizing SpMV Kernels on GPU 562 views
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU 561 views
AutoDDL: Automatic Distributed Deep Learning with Asymptotically Optimal Communication 561 views
SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs 560 views
GPUNet: Searching the Deployable Convolution Neural Networks for GPUs 560 views
Edge AI for Internet of Energy: Challenges and Perspectives 558 views
Long Code for Code Search 557 views
ARK: GPU-driven Code Execution for Distributed Deep Learning 557 views
Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay 556 views
TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s 555 views
BenchDirect: A Directed Language Model for Compiler Benchmarks 553 views
SkyFlow: Heterogeneous streaming for skyline computation using FlowGraph and SYCL 551 views
Seer: Predictive Runtime Kernel Selection for Irregular Problems 551 views
Frameworks in Medical Image Analysis with Deep Neural Networks 550 views
mu-grind: A Framework for Dynamically Instrumenting HLS-Generated RTL 549 views
LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers 549 views
BaCO: A Fast and Portable Bayesian Compiler Optimization Framework 548 views
Optimization of massive data applications on heterogeneous architectures 546 views
Static and Dynamic Analyses for Efficient GPU Execution 545 views
Understanding Performance Portability of Bioinformatics Applications in SYCL on an NVIDIA GPU 544 views
ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills 544 views
HiRace: Accurate and Fast Source-Level Race Checking of GPU Programs 543 views
Using scheduling entropy amplification in CUDA/OpenMP code to exhibit non-reproducibility issues 539 views
APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics 537 views
Orca: FSS-based Secure Training with GPUs 531 views
Porting Batched Iterative Solvers onto Intel GPUs with SYCL 531 views
Analyzing GPU Performance in Virtualized Environments: A Case Study 530 views
Towards Intelligent Runtime Framework for Distributed Heterogeneous Systems 529 views
Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems 528 views
A Deep Learning Model for Loop Interchange 528 views
cuSLINK: Single-linkage Agglomerative Clustering on the GPU 528 views
Fortran High-Level Synthesis: Reducing the barriers to accelerating HPC codes on FPGAs 527 views
Hybrid quantum programming with PennyLane Lightning on HPC platforms 525 views
UniFL: Accelerating Federated Learning Using Heterogeneous Hardware Under a Unified Framework 525 views
Compilation and Design Space Exploration of Dataflow Programs for Heterogeneous CPU-GPU Platforms 523 views
FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous Computing 523 views
An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing 521 views
CHARM-SYCL: New Unified Programming Environment for Multiple Accelerator Types 519 views
SYCL compute kernels for ExaHyPE 518 views
Performant low-order matrix-free finite element kernels on GPU architectures 518 views
cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications 517 views
DSDP: A Blind Docking Strategy Accelerated by GPUs 515 views
Matrix Multiplication Using Only Addition 512 views
EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs 511 views
Beehive SPIR-V Toolkit: A Composable and Functional API for Runtime SPIR-V Code Generation 511 views
CMLCompiler: A Unified Compiler for Classical Machine Learning 510 views
Hardware Checkpointing and Productive Debugging Flows for FPGAs 509 views
Scope is all you need: Transforming LLMs for HPC Code 508 views
Revisiting Query Performance in GPU Database Systems 506 views
Novel insights on atomic synchronization for sort-based group-by on GPUs 506 views
Efficient OpenCL system integration of non-blocking FPGA accelerators 506 views
A Study on the Intersection of GPU Utilization and CNN Inference 504 views
A Performance-Portable SYCL Implementation of CRK-HACC for Exascale 504 views
Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble Execution 504 views
Out-of-the-box library support for DBMS operations on GPUs 503 views
Precision and Performance Analysis of C Standard Math Library Functions on GPUs 500 views
Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications 499 views
Titles: 100
Total views: 54941
- Programming - 186,132 views
- Login - 164,498 views
- User dashboard - 90,966 views
- Paper titles list - 70,484 views
- Add new event - 64,743 views
- Add new post - 59,470 views
- Register - 49,293 views
- Statistics - 36,832 views
- Modification of self-organizing migration algorithm for OpenCL framework - 34,169 views
- Books on OpenCL and CUDA - 28,871 views