2402

Views of posts on hgpu.org

Benchmarking optimization algorithms for auto-tuning GPU kernels  752 views

Domain-Specific On-Device Object Detection Method  752 views

Machine Learning for CUDA+MPI Design Rules  751 views

Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures  751 views

Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training  750 views

Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning  750 views

Blockchain Goes Green? Part II: Characterizing the Performance and Cost of Blockchains on the Cloud and at the Edge  750 views

Training a Vision Transformer from scratch in less than 24 hours with 1 GPU  749 views

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml  747 views

Beyond Desktop Computation: Challenges in Scaling a GPU Infrastructure  747 views

FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators  747 views

BenchPress: A Deep Active Benchmark Generator  746 views

Fully Concurrent GPU Data Structures  745 views

cuPSO: GPU Parallelization for Particle Swarm Optimization Algorithms  743 views

Benchmarking and Dissecting the Nvidia Hopper GPU Architecture  742 views

Teaching Parallel Programming in Containers: Virtualization of a Heterogeneous Local Infrastructure  741 views

Performance portability evaluation of blocked stencil computations on GPUs  741 views

Deep Learning Models on CPUs: A Methodology for Efficient Training  739 views

Source-to-Source Automatic Differentiation of OpenMP Parallel Loops  739 views

Parallel programming in mobile devices with FancyJCL  738 views

Performance study on GPU offloading techniques using the Gauss matrix inverse algorithm  738 views

FPGA Accelerators on Heterogeneous Systems: An Approach Using High Level Synthesis  738 views

Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL  737 views

Monitoring Collective Communication Among GPUs  736 views

Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning  736 views

Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs  733 views

Fast Arbitrary Precision Floating Point on FPGA  732 views

A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code  732 views

Principles for Automated and Reproducible Benchmarking  731 views

Collage: Automated Integration of Deep Learning Backends  731 views

A Ray Tracing Implementation Performance Comparison between the CPU and the GPU  731 views

A Variant of Concurrent Constraint Programming on GPU  731 views

Parallel and Heterogeneous Timing Analysis: Partition, Algorithm, and System  730 views

OpenCL-HPX Integration  730 views

Fast convolution kernels on pascal GPU with high memory efficiency  730 views

Towards making the most of NLP-based device mapping optimization for OpenCL kernels  729 views

Autotuning CUDA: Applying NLP Techniques to LS-CAT  728 views

Fast GPU bounding boxes on tree-structured scenes  728 views

ALPINIST: An Annotation-Aware GPU Program Optimizer  728 views

N-Cloth: Predicting 3D Cloth Deformation with Mesh-Based Networks  728 views

Concurrency Mapping to FPGAs with OpenCL: A Case Study with a Shallow Water Kernel  726 views

Capturing the Memory Topology of GPUs  725 views

Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis  724 views

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism  724 views

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity  723 views

GGArray: A Dynamically Growable GPU Array  720 views

Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads  720 views

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models  719 views

The Application of AI Technology in GPU Scheduling Algorithm Optimization  718 views

MGARD: A multigrid framework for high-performance, error-controlled data compression and refactoring  718 views

IgNet. A Super-precise Convolutional Neural Network  718 views

Testing and Mutation Testing for GPU Kernels  715 views

Compiler Technologies in Deep Learning Co-Design: A Survey  715 views

Demystifying Dependency Bugs in Deep Learning Stack  711 views

OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials  710 views

Simulation Methodologies for Mobile GPUs  708 views

Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations  708 views

Providing performance portable numerics for Intel GPUs  706 views

An Open-source FPGA Library for Data Sorting  706 views

Deductive verification for SYCL  704 views

Extending MAGMA Portability with OneAPI  703 views

Three Contributions to the Theory and Practice of Optimizing Compilers  701 views

Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI  700 views

On Scheduling Ring-All-Reduce Learning Jobs in Multi-Tenant GPU Clusters with Communication Contention  699 views

SciAI4Industry – Solving PDEs for industry-scale problems with deep learning  698 views

CPU-GPU Layer-Switched Low Latency CNN Inference  698 views

Reducing Synchronous GPU Memory Transfers: Design and implementation of a Futhark compiler optimisation  698 views

Optimizing Deep Learning Models For Raspberry Pi  697 views

OpenMP Advisor  696 views

Safe and Practical GPU Acceleration in TrustZone  696 views

Accelerating bioinformatics applications on CUDA-enabled multi-GPU systems  696 views

Evaluation of Rust for GPGPU high-performance computing  696 views

GT4Py: High Performance Stencils for Weather and Climate Applications using Python  696 views

From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels  696 views

Using AI libraries for Incompressible Computational Fluid Dynamics  694 views

Just-in-Time Compilation and Link-Time Optimization for OpenMP Target Offloading  693 views

Electrical-Level Attacks on CPUs, FPGAs, and GPUs: Survey and Implications in the Heterogeneous Era  693 views

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs  693 views

Julia as a unifying end-to-end workflow language on the Frontier exascale system  693 views

OpenMP offload at the Exascale using Intel GPU Max 1550: evaluation of STREAmS compressible solver  693 views

High Performance Privacy Preserving AI  692 views

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations  691 views

Manas: Mining Software Repositories to Assist AutoML  691 views

A tool set for random number generation on GPUs in R  691 views

Challenges and Opportunities in C/C++ Source-To-Source Compilation  690 views

Strega: An HTTP Server for FPGAs  689 views

Evaluating the Wide Area Classroom After 24,000 HPC Students  687 views

Cramming: Training a Language Model on a Single GPU in One Day  687 views

An OpenCL-Based FPGA Accelerator for Faster R-CNN  685 views

Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU  685 views

Real-Time High-Performance Computing for Embedded Control Systems  685 views

Evaluation of Pseudo-Random Number Generation on GPU Cards  685 views

QArray: a GPU-accelerated constant capacitance model simulator for large quantum dot arrays  685 views

Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks  684 views

The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in Materials Science  682 views

Challenges and Techniques for Transparent Acceleration of Unmodified Big Data Applications  681 views

Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance  680 views

AsymML: An Asymmetric Decomposition Framework for Privacy-Preserving DNN Training and Inference  680 views

Behavioral graph fraud detection in E-commerce  679 views

Assessing Opportunities of SYCL and Intel oneAPI for Biological Sequence Alignment  679 views

 

Brief statistics for this page

Titles: 100

Total views: 71512

 

Most viewed items:

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: