Views of posts on hgpu.org
Automatically Harnessing Sparse Acceleration 1,116 views
A Newcomer In The PGAS World – UPC++ vs UPC: A Comparative Study 1,115 views
Transparent Checkpointing for OpenGL Applications on GPUs 1,115 views
GPU-Based Hierarchical Computations for View Independent Visibility 1,114 views
CASE: A Compiler-Assisted SchEduling Framework for Multi-GPU Systems 1,112 views
Supporting CUDA for an extended RISC-V GPU architecture 1,112 views
Exploiting SPMD Horizontal Locality to Improve Memory Efficiency 1,112 views
Scalable and Parallel Implementation of a Financial Application on a GPU: With Focus on Out-of-Core Case 1,109 views
An Investigation of Atomic Synchronization for Sort-Based Group-By Aggregation on GPUs 1,109 views
Efficient Video Compression via Content-Adaptive Super-Resolution 1,108 views
Scalable instruction set simulator for thousand-core architectures running on GPGPUs 1,106 views
Instruments of Productivity for High Performance Computing 1,106 views
Implementing a GPU-Enhanced Cluster for Large-Scale Simulations 1,105 views
Challenging cloning related problems with GPU-based algorithms 1,103 views
Exploiting GPU On-chip Shared Memory for Accelerating Schedulability Analysis 1,102 views
NVIDIA CUDA software and gpu parallel computing architecture 1,101 views
Streaming architectures and technology trends 1,101 views
Migrating real-time depth image-based rendering from traditional to next-gen GPGPU 1,101 views
Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach 1,101 views
LAMDA: Learning-Assisted Multi-Stage Autotuning for FPGA Design Closure 1,100 views
Hardware Acceleration of HPC Computational Flow Dynamics using HBM-enabled FPGAs 1,100 views
Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs 1,099 views
TopicBERT for Energy Efficient Document Classification 1,097 views
A Survey of Machine Learning for Computer Architecture and Systems 1,097 views
LoopBench: An Evaluation of Loop Acceleration in Heterogeneous Systems 1,094 views
Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach 1,094 views
Parallel Arbitrary-precision Integer Arithmetic 1,092 views
Dynamic GPU Energy Optimization for Machine Learning Training Workloads 1,091 views
Performance study of mapping irregular computations on GPUs 1,089 views
Performance Optimisations for Heterogeneous Managed Runtime Systems 1,088 views
End-to-end Optimization of Machine Learning Prediction Queries 1,088 views
Multicore performance optimization using partner cores 1,086 views
Compiler-Based Tools to Aid in Data Transfer Optimization and On-Chip Debug of Heterogeneous Compute Systems 1,086 views
Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels 1,086 views
Optimisation and GPU code generation of Stencils for Futhark 1,086 views
Performance Analysis of a High-level Abstractions-based Hydrocode on Future Computing Systems 1,085 views
Modeling Parallel Programs using Large Language Models 1,084 views
Exploring Applications in CUDA 1,084 views
Optimization of tele-immersion codes 1,083 views
Level-of-Detail Triangle Strips for Deforming Meshes 1,082 views
Accelerating Concurrent Heap on GPUs 1,080 views
Visualization of level-of-detail meshes on the GPU 1,080 views
Study for measurement method for coal volume on base of GPU 1,079 views
Heuristic Optimization Methods for Improving Performance of Recursive General Purpose Applications on GPUs 1,079 views
HipBone: A performance-portable GPU-accelerated C++ version of the NekBone benchmark 1,079 views
Taking the graphics processor beyond graphics 1,078 views
Direct Self-Consistent Field Computations on GPU Clusters 1,077 views
KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks 1,076 views
Using a GPU to accelerate die and mold fabrication 1,076 views
Implicit Feature-Based Alignment System for Radiotherapy 1,076 views
Employ Bump Mapping to Enrich the 3D NPR Image 1,076 views
Simulation Studies of Viral Advertisement Diffusion on Multi-GPU 1,074 views
FSpGEMM: An OpenCL-based HPC Framework for Accelerating General Sparse Matrix-Matrix Multiplication on FPGAs 1,073 views
Real-time Geometric Calibration on graphics processing unit with CUDA 1,072 views
CuPBoP-AMD: Extending CUDA to AMD Platforms 1,072 views
A load balance multi-scheduling model for OpenCL kernel tasks in an integrated cluster 1,071 views
Performance analysis and optimization of highly diverging algorithms on GPUs 1,071 views
Embedded Software Synthesis using Heterogeneous Dataflow Models 1,071 views
Fast Turnaround HLS Debugging using Dependency Analysis and Debug Overlays 1,068 views
TorchAudio: Building Blocks for Audio and Speech Processing 1,068 views
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer 1,067 views
iGUARD: In-GPU Advanced Race Detection 1,066 views
Simulation Modelling and Visualisation: Toolkits for Building Artificial Worlds 1,066 views
AVEC: Accelerator Virtualization in Cloud-Edge Computing for Deep Learning Libraries 1,065 views
Model-based optimization of MPDATA on Intel Xeon Phi through load imbalancing 1,065 views
Analysis and Comparison of Performance and Power Consumption of Neural Networks on CPU, GPU, TPU and FPGA 1,065 views
Efficient code generation for hardware accelerators by refining partially specified implementation 1,064 views
Productive Performance Engineering for Weather and Climate Modeling with Python 1,064 views
Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks? 1,062 views
BASEMENT v3: a modular freeware for river process modelling over multiple computational backends 1,060 views
Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs 1,060 views
The Ecological Footprint of Neural Machine Translation Systems 1,057 views
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices 1,055 views
Reducing IO bandwidth for GPU based moment invariant classifier systems 1,053 views
94% on CIFAR-10 in 3.29 Seconds on a Single GPU 1,052 views
Case Study: GPU-based implementation of sequence pair based floorplanning using CUDA 1,051 views
The Celerity High-level API: C++20 for Accelerator Clusters 1,051 views
An Accelerated IHS Transform Fusion of Remote Sensing Image Data Based on GPU 1,051 views
PeriPy – A High Performance OpenCL Peridynamics Package 1,050 views
GPGPU flow 1,050 views
Software Testing – Test Suite Compilation and Execution Optimizations 1,049 views
Acceleration of the Method of Moments Calculations by Using Graphics Processing Units 1,049 views
Enhancing Performance of Simulations using GPGPU 1,043 views
Effective GPU Sharing Under Compiler Guidance 1,042 views
How to Render FDTD Computations More Effective Using a Graphics Accelerator 1,041 views
BootCMatchG: An adaptive Algebraic MultiGrid linear solver for GPUs 1,041 views
Deep Graph Learning for Program Analysis and System Optimization 1,040 views
Custom Code Generation for a Graph DSL 1,040 views
Accelerating Regular-Expression Matching on FPGAs with High-Level Synthesis 1,040 views
High Performance GPU Code Generation for Matrix-Matrix Multiplication using MLIR: Some Early Results 1,040 views
AZP: Automatic Specialization for Zero Values in Gaming Applications 1,040 views
Migrating CUDA to oneAPI: A Smith-Waterman Case Study 1,039 views
Worst-Case Execution Time Guarantees for Runtime-Reconfigurable Architectures 1,038 views
Fast CUDA-Aware MPI Datatypes without Platform Support 1,036 views
Multi-level parallelization for hybrid ACO 1,036 views
Parallel computing with CUDA 1,034 views
Titles: 100
Total views: 107574
- Programming - 186,129 views
- Login - 164,389 views
- User dashboard - 90,683 views
- Paper titles list - 70,055 views
- Add new event - 64,591 views
- Add new post - 59,359 views
- Register - 49,227 views
- Statistics - 36,555 views
- Modification of self-organizing migration algorithm for OpenCL framework - 34,167 views
- Books on OpenCL and CUDA - 28,820 views