Views of posts on hgpu.org
Modification of self-organizing migration algorithm for OpenCL framework 34,162 views
The OoO VLIW JIT Compiler for GPU Inference 18,406 views
Parallel Ray Tracing Simulations with MATLAB for Dynamic Lens Systems 15,265 views
Data Layout Pruning on GPU 12,276 views
Domain-Specific Code Language Models: Unraveling the Potential for HPC Codes and Tasks 10,918 views
Code Optimization Techniques for Graphics Processing Units 10,632 views
FPGA implementation of a Convolutional Neural Network for "Wake up word" detection 10,172 views
OpenMP Programming on Intel R Xeon Phi TM Coprocessors: An Early Performance Comparison 9,990 views
Matrix inversion speed up with CUDA 9,938 views
Performance Evaluation of Container-based Virtualization for High Performance Computing Environments 9,854 views
OpenCL Programming by Example 9,689 views
Computing Treewidth on the GPU 9,676 views
Energy efficiency of finite difference algorithms on multicore CPUs, GPUs, and Intel Xeon Phi processors 9,518 views
OpenCL Actors – Adding Data Parallelism to Actor-based Programming with CAF 9,462 views
An Efficient Load Balancing Method for Tree Algorithms 9,373 views
GMM based Fisher vector calculation on GPGPU 9,327 views
Modeling the Resource Requirements of Convolutional Neural Networks on Mobile Devices 9,303 views
GALARIO: a GPU Accelerated Library for Analysing Radio Interferometer Observations 9,133 views
Breaking DVB-CSA 8,667 views
Torch7: A Matlab-like Environment for Machine Learning 8,526 views
End-to-end Deep Learning of Optimization Heuristics 8,519 views
Experiences Building an MLIR-based SYCL Compiler 8,209 views
Accelerating Radio Astronomy with Auto-Tuning 7,875 views
GPU implementation of a deep learning network for image recognition tasks 7,860 views
4kUHD H264 wireless live video streaming using CUDA 7,838 views
Monte Carlo methods for massively parallel computers 7,720 views
PySPH: A Python framework for SPH 7,691 views
GPU Octrees and Optimized Search 7,600 views
Out-of-core Implementation for Accelerator Kernels on Heterogeneous Clouds 7,504 views
CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs 7,474 views
On Optimizing Complex Stencils on GPUs 7,474 views
Distributed wideband software-defined radio receiver for heterogeneous systems 7,397 views
Automated Testing of Graphics Shader Compilers 7,386 views
Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures 7,377 views
Compoundly weighted Voronoi: a sequential and parallel implementation 7,304 views
Meta Networks for Neural Style Transfer 7,296 views
IBM Deep Learning Service 7,249 views
NaNet:a low-latency NIC enabling GPU-based, real-time low level trigger systems 7,206 views
Empower Sequence Labeling with Task-Aware Neural Language Model 7,191 views
Understanding the Topics and Challenges of GPU Programming by Classifying and Analyzing Stack Overflow Posts 7,145 views
Fast Parallel Sorting Algorithms on GPUs 6,994 views
A code-based analytical approach for using separate device coprocessors in computing systems 6,909 views
An OpenCL Method of Parallel Sorting Algorithms for GPU Architecture 6,738 views
Quasi-real-time analysis of dynamic near field scattering data using a graphics processing unit 6,565 views
A Common GPU n-Dimensional Array for Python and C 6,553 views
Random Forests of Very Fast Decision Trees on GPU for Mining Evolving Big Data Streams 6,271 views
gSLIC: a real-time implementation of SLIC superpixel segmentation 6,226 views
A Comparative Study of 2D Numerical Methods with GPU Computing 6,206 views
Interactive Soft Tissue for Surgical Simulation 6,174 views
An octree-based proxy for collision detection in large-scale particle systems 6,126 views
SoAx: A generic C++ Structure of Arrays for handling Particles in HPC Codes 6,123 views
Implementing Level-3 BLAS Routines in OpenCL on Different Processing Units 6,058 views
GPU-PIV 6,029 views
Deep learning for galaxy surface brightness profile fitting 5,963 views
The CUDA Handbook: A Comprehensive Guide to GPU Programming 5,961 views
libWater: Heterogeneous Distributed Computing Made Easy 5,904 views
Advanced 2D Rasterization on Modern CPUs 5,862 views
Sorting with GPUs: A Survey 5,813 views
Report: Performance comparison between C2075 and P100 GPU cards using cosmological correlation functions 5,728 views
Accelerating Genomics Research with OpenCL and FPGAs 5,728 views
Optimization of the Brillouin operator on the KNL architecture 5,697 views
BIDMach: Large-scale Learning with Zero Memory Allocation 5,686 views
Fast in-place sorting with CUDA based on bitonic sort 5,671 views
DTAM: Dense tracking and mapping in real-time 5,608 views
Industrial Robot Collision Handling in Harsh Environments 5,597 views
Language Modeling with Gated Convolutional Networks 5,593 views
Build and Travel KD-Tree with CUDA 5,592 views
Synkhronos: a Multi-GPU Theano Extension for Data Parallelism 5,583 views
Usage of GPU in LS-DYNA 5,569 views
Hydra: a C++11 framework for data analysis in massively parallel platforms 5,532 views
Best Practice Guide – GPGPU 5,513 views
OpenCL Programming Guide 5,471 views
GPU sample sort 5,462 views
Vectorized algorithm for multidimensional Monte Carlo integration on modern GPU, CPU and MIC architectures 5,445 views
vCUDA Framework Development for GPU Virtualization 5,433 views
Parallel Medical Image Reconstruction: From Graphics Processors to Grids 5,341 views
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs 5,326 views
Radeon PRO Solid State Graphics (SSG) API User Manual 5,313 views
Acceleration of tensor-product operations for high-order finite element methods 5,312 views
Collision Detection Based on Fuzzy Scene Subdivision 5,284 views
Launch-time Optimization of OpenCL Kernels 5,282 views
Simulation of Biological Tissue using Mass-Spring-Damper Models 5,277 views
Efficient Algorithms for Sorting on GPUs 5,265 views
Implementing Neural Networks Efficiently 5,264 views
On Pre-Trained Image Features and Synthetic Images for Deep Learning 5,190 views
A Dynamic Hash Table for the GPU 5,187 views
A novel sorting algorithm for many-core architectures based on adaptive bitonic sort 5,167 views
Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs 5,141 views
Parallel Neural Network Training with OpenCL 5,137 views
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort 5,129 views
Cue-independent extending inverse kinematics for robust pose estimation in 3D point clouds 5,125 views
Titles: 100
Total views: 736254
- Programming - 186,127 views
- Login - 164,291 views
- User dashboard - 90,394 views
- Paper titles list - 69,712 views
- Add new event - 64,542 views
- Add new post - 59,122 views
- Register - 49,136 views
- Statistics - 36,280 views
- Modification of self-organizing migration algorithm for OpenCL framework - 34,162 views
- Books on OpenCL and CUDA - 28,763 views