Views of posts on hgpu.org
GPU Accelerated Greedy Algorithms for Compressed Sensing 3,025 views
libmolgrid: GPU Accelerated Molecular Gridding for Deep Learning Applications 3,024 views
Implementing Strassen’s Algorithm with CUTLASS on NVIDIA Volta GPUs 3,022 views
Hybrid CPU-GPU Implementation of Tracking-Learning-Detection Algorithm 3,022 views
Accelerating the ANSYS Direct Sparse Solver with GPUs 3,019 views
A Parallel Algorithm of PCA-SIFT Based on CUDA 3,019 views
Programming Frameworks for Distributed Smartphone Computing 3,019 views
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? 3,019 views
KERNELGEN – A Toolchain for Automatic GPU-centric Applications Porting 3,017 views
A simple GPU-based approach for 3D Voronoi diagram construction and visualization 3,014 views
Support for Parallel Scan in OpenMP 3,014 views
Legolizer: A Real-Time System for Modeling and Rendering LEGO Representations of Boundary Models 3,013 views
The Accelerator Wall: Limits of Chip Specialization 3,011 views
SWM: Simplified Wu-Manber for GPU-based Deep Packet Inspection 3,010 views
Hybrid CPU-GPU Pipeline Framework 3,010 views
GPU Virtualization 3,010 views
Interleaving and Lock-Step Semantics for Analysis and Verification of GPU Kernels 3,010 views
A Case for Work-stealing on FPGAs with OpenCL Atomics 3,010 views
GPU Accelerated Face Detection (thesis) 3,009 views
A Data-Parallel Graphics Pipeline Implemented in OpenCL 3,008 views
Swendsen-Wang Multi-Cluster Algorithm for the 2D/3D Ising Model on Xeon Phi and GPU 3,008 views
Towards Interactive Visual Exploration of Parallel Programs using a Domain-specific Language 3,007 views
Fractal Based Method on Hardware Acceleration for Natural Environments 3,006 views
On the Fly Porn Video Blocking Using Distributed Multi-GPU and Data Mining Approach 3,006 views
A design case study: CPU vs. GPGPU vs. FPGA 3,006 views
GPU-Based Airway Tree Segmentation and Centerline Extraction 3,006 views
High Performance Histograms on SIMT and SIMD Architectures 3,006 views
A Hybrid Approach to Parallel Connected Component Labeling Using CUDA 3,005 views
Hardware Transactional Memory for GPU Architectures 3,004 views
A Predictive Model for Solving Small Linear Algebra Problems in GPU Registers 3,004 views
Swarm-NG: a CUDA Library for Parallel n-body Integrations with focus on Simulations of Planetary Systems 3,003 views
Computing Strongly Connected Components with CUDA 3,003 views
Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit 3,002 views
Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths 3,002 views
Multi-Scale, Multi-Level, Heterogeneous Features Extraction and Classification of Volumetric Medical Images 3,002 views
Molecular dynamics simulations through GPU video games technologies 3,001 views
3D tumor localization through real-time volumetric x-ray imaging for lung cancer radiotherapy 3,001 views
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication 3,001 views
Face Recognition Using OpenCL 3,000 views
Theano-based Large-Scale Visual Recognition with Multiple GPUs 3,000 views
Sailfish: a flexible multi-GPU implementation of the lattice Boltzmann method 2,999 views
OpenGL SuperBible: Comprehensive Tutorial and Reference (5th Edition) 2,998 views
Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer 2,997 views
The GPU Computing Era 2,996 views
Rubus: A compiler for seamless and extensible parallelism 2,996 views
A short guide to CUDA C: For physicists with multi-core graphics cards 2,994 views
Fast Burrows Wheeler Compression Using CPU and GPU 2,994 views
FastTree: A Hardware KD-Tree Construction Acceleration Engine for Real-Time Ray Tracing 2,994 views
Adapting the GA Approach to Solve Traveling Salesman Problems on CUDA Architecture 2,993 views
Matrix Multiplication with CUDA – A basic introduction to the CUDA programming model 2,992 views
Anisotropic Kuwahara Filtering on the GPU 2,991 views
Parallelization of the Generalized Hough Transform on GPU 2,991 views
Demystifying GPU microarchitecture through microbenchmarking 2,989 views
Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems 2,989 views
A Comparison of Modern GPU and CPU Architectures: And the Common Convergence of Both 2,989 views
3D GPU Architecture using Cache Stacking: Performance, Cost, Power and Thermal analysis 2,988 views
Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning 2,987 views
Machine Learning from Streaming Data in Heterogeneous Computing Environments 2,986 views
Design and Development of an Efficient H. 264 Video Encoder for CPU/GPU using OpenCL 2,986 views
Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling 2,985 views
cf4ocl: a C framework for OpenCL 2,985 views
GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model 2,985 views
GPU-based ultrafast IMRT plan optimization 2,983 views
Finding Longest Common Subsequences by GPU-Based Parallel Ant Colony Optimization 2,983 views
Optimizing Performance of Recurrent Neural Networks on GPUs 2,982 views
A code motion technique for accelerating general-purpose computation on the GPU 2,982 views
Real-time Image Processing on Low Cost Embedded Computers 2,982 views
Improved Finite Difference Schemes for a 3-D Viscothermal Wave Equation on a GPU 2,980 views
GPU Array Access Auto-Tuning 2,980 views
On Vectorization of Deep Convolutional Neural Networks for Vision Tasks 2,979 views
GHOST: GPGPU-Offloaded High Performance Storage I/O Deduplication for Primary Storage System 2,977 views
Implementation of the SYCL Heterogeneous Computing Library 2,975 views
The Comparisons of OpenCL and OpenMP Computing Paradigm 2,973 views
Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning 2,972 views
Scalable Kernel Fusion for Memory-Bound GPU Applications 2,970 views
Implementation of digital down converter in GPU 2,969 views
Achieving TeraCUPS on Longest Common Subsequence Problem using GPGPUs 2,968 views
FastMag: Fast micromagnetic simulator for complex magnetic structures 2,968 views
Dynamic Memory Allocation for OpenCL 2,968 views
Multi-Threaded Automatic Integration Using OpenMP and CUDA 2,967 views
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures 2,967 views
Molecular dynamics recipes for genome research 2,967 views
Sparser, Better, Faster GPU Parsing 2,967 views
Best Practice Guide Intel Xeon Phi v2.0 2,966 views
clMAGMA: High Performance Dense Linear Algebra with OpenCL 2,966 views
Using GPUs for Machine Learning Algorithms 2,966 views
Converting Data to Task-Parallelism by Rewrites 2,965 views
Data-Parallel Octrees for Surface Reconstruction 2,965 views
Hybrid GPU-Based Single- and Double-Bounce SAR Simulation 2,964 views
A framework to implement a multifrontal scheme on GPU architectures with OpenCL 2,963 views
Pseudorandom Numbers Generation for Monte Carlo Simulations on GPUs: OpenCL Approach 2,963 views
State Lattice-based Motion Planning for Autonomous On-Road Driving 2,962 views
High productivity multi-device exploitation with the Heterogeneous Programming Library 2,961 views
Parallel Execution of AES-CTR Algorithm Using Extended Block Size 2,960 views
GeNN: a code generation framework for accelerated brain simulations 2,960 views
Titles: 100
Total views: 299183
- Programming - 186,226 views
- Login - 171,894 views
- User dashboard - 98,377 views
- Paper titles list - 91,771 views
- Add new event - 69,092 views
- Add new post - 62,672 views
- Register - 52,973 views
- Statistics - 44,033 views
- Modification of self-organizing migration algorithm for OpenCL framework - 34,513 views
- Books on OpenCL and CUDA - 31,045 views