Views of posts on hgpu.org
Efficient 2D Software Rendering 5,073 views
Unified Deep Learning with CPU, GPU, and FPGA Technologies 5,066 views
Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model 5,043 views
SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks 5,032 views
GPU Pro 6: Advanced Rendering Techniques 5,028 views
OpenCL in Action: How to Accelerate Graphics and Computations 5,017 views
Flexible FPGA design for FDTD using OpenCL 5,017 views
Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR 4,972 views
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network 4,966 views
Scandalously Parallelizable Mesh Generation 4,886 views
Introduction to GPU Radix Sort 4,878 views
Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing 4,858 views
Light Loss-Less Data Compression, with GPU Implementation 4,824 views
Nemo: A parallelized Lagrangian particle-tracking model 4,819 views
GMP implementation on CUDA – A Backward Compatible Design With Performance Tuning 4,795 views
Distributed Training Large-Scale Deep Architectures 4,776 views
Fast parallel GPU-sorting using a hybrid algorithm 4,763 views
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems 4,762 views
OpenFace: A general-purpose face recognition library with mobile applications 4,748 views
CUSIMANN: An optimized simulated annealing software for GPUs 4,713 views
Theano: Deep Learning on GPUs with Python 4,708 views
Deep Voice 3: 2000-Speaker Neural Text-to-Speech 4,695 views
Deep Learning for Obfuscated Code Analysis 4,668 views
A cluster for CS education in the manycore era 4,666 views
Tesla vs. Xeon Phi vs. Radeon A Compiler Writer’s Perspective 4,662 views
Nengo: a Python tool for building large-scale functional brain models 4,658 views
Calamari – A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition 4,654 views
High performance comparison-based sorting algorithm on many-core GPUs 4,647 views
A GPU Parallelized Spectral Method for Elliptic Equations 4,639 views
Predicting the Execution Time of a kernel on a specific GPU using PTX code 4,627 views
Fast MPEG-CDVS Encoder with GPU-CPU Hybrid Computing 4,624 views
GooFit 2.0 4,597 views
A Study of Time and Energy Efficient Algorithms for Parallel and Heterogeneous Computing 4,593 views
Deep and Shallow convections in Atmosphere Models on Intel Xeon Phi Coprocessor Systems 4,589 views
Advanced Simulation Library: Expanding software ecosystem for the DSP/FPGA/GPU market 4,586 views
cudaMap: a GPU accelerated program for gene expression connectivity mapping 4,583 views
Data Sorting Using Graphics Processing Units 4,576 views
MATLAB and Python for GPU Computing 4,560 views
HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads 4,543 views
GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications 4,524 views
Accelerator Aware MPI Micro-benchmarking using CUDA, OpenACC and OpenCL 4,523 views
Espresso: Efficient Forward Propagation for BCNNs 4,515 views
Towards Portable Performance for Explicit Hydrodynamics Codes 4,506 views
Real-Time Rendering of Molecular Dynamics Simulation Data: A Tutorial 4,502 views
GPUTeraSort: high performance graphics co-processor sorting for large database management 4,502 views
Performance Analysis and Tuning For: General-Purpose Graphics Processing Units (GPGPU) 4,481 views
Combining Belief Propagation and Successive Cancellation List Decoding of Polar Codes on a GPU Platform 4,479 views
Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels 4,430 views
Intel Xeon Phi Coprocessor High-Performance Programming 4,424 views
ChainerMN: Scalable Distributed Deep Learning Framework 4,422 views
Designing efficient sorting algorithms for manycore GPUs 4,418 views
OpenCL 2.2 API Specification 4,412 views
A Framework for Productive, Efficient and Portable Parallel Computing 4,407 views
Techniques for efficient DCT/IDCT implementation on generic GPU 4,396 views
Strategy Preserving Compilation for Parallel Functional Code 4,396 views
A Highly Extensible Framework for Molecule Dynamic Simulation on GPUs 4,395 views
PCIeHLS: an OpenCL HLS framework 4,375 views
Data Coherence Analysis and Optimization for Heterogeneous Computing 4,366 views
GPGPU Programming for Games and Science 4,360 views
Multi-GPU Rendering with Vulkan API 4,352 views
Sorting on GPUs for large scale datasets: A thorough comparison 4,351 views
An EoS-meter of QCD transition from deep learning 4,336 views
Automatic Scan Parallelization in OpenMP 4,325 views
Exponential integrators on graphic processing units 4,324 views
Toward Performance Portability for CPUs and GPUs Through Algorithmic Compositions 4,320 views
The Parallel Bayesian Toolbox for High-performance Bayesian Filtering in Metrology 4,318 views
A Fast and Generic GPU-Based Parallel Reduction Implementation 4,313 views
VertexAPI2 – A Vertex-Program API for Large Graph Computations on the GPU 4,292 views
Parallel Prefix Sum (Scan) with CUDA 4,282 views
GPU Computing Gems: Jade Edition 4,280 views
GLSL Essentials 4,265 views
High-Performance Tensor Contractions for GPUs 4,260 views
k+-buffer: Fragment Synchronized k-buffer 4,257 views
MatConvNet – Convolutional Neural Networks for MATLAB 4,242 views
Robust GPGPU plugin development for RapidMiner 4,236 views
Caffe: Convolutional Architecture for Fast Feature Embedding 4,234 views
OpenDNN: An Open-source, cuDNN-like Deep Learning Primitive Library 4,222 views
High Performance Algorithms to Improve the Runtime Computation of Spacecraft Trajectories 4,203 views
An efficient GPU algorithm for tetrahedron-based Brillouin-zone integration 4,198 views
XGBoost: Scalable GPU Accelerated Learning 4,170 views
Dynamic Load Balancing Strategies for Graph Applications on GPUs 4,164 views
Computer Graphics: From Pixels to Programmable Graphics Hardware 4,158 views
GPU Pro 2 4,156 views
CuNeuQuant: A CUDA Implementation of the NeuQuant Image Quantization Algorithm 4,147 views
Medusa: Simplified Graph Processing on GPUs 4,146 views
Parallelize L-BFGS-B on the GPU 4,143 views
Graphics Processing Units in Acceleration of Bandwidth Selection for Kernel Density Estimation 4,142 views
A Compiler Infrastructure for Embedded Multicore SoCs 4,137 views
U-Net: Convolutional Networks for Biomedical Image Segmentation 4,133 views
Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead 4,132 views
cufftShift: High Performance CUDA-accelerated FFT-shift Library 4,130 views
Early Results of Deep Learning on the Stampede2 Supercomputer 4,126 views
A Convolutional Neural Network Cascade for Face Detection 4,107 views
Parallel Computing for the Inverse of SPD matrix 4,100 views
VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron 4,093 views
cuDNN: Efficient Primitives for Deep Learning 4,070 views
BENCHIP: Benchmarking Intelligence Processors 4,070 views
Titles: 100
Total views: 448917
- Programming - 186,119 views
- Login - 163,469 views
- User dashboard - 89,347 views
- Paper titles list - 67,566 views
- Add new event - 64,323 views
- Add new post - 58,677 views
- Register - 48,947 views
- Statistics - 35,190 views
- Modification of self-organizing migration algorithm for OpenCL framework - 34,139 views
- Books on OpenCL and CUDA - 28,545 views