Views of posts on hgpu.org
On Pre-Trained Image Features and Synthetic Images for Deep Learning 5,557 views
Flexible FPGA design for FDTD using OpenCL 5,556 views
Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model 5,541 views
Efficient 2D Software Rendering 5,531 views
Cue-independent extending inverse kinematics for robust pose estimation in 3D point clouds 5,531 views
Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs 5,490 views
Fast parallel GPU-sorting using a hybrid algorithm 5,486 views
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network 5,476 views
GPU Pro 6: Advanced Rendering Techniques 5,454 views
Parallel Neural Network Training with OpenCL 5,454 views
SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks 5,421 views
Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR 5,408 views
OpenFace: A general-purpose face recognition library with mobile applications 5,375 views
Advanced Simulation Library: Expanding software ecosystem for the DSP/FPGA/GPU market 5,372 views
Introduction to GPU Radix Sort 5,351 views
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems 5,319 views
GMP implementation on CUDA – A Backward Compatible Design With Performance Tuning 5,317 views
Deep Learning for Obfuscated Code Analysis 5,308 views
Nemo: A parallelized Lagrangian particle-tracking model 5,261 views
High performance comparison-based sorting algorithm on many-core GPUs 5,254 views
Scandalously Parallelizable Mesh Generation 5,252 views
Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing 5,249 views
Theano: Deep Learning on GPUs with Python 5,234 views
Distributed Training Large-Scale Deep Architectures 5,226 views
Light Loss-Less Data Compression, with GPU Implementation 5,224 views
Data Sorting Using Graphics Processing Units 5,223 views
cudaMap: a GPU accelerated program for gene expression connectivity mapping 5,192 views
Predicting the Execution Time of a kernel on a specific GPU using PTX code 5,158 views
Performance Analysis and Tuning For: General-Purpose Graphics Processing Units (GPGPU) 5,156 views
GPUTeraSort: high performance graphics co-processor sorting for large database management 5,145 views
Deep Voice 3: 2000-Speaker Neural Text-to-Speech 5,133 views
A cluster for CS education in the manycore era 5,129 views
Calamari – A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition 5,111 views
Designing efficient sorting algorithms for manycore GPUs 5,106 views
Nengo: a Python tool for building large-scale functional brain models 5,098 views
High-Performance Tensor Contractions for GPUs 5,097 views
A Study of Time and Energy Efficient Algorithms for Parallel and Heterogeneous Computing 5,080 views
CUSIMANN: An optimized simulated annealing software for GPUs 5,078 views
Tesla vs. Xeon Phi vs. Radeon A Compiler Writer’s Perspective 5,078 views
Fast MPEG-CDVS Encoder with GPU-CPU Hybrid Computing 5,070 views
Exponential integrators on graphic processing units 5,060 views
Intel Xeon Phi Coprocessor High-Performance Programming 5,048 views
MATLAB and Python for GPU Computing 5,048 views
A GPU Parallelized Spectral Method for Elliptic Equations 5,033 views
GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications 5,019 views
Multi-GPU Rendering with Vulkan API 5,010 views
Parallel Prefix Sum (Scan) with CUDA 5,005 views
GooFit 2.0 5,004 views
GPU Computing Gems: Jade Edition 4,984 views
Real-Time Rendering of Molecular Dynamics Simulation Data: A Tutorial 4,972 views
Deep and Shallow convections in Atmosphere Models on Intel Xeon Phi Coprocessor Systems 4,962 views
Sorting on GPUs for large scale datasets: A thorough comparison 4,943 views
HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads 4,937 views
Espresso: Efficient Forward Propagation for BCNNs 4,935 views
OpenCL 2.2 API Specification 4,935 views
Towards Portable Performance for Explicit Hydrodynamics Codes 4,916 views
Accelerator Aware MPI Micro-benchmarking using CUDA, OpenACC and OpenCL 4,912 views
GPGPU Programming for Games and Science 4,867 views
Techniques for efficient DCT/IDCT implementation on generic GPU 4,863 views
Combining Belief Propagation and Successive Cancellation List Decoding of Polar Codes on a GPU Platform 4,860 views
A Framework for Productive, Efficient and Portable Parallel Computing 4,848 views
U-Net: Convolutional Networks for Biomedical Image Segmentation 4,843 views
GLSL Essentials 4,823 views
Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels 4,817 views
PCIeHLS: an OpenCL HLS framework 4,812 views
An EoS-meter of QCD transition from deep learning 4,808 views
Strategy Preserving Compilation for Parallel Functional Code 4,803 views
A Highly Extensible Framework for Molecule Dynamic Simulation on GPUs 4,795 views
Atmospheric Chemistry 4,780 views
ChainerMN: Scalable Distributed Deep Learning Framework 4,776 views
Medusa: Simplified Graph Processing on GPUs 4,754 views
Toward Performance Portability for CPUs and GPUs Through Algorithmic Compositions 4,751 views
Data Coherence Analysis and Optimization for Heterogeneous Computing 4,739 views
Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead 4,736 views
A Comparison of Potential Interfaces for Batched BLAS Computations 4,734 views
The Parallel Bayesian Toolbox for High-performance Bayesian Filtering in Metrology 4,731 views
Parallelize L-BFGS-B on the GPU 4,729 views
Automatic Scan Parallelization in OpenMP 4,722 views
Caffe: Convolutional Architecture for Fast Feature Embedding 4,722 views
A Fast and Generic GPU-Based Parallel Reduction Implementation 4,715 views
cufftShift: High Performance CUDA-accelerated FFT-shift Library 4,714 views
VertexAPI2 – A Vertex-Program API for Large Graph Computations on the GPU 4,702 views
VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron 4,693 views
GPU Pro 2 4,676 views
k+-buffer: Fragment Synchronized k-buffer 4,674 views
OpenDNN: An Open-source, cuDNN-like Deep Learning Primitive Library 4,657 views
Dissecting GPU Memory Hierarchy through Microbenchmarking 4,656 views
A Compiler Infrastructure for Embedded Multicore SoCs 4,656 views
MatConvNet – Convolutional Neural Networks for MATLAB 4,651 views
Computer Graphics: From Pixels to Programmable Graphics Hardware 4,621 views
A Convolutional Neural Network Cascade for Face Detection 4,606 views
CuNeuQuant: A CUDA Implementation of the NeuQuant Image Quantization Algorithm 4,601 views
cuDNN: Efficient Primitives for Deep Learning 4,599 views
XGBoost: Scalable GPU Accelerated Learning 4,594 views
Dynamic Load Balancing Strategies for Graph Applications on GPUs 4,592 views
An efficient GPU algorithm for tetrahedron-based Brillouin-zone integration 4,584 views
High Performance Algorithms to Improve the Runtime Computation of Spacecraft Trajectories 4,577 views
Titles: 100
Total views: 499882
- Programming - 186,230 views
- Login - 172,112 views
- User dashboard - 98,572 views
- Paper titles list - 92,669 views
- Add new event - 69,198 views
- Add new post - 62,784 views
- Register - 53,094 views
- Statistics - 44,235 views
- Modification of self-organizing migration algorithm for OpenCL framework - 34,520 views
- Books on OpenCL and CUDA - 31,158 views