2402

Views of posts on hgpu.org

On Pre-Trained Image Features and Synthetic Images for Deep Learning  5,557 views

Flexible FPGA design for FDTD using OpenCL  5,556 views

Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model  5,541 views

Efficient 2D Software Rendering  5,531 views

Cue-independent extending inverse kinematics for robust pose estimation in 3D point clouds  5,531 views

Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs  5,490 views

Fast parallel GPU-sorting using a hybrid algorithm  5,486 views

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network  5,476 views

GPU Pro 6: Advanced Rendering Techniques  5,454 views

Parallel Neural Network Training with OpenCL  5,454 views

SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks  5,421 views

Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR  5,408 views

OpenFace: A general-purpose face recognition library with mobile applications  5,375 views

Advanced Simulation Library: Expanding software ecosystem for the DSP/FPGA/GPU market  5,372 views

Introduction to GPU Radix Sort  5,351 views

Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores  5,329 views

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems  5,319 views

GMP implementation on CUDA – A Backward Compatible Design With Performance Tuning  5,317 views

Deep Learning for Obfuscated Code Analysis  5,308 views

Nemo: A parallelized Lagrangian particle-tracking model  5,261 views

High performance comparison-based sorting algorithm on many-core GPUs  5,254 views

Scandalously Parallelizable Mesh Generation  5,252 views

Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing  5,249 views

Theano: Deep Learning on GPUs with Python  5,234 views

Distributed Training Large-Scale Deep Architectures  5,226 views

Light Loss-Less Data Compression, with GPU Implementation  5,224 views

Data Sorting Using Graphics Processing Units  5,223 views

cudaMap: a GPU accelerated program for gene expression connectivity mapping  5,192 views

Predicting the Execution Time of a kernel on a specific GPU using PTX code  5,158 views

Performance Analysis and Tuning For: General-Purpose Graphics Processing Units (GPGPU)  5,156 views

GPUTeraSort: high performance graphics co-processor sorting for large database management  5,145 views

Deep Voice 3: 2000-Speaker Neural Text-to-Speech  5,133 views

A cluster for CS education in the manycore era  5,129 views

Calamari – A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition  5,111 views

Designing efficient sorting algorithms for manycore GPUs  5,106 views

Nengo: a Python tool for building large-scale functional brain models  5,098 views

High-Performance Tensor Contractions for GPUs  5,097 views

A Study of Time and Energy Efficient Algorithms for Parallel and Heterogeneous Computing  5,080 views

CUSIMANN: An optimized simulated annealing software for GPUs  5,078 views

Tesla vs. Xeon Phi vs. Radeon A Compiler Writer’s Perspective  5,078 views

Fast MPEG-CDVS Encoder with GPU-CPU Hybrid Computing  5,070 views

Exponential integrators on graphic processing units  5,060 views

Intel Xeon Phi Coprocessor High-Performance Programming  5,048 views

MATLAB and Python for GPU Computing  5,048 views

A GPU Parallelized Spectral Method for Elliptic Equations  5,033 views

GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications  5,019 views

Multi-GPU Rendering with Vulkan API  5,010 views

Parallel Prefix Sum (Scan) with CUDA  5,005 views

GooFit 2.0  5,004 views

GPU Computing Gems: Jade Edition  4,984 views

Real-Time Rendering of Molecular Dynamics Simulation Data: A Tutorial  4,972 views

Deep and Shallow convections in Atmosphere Models on Intel Xeon Phi Coprocessor Systems  4,962 views

Sorting on GPUs for large scale datasets: A thorough comparison  4,943 views

HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads  4,937 views

Espresso: Efficient Forward Propagation for BCNNs  4,935 views

OpenCL 2.2 API Specification  4,935 views

Towards Portable Performance for Explicit Hydrodynamics Codes  4,916 views

Accelerator Aware MPI Micro-benchmarking using CUDA, OpenACC and OpenCL  4,912 views

GPGPU Programming for Games and Science  4,867 views

Techniques for efficient DCT/IDCT implementation on generic GPU  4,863 views

Combining Belief Propagation and Successive Cancellation List Decoding of Polar Codes on a GPU Platform  4,860 views

A Framework for Productive, Efficient and Portable Parallel Computing  4,848 views

U-Net: Convolutional Networks for Biomedical Image Segmentation  4,843 views

BbmTTP: Beat-based Parallel Simulated Annealing Algorithm on GPGPUs for the Mirrored Traveling Tournament Problem  4,833 views

GLSL Essentials  4,823 views

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels  4,817 views

PCIeHLS: an OpenCL HLS framework  4,812 views

An EoS-meter of QCD transition from deep learning  4,808 views

Strategy Preserving Compilation for Parallel Functional Code  4,803 views

A Highly Extensible Framework for Molecule Dynamic Simulation on GPUs  4,795 views

Atmospheric Chemistry  4,780 views

ChainerMN: Scalable Distributed Deep Learning Framework  4,776 views

Medusa: Simplified Graph Processing on GPUs  4,754 views

Toward Performance Portability for CPUs and GPUs Through Algorithmic Compositions  4,751 views

Data Coherence Analysis and Optimization for Heterogeneous Computing  4,739 views

Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead  4,736 views

A Comparison of Potential Interfaces for Batched BLAS Computations  4,734 views

The Parallel Bayesian Toolbox for High-performance Bayesian Filtering in Metrology  4,731 views

Parallelize L-BFGS-B on the GPU  4,729 views

Automatic Scan Parallelization in OpenMP  4,722 views

Caffe: Convolutional Architecture for Fast Feature Embedding  4,722 views

A Fast and Generic GPU-Based Parallel Reduction Implementation  4,715 views

cufftShift: High Performance CUDA-accelerated FFT-shift Library  4,714 views

VertexAPI2 – A Vertex-Program API for Large Graph Computations on the GPU  4,702 views

VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron  4,693 views

GPU Pro 2  4,676 views

k+-buffer: Fragment Synchronized k-buffer  4,674 views

OpenDNN: An Open-source, cuDNN-like Deep Learning Primitive Library  4,657 views

Dissecting GPU Memory Hierarchy through Microbenchmarking  4,656 views

A Compiler Infrastructure for Embedded Multicore SoCs  4,656 views

MatConvNet – Convolutional Neural Networks for MATLAB  4,651 views

Computer Graphics: From Pixels to Programmable Graphics Hardware  4,621 views

DeepLearningKit – an Open Source Deep Learning Framework for Apple’s iOS, OS X and tvOS developed in Metal and Swift  4,615 views

A Convolutional Neural Network Cascade for Face Detection  4,606 views

CuNeuQuant: A CUDA Implementation of the NeuQuant Image Quantization Algorithm  4,601 views

cuDNN: Efficient Primitives for Deep Learning  4,599 views

XGBoost: Scalable GPU Accelerated Learning  4,594 views

Dynamic Load Balancing Strategies for Graph Applications on GPUs  4,592 views

An efficient GPU algorithm for tetrahedron-based Brillouin-zone integration  4,584 views

High Performance Algorithms to Improve the Runtime Computation of Spacecraft Trajectories  4,577 views

 

Brief statistics for this page

Titles: 100

Total views: 499882

 

Most viewed items:

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org