2402

Views of posts on hgpu.org

Efficient 2D Software Rendering  5,073 views

Unified Deep Learning with CPU, GPU, and FPGA Technologies  5,066 views

Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model  5,043 views

SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks  5,032 views

GPU Pro 6: Advanced Rendering Techniques  5,028 views

OpenCL in Action: How to Accelerate Graphics and Computations  5,017 views

Flexible FPGA design for FDTD using OpenCL  5,017 views

Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR  4,972 views

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network  4,966 views

A Comparison of the Performance of the Molecular Dynamics Simulation Package GROMACS Implemented in the SYCL and CUDA Programming Models  4,919 views

Scandalously Parallelizable Mesh Generation  4,886 views

Introduction to GPU Radix Sort  4,878 views

Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing  4,858 views

Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores  4,856 views

Light Loss-Less Data Compression, with GPU Implementation  4,824 views

Nemo: A parallelized Lagrangian particle-tracking model  4,819 views

GMP implementation on CUDA – A Backward Compatible Design With Performance Tuning  4,795 views

Distributed Training Large-Scale Deep Architectures  4,776 views

Fast parallel GPU-sorting using a hybrid algorithm  4,763 views

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems  4,762 views

OpenFace: A general-purpose face recognition library with mobile applications  4,748 views

CUSIMANN: An optimized simulated annealing software for GPUs  4,713 views

Theano: Deep Learning on GPUs with Python  4,708 views

Deep Voice 3: 2000-Speaker Neural Text-to-Speech  4,695 views

Deep Learning for Obfuscated Code Analysis  4,668 views

A cluster for CS education in the manycore era  4,666 views

Tesla vs. Xeon Phi vs. Radeon A Compiler Writer’s Perspective  4,662 views

Nengo: a Python tool for building large-scale functional brain models  4,658 views

Calamari – A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition  4,654 views

High performance comparison-based sorting algorithm on many-core GPUs  4,647 views

A GPU Parallelized Spectral Method for Elliptic Equations  4,639 views

Predicting the Execution Time of a kernel on a specific GPU using PTX code  4,627 views

Fast MPEG-CDVS Encoder with GPU-CPU Hybrid Computing  4,624 views

GooFit 2.0  4,597 views

A Study of Time and Energy Efficient Algorithms for Parallel and Heterogeneous Computing  4,593 views

Deep and Shallow convections in Atmosphere Models on Intel Xeon Phi Coprocessor Systems  4,589 views

Advanced Simulation Library: Expanding software ecosystem for the DSP/FPGA/GPU market  4,586 views

cudaMap: a GPU accelerated program for gene expression connectivity mapping  4,583 views

Data Sorting Using Graphics Processing Units  4,576 views

MATLAB and Python for GPU Computing  4,560 views

HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads  4,543 views

GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications  4,524 views

Accelerator Aware MPI Micro-benchmarking using CUDA, OpenACC and OpenCL  4,523 views

Espresso: Efficient Forward Propagation for BCNNs  4,515 views

Towards Portable Performance for Explicit Hydrodynamics Codes  4,506 views

Real-Time Rendering of Molecular Dynamics Simulation Data: A Tutorial  4,502 views

GPUTeraSort: high performance graphics co-processor sorting for large database management  4,502 views

Performance Analysis and Tuning For: General-Purpose Graphics Processing Units (GPGPU)  4,481 views

Combining Belief Propagation and Successive Cancellation List Decoding of Polar Codes on a GPU Platform  4,479 views

BbmTTP: Beat-based Parallel Simulated Annealing Algorithm on GPGPUs for the Mirrored Traveling Tournament Problem  4,464 views

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels  4,430 views

Intel Xeon Phi Coprocessor High-Performance Programming  4,424 views

ChainerMN: Scalable Distributed Deep Learning Framework  4,422 views

Designing efficient sorting algorithms for manycore GPUs  4,418 views

OpenCL 2.2 API Specification  4,412 views

A Framework for Productive, Efficient and Portable Parallel Computing  4,407 views

Techniques for efficient DCT/IDCT implementation on generic GPU  4,396 views

Strategy Preserving Compilation for Parallel Functional Code  4,396 views

A Highly Extensible Framework for Molecule Dynamic Simulation on GPUs  4,395 views

PCIeHLS: an OpenCL HLS framework  4,375 views

Data Coherence Analysis and Optimization for Heterogeneous Computing  4,366 views

GPGPU Programming for Games and Science  4,360 views

Multi-GPU Rendering with Vulkan API  4,352 views

Sorting on GPUs for large scale datasets: A thorough comparison  4,351 views

An EoS-meter of QCD transition from deep learning  4,336 views

Automatic Scan Parallelization in OpenMP  4,325 views

Exponential integrators on graphic processing units  4,324 views

Toward Performance Portability for CPUs and GPUs Through Algorithmic Compositions  4,320 views

The Parallel Bayesian Toolbox for High-performance Bayesian Filtering in Metrology  4,318 views

A Fast and Generic GPU-Based Parallel Reduction Implementation  4,313 views

VertexAPI2 – A Vertex-Program API for Large Graph Computations on the GPU  4,292 views

Parallel Prefix Sum (Scan) with CUDA  4,282 views

GPU Computing Gems: Jade Edition  4,280 views

GLSL Essentials  4,265 views

High-Performance Tensor Contractions for GPUs  4,260 views

k+-buffer: Fragment Synchronized k-buffer  4,257 views

MatConvNet – Convolutional Neural Networks for MATLAB  4,242 views

Robust GPGPU plugin development for RapidMiner  4,236 views

Caffe: Convolutional Architecture for Fast Feature Embedding  4,234 views

OpenDNN: An Open-source, cuDNN-like Deep Learning Primitive Library  4,222 views

High Performance Algorithms to Improve the Runtime Computation of Spacecraft Trajectories  4,203 views

An efficient GPU algorithm for tetrahedron-based Brillouin-zone integration  4,198 views

XGBoost: Scalable GPU Accelerated Learning  4,170 views

Dynamic Load Balancing Strategies for Graph Applications on GPUs  4,164 views

Computer Graphics: From Pixels to Programmable Graphics Hardware  4,158 views

GPU Pro 2  4,156 views

CuNeuQuant: A CUDA Implementation of the NeuQuant Image Quantization Algorithm  4,147 views

Medusa: Simplified Graph Processing on GPUs  4,146 views

Parallelize L-BFGS-B on the GPU  4,143 views

Graphics Processing Units in Acceleration of Bandwidth Selection for Kernel Density Estimation  4,142 views

A Compiler Infrastructure for Embedded Multicore SoCs  4,137 views

U-Net: Convolutional Networks for Biomedical Image Segmentation  4,133 views

Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead  4,132 views

cufftShift: High Performance CUDA-accelerated FFT-shift Library  4,130 views

Early Results of Deep Learning on the Stampede2 Supercomputer  4,126 views

A Convolutional Neural Network Cascade for Face Detection  4,107 views

Parallel Computing for the Inverse of SPD matrix  4,100 views

VASP on a GPU: application to exact-exchange calculations of the stability of elemental boron  4,093 views

cuDNN: Efficient Primitives for Deep Learning  4,070 views

BENCHIP: Benchmarking Intelligence Processors  4,070 views

 

Brief statistics for this page

Titles: 100

Total views: 448917

 

Most viewed items:

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: