high performance computing on graphics processing units: hgpu.org

hgpu.org » Tesla M2090

GPU Computing with Python: Performance, Energy Efficiency and Usability

Håvard H. Holm, André R. Brodtkorb, Martin L. Sætra

View

Tags: Computer science, CUDA, Energy-efficient computing, nVidia, nVidia GeForce GTX 780, nVidia GeForce GTX 840 M, OpenCL, Performance, Python, Tesla K20, Tesla M2090, Tesla P100, Tesla V100

December 8, 2019 by hgpu

Dense and sparse parallel linear algebra algorithms on graphics processing units

Alejandro Lamas Davina

View

Download (PDF)

Tags: Algorithms, Computer science, CUDA, Linear Algebra, MPI, nVidia, Sparse, Tesla K20, Tesla M2090, Thesis

November 25, 2018 by hgpu

Ray-traced Radiative Transfer on Massively Threaded Architectures

Samuel Paul Thomson

View

Download (PDF)

Tags: Astrophysics, CUDA, nVidia, Raytracing, Tesla M2090, Thesis

July 1, 2018 by hgpu

Research on the simulation of PF-LBM model based on MPI+CUDA mixed granularity parallel

Changsheng Zhu, Jieqiong Liu, Li Feng, Xin Deng

View

Download (PDF)

Tags: CUDA, Fluid dynamics, Heterogeneous systems, Lattice Boltzmann model, MPI, nVidia, Tesla M2090

June 24, 2018 by hgpu

Automatic generation of CUDA code performing tensor manipulations using C++ expression templates

Adam G.M. Lewis, Harald P. Pfeiffer

View

Download (PDF)

Tags: Benchmarking, Code generation, Computer science, CUDA, General Relativity and Quantum Cosmology, nVidia, Tesla K80, Tesla M2090, Tesla P100

April 28, 2018 by hgpu

GPU Accelerated Finite Element Assembly with Runtime Compilation

Tao Cui, Xiaohu Guo, Hui Liu

View

Download (PDF)

Tags: Computer science, CUDA, Differential equations, FEM, Finite element method, Mathematical Software, Numerical Analysis, nVidia, Partial differential equations, PDEs, Symbolic Computation, Tesla K20, Tesla M2090, Tesla V100

February 15, 2018 by hgpu

POMPEI: Programming with OpenMP4 for Exascale Investigations

Jack Dongarra, Azzam Haidar, Oscar Hernandez, Stanimire Tomov, Manjunath Gorentla Venkata

View

Download (PDF)

Tags: Computer science, Intel Xeon Phi, Linear Algebra, nVidia, OpenACC, OpenMP, Programming techniques, Tesla M2090

December 15, 2017 by hgpu

Scalability Study of Deep Learning Algorithms in High Performance Computer Infrastructures

Francesc Sastre Cabot

View

Download (PDF)

Tags: Algorithms, Computer science, Computer vision, CUDA, Deep learning, GPU cluster, Machine learning, Neural networks, nVidia, Tesla K80, Tesla M2090, Thesis

July 22, 2017 by hgpu

Optimizing Communication by Compression for Multi-GPU Scalable Breadth-First Searches

Julian Romera

View

Download (PDF)

Tags: Algorithms, Compression, Computer science, CUDA, nVidia, nVidia GeForce GTX 480, OpenMPI, Performance, Tesla K80, Tesla M2090, Thesis

April 7, 2017 by hgpu

GOTHIC: Gravitational oct-tree code accelerated by hierarchical time step controlling

Yohei Miki, Masayuki Umemura

View

Download (PDF)

Tags: Astrophysics, CUDA, Gravitation, Instrumentation and Methods for Astrophysics, N-body simulation, nVidia, nVidia GeForce GTX Titan X, Tesla K20, Tesla M2090

October 29, 2016 by hgpu

OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross-Pitaevskii equation

Vladimir Loncar, Luis E. Young-S., Srdjan Skrbic, Paulsamy Muruganandam, Sadhan K. Adhikari, Antun Balaz

View

Download (PDF)

Source codes

Tags: Condensed matter, CUDA, MPI, nVidia, OpenMP, Package, Physics, Tesla M2090

October 22, 2016 by hgpu

A parallel pattern for iterative stencil + reduce

M. Aldinucci, M. Danelutto, M. Drocco, P. Kilpatrick, C. Misale, G. Peretti Pezzi, M. Torquati

View

Download (PDF)

Tags: ARM, Computer science, nVidia, OpenCL, Stencil computation, Tesla K40, Tesla M2090

September 17, 2016 by hgpu

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

OpenMC Monte Carlo Code

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Polygeist: C/C++ frontend for MLIR

Retargeting and Respecializing GPU Workloads for Performance Portability

Parallel Gaussian process with kernel approximation in CUDA

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

GPU Computing with Python: Performance, Energy Efficiency and Usability

Dense and sparse parallel linear algebra algorithms on graphics processing units

Ray-traced Radiative Transfer on Massively Threaded Architectures

Research on the simulation of PF-LBM model based on MPI+CUDA mixed granularity parallel

Automatic generation of CUDA code performing tensor manipulations using C++ expression templates

GPU Accelerated Finite Element Assembly with Runtime Compilation

POMPEI: Programming with OpenMP4 for Exascale Investigations

Scalability Study of Deep Learning Algorithms in High Performance Computer Infrastructures

Optimizing Communication by Compression for Multi-GPU Scalable Breadth-First Searches

GOTHIC: Gravitational oct-tree code accelerated by hierarchical time step controlling

OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross-Pitaevskii equation

A parallel pattern for iterative stencil + reduce

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)