high performance computing on graphics processing units: hgpu.org

hgpu.org » Poster

Optimizing CUDA Shared Memory Usage

Shuang Gao, Gregory D. Peterson

View

Download (PDF)

Tags: Computer science, CUDA, nVidia, Optimization, Poster

December 4, 2015 by hgpu

Speculative Parallelization on GPGPUs

Min Feng, Rajiv Gupta, Laxmi N. Bhuyan

View

Download (PDF)

Tags: Compilers, Computer science, CUDA, nVidia, Performance, Poster, Tesla C1060

February 23, 2012 by hgpu

Design and Optimization of Image Processing Algorithms on Mobile GPU

Nitin Singhal, Jin Woo Yoo, Ho Yeol Choi, In Kyu Park

View

Download (PDF)

Tags: Algorithms, Image processing, Optimization, Poster, Rendering

December 8, 2011 by hgpu

Dax: Data Analysis at Extreme

Kenneth Moreland, Utkarsh Ayachit, Berk Geveci, Kwan-Liu Ma

View

Download (PDF)

Tags: Algorithms, Computer science, Poster, Visualization

October 28, 2011 by hgpu

Flexible neuronal network simulation framework using code generation for NVidia CUDA

Thomas Nowotny

View

Download (PDF)

Tags: Bioinformatics, Biology, CUDA, Neural networks, Neuroscience, nVidia, nVidia Quadro FX 5800, Poster

October 26, 2011 by hgpu

Towards dynamic reconfigurable load-balancing for hybrid desktop platforms

Alecio P.D. Binotto, Carlos E. Pereira, Dieter W. Fellner

View

Download (PDF)

Tags: Fluid dynamics, Heterogeneous systems, nVidia, nVidia GeForce 8800 GT, nVidia GeForce GTX 285, OpenCL, Poster

August 31, 2011 by hgpu

CT image reconstruction using hexagonal grids

Michael Knaup, Sven Steckmann, Olivier Bockenbach, and Marc Kachelriess

View

Download (PDF)

Tags: Algorithms, Computed tomography, CT, Image processing, Image reconstruction, Poster

July 18, 2011 by hgpu

Scalable Streaming-Array of Simple Soft-Processors for Stencil Computations with Constant Memory-Bandwidth

Kentaro Sano, Yoshiaki Hatsuda, Satoru Yamamoto

View

Download (PDF)

Tags: Computer science, FPGA, nVidia, Poster, Tesla C1060

June 21, 2011 by hgpu

Fast GPU-Based CT Reconstruction using the Common Unified Device Architecture (CUDA)

Holger Scherl, Benjamin Keck, Markus Kowarschik, Joachim Hornegger

View

Download (PDF)

Tags: Computed tomography, CT, CUDA, Image processing, Image reconstruction, Medicine, nVidia, nVidia GeForce 8800 GTX, Poster

May 28, 2011 by hgpu

A GPU-based architecture for real-time data assessment at synchrotron experiments

Suren Chilingaryan, Alessandro Mirone, Andrew Hammersley,Claudio Ferrero, Lukas Helfen, Andreas Kopmann, Tomy dos Santos Rolo

View

Download (PDF)

Tags: Biology, Image reconstruction, Materials Science, Medicine, nVidia, nVidia GeForce GTX 280, nVidia GeForce GTX 295, nVidia GeForce GTX 480, Poster, Tomography

May 8, 2011 by hgpu

Poster: GPU-accelerated rigid body fitting of atomic structures into electron density maps

Edward W. Lowe Jr., Nils Woetzel, Jens Meiler

View

Download (PDF)

Source codes

Tags: ATI, ATI Radeon HD 5970, Biology, Cryo-EM, Molecular modeling, nVidia, nVidia GeForce GTX 470, OpenCL, Package, Poster, Tesla C1060, Tesla C2050

April 1, 2011 by hgpu

Poster: GPU-accelerated artificial neural network for QSAR modeling

Edward W. Lowe Jr., Nils Woetzel, Jens Meiler

View

Download (PDF)

Tags: ATI, ATI Radeon HD 5970, Biology, Neural networks, nVidia, nVidia GeForce GTX 470, OpenCL, Poster, Tesla C1060, Tesla C2050

April 1, 2011 by hgpu

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

Accelerated discovery and design of Fe-Co-Zr magnets with tunable magnetic anisotropy through machine learning and parallel computing

ParEval: A Parallel Code Evaluation Benchmark

ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

No More Shading Languages: Compiling C++ to Vulkan Shaders

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Optimizing CUDA Shared Memory Usage

Speculative Parallelization on GPGPUs

Design and Optimization of Image Processing Algorithms on Mobile GPU

Dax: Data Analysis at Extreme

Flexible neuronal network simulation framework using code generation for NVidia CUDA

Towards dynamic reconfigurable load-balancing for hybrid desktop platforms

CT image reconstruction using hexagonal grids

Scalable Streaming-Array of Simple Soft-Processors for Stencil Computations with Constant Memory-Bandwidth

Fast GPU-Based CT Reconstruction using the Common Unified Device Architecture (CUDA)

A GPU-based architecture for real-time data assessment at synchrotron experiments

Poster: GPU-accelerated rigid body fitting of atomic structures into electron density maps

Poster: GPU-accelerated artificial neural network for QSAR modeling

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)