high performance computing on graphics processing units: hgpu.org

hgpu.org » Algorithm optimization

Advancing the distributed Multi-GPU ChASE library through algorithm optimization and NCCL library

Xinzhe Wu, Edoardo Di Napoli

View

Tags: Algorithm optimization, Computer science, CUDA, MPI, nVidia, nVidia A100, OpenMP, OpenMPI, Package

October 8, 2023 by hgpu

EDSSA: An Encoder-Decoder Semantic Segmentation Networks Accelerator on OpenCL-Based FPGA Platform

Hongzhi Huang, Yakun Wu, Mengqi Yu, Xuesong Shi, Fei Qiao, Li Luo, Qi Wei, Xinjun Liu

View

Tags: Algorithm optimization, Computer science, Computer vision, FPGA, Neural networks, OpenCL

July 26, 2020 by hgpu

Implementing Push-Pull Efficiently in GraphBLAS

Carl Yang, Aydin Buluc, John D. Owens

View

Tags: Algorithm optimization, Algorithms, BLAS, Computer science, CUDA, Graph theory, Linear Algebra, nVidia, Sparse matrix, Tesla K40

April 15, 2018 by hgpu

Accelerated Combinatorial Optimization using Graphics Processing Units and C++ AMP

Alexandru Voicu

View

Tags: Algorithm optimization, Ant colony optimization, C++ AMP

September 8, 2014 by AlexVlx

Investigating performance variations of an optimized GPU-ported granulometry algorithm

Vincent Boulos, Vincent Fristot, Dominique Houzet, Luc Salvo, Pierre Lhuissier

View

Tags: Algorithm optimization, Algorithms, CUDA, FEM, Finite element method, Image processing, Materials Science, nVidia, nVidia GeForce GTX 285, nVidia GeForce GTX 480, nVidia Quadro FX 4000

February 22, 2013 by hgpu

Is the game worth the candle? Evaluation of OpenCL for object detection algorithm optimization

Floris De Smedt, Lars Stuyf, Sander Beckers, Joost Vennekens, Gorik De Samblanx, Toon Goedeme

View

Tags: Algorithm optimization, Algorithms, Computer science, Computer vision, nVidia, nVidia GeForce GTX 295, OpenCL

October 5, 2012 by hgpu

Image processing algorithm optimization with CUDA for Pure Data

Rudi Giot, Abilio Rodrigues e Sousa

View

Tags: Algorithm optimization, Algorithms, CUDA, Image processing, nVidia, Optimization

December 24, 2011 by hgpu

Automatic transformation and optimization of applications on GPUs and GPU clusters

Wenjing Ma

View

Tags: Algorithm optimization, Algorithms, Benchmarking, Code generation, Computer science, CUDA, Data mining, GPU cluster, Heterogeneous systems, nVidia, nVidia GeForce 8800 GTX, nVidia GeForce 9800 GX2, Optimization, Tesla T10, Thesis

November 9, 2011 by hgpu

A control-structure splitting optimization for GPGPU

Snaider Carrillo, Jakob Siegel, Xiaoming Li

Tags: Algorithm optimization, Computer science, CUDA, nVidia, Programming techniques

November 1, 2010 by hgpu

Performance study of interference on GPU and CPU resources with multiple applications

Shinichi Yamagiwa, Koichi Wada

Tags: Algorithm optimization, Computer science, Performance

October 29, 2010 by hgpu

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Victor W. Lee,Changkyu Kim,Jatin Chhugani,Michael Deisher,Daehyun Kim,Anthony D. Nguyen,Nadathur Satish,Mikhail Smelyanskiy,Srinivas Chennupaty,Per Hammarlund,Ronak Singhal,Pradeep Dubey

View

Tags: Algorithm optimization, Computer science, nVidia, nVidia GeForce GTX 280, Performance

October 27, 2010 by hgpu

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

GPU plugin for PySCF

Python-Based Quantum Chemistry Calculations with GPU Acceleration

QArray

QArray: a GPU-accelerated constant capacitance model simulator for large quantum dot arrays

Celerity: High-level C++ for Accelerator Clusters

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

OpenMC Monte Carlo Code

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Polygeist: C/C++ frontend for MLIR

Retargeting and Respecializing GPU Workloads for Performance Portability

Parallel Gaussian process with kernel approximation in CUDA

Parallel Gaussian process with kernel approximation in CUDA

See all packages

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: