high performance computing on graphics processing units: hgpu.org

Posts

Jun, 11

Enabling High Performance Computing in Cloud Infrastructure using rCUDA

With the dawn of virtualization and Infrastructureas-a-Service (IaaS), the comprehensive technical computing community is in view of the use of clouds for their technical computing needs. This is due to the relative scalability, ease of use, advanced user milieu customization abilities clouds provide, as well as many novel computing archetypes available for data-intensive applications. However, […]

CUDA

Jun, 11

Canadian Hydrogen Intensity Mapping Experiment (CHIME) Pathfinder

A pathfinder version of CHIME (the Canadian Hydrogen Intensity Mapping Experiment) is currently being commissioned at the Dominion Radio Astrophysical Observatory (DRAO) in Penticton, BC. The instrument is a hybrid cylindrical interferometer designed to measure the large scale neutral hydrogen power spectrum across the redshift range 0.8 to 2.5. The power spectrum will be used […]

OpenCL

Jun, 9

Improvement Study of EEMD Decomposition Efficiency Based on CUDA Architecture

EEMD can inhibit mode mixing, which may occur in EMD, EEMD is a technology of adding many groups of white noise to original signal to do assisted analysis on the basis of EMD, however, it will greatly reduce the decomposition efficiency of the signal. In order to eliminate the effects of mode mixing, and improve […]

CUDA

Jun, 9

Efficient all-against-all protein similarity matrix computation using OpenCL

In this report we introduced CLSW, a fast GPU-based Smith-Waterman score-only-alignment calculator. While generally applicable for any protein alignment problem, it was designed specifically as a proof-of-concept application for SIMAP. Even if we had only two weeks to develop a fully functional, validated and optimized implementation and all related concepts, our results show that in […]

OpenCL

Jun, 9

GPU-Accelerated Dynamic Functional Connectivity Analysis for Functional MRI Data Using OpenCL

Intense computations in engineering and science, especially bioinformatics have been made practical by the recent advances in Graphical Processing Unit (GPU) computing technology. In this study, implementation and performance evaluations for a GPU-accelerated dynamic functional connectivity (DFC) analysis, which is an analysis method for investigating dynamic interactions among different brain networks, is presented. Open Computing […]

OpenCL

Jun, 9

3D Skeleton Extraction Method using Potential Field on OpenCL

For 3D skeleton extraction, the algorithm based on generalized potential fields, known as the outstandingly flexible and robust method, is suffering from seriously heavy computational burden. In this paper, we put forward a parallel algorithm based on OpenCL heterogeneous parallel framework, which can make full use of the great computing power provided by heterogeneous model […]

OpenCL

Jun, 9

Multi-level parallelization for hybrid ACO

The Graphics-Processing-Unit (GPU) became one of the main platforms to design massively parallel metaheuristics. This advance is due to the highly parallel architecture of GPU and especially thanks to the publication of languages like CUDA. In this paper, we deal with a multilevel parallel hybrid Ant System (AS) to solve the Travelling Salesman Problem (TSP). […]

CUDA

Jun, 8

The Performance Analysis Based on Heterogeneous Parallel Processors for Anisotropic Diffusion Filters

A noise in digital image degrades the performance of image processing. These images are most often used in medical field for diagnosis and treatment. Thus, there is a huge demand for high quality images from the medical field. The current algorithms to process useable images are derived using Gaussian blur filter. However, using such isotropic […]

CUDA

Jun, 8

Native Offload of Haskell Repa Programs to GPGPU

In light of recent hardware advances, General Purpose Graphics Processing Units (GPGPUs) are becoming increasingly commonplace, and demand novel programming models to account for their radically different architecture. For the most part, existing approaches to programming GPGPUs within a high-level programming language choose to embed a domain specific language (DSL) within a host metalanguage and […]

OpenCL

Jun, 8

A numerical tour of wave propagation

This tutorial is written for beginners as an introduction to basic wave propagation using nite dierence method, from acoustic and elastic wave modeling, to reverse time migration and full waveform inversion. Most of the theoretical delineations summarized in this tutorial have been implemented in Madagascar with Matlab, C and CUDA programming, which will benet readers’ […]

CUDA

Jun, 8

Efficient 3D Isotropic Volume Reconstruction Based On 2D Localized Ultrasound Images

A miniature 3D tracked ultrasonic probe has been developed to acquire intra-articular cartilage images under arthroscopic surgical conditions. The aim is to detect cartilaginous lesions (arthritis) and quantify their precise sizes and locations to help the clinician in his diagnostic and his therapeutic decision making. The ultrasonic transducer is tracked by an optical sensor, which […]

CUDA

Jun, 8

Review and Comparative Study of Ray Traversal Algorithms on a Modern GPU Architecture

In this paper we present a chronological review of five distinct data structures commonly found in literature and ray tracing systems: Bounding Volume Hierarchies (BVH), Octrees, Uniform Grids, KD-Trees, and Bounding Interval Hierarchies (BIH). This review is then followed by an extensive comparative study of six different ray traversal algorithms implemented on a modern Kepler […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Enabling High Performance Computing in Cloud Infrastructure using rCUDA

Canadian Hydrogen Intensity Mapping Experiment (CHIME) Pathfinder

Improvement Study of EEMD Decomposition Efficiency Based on CUDA Architecture

Efficient all-against-all protein similarity matrix computation using OpenCL

GPU-Accelerated Dynamic Functional Connectivity Analysis for Functional MRI Data Using OpenCL

3D Skeleton Extraction Method using Potential Field on OpenCL

Multi-level parallelization for hybrid ACO

The Performance Analysis Based on Heterogeneous Parallel Processors for Anisotropic Diffusion Filters

Native Offload of Haskell Repa Programs to GPGPU

A numerical tour of wave propagation

Efficient 3D Isotropic Volume Reconstruction Based On 2D Localized Ultrasound Images

Review and Comparative Study of Ray Traversal Algorithms on a Modern GPU Architecture

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)