high performance computing on graphics processing units: hgpu.org

Posts

Sep, 25

GPGPU workload analysis and media performance studies

This project was done with the Mobile Microprocessor Group at Intel Corporation as a part of a six month internship. The primay objective of this project was to study the performance of GPGPUs (General purpose computation on Graphics Processing Units) for various benchmark applications. GPGPUs have gained wide spread importance in recent years because of […]

OpenCL

Sep, 11

Parallel programming with NVIDIA CUDA

Using hardware acceleration via General Programming on stock GPUs (GPGPU), I’ve sped up my algorithms by more than tenfold. This article shows how you can achieve these results too! Programmers have been interested in leveraging the highly parallel processing power of video cards to speed up applications that are not graphic in nature for a […]

CUDA

Sep, 8

GPU Computation in Bioinspired Algorithms: A Review

Bioinspired methods usually need a high amount of computational resources. For this reason, parallelization is an interesting alternative in order to decrease the execution time and to provide accurate results. In this sense, recently there has been a growing interest in developing parallel algorithms using graphic processing units (GPU) also refered as GPU computation. Advances […]

CUDA

•

OpenCL

Sep, 8

Towards GPGPU Assisted Computing in Virtualized Environments

General Purpose Computation on Graphics Processing Units (GPGPU) makes it possible to use the massive computing power of modern graphics cards for generic high-performance computing. However, the new virtualization technologies will typically not support high-performance graphics cards and as a consequence GPGPU resources can not be used in typical virtualization setups. In this paper we […]

CUDA

•

OpenCL

Sep, 8

Implementing Independent Component Analysis in General-Purpose GPU Architectures

New computational architectures, such as multi-core processors and graphics processing units (GPUs), pose challenges to application developers. Although in the case of general-purpose GPU programming, environments and toolkits such as CUDA and OpenCL have simplified application development, different ways of thinking about memory access, storage, and program execution are required. This paper presents a strategy […]

CUDA

•

OpenCL

Aug, 31

Partial wave analysis at BES III harnessing the power of GPUs

Partial wave analysis is a core tool in hadron spectroscopy. With the high statistics data available at facilities such as the Beijing Spectrometer III, this procedure becomes computationally very expensive. We have successfully implemented a framework for performing partial wave analysis on graphics processors. We discuss the implementation, the parallel computing frameworks employed and the […]

OpenCL

Aug, 21

EpiGPU

MOTIVATION: Hundreds of genome-wide association studies have been performed over the last decade, but as single nucleotide polymorphism (SNP) chip density has increased so has the computational burden to search for epistasis [for n SNPs the computational time resource is O(n(n-1)/2)]. While the theoretical contribution of epistasis toward phenotypes of medical and economic importance is […]

OpenCL

Aug, 21

Visual Computing in Biology and Medicine: Interactive visual analysis of contrast-enhanced ultrasound data based on small neighborhood statistics

Contrast-enhanced ultrasound (CEUS) has recently become an important technology for lesion detection and characterization in cancer diagnosis. CEUS is used to investigate the perfusion kinetics in tissue over time, which relates to tissue vascularization. In this paper we present a pipeline that enables interactive visual exploration and semi-automatic segmentation and classification of CEUS data. For […]

OpenCL

Aug, 21

Reducing data access latency in SDSM systems using runtime optimizations

Software Distributed Shared Memory (SDSM) systems offer a convenient way to run applications developed for shared memory systems on distributed systems with no changes to them. However, since SDSM systems add an extra layer of abstraction to the memory hierarchy, applications may suffer performance problems when running on top of them. Our main research interest […]

OpenCL

Aug, 21

A new method for GPU based irregular reductions and its application to k-means clustering

A frequently used method of clustering is a technique called k-means clustering. The k-means algorithm consists of two steps: A map step, which is simple to execute on a GPU, and a reduce step, which is more problematic. Previous researchers have used a hybrid approach in which the map step is computed on the GPU […]

OpenCL

Aug, 21

Multi- and many-core data mining with adaptive sparse grids

Gaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is thus well-suited for huge amounts of data. Due to the recursive nature of sparse grid […]

OpenCL

Aug, 21

Sponge: portable stream programming on graphics engines

Graphics processing units (GPUs) provide a low cost platform for accelerating high performance computations. The introduction of new programming languages, such as CUDA and OpenCL, makes GPU programming attractive to a wide variety of programmers. However, programming GPUs is still a cumbersome task for two primary reasons: tedious performance optimizations and lack of portability. First, […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPGPU workload analysis and media performance studies

Parallel programming with NVIDIA CUDA

GPU Computation in Bioinspired Algorithms: A Review

Towards GPGPU Assisted Computing in Virtualized Environments

Implementing Independent Component Analysis in General-Purpose GPU Architectures

Partial wave analysis at BES III harnessing the power of GPUs

EpiGPU

Visual Computing in Biology and Medicine: Interactive visual analysis of contrast-enhanced ultrasound data based on small neighborhood statistics

Reducing data access latency in SDSM systems using runtime optimizations

A new method for GPU based irregular reductions and its application to k-means clustering

Multi- and many-core data mining with adaptive sparse grids

Sponge: portable stream programming on graphics engines

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)