high performance computing on graphics processing units: hgpu.org

Posts

May, 11

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

General purpose GPU Computing (GPGPU) has taken off in the past few years, with great promises for increased desktop processing power due to the large number of fast computing cores on high-end graphics cards. Many publications have demonstrated phenomenal performance and have reported speedups as much as 1000x over code running on multi-core CPUs. Other […]

May, 11

Fast JND-Based Video Carving With GPU Acceleration for Real-Time Video Retargeting

A recently developed image resizing technique, seam carving, has been proved to be a useful tool for content-adaptive spatially nonuniform image resizing with the purpose of optimal display on a screen of reduced resolution or different aspect ratio. In this paper, we present a fast algorithm for real-time content-aware video retargeting based on the improved […]

CUDA

May, 11

Differential evolution algorithm on the GPU with C-CUDA

Several areas of knowledge are being benefited with the reduction of the computing time by using the technology of Graphics Processing Units (GPU) and the Compute Unified Device Architecture (CUDA) platform. In case of Evolutionary algorithms, which are inherently parallel, this technology may be advantageous for running experiments demanding high computing time. In this paper, […]

CUDA

May, 11

Whole-function vectorization

Data-parallel programming languages are an important component in today’s parallel computing landscape. Among those are domain-specific languages like shading languages in graphics (HLSL, GLSL, RenderMan, etc.) and “general-purpose” languages like CUDA or OpenCL. Current implementations of those languages on CPUs solely rely on multi-threading to implement parallelism and ignore the additional intra-core parallelism provided by […]

OpenCL

May, 11

Gemma in April: A matrix-like parallel programming architecture on OpenCL

Nowadays, Graphics Processing Unit (GPU), as a kind of massive parallel processor, has been widely used in general purposed computing tasks. Although there have been mature development tools, it is not a trivial task for programmers to write GPU programs. Based on this consideration, we propose a novel parallel computing architecture. The architecture includes a […]

OpenCL

May, 11

High performance memetic algorithm particle filter for multiple object tracking on modern GPUs

This work presents an effective approach to visual tracking using a graphics processing unit (GPU) for computation purposes. In order to get a performance improvement against other platforms it is convenient to select proper algorithms such as population-based ones. They expose a parallel-friendly nature needing from many independent evaluations that map well to the parallel […]

CUDA

May, 10

Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory

In this paper, we describe a runtime to automatically enhance the performance of applications running on heterogeneous platforms consisting of a multi-core (CPU) and a throughput-oriented many-core (GPU). The CPU and GPU are connected by a non-coherent interconnect such as PCI-E, and as such do not have shared memory. Heterogeneous platforms available today such as […]

CUDA

May, 10

The GPU on the simulation of cellular computing models

Membrane Computing is a discipline aiming to abstract formal computing models, called membrane systems or P systems, from the structure and functioning of the living cells as well as from the cooperation of cells in tissues, organs, and other higher order structures. This framework provides polynomial time solutions to NP-complete problems by trading space for […]

May, 10

Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming

MOTIVATION: The introduction of next-generation sequencing techniques and especially the high-throughput systems Solexa (Illumina Inc.) and SOLiD (ABI) made the mapping of short reads to reference sequences a standard application in modern bioinformatics. Short-read alignment is needed for reference based re-sequencing of complete genomes as well as for gene expression analysis based on transcriptome sequencing. […]

CUDA

May, 10

Fast Parallel Tandem Mass Spectral Library Searching Using GPU Hardware Acceleration

Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and […]

CUDA

May, 10

Accelerating image registration of MRI by GPU-based parallel computation

Automatic image registration for MRI applications generally requires many iteration loops and is, therefore, a time-consuming task. This drawback prolongs data analysis and delays the workflow of clinical routines. Recent advances in the massively parallel computation of graphic processing units (GPUs) may be a solution to this problem. This study proposes a method to accelerate […]

May, 10

Astrophysical particle simulations with large custom GPU clusters on three continents

We present direct astrophysical N-body simulations with up to six million bodies using our parallel MPI-CUDA code on large GPU clusters in Beijing, Berkeley, and Heidelberg, with different kinds of GPU hardware. The clusters are linked in the cooperation of ICCS (International Center for Computational Science). We reach about one third of the peak performance […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

Fast JND-Based Video Carving With GPU Acceleration for Real-Time Video Retargeting

Differential evolution algorithm on the GPU with C-CUDA

Whole-function vectorization

Gemma in April: A matrix-like parallel programming architecture on OpenCL

High performance memetic algorithm particle filter for multiple object tracking on modern GPUs

Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory

The GPU on the simulation of cellular computing models

Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming

Fast Parallel Tandem Mass Spectral Library Searching Using GPU Hardware Acceleration

Accelerating image registration of MRI by GPU-based parallel computation

Astrophysical particle simulations with large custom GPU clusters on three continents

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)