high performance computing on graphics processing units: hgpu.org

Posts

Dec, 2

Effective GPU Strategies for LU Decomposition

GPUs are becoming an attractive computing platform not only for traditional graphics computation but also for general-purpose computation because of the computational power, programmability and comparatively low cost of modern GPUs. This has lead to a variety of complex GPGPU applications with significant performance improvements. The LU decomposition represents a fundamental step in many computationally […]

CUDA

Dec, 2

Parallel-META: A high-performance computational pipeline for metagenomic data analysis

Metagenomics method directly sequences and analyzes genome information from microbial communities. There are usually more than hundreds of genomes from different microbial species in the same community, and the main computational tasks for metagenomics data analysis include taxonomical and functional component of these genomes in the microbial community. Metagenomic data analysis is both data- and […]

Dec, 2

Efficient Cubic B-spline Image Interpolation on a GPU

Application of geometric transformation to images requires an interpolation step. When applied to image rotation, the presently most efficient GPU implementation for the cubic spline image interpolation still cost about 10 times as much as linear interpolation. This implementation involves two steps: a prefilter step performs a two-pass forward-backward recursive filter, then a cubic polynomial […]

CUDA

Dec, 2

Massively Parallelized Monte Carlo Simulation and its Applications in Finance

In this paper, we propose, develop and implement a tool that increases the computational speed of exotic derivatives pricing at a fraction of the cost of traditional methods. Our paper focuses on investigating the computing efficiencies of GPU systems. We utilize the GPU’s natural parallelization capabilities to price financial instruments. We outline our implementation, solutions […]

Dec, 2

An error correction solver for linear systems: Evaluation of mixed precision implementations

This paper proposes an error correction method for solving linear systems of equations and the evaluation of an implementation using mixed precision techniques. While different technologies are available, graphic processing units (GPUs) have been established as particularly powerful coprocessors in recent years. For this reason, our error correction approach is focused on a CUDA implementation […]

CUDA

Dec, 2

Auto-optimization of a Feature Selection Algorithm

Advanced visualization algorithms are typically computationally expensive but highly data parallel which make them attractive candidates for GPU architectures. However, porting algorithms on a GPU still remains a challenging process. The Mint programming model addresses this issue with its simple and high level interface. It targets the users who seek real-time performance without investing in […]

CUDA

Dec, 2

Evaluation of Fermi Features for Data Mining Algorithms

A recent development in High Performance Computing is the availability of NVIDIA’s Fermi or the 20-series GPUs. These offer features such as inbuilt atomic double precision support and increased shared memory. This thesis focuses on optimizing and evaluating the new features offered by the Fermi series GPUs for data mining algorithms involving reductions. Using three […]

CUDA

Dec, 2

Implementation of the FDTD Method Based on Lorentz-Drude Dispersive Model on GPU for Plasmonics Applications

We present a three-dimensional finite difference time domain (FDTD) method on graphics processing unit (GPU) for plasmonics applications. For the simulation of plasmonics devices, the Lorentz-Drude (LD) dispersive model is incorporated into Maxwell equations, while the auxiliary differential equation (ADE) technique is applied to the LD model. Our numerical experiments based on typical domain sizes […]

CUDA

Dec, 2

Spotting Radio Transients with the help of GPUs

Exploration of the time-domain radio sky has huge potential for advancing our knowledge of the dynamic universe. Past surveys have discovered large numbers of pulsars, rotating radio transients and other transient radio phenomena; however, they have typically relied upon off-line processing to cope with the high data and processing rate. This paradigm rules out the […]

CUDA

Dec, 1

A programming language interface to describe transformations and code generation

This paper presents a programming language interface, a complete scripting language, to describe composable compiler transformations. These transformation programs can be written, shared and reused by non-expert application and library developers. From a compiler writer’s perspective, a scripting language interface permits rapid prototyping of compiler algorithms that can mix levels and compose different sequences of […]

CUDA

Dec, 1

GPU Acceleration of Solving Parabolic Partial Differential Equations Using Difference Equations

Parabolic partial differential equations are often used to model systems involving heat transfer, acoustics, and electrostatics. The need for more complex models with increasing precision drives greater computational demands from processors. Since solving these types of equations is inherently parallel, GPU computing offers an attractive solution for drastically decreasing time to completion, power usage, and […]

CUDA

Dec, 1

Scalable Data Clustering using GPU Clusters

The computational demands of multivariate clustering grow rapidly, and therefore processing large data sets, like those found in flow cytometry data, is very time consuming on a single CPU. Fortunately these techniques lend themselves naturally to large scale parallel processing. To address the computational demands, graphics processing units, specifically NVIDIA’s CUDA framework and Tesla architecture, […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Effective GPU Strategies for LU Decomposition

Parallel-META: A high-performance computational pipeline for metagenomic data analysis

Efficient Cubic B-spline Image Interpolation on a GPU

Massively Parallelized Monte Carlo Simulation and its Applications in Finance

An error correction solver for linear systems: Evaluation of mixed precision implementations

Auto-optimization of a Feature Selection Algorithm

Evaluation of Fermi Features for Data Mining Algorithms

Implementation of the FDTD Method Based on Lorentz-Drude Dispersive Model on GPU for Plasmonics Applications

Spotting Radio Transients with the help of GPUs

A programming language interface to describe transformations and code generation

GPU Acceleration of Solving Parabolic Partial Differential Equations Using Difference Equations

Scalable Data Clustering using GPU Clusters

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)