high performance computing on graphics processing units: hgpu.org

Posts

Jul, 2

On Expressing Different Concurrency Paradigms on Virtual Execution Systems

Virtual machines emerged during the 90s as the platform for developing frameworks and applications, offering large base class libraries, dynamic loading, and reflection. The design of these machines was influenced by the then dominant idea that processors would have maintained a Von-Neumann model while hiding non-Von Neumann aspects in their internal structure. Recently Graphics Processing […]

Jul, 2

Approximation of Loop Subdivision Surfaces for Fast Rendering

This paper describes an approach to the approximation of Loop subdivision surfaces for real-time rendering. The approach consists of two phases, which separately construct the approximation geometry and the normal field of a subdivision surface. It first exploits quartic triangular Bezier patches to approximate the geometry of the subdivision surface by interpolating a grid of […]

Jul, 2

Nonnegative Tensor Factorization Accelerated Using GPGPU

This article presents an optimized algorithm for Nonnegative Tensor Factorization (NTF), implemented in the CUDA (Compute Uniform Device Architecture) framework, that runs on contemporary graphics processors and exploits their massive parallelism. The NTF implementation is primarily targeted for analysis of high-dimensional spectral images, including dimensionality reduction, feature extraction, and other tasks related to spectral imaging; […]

CUDA

Jul, 2

Integrating Object Detection with 3D Tracking Towards a Better Driver Assistance System

Driver assistance helps save lives. Accurate 3D pose is required to establish if a traffic sign is relevant to the driver. We propose a real-time system that integrates single view detection with region-based 3D tracking of road signs. The optimal set of candidate detections is found, followed by AdaBoost cascades and SVMs. The 2D detections […]

Jul, 2

Massively parallel two-dimensional TLM algorithm on graphics processing units

Recent advances in computing technology has brought massively parallel computing power to desktop PCs. As multi-core processor technology becomes mature, a new front in parallel technology based on graphics processors has emerged. A massively parallel 2D-TLM algorithm for NVIDIA advanced graphics processors has been developed. The proposed parallel computing paradigm can be adopted straightforwardly to […]

CUDA

Jul, 2

Compressed Facade Displacement Maps

We describe an approach to render massive urban models. To prevent a memory transfer bottleneck we propose to render the models from a compressed representation directly. Our solution is based on rendering crude building outlines as polygons and generating details by ray-tracing displacement maps in the fragment shader. We demonstrate how to compress a displacement […]

OpenGL

Jul, 2

Real-Time Reconstruction of Sensitivity Encoded Radial Magnetic Resonance Imaging Using a Graphics Processing Unit

A barrier to the adoption of non-Cartesian parallel magnetic resonance imaging for real-time applications has been the times required for the image reconstructions. These times have exceeded the underlying acquisition time thus preventing real-time display of the acquired images. We present a reconstruction algorithm for commodity graphics hardware (GPUs) to enable real time reconstruction of […]

Jul, 2

Parallelization of RSA Algorithm Based on Compute Unified Device Architecture

In the domain of computer security, how to enhance the speed of RSA algorithm has been the research hot spot. With the recent tremendous increase in Graphics Processing Unit’s computing capability as a co-processor of the CPU, Nvidia’s Compute Unified Device Architecture (CUDA) can greatly benefit single instruction multiple thread styled, computationally expensive programs. This […]

CUDA

Jul, 2

Visualizing and Analyzing the Mona Lisa

As technologies for acquiring 3D data and algorithms for constructing integrated models evolve, very large data sets representing objects or environments are emerging in various application areas. As a result, significant research in computer graphics has aimed to interactively render such models on affordable commodity computers. Interest is growing in the possibility of integrating real-time […]

Jul, 1

Size-based Transfer Functions: A New Volume Exploration Technique

The visualization of complex 3D images remains a challenge, a fact that is magnified by the difficulty to classify or segment volume data. In this paper, we introduce size-based transfer functions, which map the local scale of features to color and opacity. Features in a data set with similar or identical scalar values can be […]

OpenGL

Jul, 1

Adaptive proxy geometry for direct volume manipulation

This paper introduces a new design to allow interactive, direct manipulation of volume data on volumetrically rendered images. We present an adaptive volume proxy mesh which serves not to define surfaces, but to encode the geometry and physical state of the volume. This system performs a modeling-free form of direct volume deformation by adaptively constructing […]

OpenGL

Jul, 1

Graphics processing unit accelerated non-uniform fast Fourier transform for ultrahigh-speed, real-time Fourier-domain OCT

We implemented fast Gaussian gridding (FGG)-based non-uniform fast Fourier transform (NUFFT) on the graphics processing unit (GPU) architecture for ultrahigh-speed, real-time Fourier-domain optical coherence tomography (FD-OCT). The Vandermonde matrix-based non-uniform discrete Fourier transform (NUDFT) as well as the linear/cubic interpolation with fast Fourier transform (InFFT) methods are also implemented on GPU to compare their performance […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

On Expressing Different Concurrency Paradigms on Virtual Execution Systems

Approximation of Loop Subdivision Surfaces for Fast Rendering

Nonnegative Tensor Factorization Accelerated Using GPGPU

Integrating Object Detection with 3D Tracking Towards a Better Driver Assistance System

Massively parallel two-dimensional TLM algorithm on graphics processing units

Compressed Facade Displacement Maps

Real-Time Reconstruction of Sensitivity Encoded Radial Magnetic Resonance Imaging Using a Graphics Processing Unit

Parallelization of RSA Algorithm Based on Compute Unified Device Architecture

Visualizing and Analyzing the Mona Lisa

Size-based Transfer Functions: A New Volume Exploration Technique

Adaptive proxy geometry for direct volume manipulation

Graphics processing unit accelerated non-uniform fast Fourier transform for ultrahigh-speed, real-time Fourier-domain OCT

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)