high performance computing on graphics processing units: hgpu.org

Posts

Jul, 25

COTS cluster-based sort-last rendering: performance evaluation and pipelined implementation

Sort-last parallel rendering is an efficient technique to visualize huge datasets on COTS clusters. The dataset is subdivided and distributed across the cluster nodes. For every frame, each node renders a full resolution image of its data using its local GPU, and the images are composited together using a parallel image compositing algorithm. In this […]

OpenGL

Jul, 25

A Fixed-Complexity Sphere Decoder for MIMO Systems on Graphics Processing Units

Graphic Processing Units (GPUs) is a low-cost parallel programmable co-processor that can deliver extremely high computation throughput and is well suited for large-scale system design and simulation. In this paper, we utilize the parallel processing power of GPU to accelerate the simulation of MIMO systems. In our work, flat fading channel is considered and an […]

Jul, 25

A Coarse Grain Reconfigurable Architecture for sequence alignment problems in bio-informatics

A Coarse Grain Reconfigurable Architecture (CGRA) tailored for accelerating bio-informatics algorithms is proposed. The key innovation is a light weight bio-informatics processor that can be reconfigured to perform different Add Compare and Select operations of the popular sequencing algorithms. A programmable and scalable architectural platform instantiates an array of such processing elements and allows arbitrary […]

Jul, 25

Real-time interactive object extraction system for high resolution remote sensing images based on parallel computing architecture

Random Walks has less interaction, better accuracy and higher computing independency. We introduce local intensity entropy to modify the weight function in Random Walks, in order to consider not only the intensity change of adjacent pixels, but also the statistical features of regions. Then we put forward a real-time interactive object extraction system for high […]

CUDA

Jul, 25

Modular Technology in the Modelling of Large Virtual Environments in Driving Simulators

This paper presents the latest research and developments in Modular Technology. That is, the optimized repetition of the same geometry or module, for the generation of large virtual environments for the simulators that are designed by CITEF. The current trend is on redirecting the maximum possible share of graphical calculation to the GPU to lighten […]

Jul, 24

Gpu architecture for stationary multisensor pedestrian detection at smart intersections

We present a real-time multisensor architecture for combined laser scanner and infra-red video-based pedestrian detection and tracking used within a road side unit for intersection assistance. In order to achieve outmost classification performance we propose a cascaded classifier using laser scanner hypothesis generation and histogram of oriented gradients (HOG) descriptors for video-based classification together with […]

CUDA

Jul, 24

Central Force Optimization on a GPU: A case study in high performance metaheuristics using multiple topologies

Central Force Optimization (CFO) is a powerful new metaheuristic algorithm that has been demonstrated to be competitive with other metaheuristic algorithms such as Genetic Algorithms (GA), Particle Swarm Optimization (PSO), and Group Search Optimization (GSO). While CFO often shows superiority in terms of functional evaluations and solution quality, the algorithm is complex and often requires […]

CUDA

Jul, 24

GPGPU Acceleration Algorithm for Medical Image Reconstruction

Medical imaging techniques such as X-ray, Ultrasound, CT and MRI scan are widely used for diagnosis. The 2D medical images from these scans are difficult to interpret because they can only show cross section views of a human body. Interpreting these images requires experts or trained professionals. Reconstructing 2D images into 3D models can help […]

OpenGL

Jul, 24

Query-Driven Visualization of Time-Varying Adaptive Mesh Refinement Data

The visualization and analysis of AMR-based simulations is integral to the process of obtaining new insight in scientific research. We present a new method for performing query-driven visualization and analysis on AMR data, with specific emphasis on time-varying AMR data. Our work introduces a new method that directly addresses the dynamic spatial and temporal properties […]

CUDA

Jul, 24

Out-of-core cone beam reconstruction using multiple GPUs

This paper presents a graphics processing unit (GPU) based method capable of accelerating cone-beam reconstruction of large volume data, which cannot be entirely stored in video memory. Our method accelerates the Feldkamp, Davis and Kress (FDK) algorithm in a multi-GPU environment. We present how the entire volume can be efficiently decomposed into small portions to […]

CUDA

Jul, 24

Real-Time Illustration of Vascular Structures

We present real-time vascular visualization methods, which extend on illustrative rendering techniques to particularly accentuate spatial depth and to improve the perceptive separation of important vascular properties such as branching level and supply area. The resulting visualization can and has already been used for direct projection on a patient’s organ in the operation theater where […]

OpenGL

Jul, 24

Fast and robust CAMShift tracking

CAMShift is a well-established and fundamental algorithm for kernel-based visual object tracking. While it performs well with objects that have a simple and constant appearance, it is not robust in more complex cases. As it solely relies on back projected probabilities it can fail in cases when the object’s appearance changes (e.g., due to object […]

OpenGL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

COTS cluster-based sort-last rendering: performance evaluation and pipelined implementation

A Fixed-Complexity Sphere Decoder for MIMO Systems on Graphics Processing Units

A Coarse Grain Reconfigurable Architecture for sequence alignment problems in bio-informatics

Real-time interactive object extraction system for high resolution remote sensing images based on parallel computing architecture

Modular Technology in the Modelling of Large Virtual Environments in Driving Simulators

Gpu architecture for stationary multisensor pedestrian detection at smart intersections

Central Force Optimization on a GPU: A case study in high performance metaheuristics using multiple topologies

GPGPU Acceleration Algorithm for Medical Image Reconstruction

Query-Driven Visualization of Time-Varying Adaptive Mesh Refinement Data

Out-of-core cone beam reconstruction using multiple GPUs

Real-Time Illustration of Vascular Structures

Fast and robust CAMShift tracking

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)