high performance computing on graphics processing units: hgpu.org

Posts

Aug, 28

Global Illumination for Advanced Computer Graphics

Real-time 3D graphics is present today on various devices, from high-end PC powered by highly complex GPUs to more simple handheld consoles or mobile phones. All these solutions are based on an aging technique called immediate mode rasterization, very efficient for rendering simple scenes, but unable to capture essential visual features such as soft shadows, […]

Aug, 27

Switching to High Gear: Opportunities for Grand-Scale Real-Time Parallel Simulations

The recent emergence of dramatically large computational power, spanning desktops with multi-core processors and multiple graphics cards to supercomputers with 105 processor cores, has suddenly resulted in simulation-based solutions trailing behind in the ability to fully tap the new computational capacity. Here, we motivate the need for switching the parallel simulation research to a higher […]

CUDA

Aug, 27

directCell: hybrid systems with tightly coupled accelerators

The Cell Broadband Engine (Cell/B.E.) processor is a hybrid IBM PowerPC processor. In blade servers and PCI Express card systems, it has been used primarily in a server context, with Linux as the operating system. Because neither Linux as an operating system nor a PowerPC processor-based architecture is the preferred choice for all applications, some […]

Aug, 27

A breadth-first course in multicore and manycore programming

The technique of scaling hardware performance through increasing the number of cores on a chip requires programmers to learn to write parallel code that can exploit this hardware. In order to expose students to a variety of multicore programming models, our university offered a breadth-first introduction to multicore and manycore programming for upper-level undergraduates. Our […]

CUDA

Aug, 27

A structured parallel periodic arnoldi shooting algorithm for RF-PSS analysis based on GPU platforms

The recent multi/many-core CPUs or GPUs have provided an ideal parallel computing platform to accelerate the time-consuming analysis of radio-frequency/millimeter-wave (RF/MM) integrated circuit (IC). This paper develops a structured shooting algorithm that can fully take advantage of parallelism in periodic steady state (PSS) analysis. Utilizing periodic structure of the state matrix of RF/MM-IC simulation, a […]

CUDA

Aug, 27

Automatic contention detection and amelioration for data-intensive operations

To take full advantage of the parallelism offered by a multi-core machine, one must write parallel code. Writing parallel code is difficult. Even when one writes correct code, there are numerous performance pitfalls. For example, an unrecognized data hotspot could mean that all threads effectively serialize their access to the hotspot, and throughput is dramatically […]

Aug, 27

Motion planning for autonomous driving with a conformal spatiotemporal lattice

We present a motion planner for autonomous highway driving that adapts the state lattice framework pioneered for planetary rover navigation to the structured environment of public roadways. The main contribution of this paper is a search space representation that allows the search algorithm to systematically and efficiently explore both spatial and temporal dimensions in real […]

Aug, 27

Fast and sleek glyph rendering for interactive HARDI data exploration

High angular resolution diffusion imaging (HARDI) is an emerging magnetic resonance imaging (MRI) technique that overcomes some decisive limitations of its predecessor diffusion tensor imaging (DTI). HARDI can resolve locally more than one direction in the diffusion pattern of water molecules and thereby opens up the opportunity to display and track crossing fibers. Showing the […]

Aug, 27

Rapid RNA Folding: Analysis and Acceleration of the Zuker Recurrence

RNA folding is a compute-intensive task that lies at the core of search applications in bioinformatics such as RNAfold and UNAFold. In this work, we analyze the Zuker RNA folding algorithm, which is challenging to accelerate because it is resource intensive and has a large number of variable-length dependencies. We use a technique of Lyngso […]

CUDA

Aug, 27

A New Approach for Color Character Extraction Based on Parallel Clustering

A new approach of fast color character extraction was proposed. Clustering algorithm was adopted in our method to differentiate between objective character regions and background regions on the premise that character regions are nearly monochromatic. However, the key point of this approach was how to select suitable elements’ features based upon the original image information […]

CUDA

Aug, 27

A new adaptive model for real-time fluid simulation with complex boundaries

In this paper, we present a new adaptive model for real-time fluid simulation with complex boundaries based on smoothed particle hydrodynamics (SPH) framework. Firstly, we introduce an adaptive SPH framework that is based on our character field function composed of 4 factors: geometrical complexity, boundary condition, physical complexity and complementary condition in terms of the […]

CUDA

Aug, 26

Parallel Fast Gauss Transform

We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take $O(N^2)$ time. The parallel time complexity estimates for our algorithms are $O(N/np)$ for uniform point distributions and $O(N/np log N/np + nplognp)$ for nonuniform distributions using np CPUs. We incorporate a […]

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Posts

Global Illumination for Advanced Computer Graphics

Switching to High Gear: Opportunities for Grand-Scale Real-Time Parallel Simulations

directCell: hybrid systems with tightly coupled accelerators

A breadth-first course in multicore and manycore programming

A structured parallel periodic arnoldi shooting algorithm for RF-PSS analysis based on GPU platforms

Automatic contention detection and amelioration for data-intensive operations

Motion planning for autonomous driving with a conformal spatiotemporal lattice

Fast and sleek glyph rendering for interactive HARDI data exploration

Rapid RNA Folding: Analysis and Acceleration of the Zuker Recurrence

A New Approach for Color Character Extraction Based on Parallel Clustering

A new adaptive model for real-time fluid simulation with complex boundaries

Parallel Fast Gauss Transform

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)