high performance computing on graphics processing units: hgpu.org

Posts

Jun, 28

Performance and Power Efficiency Analysis of the Symmetric Cryptograph on Two Stream Processor Architectures

Multimedia and some scientific applications have achieved good performance on the stream processor architecture by employing the stream programming model. In order to find out the way to accelerate the symmetric cryptograph on stream processor, we implement and analyze cryptograph algorithms on different stream processors in this paper. Four cipher algorithms including RC5, AES, TWOFISH […]

Jun, 28

Petascale turbulence simulation using a highly parallel fast multipole method

We present a 0.5 Petaflop/s calculation of homogeneous isotropic turbulence in a cube of 2048^3 particles, using a highly parallel fast multipole method (FMM) using 2048 GPUs on the TSUBAME 2.0 system. We compare this particle-based code with a spectral DNS code under the same calculation condition and the same machine. The results of our […]

CUDA

Jun, 27

A High Performance Massively Parallel Approach for Real Time Deformable Body Physics Simulation

Single processor technology has been evolving across last decades, but due to physical limitations of chip manufacturing process, the industry is pursuing alternatives to sustain computational power growth,including the creation of multi-core systems. Parallel computing targets problems that are scalable and possibly distributed, dividing the problem into smaller pieces. This approach may be explored to […]

CUDA

Jun, 27

Looking at the surprise: Bottom-up attentional control of an active camera system

Inspired by the expectation-based perception of humans, a surprise-driven active vision system is proposed. This vision system not only considers spatial saliency of objects in the environment, but also investigates temporal novelty in the neighborhood. Surprise is defined as the difference of the saliency probability distributions of two consecutive input images, which is measured using […]

CUDA

Jun, 27

Real-Time Simulation of Granular Materials Using Graphics Hardware

We present a method to compute friction in a particle-based simulation of granular materials on GPUs and its data structure. We use Distinct Element Method to compute the force between particles. There has been a method to accelerate Distinct Element Method using GPUs, but the method does not compute friction. We implemented friction into the […]

CUDA

Jun, 27

Generating and Rendering Procedural Clouds in Real Time on Programmable 3D Graphics Hardware

This paper discusses a process of generating and rendering procedural clouds for 3D environments using programmable 3D graphics hardware. Cloud texture generation is performed using Perlin noise and turbulence functions. Our implementation is done in OpenGL supported GPUs with programmable vertex & fragment processing pipeline that supports OpenGL shading language (GLSL). We have performed a […]

OpenGL

Jun, 27

High Quality Interactive Rendering of Massive Point Models Using Multi-way kd-Trees

We present a simple and efficient technique for out-of-core multi resolution construction and high quality visualization of large point datasets. The method introduces a novel hierarchical LOD data organization based on multi-way kd-trees that simplifies memory management and allows controlling the LOD tree’s height. The technique is incorporated in a full end-to-end system, which is […]

OpenGL

Jun, 27

Rapid Texture-based Volume Rendering

Nowadays, man can get a great number of 3D data sets from different sources is common in medical diagnosis but how to explore the information contents of these data sets is still a problem. One effective method is with computer aid rendering the volume. As the 3D datasets are usually in large scalar, the capability […]

Jun, 27

Voreen: A Rapid-Prototyping Environment for Ray-Casting-Based Volume Visualizations

By splitting a complex ray-casting process into different tasks performed on different processors, Voreen provides a lot of flexibility because users can intervene at different points during ray casting. Voreen’s object-oriented design lets users easily create customized processor classes that cooperate seamlessly with existing classes. A user-friendly GUI supports rapid prototyping of visualization ideas. We’ve […]

OpenGL

Jun, 27

Uncluttering Graph Layouts Using Anisotropic Diffusion and Mass Transport

Many graph layouts include very dense areas, making the layout difficult to understand. In this paper, we propose a technique for modifying an existing layout in order to reduce the clutter in dense areas. A physically inspired evolution process based on a modified heat equation is used to create an improved layout density image, making […]

OpenGL

Jun, 27

Accelerating Smith-Waterman Local Sequence Alignment on GPU Cluster

With a high accuracy, the Smith-Waterman local sequence alignment algorithm requires a very large amount of memory and computation, making implementations on common computing systems become less practical. In this paper, we present swGPUCluster – an implementation of the SmithWaterman algorithm on a cluster equipped with NVIDIA GPU graphics cards (called a GPU cluster). Our […]

CUDA

Jun, 27

Efficient implementation of the overlap operator on multi-GPUs

Lattice QCD calculations were one of the first applications to show the potential of GPUs in the area of high performance computing. Our interest is to find ways to effectively use GPUs for lattice calculations using the overlap operator. The large memory footprint of these codes requires the use of multiple GPUs in parallel. In […]

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Performance and Power Efficiency Analysis of the Symmetric Cryptograph on Two Stream Processor Architectures

Petascale turbulence simulation using a highly parallel fast multipole method

A High Performance Massively Parallel Approach for Real Time Deformable Body Physics Simulation

Looking at the surprise: Bottom-up attentional control of an active camera system

Real-Time Simulation of Granular Materials Using Graphics Hardware

Generating and Rendering Procedural Clouds in Real Time on Programmable 3D Graphics Hardware

High Quality Interactive Rendering of Massive Point Models Using Multi-way kd-Trees

Rapid Texture-based Volume Rendering

Voreen: A Rapid-Prototyping Environment for Ray-Casting-Based Volume Visualizations

Uncluttering Graph Layouts Using Anisotropic Diffusion and Mass Transport

Accelerating Smith-Waterman Local Sequence Alignment on GPU Cluster

Efficient implementation of the overlap operator on multi-GPUs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)