high performance computing on graphics processing units: hgpu.org

Posts

Mar, 9

Comparison of FPGA and GPU implementations of real-time stereo vision

Real-time stereo vision systems have many applications – from autonomous navigation for vehicles through surveillance to materials handling. Accurate scene interpretation depends on an ability to process high resolution images in real-time, but, although the calculations for stereo matching are basically simple, a practical system needs to evaluate at least 109 disparities every second – […]

CUDA

Mar, 9

Benchmarking GPU and CPU codes for Heisenberg spin glass overrelaxation

We present a set of possible implementations for Graphics Processing Units (GPU) of the Overrelaxation technique applied to the 3D Heisenberg spin glass model. The results show that a carefully tuned code can achieve more than 100 GFlops/sec. of sustained performance and update a single spin in about 0.6 nanoseconds. A multi-hit technique that exploits […]

CUDA

Mar, 9

GPU Computing Gems: Emerald Edition

Graphics Processing Units (GPUs) are designed to be parallel – having hundreds of cores versus traditional CPUs. Increasingly, you can leverage GPU power for many computationally-intense applications – not just for graphics. If you’re facing the challenge of programming systems to effectively use these massively parallel processors to achieve efficiency and performance goals, GPU Computing […]

CUDA

Mar, 9

Visualization of level-of-detail meshes on the GPU

Extensive research has been carried out in multiresolution models for many decades. The tendency in recent years has been to harness the potential of GPUs to perform the level-of-detail extraction on graphics hardware. The aim of this work is to present a new level-of-detail scheme based on triangles which is both simple and efficient. In […]

Mar, 9

GPU-RMAP: Accelerating Short-Read Mapping on Graphics Processors

Next-generation, high-throughput sequencers are now capable of producing hundreds of billions of short sequences (reads) in a single day. The task of accurately mapping the reads back to a reference genome is of particular importance because it is used in several other biological applications, e.g., genome re-sequencing, DNA methylation, and ChiP sequencing. On a personal […]

Mar, 9

GPGPU flow

Abstract is not available.

Mar, 9

Classical Simulation of Quantum Adiabatic Algorithms using Mathematica on GPUs

In this paper we present a simulation environment enhanced with parallel processing which can be used on personal computers, based on a high-level user interface developed on Mathematicacopyright which is connected to C++ code in order to make our platform capable of communicating with a Graphics Processing Unit. We introduce the reader to the behavior […]

CUDA

Mar, 8

Using common graphics hardware for multi-agent traffic simulation with CUDA

Today’s graphics processing units (GPU) have tremendous resources when it comes to raw computing power. The simulation of large groups of agents in transport simulation has a huge demand of computation time. Therefore it seems reasonable to try to harvest this computing power for traffic simulation. Unfortunately simulating a network of traffic is inherently connected […]

CUDA

Mar, 8

Fast heterogeneous computing with CUDA compatible Tesla GPU computing processor (personal supercomputing)

This paper presents how fast heterogeneous computing can be achieved with Tesla GPU computing processor. Tesla GPU super computer brings the performance of a cluster to a workstation and turning it into a supercomputer. We have chosen molecular dynamics field to show fast and high performance computing with Tesla GPU. We have given a DCS […]

CUDA

Mar, 8

Performance and Scalability of GPU-Based Convolutional Neural Networks

In this paper we present the implementation of a framework for accelerating training and classification of arbitrary Convolutional Neural Networks (CNNs) on the GPU. CNNs are a derivative of standard Multilayer Perceptron (MLP) neural networks optimized for two-dimensional pattern recognition problems such as Optical Character Recognition (OCR) or face detection. We describe the basic parts […]

CUDA

Mar, 8

A GPU-based finite-size pencil beam algorithm with 3D-density correction for radiotherapy dose calculation

Targeting at developing an accurate and efficient dose calculation engine for online adaptive radiotherapy, we have implemented a finite size pencil beam (FSPB) algorithm with a 3D-density correction method on GPU. This new GPU-based dose engine is built on our previously published ultrafast FSPB computational framework [Gu et al. Phys. Med. Biol. 54 6287-97, 2009]. […]

CUDA

Mar, 8

General-purpose molecular dynamics simulations on GPU-based clusters

We present a GPU implementation of LAMMPS, a widely-used parallel molecular dynamics (MD) software package, and show 5x to 13x single node speedups versus the CPU-only version of LAMMPS. This new CUDA package for LAMMPS also enables multi-GPU simulation on hybrid heterogeneous clusters, using MPI for inter-node communication, CUDA kernels on the GPU for all […]

CUDA