1300

Posts

Nov, 3

Studying Thermal Management for Graphics-Processor Architectures

We have previously presented Qsilver, a flexible simulation system for graphics architectures. In this paper we describe our extensions to this system, which we use – instrumented with a power model and HotSpot – to analyze the application of standard CPU static and runtime thermal management techniques on the GPU. We describe experiments implementing clock […]
Nov, 3

Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA

We port a high-order finite-element application that performs the numerical simulation of seismic wave propagation resulting from earthquakes in the Earth on NVIDIA GeForce 8800 GTX and GTX 280 graphics cards using CUDA. This application runs in single precision and is therefore a good candidate for implementation on current GPU hardware, which either does not […]
Nov, 3

Harvesting graphics power for MD simulations

We discuss an implementation of molecular dynamics (MD) simulations on a graphic processing unit (GPU) in the NVIDIA CUDA language. We tested our code on a modern GPU, the NVIDIA GeForce 8800 GTX. Results for two MD algorithms suitable for short-ranged and long-ranged interactions, and a congruential shift random number generator are presented. The performance […]
Nov, 3

Real-time Visual Tracker by Stream Processing

In this work, we implement a real-time visual tracker that targets the position and 3D pose of objects in video sequences, specifically faces. The use of stream processors for the computations and efficient Sparse-Template-based particle filtering allows us to achieve real-time processing even when tracking multiple objects simultaneously in high-resolution video frames. Stream processing is […]
Nov, 3

Parallel, stochastic measurement of molecular surface area

Biochemists often wish to compute surface areas of proteins. A variety of algorithms have been developed for this task, but they are designed for traditional single-processor architectures. The current trend in computer hardware is towards increasingly parallel architectures for which these algorithms are not well suited. We describe a parallel, stochastic algorithm for molecular surface […]
Nov, 3

Maximum mipmaps for fast, accurate, and scalable dynamic height field rendering

This paper presents a GPU-based, fast, and accurate dynamic height field rendering technique that scales well to large scale height fields. Current real-time rendering algorithms for dynamic height fields employ approximate ray-height field intersection methods, whereas accurate algorithms require pre-computation in the order of seconds to minutes and are thus not suitable for dynamic height […]
Nov, 3

Neural Network Implementation Using CUDA and OpenMP

Many algorithms for image processing and pattern recognition have recently been implemented on GPU (graphic processing unit) for faster computational times. However, the implementation using GPU encounters two problems. First, the programmer should master the fundamentals of the graphics shading languages that require the prior knowledge on computer graphics. Second, in a job which needs […]
Nov, 3

High-Precision Numerical Simulations of Rotating Black Holes Accelerated by CUDA

Hardware accelerators (such as Nvidia’s CUDA GPUs) have tremendous promise for computational science, because they can deliver large gains in performance at relatively low cost. In this work, we focus on the use of Nvidia’s Tesla GPU for high-precision (double, quadruple and octal precision) numerical simulations in the area of black hole physics — more […]
Nov, 3

PAPER – Accelerating parallel evaluations of ROCS

Modern graphics processing units (GPUs) are flexibly programmable and have peak computational throughput significantly faster than conventional CPUs. Herein, we describe the design and implementation of PAPER, an open-source implementation of Gaussian molecular shape overlay for NVIDIA GPUs. We demonstrate one to two order-of-magnitude speedups on high-end commodity GPU hardware relative to a reference CPU […]
Nov, 3

Real-time KD-tree construction on graphics hardware

We present an algorithm for constructing kd-trees on GPUs. This algorithm achieves real-time performance by exploiting the GPU’s streaming architecture at all stages of kd-tree construction. Unlike previous parallel kd-tree algorithms, our method builds tree nodes completely in BFS (breadth-first search) order. We also develop a special strategy for large nodes at upper tree levels […]
Nov, 3

Optimal loop unrolling for GPGPU programs (thesis)

Graphics Processing Units (GPUs) are massively parallel, many-core processorswith tremendous computational power and very high memory bandwidth. GPUs areprimarily designed for accelerating 3D graphics applications on modern computersystems and are therefore, specialized for highly data parallel, compute intensiveproblems, unlike general-purpose CPUs. In recent times, there has been significantinterest in finding ways to accelerate general purpose […]
Nov, 3

Optimal loop unrolling for GPGPU programs

Graphics Processing Units (GPUs) are massively parallel, many-core processors with tremendous computational power and very high memory bandwidth. With the advent of general purpose programming models such as NVIDIA’s CUDA and the new standard OpenCL, general purpose programming using GPUs (GPGPU) has become very popular. However, the GPU architecture and programming model have brought along […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: