8872

Posts

Jan, 15

Interleaving and Lock-Step Semantics for Analysis and Verification of GPU Kernels

We study semantics of GPU kernels – the parallel programs that run on Graphics Processing Units (GPUs). We provide a novel lock-step execution semantics for GPU kernels represented by arbitrary reducible control flow graphs and compare this semantics with a traditional interleaving semantics. We show for terminating kernels that either both semantics compute identical results […]
Jan, 15

Three Dimensional Fast Fourier Transform CUDA Implementation

A 3 dimensional DFT can be expressed as 3 DFTs on a 3 dimensional data along each dimension. Each of these 1 dimensional DFTs can be computed efficiently owing to the properties of the transform. This class of algorithms is known as the Fast Fourier Transform (FFT). We introduce the one dimensional FFT algorithm in […]
Jan, 15

Two Approaches to Particle Simulation: OpenMPI and CUDA

In this project, our goal is to make particle simulation run in parallel using both CUDA and MPI implementations. The scale of parallelism between the are two vastly different, and we wish to compare implementations to gain insight on the strengths and weaknesses between the two different paradigms of parallelization. Particle simulations are of huge […]
Jan, 15

Ray Tracing in Real-Time Games

This thesis describes efficient rendering algorithms based on ray tracing, and the application of these algorithms to real-time games. Compared to rasterizationbased approaches, rendering based on ray tracing allows elegant and correct simulation of important global effects, such as shadows, reflections and refractions. The price for these benefits is performance: ray tracing is compute-intensive. This […]
Jan, 15

Massively parallelizable list-mode reconstruction using a Monte Carlo-based elliptical Gaussian model

PURPOSE: A fully three-dimensional (3D) massively parallelizable list-mode ordered-subsets expectation-maximization (LM-OSEM) reconstruction algorithm has been developed for high-resolution PET cameras. System response probabilities are calculated online from a set of parameters derived from Monte Carlo simulations. The shape of a system response for a given line of response (LOR) has been shown to be asymmetrical […]
Jan, 14

Adaptive Dynamic Load Balancing in Heterogeneous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search

The emergence of new hybrid and heterogenous multi-GPU multi-CPU large scale platforms offers new opportunities and pauses new challenges when solving difficult optimization problems. This paper targets irregular tree search algorithms in which workload is unpredictable. We propose an adaptive distributed approach allowing to distribute the load dynamically at runtime while taking into account the […]
Jan, 13

HiDP: A Hierarchical Data Parallel Language

Problem domains are commonly decomposed hierarchically to fully utilize parallel resources in modern microprocessors. Such decompositions can be provided as library routines, written by experienced experts, for general algorithmic patterns. But such APIs tend to be constrained to certain architectures or data sizes. Integrating them with application code is often an unnecessarily daunting task, especially […]
Jan, 13

A master-slave robotic simulator based on GPUDirect

The same as in traditional surgery, surgeons in telerobotic surgery need extensive training to achieve experience and highly accurate instrument manipulation. Traditional training methods like practice in operating room have major drawbacks such as high risk and limited opportunity for which virtual reality (VR) and computer technologies can offer solutions. To accelerate the data transmission […]
Jan, 13

Acceleration of Selective Cationic Antibacterial Peptides computation: A comparison of FPGA and GPU approaches

Prediction of physicochemical properties of peptide sequences can be used for the identification of "Selective Cationic Amphipatic Antibacterial Peptides" (SCAAP), with possible applications in different diseases treatment. The exhaustive computation of physicochemical properties of peptide sequences can lead to reduce the search space of SCAAP, but the combinatorial complexity of these calculations is a high-performance […]
Jan, 13

Toward Practical Real-Time Photon Mapping: Efficient GPU Density Estimation

We describe the design space for real-time photon density estimation, the key step of rendering global illumination (GI) via photon mapping. We then detail and analyze efficient GPU implementations of four best-of-breed algorithms. All produce reasonable results on NVIDIA GeForce 670 at 1920×1080 for complex scenes with multiple-bounce diffuse effects, caustics, and glossy reflection in […]
Jan, 13

Exploring Traditional and Emerging Parallel Programming Models using a Proxy Application

Parallel computing architectures are becoming more complex with increasing core counts and more heterogeneous architectures. However, the most commonly used programming models, C/C++ with MPI and/or OpenMP, make it very difficult to write source code that is easily tuned for many targets. Newer language approaches attempt to ease this burden by providing optimization features such […]
Jan, 12

Accelerating Topic Model Training on a Single Machine

We present the design and implementation of GLDA, a library that utilizes the GPU (Graphics Processing Unit) to perform Gibbs sampling of Latent Dirichlet Allocation (LDA) on a single machine. LDA is an effective topic model used in many applications, e.g., classification, feature selection, and information retrieval. However, training an LDA model on large data […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: