- •ApplicationsWhere it's
- •HardwareSpecs and
- •ProgrammingAlgorithms and techniques
- •ResourcesSource codes,
tutorials, books, etc.
The most recent entries
In this text we present the real-time implementation of a Bayesian framework for robotic multisensory perception on a graphics processing unit (GPU) using the Compute Unified Device Architecture (CUDA). As an additional objective, we intend to show the benefits of parallel computing for similar problems (i.e. probabilistic grid-based frameworks), and the user-friendly nature of CUDA as a programming tool. Inspired by the study of biological systems, several Bayesian inference algorithms for artificial perception have been proposed. Their high computational cost has been a prohibitory factor...
Granular flows are extremely important for the pharmaceutical and chemical industry, as well as for other scientific areas. Thus, the understanding of the impact of particle size and related effects on the mean, as well as on the fluctuating flow field, in granular flows is critical for design and optimization of powder processing operations. We use a specialized simulation tool written in C and CUDA (Compute Unified Device Architecture), a massive parallelization technique which runs on the Graphics Processing Unit (GPU). We focus on both, a new implementation approach using CUDA/GPU, as...
We propose a Parallel Banding Algorithm (PBA) on the GPU to compute the exact Euclidean Distance Transform (EDT) for a binary image in 2D and higher dimensions. Partitioning the image into small bands to process and then merging them concurrently, PBA computes the exact EDT with optimal linear total work, high level of parallelism and a good memory access pattern. This work is the first attempt to exploit the enormous power of the GPU in computing the exact EDT, while prior works are only on approximation. Compared to these other algorithms in our experiments, our exact algorithm is still a...
A traditional fixed-function graphics accelerator has evolved into a programmable general-purpose graphics processing unit over the last few years. These powerful computing cores are mainly used for accelerating graphics applications or enabling low-cost scientific computing. To further reduce the cost and form factor, an emerging trend is to integrate GPU along with the memory controllers onto the same die with the processor cores. However, given such a system-on-chip, the GPU, while occupying a substantial part of the silicon, will sit idle and contribute nothing to the overall system...
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level...
Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS
MOTIVATION: Epistasis, the presence of gene-gene interactions, has been hypothesized to be at the root of many common human diseases, but current genome-wide association studies largely ignore its role. Multifactor dimensionality reduction (MDR) is a powerful model-free method for detecting epistatic relationships between genes, but computational costs have made its application to genome-wide data difficult. Graphics processing units (GPUs), the hardware responsible for rendering computer games, are powerful parallel processors. Using GPUs to run MDR on a genome-wide dataset allows for...
The particle-in-cell (PIC) algorithm is one of the most widely used algorithms in computational plasma physics. With the advent of graphical processing units (GPUs), large-scale plasma simulations on inexpensive GPU clusters are in reach. We present an implementation of a fully relativistic plasma PIC algorithm for GPUs based on the NVIDIA CUDA library. It supports a hybrid architecture consisting of single computation nodes interconnected in a standard cluster topology, with each node carrying one or more GPUs. The internode communication is realized using the message-passing interface. The...
The human vision has been studied deeply in the past years, and several different models have been proposed to simulate it on computer. Some of these models concerns visual saliency which is potentially very interesting in a lot of applications like robotics, image analysis, compression, video indexing. Unfortunately they are compute intensive with tight real-time requirements. Among all the existing models, we have chosen a spatio-temporal one combining static and dynamic information. We propose in this paper a very efficient implementation of this model with multi-GPU reaching real-time. We...
Path planning is an active topic in the literature, and efficient navigation over non-planar surfaces is an open research question. In this work we present a novel technique for navigation of multiple agents over arbitrary triangular domains. The proposed solution uses a fast hierarchical computation of geodesic distances over triangular meshes to allow interactive frame rates, and a GPU-based collision avoidance technique to guide individual agents. Unlike most previous work, the method imposes no limitations on the surface over which the agents are moving, and can naturally deal with...
We present a GPU algorithm to render path-based 3D surface detail in real-time. Our method models these features using a vector representation that is efficiently stored in two textures. First texture is used to specify the position of the features, while the second texture contains their paths, profiles and material information. A fragment shader is then proposed to evaluate this data on the GPU by performing an accurate and fast rendering of the details, including visibility computations and antialiasing. Some of our main contributions include a CSG approach to efficiently deal with...
Performance Analysis of General-Purpose Computation on Commodity Graphics Hardware: A Case Study Using Bioinformatics
Using modern graphics processing units for no-graphics high performance computing is motivated by their enhanced programmability, attractive cost/performance ratio and incredible growth in speed. Although the pipeline of a modern graphics processing unit (GPU) permits high throughput and more concurrency, they bring more complexities in analyzing the performance of GPU-based applications. In this paper, we identify factors that determine performance of GPU-based applications. We then classify them into three categories: data-linear, data-constant and computation-dependent. According to the...
Multiple-input multiple-output (MIMO) significantly increases the throughput of a communication system by employing multiple antennas at the transmitter and the receiver. To extract maximum performance from a MIMO system, a computationally intensive search based detector is needed. To meet the challenge of MIMO detection, typical suboptimal MIMO detectors are ASIC or FPGA designs. We aim to show that a MIMO detector on Graphic processor unit (GPU), a low-cost parallel programmable co-processor, can achieve high throughput and can serve as an alternative to ASIC/FPGA designs. However, careful...
Most viewed papers (last 30 days)
- OpenCL Performance Evaluation on Modern Multi Core CPUs
- JPEG-GPU:: a GPGPU Implementation of JPEG Core Coding Systems
- Surface Reconstruction from Scattered Point via RBF Interpolation on GPU
- Parallelization of the Ant Colony Optimization for the Shortest Path Problem using OpenMP and CUDA
- Enabling OS Research by Inferring Interactions in the Black-Box GPU Stack
- CLgrep: A Parallel String Matching Tool
- Rapid Computation of Sodium Bioscales Using GPU-Accelerated Image Reconstruction
- Using GPU Simulation to Accurately Fit to the Power-Law Distribution
- OCLoptimizer: An Iterative Optimization Tool for OpenCL
- 3DES ECB Optimized for Massively Parallel CUDA GPU Architecture
Parallel GPU-accelerated Recursion-based Generators of Pseudorandom Numbers
Optimizing a Biomedical Imaging Orientation Score Framework
Accelerating Computer Vision Algorithms Using OpenCL on Mobile GPU - A Case Study
Parallel AES Encryption Engines for Many-Core Processor Arrays
Speeding up Large-Scale Point-in-Polygon Test Based Spatial Join on GPUs
Real-space density functional theory on graphical processing units: computational approach and comparison to Gaussian basis set methods
OCLoptimizer: An Iterative Optimization Tool for OpenCL
A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures
A CUDA-Based Cooperative Evolutionary Multi-Swarm Optimization Applied to Engineering Problems
Implementations of the FFT algorithm on GPU
May 18-21, 2014
March 17-19, 2014
June 26, 2013 9:00 AM - 10:00 AM PDT
June 4, 2013 10:00 AM - 11:00 AM PDT
June 20, 2013 9:00 AM - 10:00 AM PDT
Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.
The platforms are
- GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
- GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
- CPU: AMD Phenom II X6 @ 2.8GHz 1055T
- RAM: 12GB
- HDD: 2TB, Raid-0
- OS: OpenSUSE 11.4
- SDK: AMD APP SDK 2.8
- GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
- GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
- CPU: Intel Core i7-2600 @ 3.4GHz
- RAM: 16GB
- HDD: 2TB, Raid-0
- OS: OpenSUSE 12.2
- SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8
Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.