- •ApplicationsWhere it's
- •HardwareSpecs and
- •ProgrammingAlgorithms and techniques
- •ResourcesSource codes,
tutorials, books, etc.
The most recent entries
The objective of this paper is to demonstrate a topology optimization method that can handle multiple constraints. The method relies on the concept of topological sensitivity that captures the first order change in any quantity of interest to a topological change. Specifically, in this paper, the topological sensitivity field for each of constraints is first computed. These fields are then dynamically combined to result in a single topological level-set. Finally, by relying on a fixed-point iteration, the topological level-set leads to optimal topologies (with decreasing volume fractions)...
Exploiting Uniform Vector Instructions for GPGPU Performance, Energy Efficiency, and Opportunistic Reliability Enhancement
State-of-art graphics processing units (GPUs) employ the single-instruction multiple-data (SIMD) style execution to achieve both high computational throughput and energy efficiency. As previous works have shown, there exists significant computational redundancy in SIMD execution, where different execution lanes operate on the same operand values. Such value locality is referred to as uniform vectors. In this paper, we first show that besides redundancy within a uniform vector, different vectors can also have the identical values. Then, we propose detailed architecture designs to exploit both...
Ray Casting is an important visual application, used to visualize 3D datasets, such as CT data used in medical imaging. High quality image generation algorithms, known as ray casting, cast rays through the volume, performing compositing of each voxel into a corresponding pixel, based on voxel opacity and color. Since all rays perform the computations independently, the problem is very much portable for parallel architectures. Tracing multiple rays using SIMD is challenging, because rays can access non-contiguous memory locations, resulting in incoherent and irregular memory accesses. The aim...
We present high performance implementations on a CPU and an NVIDIA GPU of a Monte Carlo pricer for a simple FX basket option driven by a multi-factor local volatility model. Basket options such as these are typically considered too complicated to tackle analytically in a market-consistent manner, and are too high dimensional for PDE methods. Consequently these products are valued using Monte Carlo methods. This results in a compute intensive, massively parallel problem which is ideally suited to modern CPUs and GPUs. We develop fully parallelized, fully vectorized code and study the effects...
This work presents the implementation of a topology optimization approach based on level set method in massively parallel computer architectures, in particular on a Graphics Processing Unit (GPU). Such architectures are becoming so popular during last years for complex and tedious scientific computation. They are composed of dozens, hundreds, or even thousands of cores specially designed for parallel computing. The speedup process consists of using these graphic units to exploit data parallelism of expensive and parallelizable parts of the method, while non-parallelizable parts are calculated...
Managing large-scale data is typically memory intensive. The current generation of GPUs has much lower memory capacity than CPUs which is often a limiting factor in processing large data. It is desirable to reduce memory footprint in spatially joining large-scale datasets through query optimization. In this study, we present a technique of selectivity estimation for optimizing spatial join processing on GPUs. By seamlessly integrating multi-dimensional cumulative histograms and the summed-area-table algorithm, our technique can be efficiently realized on GPUs with good portability. Our...
Analyzing how species are distributed on the Earth has been one of the fundamental questions in biogeography and ecology for a long time. With world-wide data contributions, more than 375 million species occurrence records for nearly 1.5 million species have been deposited to the Global Biodiversity Information Facility (GBIF) data portal. The sheer amounts of point and polygon data and the computation-intensive point-in-polygon tests for zonal summations for biodiversity studies have imposed significant technical challenges. In this study, we have developed a set of data parallel designs of...
The development of increasingly powerful and low cost massively parallel processors, known as GPUs, has created new opportunities for high speed and high precision computational work in physics. GPUs are extremely well suited to solving computationally intense problems at speeds much greater than traditional processors. They are now found in most personal computers, with research grade models available at reasonable prices. This makes a wide variety of previously intractably computationally intense problems solvable at a personal workstation. In this thesis I explore how these massively...
We develop the first parallel algorithm for Coalition Structure Generation (CSG), which is central to many multi-agent systems applications. Our approach involves distributing the key steps of a dynamic programming approach to CSG across computational nodes on a Graphics Processing Unit (GPU) such that each of the thousands of threads of computation can be used to perform small computations that speed up the overall process. In so doing, we solve important challenges that arise in solving combinatorial optimisation problems on GPUs such as the efficient allocation of memory and computational...
The all-pairs shortest paths (APSP) problem finds the shortest path distances between all pairs of vertices,and is one of the most fundamental graph problems. In this paper, a parallel recursive partitioning approach to APSP problem using Open Computing Language (OpenCL) for directed and dense graphs with no negative cyclesbased on R-Kleene algorithm, is presented, which recursively partitions dense adjacency matrix into sub-matrices and computes the shortest path. Graphics Processing Units (GPUs) are massively parallel in nature and provide high computational speedup at very low cost in...
Acceleration of cryptographic applications on massively parallel computing platforms, such as Graphics Processing Units (GPUs), becomes a real challenge as their decreasing cost and mass production makes practical implementations attractive. We propose a layered trusted architecture integrating random bits generation and parallelized RSA cryptographic computations on such platforms. The GPU-resident, three-tier, MR architecture consists of a RBG, using the GPU as a deep entropy pool; a bignum modular arithmetic library using the Residue Number System; and GPU APIs for RSA key generation,...
May 17, 2013 · >>>
In this paper we present the vortex-in-cell method aimed at graphic processor units. Inviscid fluid model is examined in domain with periodic boundary conditions. The leap-frogging vortex rings simulation results are presented with sample vortex rings collision visualization. At the end the GPU solver performance advantage over CPU solver is presented.
May 17, 2013 · >>>
Most viewed papers (last 30 days)
- Graphics Programming on the Web WebCL Course Notes
- Use NVIDIA CUDA technology to create genetic algorithms with extensive population
- Simulating the universe with GPU-accelerated supercomputers: n-body methods, tests, and examples
- Implementations of the FFT algorithm on GPU
- Secrets from the GPU
- GPU Scripting and Code Generation with PyCUDA
- A General-Purpose GPU Reservoir Computer
- One OpenCL to Rule Them All?
- Fluid Motion Modelling Using Vortex Particle Method on GPU
- Adding GPU Computing to Computer Organization Courses
Adaptive Dynamic Load Balancing in Heterogeneous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search
Graphics Programming on the Web WebCL Course Notes
Automatic Compilation for Heterogeneous Architectures with Single Assignment C
A parallel decoding algorithm of LDPC codes using CUDA
Mr. Scan: Extreme Scale Density-Based Clustering using a Tree-Based Network of GPGPU Nodes
Optimizing MapReduce for GPUs with effective shared memory usage
Comprehensive Analysis of High-Performance Computing Methods for Filtered Back-Projection
Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
CUDA implementation of the algorithm for simulating the epidemic spreading over large networks
Stencil-Aware GPU Optimization of Iterative Solvers
October 1-4, 2013
November 13-15, 2013
February 2-6, 2014
San Francisco, USA
February 12-14, 2014
November 11-14, 2013
San Jose, California, USA
Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.
The platforms are
- GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
- GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
- CPU: AMD Phenom II X6 @ 2.8GHz 1055T
- RAM: 12GB
- HDD: 2TB, Raid-0
- OS: OpenSUSE 11.4
- SDK: AMD APP SDK 2.8
- GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
- GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
- CPU: Intel Core i7-2600 @ 3.4GHz
- RAM: 16GB
- HDD: 2TB, Raid-0
- OS: OpenSUSE 12.2
- SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8
Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.