- •ApplicationsWhere it's
- •HardwareSpecs and
- •ProgrammingAlgorithms and techniques
- •ResourcesSource codes,
tutorials, books, etc.
The most recent entries
This work presents the implementation of a topology optimization approach based on level set method in massively parallel computer architectures, in particular on a Graphics Processing Unit (GPU). Such architectures are becoming so popular during last years for complex and tedious scientific computation. They are composed of dozens, hundreds, or even thousands of cores specially designed for parallel computing. The speedup process consists of using these graphic units to exploit data parallelism of expensive and parallelizable parts of the method, while non-parallelizable parts are calculated...
Managing large-scale data is typically memory intensive. The current generation of GPUs has much lower memory capacity than CPUs which is often a limiting factor in processing large data. It is desirable to reduce memory footprint in spatially joining large-scale datasets through query optimization. In this study, we present a technique of selectivity estimation for optimizing spatial join processing on GPUs. By seamlessly integrating multi-dimensional cumulative histograms and the summed-area-table algorithm, our technique can be efficiently realized on GPUs with good portability. Our...
Analyzing how species are distributed on the Earth has been one of the fundamental questions in biogeography and ecology for a long time. With world-wide data contributions, more than 375 million species occurrence records for nearly 1.5 million species have been deposited to the Global Biodiversity Information Facility (GBIF) data portal. The sheer amounts of point and polygon data and the computation-intensive point-in-polygon tests for zonal summations for biodiversity studies have imposed significant technical challenges. In this study, we have developed a set of data parallel designs of...
The development of increasingly powerful and low cost massively parallel processors, known as GPUs, has created new opportunities for high speed and high precision computational work in physics. GPUs are extremely well suited to solving computationally intense problems at speeds much greater than traditional processors. They are now found in most personal computers, with research grade models available at reasonable prices. This makes a wide variety of previously intractably computationally intense problems solvable at a personal workstation. In this thesis I explore how these massively...
We develop the first parallel algorithm for Coalition Structure Generation (CSG), which is central to many multi-agent systems applications. Our approach involves distributing the key steps of a dynamic programming approach to CSG across computational nodes on a Graphics Processing Unit (GPU) such that each of the thousands of threads of computation can be used to perform small computations that speed up the overall process. In so doing, we solve important challenges that arise in solving combinatorial optimisation problems on GPUs such as the efficient allocation of memory and computational...
The all-pairs shortest paths (APSP) problem finds the shortest path distances between all pairs of vertices,and is one of the most fundamental graph problems. In this paper, a parallel recursive partitioning approach to APSP problem using Open Computing Language (OpenCL) for directed and dense graphs with no negative cyclesbased on R-Kleene algorithm, is presented, which recursively partitions dense adjacency matrix into sub-matrices and computes the shortest path. Graphics Processing Units (GPUs) are massively parallel in nature and provide high computational speedup at very low cost in...
Acceleration of cryptographic applications on massively parallel computing platforms, such as Graphics Processing Units (GPUs), becomes a real challenge as their decreasing cost and mass production makes practical implementations attractive. We propose a layered trusted architecture integrating random bits generation and parallelized RSA cryptographic computations on such platforms. The GPU-resident, three-tier, MR architecture consists of a RBG, using the GPU as a deep entropy pool; a bignum modular arithmetic library using the Residue Number System; and GPU APIs for RSA key generation,...
In this paper we present the vortex-in-cell method aimed at graphic processor units. Inviscid fluid model is examined in domain with periodic boundary conditions. The leap-frogging vortex rings simulation results are presented with sample vortex rings collision visualization. At the end the GPU solver performance advantage over CPU solver is presented.
In this thesis, we present a CUDA-implementation of two sub-steps of the Parallel Multilevel Partition of Unity Method (PMPUM). The PMPUM is a method for the approximation of Partial Differential Equations (PDEs) whose main computational effort is caused by the integration of the weak formulation. Therefore, an efficient CUDA-implementation of the required steps could speed up a given PMPUM-implementation. The core of this thesis is the analysis of the applicability of CUDA in the PMPUM. To this end the required steps, the decomposition of the domain and the integration, were implemented...
Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads
Todays, there are many studies in complicated computation and big data processing by using the high performance computability of GPU. Tesla K20X recently announced by NVIDIA provides 3.95 TFLOPS in precision floating point performance . The performance of K20X is 10 times higher than Intel's high-end CPUs. Due to the high performance computability of GPU, K20X was adapted to Titan, the first super computer in the world . However, additional steps are needed in GPU computing process, which aren't needed in the computation using only CPU. The data required to execute on GPU has to move...
Most relatively modern desktop or even laptop computers contain a graphics card useful for more than showing colors on a screen. In this paper, we make a case for why you should learn enough about GPU (graphics processing unit) computing to use as an accelerator or even replacement to your CPU code. We include an example of our own as a case study to show what can be realistically expected.
ICPP-EMS 2013 is organized in conjunction with ICPP 2013 The 42nd International Conference on Parallel Processing. The 2013 International Workshop on Embedded Multicore Systems (ICPP-EMS 2013) will bring researchers and experts together to present and discuss the latest developments and technical solutions concerning various aspects of embedded Multicore computing. ICPP-EMS 2013 seeks original unpublished papers focusing on emerging applications, embedded compilers, embedded memory and architecture design, DSP/GPU systems, ESLs, embedded Multicore programming models, and WCET analysis....
Most viewed papers (last 30 days)
- Graphics Programming on the Web WebCL Course Notes
- Simulating the universe with GPU-accelerated supercomputers: n-body methods, tests, and examples
- Secrets from the GPU
- Implementations of the FFT algorithm on GPU
- Fluid Motion Modelling Using Vortex Particle Method on GPU
- Adding GPU Computing to Computer Organization Courses
- libWater: Heterogeneous Distributed Computing Made Easy
- Fast Implementation of Scale Invariant Feature Transform Based on CUDA
- Faster Upper Body Pose Estimation and Recognition Using CUDA
- Analyzing Locality of Memory References in GPU Architectures
Optimizing a Biomedical Imaging Orientation Score Framework
Graphics Programming on the Web WebCL Course Notes
Adaptive Dynamic Load Balancing in Heterogeneous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search
Duality based optical flow algorithms with applications
In-Place Recursive Approach for All-Pairs Shortest Paths Problem Using OpenCL
A parallel decoding algorithm of LDPC codes using CUDA
Optimizing MapReduce for GPUs with effective shared memory usage
OpenCL parallel Processing using General Purpose Graphical Processing units - TiViPE software development
Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
Stencil-Aware GPU Optimization of Iterative Solvers
October 1-4, 2013
November 13-15, 2013
February 2-6, 2014
San Francisco, USA
February 12-14, 2014
November 11-14, 2013
San Jose, California, USA
Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.
The platforms are
- GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
- GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
- CPU: AMD Phenom II X6 @ 2.8GHz 1055T
- RAM: 12GB
- HDD: 2TB, Raid-0
- OS: OpenSUSE 11.4
- SDK: AMD APP SDK 2.8
- GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
- GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
- CPU: Intel Core i7-2600 @ 3.4GHz
- RAM: 16GB
- HDD: 2TB, Raid-0
- OS: OpenSUSE 12.2
- SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8
Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.