The most recent entries
Analysis of Multicore CPU and GPU Toward Parallelization of Total Focusing Method Ultrasound ReconstructionUltrasonic imaging and reconstruction tools are com-monly used to detect, identify and measure defects in different mechanical parts. Due to the complexity of the underlying physics, and due to the evergrowing quantity of acquired data, computation time is becoming a limitation to the opti-mal inspection of a mechanical part. This article presents the performances of several implementations of a computational heavy algorithm, named Total Focusing Method, on both gra-phics processing units (GPU) and general purpose processors (GPP). The scope of this study is narrowed to planar parts tested... April 29, 2013 · >>>
|
Local Histogram Modification Based Contrast Enhancement with GPU AccelerationThis paper presents a novel local contrast enhancement algorithm based on local histogram modification. The computation of local contrast enhancement operators is usually slow though they produce better local contrast and details. We have addressed this issue by subtly designing a highly parallel algorithm, which could be easily implemented on Graphics Processing Units (GPU) to harvest high computational efficiency. Our method is fast and easy to use, and the experiment results show that the technique can produce good results on a variety of images. April 29, 2013 · >>>
|
Split tiling for GPUs: automatic parallelization using trapezoidal tilesTiling is a key technique to enhance data reuse. For computations structured as one sequential outer "time" loop enclosing a set of parallel inner loops, tiling only the parallel inner loops may not enable enough data reuse in the cache. Tiling the inner loops along with the outer time loop enhances data locality but may require other transformations like loop skewing that inhibit inter-tile parallelism. One approach to tiling that enhances data locality without inhibiting inter-tile parallelism is split tiling, where tiles are subdivided into a sequence of trapezoidal computation... April 29, 2013 · >>>
|
CUDA Based CAMshift Algorithm for Object Tracking SystemsIn this paper, we present an image object tracking system for GPGPU based CAMshift algorithm. For image object tracking, we use the parallel CAMshift tracking algorithm based on the HSV color image distribution of detected moving objects. In this, RGB-to-HSV color conversion, image masking such as open and close operation for image morphology, and computing of centroid are executed in parallel. CAMshift algorithm is very efficient for real-time tracking because of its fast and robust performance. In this system, CUDA environment and C++ program are used for image processing and accessing the... April 27, 2013 · >>>
|
Efficient Computation of the Kleene Star in Max-Plus Algebra using a CUDA GPUThis research aims to accelerate the computation of the Kleene star in max-plus algebra using CUDA technology on graphics processing units (GPUs). The target module is the Kleene star of a weighted adjacency matrix for directed acyclic graph (DAGs) which plays an essential role in calculating the earliest and/or latest schedule for a class of discrete event systems. In recent NVIDIA GPU cards, an environment for high performance computing is provided to general developers, for which we aim to exploit the benefit of using GPUs. Using an NVIDIA Tesla C2075 for our experiments, we obtained... April 27, 2013 · >>>
|
Modeling and Optimization of Parallel Matrix-based Computations on GPUAs graphics processing units (GPUs) are continually being utilized as coprocessors, the demand for optimally utilizing them for various applications continues to grow. This work narrows the gap between programmers and minimum execution time for matrix-based computations on a GPU. To minimize execution time, computation and communication time must be considered. For computation, the placement of data in GPU memory significantly affects computation time and therefore is considered. Various matrix-based computation patterns are examined with respect to the layout of GPU memory. A computation... April 27, 2013 · >>>
|
Orchestrated Scheduling and Prefetching for GPGPUsIn this paper, we present techniques that coordinate the thread scheduling and prefetching decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better tolerate long memory latencies. We demonstrate that existing warp scheduling policies in GPGPU architectures are unable to effectively incorporate data prefetching. The main reason is that they schedule consecutive warps, which are likely to access nearby cache blocks and thus prefetch accurately for one another, back-to-back in consecutive cycles. This either 1) causes prefetches to be generated by a warp too close... April 27, 2013 · >>>
|
H. 264 Parallel Optimization on Graphics ProcessorsMultimedia applications are present in most mobile hand-held devices. The H.264 standard is currently dominating the video compression world. H.264 has high computational complexity requiring large amount of processing resources. Many techniques emerged that optimize H.264 using parallelization on multicore systems ranging from groups of pictures until the smallest block of pixels. We propose a parallelization technique based on rows of macroblocks with a light dependency detection algorithm that optimizes data parallelization and minimizes dependency synchronization stall time. The parallel... April 27, 2013 · >>>
|
Using High Performance Computing for Optimizing Credit Risk CalculationThe volume of banks data calculation is increasing each year with extraordinary scale and with that, new forms of computation is needed. High performance computing is a very attractive field for optimization such bank calculous, which can give promising results. This paper shows a implementation of know model for assessing the credit risk of a company. For getting most accurate price and speedup comparisson, this method was implemented in both CPU and GPU version. The Gpu version was builtt using CUDA architecture and show some reasons and advantages of using such the Gpu computing for... April 26, 2013 · >>>
|
A method for speeding up beam-tracing simulation using thread-level parallelizationIn recent years, the computational power of modern processors has been increasing mainly because of the increase in the number of processor cores. Computationally intensive applications can gain from this trend only if they employ parallelism, such as thread-level parallelization. Geometric simulations can employ thread-level parallelization because the main part of a geometric simulation can be divided into a subset of mutually independent tasks. This approach is especially interesting for acoustic beam tracing because it is an intensive computing task. This paper presents the... April 26, 2013 · >>>
|
Parallel Variable Distribution Algorithm for Constrained Optimization with Nonmonotone TechniqueA modified parallel variable distribution (PVD) algorithm for solving large-scale constrained optimization problems is developed, which modifies quadratic subproblem QPl at each iteration instead of the QPl of the SQP-type PVD algorithm proposed by C. A. Sagastizabal and M. V. Solodov in 2002. The algorithm can circumvent the difficulties associated with the possible inconsistency of subproblem of the original SQP method. Moreover, we introduce a nonmonotone technique instead of the penalty function to carry out the line search procedure with more flexibly. Under appropriate conditions, the... April 26, 2013 · >>>
|
Design and Performance Analysis of Parallel Processing of SRTP PacketsEncryption of real-time multimedia data transfers is one of the tasks for telecommunication infrastructure which should be considered in order to reach essential level of security. Execution time of ciphering algorithm could play fundamental role in delay of the packets, therefore, it provides interesting challenge in terms of optimization methods. This work focuses on parallelization possibilities of processing SRTP for the purposes of private gateway with the usage of OpenCL framework, utilization gateway's resources and analysis of potential improvement. April 26, 2013 · >>>
|
Most viewed papers (last 30 days)
- Graphics Programming on the Web WebCL Course Notes
- Simulating the universe with GPU-accelerated supercomputers: n-body methods, tests, and examples
- Secrets from the GPU
- Implementations of the FFT algorithm on GPU
- Fluid Motion Modelling Using Vortex Particle Method on GPU
- Adding GPU Computing to Computer Organization Courses
- libWater: Heterogeneous Distributed Computing Made Easy
- Fast Implementation of Scale Invariant Feature Transform Based on CUDA
- Faster Upper Body Pose Estimation and Recognition Using CUDA
- Analyzing Locality of Memory References in GPU Architectures
Rating
Optimizing a Biomedical Imaging Orientation Score Framework
Graphics Programming on the Web WebCL Course Notes
Adaptive Dynamic Load Balancing in Heterogeneous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search
Duality based optical flow algorithms with applications
In-Place Recursive Approach for All-Pairs Shortest Paths Problem Using OpenCL
A parallel decoding algorithm of LDPC codes using CUDA
Optimizing MapReduce for GPUs with effective shared memory usage
OpenCL parallel Processing using General Purpose Graphical Processing units - TiViPE software development
Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
Stencil-Aware GPU Optimization of Iterative Solvers
Recent source codes
Events
October 1-4, 2013 Lyon, France The 2013 International Workshop on Embedded Multicore Systems, ICPP-EMS 2013 |
November 13-15, 2013 Zhangjiajie, China 3rd International Workshop on Embedded Multi-core Computing and Applications, EMCA 2013 |
February 2-6, 2014 San Francisco, USA |
February 12-14, 2014 Turin, Italy |
November 11-14, 2013 San Jose, California, USA |
Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.
The platforms are
- GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
- GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
- CPU: AMD Phenom II X6 @ 2.8GHz 1055T
- RAM: 12GB
- HDD: 2TB, Raid-0
- OS: OpenSUSE 11.4
- SDK: AMD APP SDK 2.8
- GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
- GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
- CPU: Intel Core i7-2600 @ 3.4GHz
- RAM: 16GB
- HDD: 2TB, Raid-0
- OS: OpenSUSE 12.2
- SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8
Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.
The information send to hgpu.org will be treated according to our Privacy Policy
HGPU Group © 2010-2013 hgpu.org
All rights belong to the respective authors
Contact information:
contact@hgpu.org

