The most recent entries
The GPU-based High-performance Pattern-matching Algorithm for Intrusion DetectionGraphics Processing Unit (GPU) has been converted to general purpose parallel processor devices from a single rendering. It performed far better than the CPU in many fields of science. String matching is widely used, especially in information retrieval, intrusion detection, Computational Biology etc. In this paper, we designed and implemented a GPU-based multi-string matching algorithm by improving traditional serial WM algorithm, called G-WM, which respectively is 12 and 11.2 times performance to serial WM algorithm using equal and Unequal length pattern sets. May 11, 2013 · >>>
|
A portable and high-performance matrix operations library for CPUs, GPUs and beyondHigh-performance computing systems today include a variety of compute devices such as multi-core CPUs, GPUs and many-core accelerators. OpenCL allows programming different types of compute devices using a single API and kernel language. However, there is no standard matrix operations library in OpenCL for operations such as matrix multiplication that works well on a variety of hardware from multiple vendors. We implemented an OpenCL auto-tuning library for real and complex variants of general matrix multiply (GEMM) and present detailed performance results and analysis on a variety of GPU and... May 11, 2013 · >>>
|
Real-Time Object Tracking by CUDA-accelerated Neural NetworkAn algorithm is proposed for tracking objects in real time. The algorithm is based on neural network implemented on GPU. Investigation and parameter optimization of the algorithm are realized. Tracking process has accelerated by 10 times and the training process has accelerated by 2 times versus to the sequential algorithm version. The maximum resolution of the frame for real-time tracking and the optimum frame sampling from a movie are calculated. May 11, 2013 · >>>
|
A GPU-based Parallel Fireworks Algorithm for OptimizationSwarm intelligence algorithms have been widely used to solve difficult real world problems in both academic and engineering domains. Thanks to the inherent parallelism, various parallelized swarm intelligence algorithms have been proposed to speed up the optimization process, especially on the massively parallel processing architecture GPUs. However, conventional swarm intelligence algorithms are usually not designed specifically for the GPU architecture. They either can not fully exploit the tremendous computational power of GPUs or can not extend effectively as the problem scales go large.... May 11, 2013 · >>>
|
An Implementation of the Discontinuous Galerkin Method on Graphics Processing UnitsComputing highly-accurate approximate solutions to partial differential equations (PDEs) requires both a robust numerical method and a powerful machine. We present a parallel implementation of the discontinuous Galerkin (DG) method on graphics processing units (GPUs). In addition to being flexible and highly accurate, DG methods accommodate parallel architectures well, as their discontinuous nature produces entirely element-local approximations. While GPUs were originally intended to compute and display computer graphics, they have recently become a popular general purpose computing device.... May 11, 2013 · >>>
|
Auto-tuning a LOFAR radio astronomy pipeline in JavaCLModern radio telescopes, such as the Low Frequency Array (LOFAR) in the north of the Netherlands, process the signal from the sky in software rather than expensive special purpose hardware, This gives the astronomers an unprecedented flexibility to perform a vast amount of various scientific experiments. However, designing the actual software that would give optimal performance for many different experiments, possibly also running on different hardware is a challenging task. Since optimizing the software by hand to fit the various experiments and hardware is unfeasible, we employ a technique... May 11, 2013 · >>>
|
Three-dimensional LBM simulations of buoyancy-driven flow using Graphics processing unitsThree-dimensional simulations of buoyancy-driven flow of two immiscible liquids are performed using lattice Boltzmann method (LBM) implemented on a graphics processing unit (GPU). Graphics processing unit is a new paradigm for computing fluid flows and has become more popular in the recent years. It is a powerful and convenient to use. LBM, which is an excellent alternative technique for fluid flow simulation, when implemented on GPUs gives a very high computational speed-up. Our present GPU based LBM solver gives a speed-up 25 times corresponding CPU based code. May 9, 2013 · >>>
|
GPU Sparse Matrix Multiplication with CUDAMatrix multiplication is a commonly-used mathematical operation that has many practical applications. It is used to solve a number of problems in a wide variety of fields including science, engineering, and computer science. Given two matrices, A and B, and a resultant matrix C. The concept of density is used to describe the number of nonzero elements in a matrix relative to the total number of elements. For an NxM matrix with Z nonzero elements, the density is defined as Z=(NxM). A sparse matrix is one which has a low density. Sparse matrices can be stored in special formats to eliminate the... May 9, 2013 · >>>
|
OpenCL Implementation of Motion Estimation for Cloud Video ProcessingWith the raise of cloud computing infrastructures on one side and the increased accessibility of parallel computational devices on the other, such as GPUs and multi-core CPUs, parallel programming has recently gained a renewed interest. This is particularly true in the domain of video coding, where the complexity and time consumption of the algorithms tend to limit the access to the core technology. In this work, we focus on the motion estimation problem, well-known to be the most time consuming step of a majority of video coding techniques. By relying on the use of the OpenCL standard, which... May 9, 2013 · >>>
|
libWater: Heterogeneous Distributed Computing Made EasyClusters of heterogeneous nodes composed of multi-core CPUs and GPUs are increasingly being used for High Performance Computing (HPC) due to the benefits in peak performance and energy efficiency. In order to fully harvest the computational capabilities of such architectures, application developers often employ a combination of different parallel programming paradigms (e.g. OpenCL, CUDA, MPI and OpenMP), also known in literature as hybrid programming, which makes application development very challenging. Furthermore, these languages offer limited support to orchestrate data and computations... May 9, 2013 · >>>
|
Parallel Implementations of a Disparity Estimation Algorithm Based on a Proximal Splitting MethodThe Parallel Proximal Algorithm (PPXA+) has been recently introduced as an efficient tool for solving convex optimization problems. It has proved particularly effective in the context of stereo vision, used as the methodological core of a novel disparity estimation technique. In this work, the main methodological issues limiting the efficient parallelization of this technique are addressed, and further modifications are proposed to enable and optimize the design of parallel implementations. Finally, actual implementations that fit both the multi-core CPU and GPU devices are provided and... May 9, 2013 · >>>
|
Programming and Performance of Graphics Processors in Shock Waves Simulation by Finite Volume MethodIn this paper, we mainly report on our experience and strategy in programming graphics processing units (GPUs) as fast parallel floating point coprocessors to accelerate the simulation of travelling shock waves of the 2-D Euler equation by the finite volume method. The GPU code is specialized in CUDA (Compute Unified Device Architecture) for which we develop exclusive algorithm in the management of memory access for high efficiency. Through simulations of Rayleigh-Taylor instability problem, its performance has been inspected and is about 119~174 times faster than CPU code. Beside that, the... May 8, 2013 · >>>
|
Most viewed papers (last 30 days)
- Graphics Programming on the Web WebCL Course Notes
- Use NVIDIA CUDA technology to create genetic algorithms with extensive population
- Simulating the universe with GPU-accelerated supercomputers: n-body methods, tests, and examples
- Secrets from the GPU
- Implementations of the FFT algorithm on GPU
- Fluid Motion Modelling Using Vortex Particle Method on GPU
- GPU Scripting and Code Generation with PyCUDA
- A General-Purpose GPU Reservoir Computer
- Adding GPU Computing to Computer Organization Courses
- libWater: Heterogeneous Distributed Computing Made Easy
Rating
Duality based optical flow algorithms with applications
Adaptive Dynamic Load Balancing in Heterogeneous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search
Graphics Programming on the Web WebCL Course Notes
OpenCL parallel Processing using General Purpose Graphical Processing units - TiViPE software development
Comprehensive Analysis of High-Performance Computing Methods for Filtered Back-Projection
Optimizing MapReduce for GPUs with effective shared memory usage
A parallel decoding algorithm of LDPC codes using CUDA
Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
CUDA implementation of the algorithm for simulating the epidemic spreading over large networks
Stencil-Aware GPU Optimization of Iterative Solvers
Recent source codes
Events
October 1-4, 2013 Lyon, France The 2013 International Workshop on Embedded Multicore Systems, ICPP-EMS 2013 |
November 13-15, 2013 Zhangjiajie, China 3rd International Workshop on Embedded Multi-core Computing and Applications, EMCA 2013 |
February 2-6, 2014 San Francisco, USA |
February 12-14, 2014 Turin, Italy |
November 11-14, 2013 San Jose, California, USA |
Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.
The platforms are
- GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
- GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
- CPU: AMD Phenom II X6 @ 2.8GHz 1055T
- RAM: 12GB
- HDD: 2TB, Raid-0
- OS: OpenSUSE 11.4
- SDK: AMD APP SDK 2.8
- GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
- GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
- CPU: Intel Core i7-2600 @ 3.4GHz
- RAM: 16GB
- HDD: 2TB, Raid-0
- OS: OpenSUSE 12.2
- SDK: nVidia CUDA Toolkit 5.0.35, AMD APP SDK 2.8
Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.
The information send to hgpu.org will be treated according to our Privacy Policy
HGPU Group © 2010-2013 hgpu.org
All rights belong to the respective authors
Contact information:
contact@hgpu.org

