Posts
Dec, 24
Scene Boundary Detection Technique Based on Bottom-Up Attention System and OpenCL Parallel Implementation
This paper spotlights the maintaining of scene boundary detection system in video and process of porting it to the OpenCL. The scene boundary detection algorithm proposed by authors is based on bottom-up focus attention principle. The system builds Gaussian pyramids from input image, calculates map of saliency from the image and then detects the most […]
Dec, 24
Transparent Checkpoint-Restart for Hardware-Accelerated 3D Graphics
A mechanism for transparent GPU-independent checkpoint-restart of 3D graphics is described. The approach is based on a record-prune-replay paradigm: all OpenGL calls relevant to the graphics driver state are recorded; calls not relevant to the internal driver state as of the last graphics frame prior to checkpoint are discarded; and and the remaining calls are […]
Dec, 24
GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training
The ability to train large-scale neural networks has resulted in state-of-the-art performance in many areas of computer vision. These results have largely come from computational break throughs of two forms: model parallelism, e.g. GPU accelerated training, which has seen quick adoption in computer vision circles, and data parallelism, e.g. A-SGD, whose large scale has been […]
Dec, 24
Large-Scale Paralleled Sparse Principal Component Analysis
Principal component analysis (PCA) is a statistical technique commonly used in multivariate data analysis. However, PCA can be difficult to interpret and explain since the principal components (PCs) are linear combinations of the original variables. Sparse PCA (SPCA) aims to balance statistical fidelity and interpretability by approximating sparse PCs whose projections capture the maximal variance […]
Dec, 23
GPU Acceleration of Melody Accurate Matching in Query-by-Humming
With the increasing scale of the melody database,the query-by-humming system faces the tradeoffs between response speed and retrieval accuracy. Melody accurate matching is the key factor to restrict the response speed. In this paper, we present a GPU acceleration method of melody accurate matching, in order to improve the response speed without reducing retrieval accuracy. […]
Dec, 23
Enabling High Performance Computing in Cloud Infrastructure using Virtualized GPUs
With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their technical computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities clouds provide, as well as many novel computing paradigms available for data-intensive applications. However, there is […]
Dec, 23
Hardware Acceleration Technologies in Computer Algebra: Challenges and Impact
The objective of high performance computing (HPC) is to ensure that the computational power of hardware resources is well utilized to solve a problem. Various techniques are usually employed to achieve this goal. Improvement of algorithm to reduce the number of arithmetic operations, modifications in accessing data or rearrangement of data in order to reduce […]
Dec, 23
Single Server Multi-GPU Training of ConvNets
In this work we evaluate different approaches to parallelize computation of convolutional neural networks across several GPUs within the same server.
Dec, 23
Fast Training of Convolutional Networks through FFTs
Convolutional networks are one of the most widely employed architectures in computer vision and machine learning. In order to leverage their ability to learn complex functions, large amounts of data are required for training. Training a large convolutional network to produce state-of-the-art results can take weeks, even when using modern GPUs. Producing labels using a […]
Dec, 22
Resource Centered Computing delivering high parallel performance
Modern parallel programming requires a combination of different paradigms, expertise and tuning, that correspond to the different levels in today’s hierarchical architectures. To cope with the inherent difficulty, ORWL (ordered read-write locks) presents a new paradigm and toolbox centered around local or remote resources, such as data, processors or accelerators. ORWL programmers describe their computation […]
Dec, 22
Energy Auto-tuning using the Polyhedral Approach
As the HPC community moves into the exascale computing era, application energy has become a big concern. Tuning for energy will be essential in the effort to overcome the limited power envelope. How is tuning for lower energy related to tuning for faster execution? Understanding that relationship can guide both performance and energy tuning for […]
Dec, 22
Speed-Up Improvement Using Parallel Approach in Image Steganography
This paper presents a parallel approach to improve the time complexity problem associated with sequential algorithms. An image steganography algorithm in transform domain is considered for implementation. Image steganography is a technique to hide secret message in an image. With the parallel implementation, large message can be hidden in large image since it does not […]