Posts
Dec, 24
GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training
The ability to train large-scale neural networks has resulted in state-of-the-art performance in many areas of computer vision. These results have largely come from computational break throughs of two forms: model parallelism, e.g. GPU accelerated training, which has seen quick adoption in computer vision circles, and data parallelism, e.g. A-SGD, whose large scale has been […]
Dec, 24
Large-Scale Paralleled Sparse Principal Component Analysis
Principal component analysis (PCA) is a statistical technique commonly used in multivariate data analysis. However, PCA can be difficult to interpret and explain since the principal components (PCs) are linear combinations of the original variables. Sparse PCA (SPCA) aims to balance statistical fidelity and interpretability by approximating sparse PCs whose projections capture the maximal variance […]
Dec, 23
GPU Acceleration of Melody Accurate Matching in Query-by-Humming
With the increasing scale of the melody database,the query-by-humming system faces the tradeoffs between response speed and retrieval accuracy. Melody accurate matching is the key factor to restrict the response speed. In this paper, we present a GPU acceleration method of melody accurate matching, in order to improve the response speed without reducing retrieval accuracy. […]
Dec, 23
Enabling High Performance Computing in Cloud Infrastructure using Virtualized GPUs
With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their technical computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities clouds provide, as well as many novel computing paradigms available for data-intensive applications. However, there is […]
Dec, 23
Hardware Acceleration Technologies in Computer Algebra: Challenges and Impact
The objective of high performance computing (HPC) is to ensure that the computational power of hardware resources is well utilized to solve a problem. Various techniques are usually employed to achieve this goal. Improvement of algorithm to reduce the number of arithmetic operations, modifications in accessing data or rearrangement of data in order to reduce […]
Dec, 23
Single Server Multi-GPU Training of ConvNets
In this work we evaluate different approaches to parallelize computation of convolutional neural networks across several GPUs within the same server.
Dec, 23
Fast Training of Convolutional Networks through FFTs
Convolutional networks are one of the most widely employed architectures in computer vision and machine learning. In order to leverage their ability to learn complex functions, large amounts of data are required for training. Training a large convolutional network to produce state-of-the-art results can take weeks, even when using modern GPUs. Producing labels using a […]
Dec, 22
Resource Centered Computing delivering high parallel performance
Modern parallel programming requires a combination of different paradigms, expertise and tuning, that correspond to the different levels in today’s hierarchical architectures. To cope with the inherent difficulty, ORWL (ordered read-write locks) presents a new paradigm and toolbox centered around local or remote resources, such as data, processors or accelerators. ORWL programmers describe their computation […]
Dec, 22
Energy Auto-tuning using the Polyhedral Approach
As the HPC community moves into the exascale computing era, application energy has become a big concern. Tuning for energy will be essential in the effort to overcome the limited power envelope. How is tuning for lower energy related to tuning for faster execution? Understanding that relationship can guide both performance and energy tuning for […]
Dec, 22
Speed-Up Improvement Using Parallel Approach in Image Steganography
This paper presents a parallel approach to improve the time complexity problem associated with sequential algorithms. An image steganography algorithm in transform domain is considered for implementation. Image steganography is a technique to hide secret message in an image. With the parallel implementation, large message can be hidden in large image since it does not […]
Dec, 22
Numerical Simulation for the MHD System in 2D Using OpenCL
In this work we compute the MHD equations with divergence cleaning on GPU. The method is based on the finite volume approach and Strang dimensional splitting. The simplicity of the approach makes it a good candidate for a GPU implementation with OpenCL. With adequate memory optimization access, we achieve very high speedups, compared to a […]
Dec, 22
Accelerating Pairwise DNA Sequence Alignment using the CUDA Compatible GPU
We present a novel implementation of the pairwise DNA sequence alignment problem other than the Dynamic programming solution presented by Smith Waterman Algorithm. The proposed implementation uses CUDA; the parallel computing platform and programming model invented by NVIDIA. The main idea of the proposed implementation is assigning different nucleotide weights then merging the sub-sequences of […]

