Posts
Dec, 14
Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives
The use of GPUs for general purpose computation has increased dramatically in the past years due to the rising demands of computing power and their tremendous computing capacity at low cost. Hence, new programming models have been developed to integrate these accelerators with high-level programming languages, giving place to heterogeneous computing systems. Unfortunately, this heterogeneity […]
Dec, 14
Acceleration of Hessenberg Reduction for Nonsymmetric Matrix
The worth of finding a general solution for nonsymmetric eigenvalue problems is specified in many areas of engineering and science computations, such as reducing noise to have a quiet ride in automotive industrial engineering or calculating the natural frequency of a bridge in civil engineering. The main objective of this thesis is to design a […]
Dec, 14
Graph Processing on GPU
Graph mining and data management has become a significant area because more and more new applications to various data mining problems in social networking, computational biology, chemical data analysis and drug discovery are emerging recently. Although traditional mining methods have been extended to process graphs, many graph applications still confront huge challenges due to continuous […]
Dec, 13
C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++
Capitalize on the faster GPU processors in today’s computers with the C++ AMP code library—and bring massive parallelism to your project. With this practical book, experienced C++ developers will learn parallel programming fundamentals with C++ AMP through detailed examples, code snippets, and case studies. Learn the advantages of parallelism and get best practices for harnessing […]
Dec, 12
Real-Time Grasp Detection Using Convolutional Neural Networks
We present an accurate, real-time approach to robotic grasp detection based on convolutional neural networks. Our network performs single-stage regression to graspable bounding boxes without using standard sliding window or region proposal techniques. The model outperforms state-of-the-art approaches by 14 percentage points and runs at 13 frames per second on a GPU. Our network can […]
Dec, 12
A Survey Paper on Solving TSP using Ant Colony Optimization on GPU
Ant Colony Optimization (ACO) is meta-heuristic algorithm inspired from nature to solve many combinatorial optimization problem such as Travelling Salesman Problem (TSP). There are many versions of ACO used to solve TSP like, Ant System, Elitist Ant System, Max-Min Ant System, Rank based Ant System algorithm. For improved performance, these methods can be implemented in […]
Dec, 12
cuLGT: Lattice Gauge Fixing on GPUs
We adopt CUDA-capable Graphic Processing Units (GPUs) for Landau, Coulomb and maximally Abelian gauge fixing in 3+1 dimensional SU(3) and SU(2) lattice gauge field theories. A combination of simulated annealing and overrelaxation is used to aim for the global maximum of the gauge functional. We use a fine grained degree of parallelism to achieve the […]
Dec, 12
Compiler-Level Explicit Cache for a GPGPU Programming Framework
GPU is widely used for high-performance computing. However, standard programming framework such as CUDA and OpenCL requires low-level specifications, thus programming is difficult and the performance is not portable. Therefore, we are developing a new framework named MESI-CUDA. Providing virtual shared variables accessible from both CPU and GPU, MESI-CUDA hides complex memory architecture and eliminates […]
Dec, 12
Strong scaling of general-purpose molecular dynamics simulations on GPUs
We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, arXiv:1308.5587). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, J. Comp. Phys. 117, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson […]
Dec, 9
Theano-based Large-Scale Visual Recognition with Multiple GPUs
In this report, we describe a Theano-based AlexNet (Krizhevsky et al., 2012) implementation and its naive data parallelism on multiple GPUs. Our performance on 2 GPUs is comparable with the state-of-art Caffe library (Jia et al., 2014) run on 1 GPU. To the best of our knowledge, this is the first open-source Python-based AlexNet implementation […]
Dec, 9
Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors
The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such […]
Dec, 9
MLitB: Machine Learning in the Browser
With few exceptions, the field of Machine Learning (ML) research has largely ignored the browser as a computational engine. Beyond an educational resource for ML, the browser has vast potential to not only improve the state-of-the-art in ML research, but also, inexpensively and on a massive scale, to bring sophisticated ML learning and prediction to […]