13255

Posts

Dec, 12

A Survey Paper on Solving TSP using Ant Colony Optimization on GPU

Ant Colony Optimization (ACO) is meta-heuristic algorithm inspired from nature to solve many combinatorial optimization problem such as Travelling Salesman Problem (TSP). There are many versions of ACO used to solve TSP like, Ant System, Elitist Ant System, Max-Min Ant System, Rank based Ant System algorithm. For improved performance, these methods can be implemented in […]
Dec, 12

cuLGT: Lattice Gauge Fixing on GPUs

We adopt CUDA-capable Graphic Processing Units (GPUs) for Landau, Coulomb and maximally Abelian gauge fixing in 3+1 dimensional SU(3) and SU(2) lattice gauge field theories. A combination of simulated annealing and overrelaxation is used to aim for the global maximum of the gauge functional. We use a fine grained degree of parallelism to achieve the […]
Dec, 12

Compiler-Level Explicit Cache for a GPGPU Programming Framework

GPU is widely used for high-performance computing. However, standard programming framework such as CUDA and OpenCL requires low-level specifications, thus programming is difficult and the performance is not portable. Therefore, we are developing a new framework named MESI-CUDA. Providing virtual shared variables accessible from both CPU and GPU, MESI-CUDA hides complex memory architecture and eliminates […]
Dec, 12

Strong scaling of general-purpose molecular dynamics simulations on GPUs

We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, arXiv:1308.5587). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, J. Comp. Phys. 117, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson […]
Dec, 9

Theano-based Large-Scale Visual Recognition with Multiple GPUs

In this report, we describe a Theano-based AlexNet (Krizhevsky et al., 2012) implementation and its naive data parallelism on multiple GPUs. Our performance on 2 GPUs is comparable with the state-of-art Caffe library (Jia et al., 2014) run on 1 GPU. To the best of our knowledge, this is the first open-source Python-based AlexNet implementation […]
Dec, 9

Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such […]
Dec, 9

MLitB: Machine Learning in the Browser

With few exceptions, the field of Machine Learning (ML) research has largely ignored the browser as a computational engine. Beyond an educational resource for ML, the browser has vast potential to not only improve the state-of-the-art in ML research, but also, inexpensively and on a massive scale, to bring sophisticated ML learning and prediction to […]
Dec, 9

Risk Estimation Without Using Stein’s Lemma — Application to Image Denoising

Image denoising is a classical problem in image processing and has applications in areas ranging from photography to medical imaging. In this paper, we examine the denoising performance of an optimized spatially-varying Gaussian filter. The parameters of the Gaussian filter are tuned by optimizing a mean squared error estimate which is similar Stein’s Unbiased Risk […]
Dec, 9

Portable OpenCL Out-of-Order Execution Framework for Heterogeneous Platforms

Heterogeneous computing has become a viable option in seeking computing performance, to the side of conventional homogeneous multi-/single-processor approaches. The advantage of heterogeneity is the possibility to choose the best device on the platform for different distinct workloads in the application to gain performance and/or to lower power consumption. The drawback of heterogeneity is the […]
Dec, 8

XIII International Conference on Parallel Processing, ICPP 2015

The ICPP 2015 : XIII International Conference on Parallel Processing is the premier interdisciplinary forum for the presentation of new advances and research results in the fields of Parallel Processing. The conference will bring together leading academic scientists, researchers and scholars in the domain of interest from around the world. Topics of interest for submission […]
Dec, 8

The Sixth International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2015

The HEART symposium is an international forum on state-of-the-art research in high-performance and power-efficient computing using accelerator technologies such as FPGAs, GPGPUs, and/or specialized accelerators. The fifth edition of HEART will take place in Boston MA, USA. The Sixth International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART) is a forum to present and […]
Dec, 8

Computer Graphics International, CGI’15

Computer Graphics International is one of the oldest and true international conference in Computer Graphics and one of the five most important ones worldwide. It is an essential yearly meeting where academics present their latest models and technologies, and explore new trends and ideas. In previous years, it had been held in many different places […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org