Posts
Jun, 16
Splotch: porting and optimizing for the Xeon Phi
With the increasing size and complexity of data produced by large scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous High Performance Computing environments for increased throughput and efficiency. We focus on the porting and optimization of Splotch, a scalable visualization algorithm, to utilize […]
Jun, 16
Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs
We perform a study of the factors affecting training time in multi-device deep learning systems. Given a specification of a convolutional neural network, we study how to minimize the time to train this model on a cluster of commodity CPUs and GPUs. Our first contribution focuses on the single-node setting, in which we show that […]
Jun, 16
Multi-Tenant Virtual GPUs for Optimising Performance of a Financial Risk Application
Graphics Processing Units (GPUs) are becoming popular accelerators in modern High-Performance Computing (HPC) clusters. Installing GPUs on each node of the cluster is not efficient resulting in high costs and power consumption as well as underutilisation of the accelerator. The research reported in this paper is motivated towards the use of few physical GPUs by […]
Jun, 14
International Conference on Robotics and Machine Vision (ICRMV’16), 2016
Index: Scopus, Ei Compendex, Web of Science (CPCI), Inspec, Google Scholar, Microsoft Academic Search, etc. AGENDA: September 14, 2016: Registration & Conference Materials Collection September 15, 2016: Keynote Speeches & Participants’ Oral Presentation September 16, 2016: Visit PUBLICATION: ICRMV 2016 conference Proceedings CONTACT US: Ms.Janet Hsiao E-mail: icrmv@academic.net
Jun, 14
International Conference on Cybernetics, Robotics and Control (ICCRC’16), 2016
Publication: All accepted papers of CRC 2016 (Registered & Presented) will be collected in the conference proceedings, which will be indexed by EI and Scopus. Selected papers will be published in International Journal of Mechanical Engineering and Robotics Research, (ISSN: 2278-0149) which is Indexed by Index Corpernicus, Scopus (since 2016) etc. Contact: Ethell Shin E-mail: […]
Jun, 14
Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond
With the appearance of the heterogeneous platform OpenPower,many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGPUs, […]
Jun, 14
First Application of Lattice QCD to Pezy-SC Processor
Pezy-SC processor is a novel new architecture developed by Pezy Computing K. K. that has achieved large computational power with low electric power consumption. It works as an accelerator device similarly to GPGPUs. A programming environment that resembles OpenCL is provided. Using a hybrid parallel system "Suiren" installed at KEK, we port and tune a […]
Jun, 14
OpenCL-Based Erasure Coding on Heterogeneous Architectures
Erasure coding, Reed-Solomon coding in particular, is a key technique to deal with failures in scale-out storage systems. However, due to the algorithmic complexity, the performance overhead of erasure coding can become a significant bottleneck in storage systems attempting to meet service level agreements (SLAs). Previous work has mainly leveraged SIMD (singleinstruction multiple-data) instruction extensions […]
Jun, 14
Multi-GPU Implementation of Machine Learning Algorithm using CUDA and OpenCL
Using modern Graphic Processing Units (GPUs) becomes very useful for computing complex and time consuming processes. GPUs provide high-performance computation capabilities with a good price. This paper deals with a multi-GPU OpenCL and CUDA implementations of k-Nearest Neighbor (k-NN) algorithm. This work compares performances of OpenCLand CUDA implementations where each of them is suitable for […]
Jun, 14
Processing Big Data in Main Memory and on GPU
Many large-scale systems were designed with the assumption that I/O is the bottleneck, but this assumption has been challenged in the past decade with new trends in hardware capabilities and workload demands. The computational power of CPU cores has not improved proportional to the performance of disks and network interfaces in the past decade, but […]
Jun, 9
Analysis and Parameter Prediction of Compiler Transformation for Graphics Processors
In the last decade graphics processors (GPUs) have been extensively used to solve computationally intensive problems. A variety of GPU architectures by different hardware manufacturers have been shipped in a few years. OpenCL has been introduced as the standard cross-vendor programming framework for GPU computing. Writing and optimising OpenCL applications is a challenging task, the […]
Jun, 9
Decoupled Vector-Fetch Architecture with a Scalarizing Compiler
As we approach the end of conventional technology scaling, computer architects are forced to incorporate specialized and heterogeneous accelerators into general-purpose processors for greater energy efficiency. Among the prominent accelerators that have recently become more popular are data-parallel processing units, such as classic vector units, SIMD units, and graphics processing units (GPUs). Surveying a wide […]