Posts
Jun, 16
Multi-Tenant Virtual GPUs for Optimising Performance of a Financial Risk Application
Graphics Processing Units (GPUs) are becoming popular accelerators in modern High-Performance Computing (HPC) clusters. Installing GPUs on each node of the cluster is not efficient resulting in high costs and power consumption as well as underutilisation of the accelerator. The research reported in this paper is motivated towards the use of few physical GPUs by […]
Jun, 14
International Conference on Robotics and Machine Vision (ICRMV’16), 2016
Index: Scopus, Ei Compendex, Web of Science (CPCI), Inspec, Google Scholar, Microsoft Academic Search, etc. AGENDA: September 14, 2016: Registration & Conference Materials Collection September 15, 2016: Keynote Speeches & Participants’ Oral Presentation September 16, 2016: Visit PUBLICATION: ICRMV 2016 conference Proceedings CONTACT US: Ms.Janet Hsiao E-mail: icrmv@academic.net
Jun, 14
International Conference on Cybernetics, Robotics and Control (ICCRC’16), 2016
Publication: All accepted papers of CRC 2016 (Registered & Presented) will be collected in the conference proceedings, which will be indexed by EI and Scopus. Selected papers will be published in International Journal of Mechanical Engineering and Robotics Research, (ISSN: 2278-0149) which is Indexed by Index Corpernicus, Scopus (since 2016) etc. Contact: Ethell Shin E-mail: […]
Jun, 14
Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond
With the appearance of the heterogeneous platform OpenPower,many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGPUs, […]
Jun, 14
First Application of Lattice QCD to Pezy-SC Processor
Pezy-SC processor is a novel new architecture developed by Pezy Computing K. K. that has achieved large computational power with low electric power consumption. It works as an accelerator device similarly to GPGPUs. A programming environment that resembles OpenCL is provided. Using a hybrid parallel system "Suiren" installed at KEK, we port and tune a […]
Jun, 14
OpenCL-Based Erasure Coding on Heterogeneous Architectures
Erasure coding, Reed-Solomon coding in particular, is a key technique to deal with failures in scale-out storage systems. However, due to the algorithmic complexity, the performance overhead of erasure coding can become a significant bottleneck in storage systems attempting to meet service level agreements (SLAs). Previous work has mainly leveraged SIMD (singleinstruction multiple-data) instruction extensions […]
Jun, 14
Processing Big Data in Main Memory and on GPU
Many large-scale systems were designed with the assumption that I/O is the bottleneck, but this assumption has been challenged in the past decade with new trends in hardware capabilities and workload demands. The computational power of CPU cores has not improved proportional to the performance of disks and network interfaces in the past decade, but […]
Jun, 14
Multi-GPU Implementation of Machine Learning Algorithm using CUDA and OpenCL
Using modern Graphic Processing Units (GPUs) becomes very useful for computing complex and time consuming processes. GPUs provide high-performance computation capabilities with a good price. This paper deals with a multi-GPU OpenCL and CUDA implementations of k-Nearest Neighbor (k-NN) algorithm. This work compares performances of OpenCLand CUDA implementations where each of them is suitable for […]
Jun, 9
Analysis and Parameter Prediction of Compiler Transformation for Graphics Processors
In the last decade graphics processors (GPUs) have been extensively used to solve computationally intensive problems. A variety of GPU architectures by different hardware manufacturers have been shipped in a few years. OpenCL has been introduced as the standard cross-vendor programming framework for GPU computing. Writing and optimising OpenCL applications is a challenging task, the […]
Jun, 9
Decoupled Vector-Fetch Architecture with a Scalarizing Compiler
As we approach the end of conventional technology scaling, computer architects are forced to incorporate specialized and heterogeneous accelerators into general-purpose processors for greater energy efficiency. Among the prominent accelerators that have recently become more popular are data-parallel processing units, such as classic vector units, SIMD units, and graphics processing units (GPUs). Surveying a wide […]
Jun, 9
OpenMP Parallelization and Optimization of Graph-based Machine Learning Algorithms
We investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is […]
Jun, 9
Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU
Sparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations. Many-core processors such as GPUs accelerate SpMV computations with high parallelism and memory bandwidth compared to CPUs; however, even for many-core processors the performance of SpMV is still strongly limited by memory bandwidth and lower locality of memory access to input vector causes […]