Posts
Mar, 14
6th Int. Workshop on Computer Science and Engineering (WCSE), 2016
All accepted of WCSE 2016 will be published by Conference proceedings, which will be indexed by 【EI &Scopus.】 Keynote &Plenary Speakers Prof. Hayato Ohwada, Tokyo University of Science, Japan Prof. Taku Harada, Tokyo University of Science, Japan Prof. Akiko Aizawa, National Institute of Informatics, Japan Prof. Hiroyuki Nishiyama, Tokyo University of Science, Japan Conference Program […]
Mar, 12
Machine Learning at the Limit
Many systems have been developed for machine learning at scale. Performance has steadily improved, but there has been relatively little work on explicitly defining or approaching the limits of performance. In this paper we describe the application of roofline design, an approach borrowed from computer architecture, to large-scale machine learning. In roofline design, one exposes […]
Mar, 12
SGO: An ultrafast engine for atomic structure global optimization by differential evolution
This paper presents a fast method for global search of atomic structures. The structures global optimization (SGO) engine consists of a high-efficiency differential evolution algorithm, accelerated local relaxation methods and an ultrafast density functional theory plane-wave code run on GPU machines. It can search the global minimum configuration of crystals, two-dimensional materials and quantum clusters […]
Mar, 12
A portable platform for accelerated PIC codes and its application to GPUs using OpenACC
We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the […]
Mar, 12
Clinically applicable Monte Carlo-based biological dose optimization for the treatment of head and neck cancers with spot-scanning proton therapy
Purpose: To demonstrate the feasibility of fast Monte Carlo (MC) based inverse biological planning for the treatment of head and neck tumors in spot-scanning proton therapy. Methods: Recently, a fast and accurate Graphics Processor Unit (GPU)-based MC simulation of proton transport was developed and used as the dose calculation engine in a GPU-accelerated IMPT optimizer. […]
Mar, 12
Automatic and Explicit Parallelization Approaches for Mathematical Simulation Models
The move from single core and processor systems to multi-core and many-processors systemscomes with the requirement of implementing computations in a way that can utilizethese multiple units eciently. This task of writing ecient multi-threaded algorithmswill not be possible with out improving programming languages and compilers to providethe mechanisms to do so. Computer aided mathematical modeling […]
Mar, 10
Accelerating Wright-Fisher Forward Simulations on the Graphics Processing Unit
Forward Wright-Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the CPU, thus limiting their usefulness. The single-locus Wright-Fisher forward algorithm is, however, exceedingly parallelizable, with many steps which are so-called embarrassingly parallel, consisting of a vast number of individual computations that are all […]
Mar, 10
Automatic Data Layout Generation and Kernel Mapping for CPU+GPU Architectures
The ubiquity of hybrid CPU+GPU architectures has led to renewed interest in automatic data layout generation owing to the fact that data layouts have a large impact on performance, and that different data layouts yield the best performance on CPUs vs. GPUs. Unfortunately, current programming models still fail to provide an effective solution to the […]
Mar, 10
Pragma Directed Shared Memory Centric Optimizations on GPUs
GPUs become a ubiquitous choice as coprocessors since they have excellent ability in concurrent processing. In GPU architecture, shared memory plays a very important role in system performance as it can largely improve bandwidth utilization and accelerate memory operations. However, even for affine GPU applications that contain regular access patterns, optimizing for shared memory is […]
Mar, 10
Study and evaluation of an Irregular Graph Algorithm on Multicore and GPU Processor Architectures
One area of Computing applications which poses significant challenge of performance scalability on Chip Multiprocessors(CMP’s) are Irregular applications. Such applications have very little computation and unpredictable memory access patterns making them memory-bound in contrast to compute-bound applications. Since the gap between processor and memory performance continues to exist, difficulty to hide and decrease this gap […]
Mar, 10
Testing fine-grained parallelism for the ADMM on a factor-graph
There is an ongoing effort to develop tools that apply distributed computational resources to tackle large problems or reduce the time to solve them. In this context, the Alternating Direction Method of Multipliers (ADMM) arises as a method that can exploit distributed resources like the dual ascent method and has the robustness and improved convergence […]
Mar, 8
D-face: Parallel Implementation of CNN Based Face Classifier using Drone Data On K40 & Jetson TK1
Convolutional Neural Networks (CNNs) are shown to perform very well in the areas such as video surveillance, object classification and face classification. Face classification has become pertinent to numerous applications, especially in this big data era of social platforms and social media. With the usage of unmanned air-borne vehicles like drones, the problem of face […]